Rules for Community Annotation


Last Update: 2002-11-08 Kara

Background

These tables are used to store the data submitted by CGD colleagues about a particular paper.

An annotation set (ANNOTATION_SET) consists of a reference (REFLINK), a colleague (COLLEAGUE), and one or more annotations (ANNOTATION).

An annotation (ANNOTATION) consists of one or more bits of information (ANNOTATION_DETAILS) about a gene (or genes), categorized into a Topic and a Sub-Topic. The annotation description (ANNOTATION) refers to all the bits of information together.

Some sample data:

ANNOTATION_SET
=============
set_no	coll_no
1	33
2	64

ANNOTATION
=========
annot_no	set_no	sgdid	topic		subtopic   desc
1		1	S001	DNA/RNA		DNA Modif  did DNA modif
2		1	S001	Interactions	Genetic	   only in S288c
3		1	S005	Interactions	Genetic	   only in S288c
4		2	S200	Alleles/Strains	Deletion   tested ura3-4 in sigma 
                                                           background

ANNOTATION_DETAIL
=================
detail_no	name			value		annot_no
1		coord			1-500		1
2		suppressor of		ABC1		2
3		suppressed by		ACT11		3
4		allele			ura3-4		4
5		background		sigma		4
6		phenotype description	Dominant	4
7		phenotype description	Inviable	4

TABLE Annotation_Set
========
annotation_set_no               Required.
				Primary key.  Oracle sequence number.
				Cannot be updated.
				NUMBER(10)

colleague_no		        Required.
				Foreign key to Colleague table.
				NUMBER(10)	

date_created			Required.
				On insert, set to SYSDATE.
				On update, keep original date.
 
created_by                      Required.
				On insert, set to USER


Other notes:
============
If a row in annotation_set is deleted, then corresponding rows in
community_annotation and annotation_detail should also be deleted.

Every row in the annotation_set table must be associated with a row in the reflink table. This will be enforced via software. In the future, we might want to add this to our nightly checking script.

This table is somewhat dangerous because it is difficult to determine what the unique key should be. Part of the problem is that the reference_no is stored in the reflink table to be consistent with the way that we handle references in the database. However, even if we decide to put the reference_no in Annotation_Set, the combination of colleague_no + reference_no would not necessarily be unique. For example, a colleague may log on and enter one set of annotations for a reference, submit, then come back to enter another set of annotations for that same paper. I had initially thought that we could add DATE_CREATED, but if the colleague comes back the same day, this would violate the unique constraint. One idea for the future is to add an additional column that is the unix process ID of the web session, but to get this in production sooner rather than later, we might want to go with what we have now, keeping in mind the potential problems of not having a unique constraint on this table.


TABLE Community_Annotation
========
community_annotation_no         Required.
				Primary key.  Oracle sequence number.
				Cannot be updated.
				NUMBER(10)

annotation_set_no               Required.
				Foreign key to Annotation_Set table.
				NUMBER(10)

sgdid                           Required.
				VARCHAR2(40)

topic                           Required.
				VARCHAR2(40)


sub_topic			Required.  
				VARCHAR2(40)


description			Required.  
				VARCHAR2(480)


Other notes:
============

Note that in the future, topic and sub_topic will likely become coded
values.  However, until we start using this interface and get user
feedback, we will wait to add these to the code table and constrain
these values only via software.

Note that there is added complexity for the future coded values of topic and sub_topic, b/c to truly constrain them properly, we need to store the relationship between the two (when topic = A, sub_topic can be B, C, or D).


TABLE Annotation_Detail
========
annotation_detail_no            Required.
				Primary key.  Oracle sequence number.
				Cannot be updated.
				NUMBER(10)

community_annotation_no         Required.
				Foreign key to Community_Annotation table.
				NUMBER(10)

detail_name			Required.
				VARCHAR2(40)

detail_value			Required.  
				VARCHAR2(480)


Other notes:
============
Note that in the future, detail_name will likely become a coded
value. 
 

Phrases to use for the detail_name, detail_value when the Topic in the Annotation table is "Interactions". Examples can be found above in the sample data section. Interactions: Genetic ======================= "is synthetically lethal with" Gene A "is synthetically lethal" with Gene B Gene A "is synthetically lethal" with Gene B "is a suppressor of" (annotation_no 2,3) Gene A "is a suppressor of" Gene B Gene B "is suppressed by" Gene A "is epistatic to" Gene A "is epistatic to" Gene B Gene A "is epistatic to" Gene B "other" whatever is typed in (reciprocal) Interactions: Regulatory ========================== Gene A "is regulated by" Gene B. Gene B "regulates" Gene A.