These tables are used to store the data submitted by CGD colleagues about a particular paper.
An annotation set (ANNOTATION_SET) consists of a reference (REFLINK), a colleague (COLLEAGUE), and one or more annotations (ANNOTATION).
An annotation (ANNOTATION) consists of one or more bits of information (ANNOTATION_DETAILS) about a gene (or genes), categorized into a Topic and a Sub-Topic. The annotation description (ANNOTATION) refers to all the bits of information together.
ANNOTATION_SET
=============
set_no coll_no
1 33
2 64
ANNOTATION
=========
annot_no set_no sgdid topic subtopic desc
1 1 S001 DNA/RNA DNA Modif did DNA modif
2 1 S001 Interactions Genetic only in S288c
3 1 S005 Interactions Genetic only in S288c
4 2 S200 Alleles/Strains Deletion tested ura3-4 in sigma
background
ANNOTATION_DETAIL
=================
detail_no name value annot_no
1 coord 1-500 1
2 suppressor of ABC1 2
3 suppressed by ACT11 3
4 allele ura3-4 4
5 background sigma 4
6 phenotype description Dominant 4
7 phenotype description Inviable 4
TABLE Annotation_Set ======== annotation_set_no Required. Primary key. Oracle sequence number. Cannot be updated. NUMBER(10) colleague_no Required. Foreign key to Colleague table. NUMBER(10) date_created Required. On insert, set to SYSDATE. On update, keep original date. created_by Required. On insert, set to USER Other notes: ============ If a row in annotation_set is deleted, then corresponding rows in community_annotation and annotation_detail should also be deleted.Every row in the annotation_set table must be associated with a row in the reflink table. This will be enforced via software. In the future, we might want to add this to our nightly checking script.
This table is somewhat dangerous because it is difficult to determine what the unique key should be. Part of the problem is that the reference_no is stored in the reflink table to be consistent with the way that we handle references in the database. However, even if we decide to put the reference_no in Annotation_Set, the combination of colleague_no + reference_no would not necessarily be unique. For example, a colleague may log on and enter one set of annotations for a reference, submit, then come back to enter another set of annotations for that same paper. I had initially thought that we could add DATE_CREATED, but if the colleague comes back the same day, this would violate the unique constraint. One idea for the future is to add an additional column that is the unix process ID of the web session, but to get this in production sooner rather than later, we might want to go with what we have now, keeping in mind the potential problems of not having a unique constraint on this table.
TABLE Community_Annotation ======== community_annotation_no Required. Primary key. Oracle sequence number. Cannot be updated. NUMBER(10) annotation_set_no Required. Foreign key to Annotation_Set table. NUMBER(10) sgdid Required. VARCHAR2(40) topic Required. VARCHAR2(40) sub_topic Required. VARCHAR2(40) description Required. VARCHAR2(480) Other notes: ============ Note that in the future, topic and sub_topic will likely become coded values. However, until we start using this interface and get user feedback, we will wait to add these to the code table and constrain these values only via software.Note that there is added complexity for the future coded values of topic and sub_topic, b/c to truly constrain them properly, we need to store the relationship between the two (when topic = A, sub_topic can be B, C, or D).
TABLE Annotation_Detail ======== annotation_detail_no Required. Primary key. Oracle sequence number. Cannot be updated. NUMBER(10) community_annotation_no Required. Foreign key to Community_Annotation table. NUMBER(10) detail_name Required. VARCHAR2(40) detail_value Required. VARCHAR2(480) Other notes: ============ Note that in the future, detail_name will likely become a coded value.Phrases to use for the detail_name, detail_value when the Topic in the Annotation table is "Interactions". Examples can be found above in the sample data section. Interactions: Genetic ======================= "is synthetically lethal with" Gene A "is synthetically lethal" with Gene B Gene A "is synthetically lethal" with Gene B "is a suppressor of" (annotation_no 2,3) Gene A "is a suppressor of" Gene B Gene B "is suppressed by" Gene A "is epistatic to" Gene A "is epistatic to" Gene B Gene A "is epistatic to" Gene B "other" whatever is typed in (reciprocal) Interactions: Regulatory ========================== Gene A "is regulated by" Gene B. Gene B "regulates" Gene A.