These tables are used for storing raw sequence data and archived sequences in the database.
The "production" version of the sequence will be stored in the display_seq and chromosome tables. Currently, there are 1 to 4 rows for each feature in the display_seq table, depending on the feature_type. For example, ORFs will have 4 rows: genomic DNA, coding DNA, genomic DNA with 1000 bp 5' and 3' flanking regions, and protein sequence. Note that we might at some point want to store subfeatures' sequences as well.
When there is a change to a feature's coordinates without a change to the underlying sequence, the coordinates are stored in the feature_update table. The coordinates of the feature in the feature table are updated, the feature's sequences in the raw_sequence table are updated, and the previous coordinates are stored in the feature_update table. In addition, the version of the chromosome that was associated with the feature is also stored so that the old sequence of a feature can be recreated.
When there is an actual change to the chromosomal sequence, the entire sequence of the chromosome is archived in the chromosome_archive table. In addition, each change that occurred is recorded in the chromosome_update table. This table can store individual nucleotide changes (eg. T to C), and can also store chunks of sequence changes up to 4000 nucleotides (eg. delete 1 kb). Some sample data can be found here.
One example scenario:
The current version of all sequences
(features and chromosomes) is 2002-05-23. On 2002-06-01, there is a
change to the feature coordinates for YFL039C. In the rows in the
display_seq table for YFL039C, the sequences will be updated and the
feature_version in the feature table will now be 2002-06-01. In the
feature_update table, the old coordinates for YFL039C will be stored,
and the feature_version in feature_update will be 2002-05-23. In
addition, changes to subfeatures associated with YFL039C will be
stored in the subfeature_update table. The chr_version will be
2002-05-23, which refers to a row in the chromosome table. This allows
us to re-construct what the original sequence of YFL039C was using the
coordinates and the chromosome.
Later, on 2002-10-01, the
chromosome sequence is changed, so the chromosome sequence is archived
in the chromosome_archive table (and specific changes recorded in
chromosome_update). In chromosome_archive, the chr_version will be
2002-05-23; the chr_seq in the chromosome table is updated, and the
version will now be 2002-10-01. We will still be able to re-construct
the sequence of the old version of YFL039C using the chr_version in
the feature_update table, which will now be referring to a row in the
chromosome_archive table
TABLE FEATURE
TABLE DISPLAY_SEQ
==================
display_seq_no Required.
Oracle sequence number.
Primary key.
Cannot be updated.
NUMBER(10)
display_seq Required. Uppercase.
CLOB
display_seq_type Required. Coded.
On insert, check value against code table.
VARCHAR2(40)
seq_length Required.
NUMBER(10)
feature_no Required.
VARCHAR2(10)
Foreign key to feature table.
date_created Required.
On insert, set to SYSDATE.
On update, keep original date.
created_by Required.
On insert, set to USER
Other notes:
============
Size issues: the largest ORF is 14733 nucleotides (YLR106C); there are
4 ORFs over 10 Kb, though most are below 4 Kb. The smallest is 75
nucleotides.
Some examples:
Most rows in the feature table will have 4 rows in the raw_seq table:
seq_type = 'Genomic DNA'
seq_type = 'Genomic DNA +/- 1 kb'
seq_type = 'Coding sequence'
seq_type = 'Protein translation'
For features with alternative translations, the feature will have the
above 4 rows plus:
seq_type = 'Alternative translation'
==================
This table already exists, but we are adding the following column:
feature_version Required.
DATE
This column stores the current version of the feature.
TABLE FEATURE_UPDATE
==================
coord_archive_no Required.
Oracle sequence number.
Primary key.
Cannot be updated.
NUMBER(10)
old_start_coord Required.
NUMBER(10)
new_start_coord Optional.
NUMBER(10)
old_stop_coord Required.
NUMBER(10)
new_stop_coord Optional.
NUMBER(10)
chr_version Required.
DATE
feature_version Required.
DATE
feature_no Required.
NUMBER(10)
Foreign key to FEATURE table.
date_created Required.
On insert, set to SYSDATE.
On update, keep original date.
created_by Required.
On insert, set to USER
Other notes:
============
The chr_version is the version of the chromosome with which the previous
version of the feature was associated. It may be the current version
of the chromosome in the chromosome table, or an archived version of
the chromosome in the chromosome_archive table.
Note that feature_version is not unique; there may be two different
changes to the same feature on the same day. In this case, there will
be two rows in feature_update pointing to the same row in the feature
table with the same feature_version.
TABLE SUBFEATURE_UPDATE
==================
subfeature_update_no Required.
Oracle sequence number.
Primary key.
Cannot be updated.
NUMBER(10)
old_start_coord Required.
NUMBER(10)
new_start_coord Optional.
NUMBER(10)
old_stop_coord Required.
NUMBER(10)
new_stop_coord Optional.
NUMBER(10)
feature_update_no Required.
NUMBER(10)
Foreign key to FEATURE table.
subfeature_no Required.
NUMBER(10)
Foreign key to FEATURE table.
date_created Required.
On insert, set to SYSDATE.
On update, keep original date.
TABLE CHROMOSOME
==================
This table already exists, but we are adding the following columns:
chr_version Required.
DATE
chr_seq Required.
CLOB
chr_version stores the current version of the chromosome, while
chr_seq stores the actual chromosome sequence.
TABLE CHROMOSOME_ARCHIVE
==================
chr_archive_no Required.
Oracle sequence number.
Primary key.
Cannot be updated.
NUMBER(10)
chr_seq Required.
CLOB
chr_seq_length Required.
NUMBER(10)
chr_version Required.
DATE
date_created Required.
On insert, set to SYSDATE.
On update, keep original date.
created_by Required.
On insert, set to USER
TABLE CHROMOSOME_UPDATE
==================
chr_update_no Required.
Oracle sequence number.
Primary key.
Cannot be updated.
NUMBER(10)
chr_update_type Required. Coded.
On insert, check value against code table.
VARCHAR2(40)
start_coord Required.
NUMBER(10)
stop_coord Required.
NUMBER(10)
date_created Required.
On insert, set to SYSDATE.
On update, keep original date.
created_by Required.
On insert, set to USER
old_seq Optional.
VARCHAR2(4000)
new_seq Optional.
VARCHAR2(4000)
Other notes:
============
Some valid values for chromo_change_type are: Deletion, Insertion,
Base change. We can also have a type like "Large change", where 4000
bp chunks are replaced.