Index of /download/CurrentNotes

Icon  Name                            Last modified      Size  Description
[DIR] Parent Directory - [   ] AllGaps.tab 29-Jun-2008 18:15 39K [TXT] DeletedGenesSinceAssembly21.txt 02-Jul-2008 01:09 112 [TXT] DeletedOrfsSinceAssembly20.txt 28-Nov-2007 00:41 17K [   ] GenesWithIntrons.tab 29-Jun-2008 18:15 28K [TXT] NewGenesSinceAssembly21.txt 02-Jul-2008 01:09 112 [TXT] NewOrfsSinceAssembly20.txt 28-Nov-2007 00:41 19K [   ] ORFsWithAdjustments.tab 29-Jun-2008 18:15 20K
The currentNotes download directory contains the following 
files and subdirectories:

-----------------------------------------------

NewGenesSinceAssembly21.txt

Lists genes that have been added to CGD since the release of 
Assembly 21 (including genes with non-translated products). 
The genomic sequence, and, for ORFs, the sequence of ORF 
coding sequence (CDS), and the predicted translation of 
each ORF, is provided in FASTA format. 

-----------------------------------------------

NewOrfsSinceAssembly20.txt

Lists ORFs that have been added to CGD since the release of Assembly 20. 
The genomic sequence, sequence of the coding sequence (CDS), and the 
predicted translation of each ORF is provided in FASTA format. 

-----------------------------------------------

DeletedGenesSinceAssembly21.txt

Lists genes that have been deleted from  CGD since the release of 
Assembly 21 (including genes with non-translated products). 
The genomic sequence, and, for ORFs, the sequence of ORF 
coding sequence (CDS), and the predicted translation of 
each ORF, is provided in FASTA format. 

-----------------------------------------------

DeletedOrfsSinceAssembly20.txt

Lists ORFs that have been deleted from CGD since the release of Assembly 
20. The genomic sequence, sequence of the coding sequence (CDS), and the 
predicted translation of each ORF is provided in FASTA format. 

-----------------------------------------------

GenesWithIntrons spreadsheet:

The intron data published in the paper Mitrovich QM, Tuch BB, Guthrie 
C, Johnson AD.  Computational and experimental approaches double the number 
of known introns in the pathogenic yeast Candida albicans.  Genome Res. 
2007 Apr;17(4):492-502 have been incorporated into Assembly 20 in CGD.
The updates have been carried forward and reflected in the Assembly 21 
annotation.  Assembly 19 coordinates will not be updated.

Please note that this dataset now allows CGD to clearly separate the 
introns in Assembly 20 from the gaps that were introduced by the 
Annnotation Working Group (AWG) to compensate for presumed sequencing 
errors that interrupt ORFs. These gaps are now labeled "adjustments" in 
CGD, and they should be considered markers for regions that require 
resequencing, rather than corrections to the assembly.  Features labeled 
"intron" are expected to be bona fide introns in vivo (in contrast to 
non-biological "adjustments").

As of May 2007, there are 18 ORFs that have both introns and adjustments; 
these ORFs are present on both the ORFsWIthIntrons and ORFsWithAdjustments 
spreadsheets.  One additional ORF falls into this category, orf19.10708 
(MTLalpha2), but this ORF is not contained in the haploid assemblies A20 or 
A21 and it is therefore omitted from this list.  

The spreadsheet includes the following fields:
Gene Name, Locus Name, Coordinates, Gap Length(s), Type of Gap(s)

Coordinates:  comma-delimited coordinate ranges specify each segment of the 
coding sequence (CDS); coordinates are inclusive.  The W or C following the 
coordinates indicates that the ORF is on the Watson or Crick strand, 
respectively.  

Size of gap:  If there are multiple gaps, the individual lengths are 
separated by commas.  Numbers over 999 do not have commas within them.  

UTR introns:  ORFs that have 5' UTR introns are not included on the 
ORFsWIthIntrons sheet because the introns fall outside of the ORF 
boundaries; the 5' UTR introns are displayed as an Assembly 20 track 
in the GBrowse Genome Browser in CGD. 

Please note that this file lists all genes with introns, including 
genes with non-translated gene products (e.g., tRNA genes).


-----------------------------------------------

ORFsWithAdjustments spreadsheet:

The intron data published in the paper Mitrovich QM, Tuch BB, Guthrie 
C, Johnson AD.  Computational and experimental approaches double the number 
of known introns in the pathogenic yeast Candida albicans.  Genome Res. 
2007 Apr;17(4):492-502 have been incorporated into Assembly 20 in CGD.
The updates will be carried forward and reflected in the Assembly 21 
annotation.  Assembly 19 coordinates will not be updated.

Please note that this dataset now allows CGD to clearly separate the 
introns in Assembly 20 from the gaps that were introduced by the 
Annnotation Working Group (AWG) to compensate for presumed sequencing 
errors that interrupt ORFs. These gaps are now labeled "adjustments" in 
CGD, and they should be considered markers for regions that require 
resequencing, rather than corrections to the assembly.  Features labeled 
"intron" are expected to be bona fide introns in vivo (in contrast to 
non-biological "adjustments").

As of May 2007, there are 18 ORFs that have both introns and adjustments; 
these ORFs are present on both the ORFsWIthIntrons and ORFsWithAdjustments 
spreadsheets.  One additional ORF falls into this category, orf19.10708 
(MTLalpha2), but this ORF is not contained in the haploid assemblies A20 or 
A21 and it is therefore omitted from this list.  The coordinates of the 
adjustments in the ORFs that also contain introns were determined at CGD.  
The coordinates of all of the other adjustments came from the Annotation 
Working Group.  

The spreadsheet includes the following fields:
ORF Name, Locus Name, Coordinates, Gap Length(s), Type of Gap(s)

Coordinates:  comma-delimited coordinate ranges specify each segment of the 
coding sequence (CDS); coordinates are inclusive.  The W or C following the 
coordinates indicates that the ORF is on the Watson or Crick strand, 
respectively.  

Size of gap:  If there are multiple gaps, the individual lengths are 
separated by commas.  Numbers over 999 do not have commas within them. 

-----------------------------------------------

AllGaps spreadsheet: 

This sheet has a separate row for every gap (intron OR adjustment) within 
an ORF.  ORFs with multiple gaps have multiple rows on the sheet.  

Note that the coordinates on the AllGaps sheet are the coordinates of the 
gap, rather than the coordinates of the CDS.  There is a difference in the 
reporting of coordinates for adjustments of negative length vs. all other 
gaps:  gap coordinates are inclusive (as are exon coordinates), UNLESS the 
gap is an adjustment of negative length.  Looking at the "gap length" field 
should clarify any cases in which there may be a question.

This list does not include 5' UTR introns, which are displayed as a track 
in the GBrowse Genome Browser in CGD.  

-----------------------------------------------