Index of /download/CurrentNotes
Name Last modified Size Description
Parent Directory -
AllGaps.tab 29-Jun-2008 18:15 39K
DeletedGenesSinceAssembly21.txt 02-Jul-2008 01:09 112
DeletedOrfsSinceAssembly20.txt 28-Nov-2007 00:41 17K
GenesWithIntrons.tab 29-Jun-2008 18:15 28K
NewGenesSinceAssembly21.txt 02-Jul-2008 01:09 112
NewOrfsSinceAssembly20.txt 28-Nov-2007 00:41 19K
ORFsWithAdjustments.tab 29-Jun-2008 18:15 20K
The currentNotes download directory contains the following
files and subdirectories:
-----------------------------------------------
NewGenesSinceAssembly21.txt
Lists genes that have been added to CGD since the release of
Assembly 21 (including genes with non-translated products).
The genomic sequence, and, for ORFs, the sequence of ORF
coding sequence (CDS), and the predicted translation of
each ORF, is provided in FASTA format.
-----------------------------------------------
NewOrfsSinceAssembly20.txt
Lists ORFs that have been added to CGD since the release of Assembly 20.
The genomic sequence, sequence of the coding sequence (CDS), and the
predicted translation of each ORF is provided in FASTA format.
-----------------------------------------------
DeletedGenesSinceAssembly21.txt
Lists genes that have been deleted from CGD since the release of
Assembly 21 (including genes with non-translated products).
The genomic sequence, and, for ORFs, the sequence of ORF
coding sequence (CDS), and the predicted translation of
each ORF, is provided in FASTA format.
-----------------------------------------------
DeletedOrfsSinceAssembly20.txt
Lists ORFs that have been deleted from CGD since the release of Assembly
20. The genomic sequence, sequence of the coding sequence (CDS), and the
predicted translation of each ORF is provided in FASTA format.
-----------------------------------------------
GenesWithIntrons spreadsheet:
The intron data published in the paper Mitrovich QM, Tuch BB, Guthrie
C, Johnson AD. Computational and experimental approaches double the number
of known introns in the pathogenic yeast Candida albicans. Genome Res.
2007 Apr;17(4):492-502 have been incorporated into Assembly 20 in CGD.
The updates have been carried forward and reflected in the Assembly 21
annotation. Assembly 19 coordinates will not be updated.
Please note that this dataset now allows CGD to clearly separate the
introns in Assembly 20 from the gaps that were introduced by the
Annnotation Working Group (AWG) to compensate for presumed sequencing
errors that interrupt ORFs. These gaps are now labeled "adjustments" in
CGD, and they should be considered markers for regions that require
resequencing, rather than corrections to the assembly. Features labeled
"intron" are expected to be bona fide introns in vivo (in contrast to
non-biological "adjustments").
As of May 2007, there are 18 ORFs that have both introns and adjustments;
these ORFs are present on both the ORFsWIthIntrons and ORFsWithAdjustments
spreadsheets. One additional ORF falls into this category, orf19.10708
(MTLalpha2), but this ORF is not contained in the haploid assemblies A20 or
A21 and it is therefore omitted from this list.
The spreadsheet includes the following fields:
Gene Name, Locus Name, Coordinates, Gap Length(s), Type of Gap(s)
Coordinates: comma-delimited coordinate ranges specify each segment of the
coding sequence (CDS); coordinates are inclusive. The W or C following the
coordinates indicates that the ORF is on the Watson or Crick strand,
respectively.
Size of gap: If there are multiple gaps, the individual lengths are
separated by commas. Numbers over 999 do not have commas within them.
UTR introns: ORFs that have 5' UTR introns are not included on the
ORFsWIthIntrons sheet because the introns fall outside of the ORF
boundaries; the 5' UTR introns are displayed as an Assembly 20 track
in the GBrowse Genome Browser in CGD.
Please note that this file lists all genes with introns, including
genes with non-translated gene products (e.g., tRNA genes).
-----------------------------------------------
ORFsWithAdjustments spreadsheet:
The intron data published in the paper Mitrovich QM, Tuch BB, Guthrie
C, Johnson AD. Computational and experimental approaches double the number
of known introns in the pathogenic yeast Candida albicans. Genome Res.
2007 Apr;17(4):492-502 have been incorporated into Assembly 20 in CGD.
The updates will be carried forward and reflected in the Assembly 21
annotation. Assembly 19 coordinates will not be updated.
Please note that this dataset now allows CGD to clearly separate the
introns in Assembly 20 from the gaps that were introduced by the
Annnotation Working Group (AWG) to compensate for presumed sequencing
errors that interrupt ORFs. These gaps are now labeled "adjustments" in
CGD, and they should be considered markers for regions that require
resequencing, rather than corrections to the assembly. Features labeled
"intron" are expected to be bona fide introns in vivo (in contrast to
non-biological "adjustments").
As of May 2007, there are 18 ORFs that have both introns and adjustments;
these ORFs are present on both the ORFsWIthIntrons and ORFsWithAdjustments
spreadsheets. One additional ORF falls into this category, orf19.10708
(MTLalpha2), but this ORF is not contained in the haploid assemblies A20 or
A21 and it is therefore omitted from this list. The coordinates of the
adjustments in the ORFs that also contain introns were determined at CGD.
The coordinates of all of the other adjustments came from the Annotation
Working Group.
The spreadsheet includes the following fields:
ORF Name, Locus Name, Coordinates, Gap Length(s), Type of Gap(s)
Coordinates: comma-delimited coordinate ranges specify each segment of the
coding sequence (CDS); coordinates are inclusive. The W or C following the
coordinates indicates that the ORF is on the Watson or Crick strand,
respectively.
Size of gap: If there are multiple gaps, the individual lengths are
separated by commas. Numbers over 999 do not have commas within them.
-----------------------------------------------
AllGaps spreadsheet:
This sheet has a separate row for every gap (intron OR adjustment) within
an ORF. ORFs with multiple gaps have multiple rows on the sheet.
Note that the coordinates on the AllGaps sheet are the coordinates of the
gap, rather than the coordinates of the CDS. There is a difference in the
reporting of coordinates for adjustments of negative length vs. all other
gaps: gap coordinates are inclusive (as are exon coordinates), UNLESS the
gap is an adjustment of negative length. Looking at the "gap length" field
should clarify any cases in which there may be a question.
This list does not include 5' UTR introns, which are displayed as a track
in the GBrowse Genome Browser in CGD.
-----------------------------------------------