Index of /download/sequence/C_albicans_SC5314

Icon  Name                    Last modified      Size  Description
[PARENTDIR] Parent Directory - [DIR] Assembly22/ 2023-06-29 09:54 - [DIR] Assembly21/ 2023-06-29 09:53 - [DIR] Assembly20/ 2023-06-29 09:53 - [DIR] Assembly19/ 2023-06-29 09:53 - [DIR] Assembly6/ 2023-06-29 09:54 - [DIR] Assembly5/ 2023-06-29 09:54 - [DIR] Assembly4/ 2023-06-29 09:54 - [DIR] 454_sequence/ 2023-06-29 09:53 -
The /download/sequence/C_albicans_SC5314 directory contains sequences
from the different assemblies of the genome sequencing project for
Candida albicans SC5314.

Current files for Assembly 22 are generated weekly and reflect the
most current information at CGD.  Most of the files in this directory
are in FASTA format; Assembly 20, 21 and 22 files in EMBL
format may be downloaded from the /Assembly20/current/EMBL_format/,
/Assembly21/current/EMBL_format/ and /Assembly22/current/EMBL_format/
subdirectories, respectively.

All files are gzip compressed. There are several freely available
software options for decompressing gzipped files using Windows.  The
software and other useful information is available on these web sites:
- WinZip (http://www.winzip.com/)
- Stuffit (http://www.stuffit.com/)
- Gzip (http://www.gzip.org/
   
and the gzip user's manual:
http://www.math.utah.edu/docs/info/gzip_toc.html

Additional sequence documentation is found on the CGD web site at:
http://www.candidagenome.org/help/SequenceHelp.shtml

------------------------------------------------

/Assembly22/

This directory contains sequence files for Assembly 22.

As of June 2011, the version designation appears in the name of each
of the current sequence files that are available at CGD, so the exact
source of the sequence data is always clear, as described at:
http://www.candidagenome.org/help/SequenceHelp.shtml

Version designations appear in the following format:
sXX-mYY-rZZ
where XX, YY, and ZZ are zero-padded integers. 

XX is incremented when there is any change to the underlying genomic
(i.e., chromosome) sequence.

YY is incremented when there is any change to the coordinates of any
feature annotated in the genome (e.g., any change in location or
boundary, or addition or removal of a feature from the annotation). YY
is reset to "01" when XX is incremented (when a sequence change is
made).

ZZ is incremented in response to curatorial changes that affect
information that appears in the GFF file, specifically gene names,
gene aliases, gene IDs, gene descriptions, feature types (e.g., gene
or pseudogene), and ORF classifications or qualifiers (e.g., Verified,
Uncharacterized, Deleted, Merged). The file will be checked on a
weekly basis, as well as any time that the GFF file is regenerated
manually, to see if changes have occurred that warrant a change in the
ZZ number. ZZ is reset to "01" when XX or YY is incremented (when a
sequence change is made, or when the coordinates of any feature are
updated).


/Assembly22/current/       

This directory contains the most current version of the sequences;
files are updated weekly:


/Assembly22/current/EMBL_format/

This directory contains current gene and sequence data from the
C. albicans Assembly 22 genome in EMBL file format.  Files in this
directory are generated weekly and reflect the most current
information at CGD.

/Assembly22/archive/

This directory contains archived versions of the Assembly 22
sequences.  The sequences are checked for changes weekly and a new
file is added whenever there has been a change.  The date of the
update is included in the filename.

------------------------------------------------

/Assembly21/

This directory contains sequence files for Assembly 21.

As of June 2011, the version designation appears in the name of each
of the relevant Assembly 21 (current) sequence files that are
available at CGD, so the exact source of the sequence data is always
clear, as described at:
http://www.candidagenome.org/help/SequenceHelp.shtml

Version designations appear in the following format:
sXX-mYY-rZZ
where XX, YY, and ZZ are zero-padded integers. 

XX is incremented when there is any change to the underlying genomic
(i.e., chromosome) sequence.

YY is incremented when there is any change to the coordinates of any
feature annotated in the genome (e.g., any change in location or
boundary, or addition or removal of a feature from the annotation). YY
is reset to "01" when XX is incremented (when a sequence change is
made).

ZZ is incremented in response to curatorial changes that affect
information that appears in the GFF file, specifically gene names,
gene aliases, gene IDs, gene descriptions, feature types (e.g., gene
or pseudogene), and ORF classifications or qualifiers (e.g., Verified,
Uncharacterized, Deleted, Merged). The file will be checked on a
weekly basis, as well as any time that the GFF file is regenerated
manually, to see if changes have occurred that warrant a change in the
ZZ number. ZZ is reset to "01" when XX or YY is incremented (when a
sequence change is made, or when the coordinates of any feature are
updated).


/Assembly21/current/       

This directory contains the most current version of the sequences;
files are updated weekly:


/Assembly21/current/EMBL_format/

This directory contains current gene and sequence data from the
C. albicans Assembly 21 genome in EMBL file format.  Files in this
directory are generated weekly and reflect the most current
information at CGD.

/Assembly21/archived_as_released/
               
This directory contains Candida albicans Assembly 21 (A21), as
released to CGD by the A21 collaborators and described in van Het Hoog
et al., 2007: http://genomebiology.com/content/pdf/gb-2007-8-4-r52.pdf


/Assembly21/archive/

This directory contains archived versions of the Assembly 21
sequences.  The sequences are checked for changes weekly and a new
file is added whenever there has been a change.  The date of the
update is included in the filename.


------------------------------------------------

/Assembly20/

This directory contains sequence files for Assembly 20. 
After October 2008 the files in this directory are NOT updated.

** PLEASE NOTE: Assembly 20 Sequence Advisory
**
** posted October 19, 2006, updated October 25, 2006
**
** The collaborative group who generated Assembly 20 has discovered that
** the sequence traces that they had been using to fill some of the gaps
** and determine overlaps between Assembly 19 contigs were derived from
** strain WO-1, rather than from the reference strain, SC5314.
**
** Please see http://www.candidagenome.org/help/Assembly20_Advisory.shtml
** for the latest information and status updates.


/Assembly20/current/       

This directory contains the last updated version of the sequences
as of Oct 6 2008
                            

/Assembly20/current/EMBL_format/

This directory contains current gene and sequence data from the
C. albicans Assembly 20 genome in EMBL file format.  Files in this
directory are NOT updated and reflect the Assembly 20
information at CGD as of Oct 6 2008.


/Assembly20/archive

This directory contains archived versions of the Assembly 20
sequences.  Before Oct 6 2008 the sequences were checked for changes weekly and a new
file was added whenever there had been changes.


------------------------------------------------

/Assembly19/

This directory contains sequence files for Assembly 19


Note, that for the directories below, all files are available in both
haploid and diploid versions for Assembly 19, the haploid versions
being identified with '_haploid' in the file name.  The haploid
versions of these files were created by omitting features mapping to
contigs in the assembly whose names begin with 'Contig19-20', as these
contigs contain the second copy of the alleles identified in the
diploid Assembly 19.


/Assembly19/current/       

This directory contains the most current version of the sequences;
files are not updated routinely:


/Assembly19/archived_as_released/
               
This directory contains DNA sequences for diploid assembly 19 as
produced by the Candida Sequencing project at the Stanford Genome
Technology Center. These files are simply copies of the SGTC files,
and are here for archival purposes:


/Assembly19/archive/

This directory contains archived versions of the Assembly 19
sequences.  Before they were permanently archived, the A19 he sequences 
were checked for changes weekly and a new
file was added whenever there were changes.


/Assembly19/SC3514_traces/

This directory contains the original SC5314 sequence trace files and
quality scores generated by the Stanford Genome Technology Center
(SGTC).  The Candida albicans server at the SGTC has been taken
offline as of October, 2006, and these sequence data were provided by
the SGTC to CGD.



------------------------------------------------

/Assembly6/
/Assembly5/
/Assembly4/

These directories contain archived files from Assembly 6, 5, and 4 of the Candida
albicans (strain SC5314) genome from the Stanford Genome Technology
Center ( see Jones et al.  (2004) The diploid genome sequence of
Candida albicans. Proc Natl Acad Sci U S A 101(19):7329-34. URL:
http://www.pnas.org/cgi/content/full/101/19/7329).  The Candida
albicans server at the SGTC has been taken offline as of October,
2006, and these sequence data were provided by the SGTC to CGD for
archival purposes.