Index of /download/sequence/C_albicans_SC5314/Assembly20/current

Icon  Name                                             Last modified      Size  Description
[PARENTDIR] Parent Directory - [   ] Ca20_chromosomes.fasta.gz 2023-06-29 09:53 4.0M [DIR] EMBL_format/ 2023-06-29 09:53 - [   ] orf_coding_assembly_20.fasta.gz 2023-06-29 09:53 3.0M [   ] orf_genomic_1000_assembly_20.fasta.gz 2023-06-29 09:53 6.0M [   ] orf_genomic_assembly_20.fasta.gz 2023-06-29 09:53 3.0M [   ] orf_trans_all_assembly_20.fasta.gz 2023-06-29 09:53 2.1M [   ] other_features_genomic_1000_assembly_20.fasta.gz 2023-06-29 09:53 112K [   ] other_features_genomic_assembly_20.fasta.gz 2023-06-29 09:53 20K [   ] other_features_no_introns_assembly_20.fasta.gz 2023-06-29 09:53 10K
This directory contains the last updated version of the sequences
as of Oct 6 2008.
                             
** PLEASE NOTE: Assembly 20 Sequence Advisory
** 
** posted October 19, 2006, updated October 25, 2006
** 
** The collaborative group who generated Assembly 20 has discovered that 
** the sequence traces that they had been using to fill some of the gaps 
** and determine overlaps between Assembly 19 contigs were derived from 
** strain WO-1, rather than from the reference strain, SC5314. 
** 
** Please see http://www.candidagenome.org/help/Assembly20_Advisory.shtml
** for the latest information and status updates.


sequence of Assembly 20 chromosomes:

	Ca20_chromosomes.fasta.gz

Contains DNA sequences for haploid Assembly 20, and the mitochondrial
chromosome sequence generated in the original SGTC sequencing project.
Haplotype information was not preserved during generation of Assembly
20.  The sequences called 'chromosomes' by the Assembly 20
collaborators may more precisely be described as 'reftigs' because
they are mosaics of haplotypes, rather than representative of a single
haploid genome in the sequenced strain.  Please note Assembly 20
issues described in detail at:
http://www.candidagenome.org/help/SequenceHelp.shtml


sequence with introns for all ORFs:
	orf_genomic_assembly_20.fasta.gz


sequence with no introns for all ORFs:
	orf_coding_assembly_20.fasta.gz                   


sequences with introns and untranslated region 1000 bp upstream and
downstream for all ORFs:
	orf_genomic_1000_assembly_20.fasta.gz


translation of all ORF regions:
	orf_trans_all_assembly_20.fasta.gz                


sequences from the systematic C. albicans sequence for the following
feature types: CEN, rRNA, tRNA, snRNA, snoRNA, ncRNA genes
(other types will be added in future):

	other_features_genomic_assembly_20.fasta.gz                  
	other_features_no_introns_assembly_20.fasta.gz 


genomic sequence for the above features plus 1000 bp upstream  and downstream sequence:

	other_features_genomic_1000_assembly_20.fasta.gz 
	

/Assembly20/current/EMBL_format/ 

This directory contains gene and sequence data from the
C. albicans Assembly 20 genome in EMBL file format.  The files in this
directory are NOT updated and reflect the Assembly 20
information at CGD as of Oct 6 2008.

------------------------------------------------

The files in this directory are in FASTA format. All files are gzip
compressed. There are several freely available software options for
decompressing gzipped files using Windows.  The software and other
useful information is available on these web sites:
 
- WinZip (http://www.winzip.com/)
- Stuffit (http://www.stuffit.com/)
- Gzip (http://www.gzip.org/
    
and the gzip user's manual:
http://www.math.utah.edu/docs/info/gzip_toc.html

Additional sequence documentation is found on the CGD web site at:
http://www.candidagenome.org/help/SequenceHelp.shtml