This directory contains downloadable GFF files for the genome of C. albicans SC5314. These features described in the files include chromosomes, Contig19s, ORFs, tRNAs, centromeres, sequence gaps, etc. Please see http://song.sourceforge.net/gff3.shtml for a detailed description of the Generic Feature Format (GFF). The file C_albicans_SC5314_A21_features.gff contains the current CGD annotation based on Assembly 21 of the C. albicans SC5314 genome sequence. It is updated weekly. The file candida_21_with_chromosome_sequences.gff.gz contains the current CGD annotation and the current genomic sequence of all chromosomes based on Assembly 21 of the C. albicans SC5314 genome sequence. The annotations in this file and C_albicans_SC5314_A21_features.gff above are the same. The chromosome sequences are specified in the "##FASTA" section at the end of this file according to GFF3 file format specifications. This file is updated weekly. The file C_albicans_SC5314_A21_intergenic.gff lists the intergenic regions between coding regions in Assembly 21. The file also contains lengths of these intergenic sequences and percent of GC and AT contents. It is updated weekly. The file candida_20.gff contains the CGD annotation based on Assembly 20 of the C. albicans genome sequence. It is NOT updated after Oct 6 2008. The file suspect_WO1_regions_reduced.gff contains all of the regions that were flagged by the BRI as potentially derived from strain WO-1, rather than the reference strain, SC5314. CGD compared the 1kb flanking parts of each suspect region to Contig19 sequences and iteratively reduced the flanking region from the gap end by 100 bp (10%) until either there was 100% identity with the contig, or 100% could not be found. Thus, the section of the flanking region which aligns perfectly with the contig has been removed from the suspect list. These are the regions that appear with the label "Suspect WO1" in the Genome Browser on the CGD web site. Please see http://www.candidagenome.org/help/Assembly20_Advisory.shtml for additional details. The file candida_19.gff contains CGD annotation from Assembly 19 of the C. albicans genome sequence. This file is a snapshot of current annotation as of September 2006, immediately before the data from Assembly 20 was loaded into CGD. This file is archival; it will not be updated. The file Assem19mapping.gff contains mappings of historic assemblies to Assembly 19 super contigs. BLAST analysis was performed to map Contigs and ORF sequences from each of the older assemblies to the Assembly 19 supercontigs. Please see http://candidagenome.org/download/mapping_historic_assemblies/ for further details on the analysis procedure and separate mapping files for individual assemblies. The file Assem20mapping.gff contains mappings of historic assemblies to Assembly 20 chromosomes. BLAST analysis was performed to map Contigs and ORF sequences from each of the older assemblies to the Assembly 20 chromsomes. Please see http://candidagenome.org/download/mapping_historic_assemblies/ for further details on the analysis procedure and separate mapping files for individual assemblies. The file Assem21mapping.gff contains mappings of historic assemblies to Assembly 21 chromosomes. BLAST analysis was performed to map Contigs and ORF sequences from each of the older assemblies to the Assembly 21 chromsomes. Please see http://candidagenome.org/download/mapping_historic_assemblies/ for further details on the analysis procedure and separate mapping files for individual assemblies. The file A19_ForcheSNPs.gff contains all the SNP locations from Forche A, Magee PT, Magee BB, May G Genome-wide single-nucleotide polymorphism map for Candida albicans. Eukaryotic Cell. 2004 Jun;3(3):705-14. SNP locations were mapped to Assembly 19 contigs using the original marker sequences. The file A20_ForcheSNPs.gff contains all the SNP locations from Forche A, Magee PT, Magee BB, May G Genome-wide single-nucleotide polymorphism map for Candida albicans. Eukaryotic Cell. 2004 Jun;3(3):705-14. SNP locations were mapped to Assembly 20 chromosomes using the original marker sequences. The file A21_ForcheSNPs.gff contains all the SNP locations from Forche A, Magee PT, Magee BB, May G Genome-wide single-nucleotide polymorphism map for Candida albicans. Eukaryotic Cell. 2004 Jun;3(3):705-14. SNP locations were mapped to Assembly 21 chromosomes using the original marker sequences. The file A21_gaps_Nruns.gff lists stretches of continuous unknown bases, denoted as N, that are 3 or more bases long anywhere on A21 chromosomes. The file 5prime_utr_intron_A20.gff or 5prime_utr_intron_A21.gff contains the 5' UTR intron data published in the paper Mitrovich QM, Tuch BB, Guthrie C, Johnson AD. Computational and experimental approaches double the number of known introns in the pathogenic yeast Candida albicans. Genome Res. 2007 Apr;17(4):492-502. These introns are mapped to Assembly 20 or Assembly 21, respectively. The file Unannotated_transcripts_Sellam_et_al.gff contains novel, unannotated transcripts detected in tiling microarray experiments from Sellam A, Hogues H, Askew C, Tebbji F, van het Hoog M, Lavoie H, Kumamoto CA, Whiteway M, Nantel A Experimental annotation of the human pathogen Candida albicans coding and noncoding transcribed regions using high-resolution tiling arrays. Genome Biol 2010; 11(7):R71. The file Unannotated_transcripts_Tuch_et_al_2010.gff contains novel, unannotated transcriptionally active regions detected by strand-specific sequencing of RNA from white and opaque cells, described in Tuch BB, Mitrovich QM, Homann OR, Hernday AD, Monighetti CK, De La Vega FM, Johnson AD (2010) The transcriptomes of two heritable cell types illuminate the circuit governing their differentiation. PLoS Genet 6(8) The file Unannotated_transcripts_Bruno_et_al_2010.gff contains novel transcriptionally active regions detected in high-throuhgput sequencing of cDNA (RNA-seq) under several environmental conditions, described in Bruno VM, Wang Z, Marjani SL, Euskirchen GM, Martin J, Sherlock G, Snyder M (2010) Comprehensive annotation of the transcriptome of the human fungal pathogen Candida albicans using RNA-seq. Genome Res 20(10):1451-8 The file Jones_PMID_15123810_Polymorphisms.gff contains all polymorphisms discussed in Jones et al. (2004) The Diploid Genome of Candida albicans. PNAS 101:7329-7334. Polymorphism locations were mapped to Assembly 21 using 50 bp flanking sequence on both sides of each polymorphism to locate exact matches using BLAST. Locations for "Deletion" type polymorphism indicates the region that is deleted, including the start and stop coordinates. Locations for "Insertion" type polymorphism indicate that an insertion has been made in the homolog sequence immedeatly AFTER the location specified. Locations for "Substitution" type polymorphisms indicate the site of a single nucleotide substitution.