gene_association.cgd.gz Contains all GO annotations for CGD (protein and RNA) The gene_association.cgd.gz file uses the standard file format for gene_association files of the Gene Ontology (GO) Consortium. A more complete description of the file format is found here: http://www.geneontology.org/doc/GO.annotation.html#file Columns are: Contents: 1) DB - database contributing the file (always "CGD" for this file) 2) DB_Object_ID - CGDID 3) DB_Object_Symbol - see below 4) Qualifier (optional) - 'NOT', 'contributes_to', or 'colocalizes_with' qualifier for a GO annotation, when needed 5) GO ID - unique numeric identifier for the GO term 6) DB:Reference(|DB:Reference) - the reference associated with the GO annotation 7) Evidence - the evidence code for the GO annotation 8) With (or) From (optional) - any With or From qualifier for the GO annotation 9) Aspect - which ontology the GO term belongs in (see note below) 10) DB_Object_Name(|Name) (optional) - a name for the gene product in words, e.g. 'acid phosphatase' 11) DB_Object_Synonym(|Synonym) (optional) - see below 12) DB_Object_Type - type of object annotated, e.g. gene, protein, etc. 13) taxon(|taxon) - taxonomic identifier of species encoding gene product 14) Date - date GO annotation was made 15) Assigned_by - source of the annotation Notes on particular columns: Column 3 - When a Standard Gene Name (e.g. C. albicans CDC28 or COX2) has been conferred, it will be present in Column 3. When no Gene Name has been conferred, the ORF Name (e.g., C. albicans orf19.6632 or C. glabrata CAGL0K12694g) will be present in column 3. Column 9 - Aspect C = Cellular Component F = Molecular Function P = Biological Process Column 11 - The ORF Name (e.g., orf19.6632, CAGL0K12694g) will be the first name present in Column 11. Any other names (except the Standard Name, which will be in Column 3 if one exists), including Aliases used for the gene, will also be present in this column. Column 15 - GO annotations in CGD are either assigned by CGD curators or assigned computationally (see http://www.candidagenome.org/cgi-bin/reference/reference.pl?dbid=CAL0121033 and http://www.candidagenome.org/cgi-bin/reference/reference.pl?dbid=CAL0142013). Previously, CGD also included IEA annotations based on predictions made by the Annotation Working Group (see Braun BR, et al. (2005) A human-curated annotation of the Candida albicans genome. PLoS Genet 1(1):36-57. PMID: 16103911). Many of these predictions were replaced with literature-based annotations during reference curation at CGD, and the remainder were archived on 10 Nov 2010. Note: This file contains ALL of the GO curation at CGD, whereas the gene_association file that is available on the GO consortium (GOC) web site, http://www.geneontology.org/, has been filtered according to GOC guidelines, which are discussed in more detail at http://www.geneontology.org/GO.annotation.shtml. Before the Annotation Working Group annotations were archived in November 2010 (see above), these annotations were being filtered from the GOC version of the file because they had an IEA evidence code and were over a year old. Note: There was an error in the "Qualifier" column of the gene_association file between April 23, 2008 and October 14, 2008. The 'contributes_to' and 'colocalizes_with' qualifiers were noted incorrectly as the 'NOT' qualifier in this file between those dates. This error has been corrected as of October 15, 2008. Note: The files are gzip compressed tab-delimited text files. There are several freely available software options for decompressing gzipped files using Windows. The software and other useful information is available on these web sites: - WinZip (http://www.winzip.com/) - Stuffit (http://www.stuffit.com/) - Gzip (http://www.gzip.org/ and the gzip user's manual: http://www.math.utah.edu/docs/info/gzip_toc.html