Index of /download/homology/orthologs/C_albicans_SC5314_S_cerevisiae

Icon  Name                                                          Last modified      Size  Description
[DIR] Parent Directory - [TXT] C_albicans_SC5314_S_cerevisiae_orthologs.txt 19-Oct-2011 14:59 90K [TXT] inparanoid_output.10-19-2011.txt 19-Oct-2011 14:59 1.6M [   ] orf_trans_all_Candida_Assembly_21_haploid.10-19-2011.fasta.gz 19-Oct-2011 14:59 2.3M [   ] orf_trans_all_Saccharomyces.10-19-2011.fasta.gz 19-Oct-2011 14:59 2.4M [   ] pompep.10-19-2011.fasta.gz 19-Oct-2011 14:59 1.5M [   ] rejected_sequences.pompep.10-19-2011.fasta.gz 19-Oct-2011 14:59 3.0K
This directory contains input and output files from CGD orthology mapping.

Orthology assignments are determined using InParanoid software (version 4.1 as of June, 2011).
More information about InParanoid is available on their project website:  
http://inparanoid.sbc.su.se/

The Candida "orf_trans_all" files are the protein input files, obtained from CGD.

The Scizosaccharomyces pombe proteome is used as the outgroup in the analysis.  
These data are contained in the "pompep" file, which is obtained from the Sanger Institute. 

For analyses with Saccharomyces cerevisiae proteins, the latest S. cerevisiae 
"orf_trans_all" file is obtained from the Saccharomyces Genome Database, 
http://www.yeastgenome.org/

The raw output from the analysis is contained in the "inparanoid_output" file.

The "orthologs.txt" file is a processed output file containing the mappings 
in a tab-delimited text format listing gene names and IDs. 

The "rejected_sequences.pompep" file contains information about ortholog mappings 
that were rejected by InParanoid based on closer relatedness to the outgroup.

The dates (indicated by MM-DD-YYYY) in the file names represent the date when 
the input files were downloaded and latest set of ortholog predictions generated.

The analysis only includes ORFs that are present in the latest reference assembly;
orthologs are not computed for "deleted ORFs" that were present in a previous 
assembly or a previous version of the reference annotation (although, for historical 
traceability, these ORFs have Locus Summary pages on the CGD web site).  
 
The ortholog mappings are updated quaterly to ensure
that the predictions are based on the most up-to-date information.

The ortholog mappings are automatically generated, with no
curator intervention.  Thus, there will occasionally be pairings that
may not occur with a different scoring matrix.  In the interests of
automating the process, we do not intend to hand-curate the ortholog
pairs at this time.

Stringent cutoffs were set for the InParanoid analysis: BLOSUM80 (instead of the 
default BLOSUM62), and an InParanoid score of 100%.
For proteins that did not have an ortholog that meets
these criteria, we used BLASTp, using the same parameters as were used
by InParanoid (-F \"m S\" -M BLOSUM80) with an expectation value (E)
of 1e-5 to identify their best hit.  These mappings are available from the 
http://candidagenome.org/download/homology/best_hits/
directory.