Index of /download/homology/orthologs/C_glabrata_CBS138_C_albicans_SC5314
Name Last modified Size Description
Parent Directory -
C_glabrata_CBS138_C_albicans_SC5314_orthologs.txt 19-Oct-2011 15:51 101K
inparanoid_output.10-19-2011.txt 19-Oct-2011 15:51 1.5M
orf_trans_all_Candida_albicans.06-07-2011.fasta.gz 07-Jun-2011 11:54 2.3M
orf_trans_all_Candida_albicans.10-19-2011.fasta.gz 19-Oct-2011 15:51 2.3M
orf_trans_all_Candida_glabrata.10-19-2011.fasta.gz 19-Oct-2011 15:51 1.9M
pompep.10-19-2011.fasta.gz 19-Oct-2011 15:51 1.5M
rejected_sequences.pompep.10-19-2011.fasta.gz 19-Oct-2011 15:51 3.6K
This directory contains input and output files from CGD orthology mapping.
Orthology assignments are determined using InParanoid software (version 4.1 as of June, 2011).
More information about InParanoid is available on their project website:
http://inparanoid.sbc.su.se/
The Candida "orf_trans_all" files are the protein input files, obtained from CGD.
The Scizosaccharomyces pombe proteome is used as the outgroup in the analysis.
These data are contained in the "pompep" file, which is obtained from the Sanger Institute.
For analyses with Saccharomyces cerevisiae proteins, the latest S. cerevisiae
"orf_trans_all" file is obtained from the Saccharomyces Genome Database,
http://www.yeastgenome.org/
The raw output from the analysis is contained in the "inparanoid_output" file.
The "orthologs.txt" file is a processed output file containing the mappings
in a tab-delimited text format listing gene names and IDs.
The "rejected_sequences.pompep" file contains information about ortholog mappings
that were rejected by InParanoid based on closer relatedness to the outgroup.
The dates (indicated by MM-DD-YYYY) in the file names represent the date when
the input files were downloaded and latest set of ortholog predictions generated.
The analysis only includes ORFs that are present in the latest reference assembly;
orthologs are not computed for "deleted ORFs" that were present in a previous
assembly or a previous version of the reference annotation (although, for historical
traceability, these ORFs have Locus Summary pages on the CGD web site).
The ortholog mappings are updated quaterly to ensure
that the predictions are based on the most up-to-date information.
The ortholog mappings are automatically generated, with no
curator intervention. Thus, there will occasionally be pairings that
may not occur with a different scoring matrix. In the interests of
automating the process, we do not intend to hand-curate the ortholog
pairs at this time.
Stringent cutoffs were set for the InParanoid analysis: BLOSUM80 (instead of the
default BLOSUM62), and an InParanoid score of 100%.
For proteins that did not have an ortholog that meets
these criteria, we used BLASTp, using the same parameters as were used
by InParanoid (-F \"m S\" -M BLOSUM80) with an expectation value (E)
of 1e-5 to identify their best hit. These mappings are available from the
http://candidagenome.org/download/homology/best_hits/
directory.