This directory contains the input sequences that were used to determine orthology assignments between C. albicans Assembly 21 and S. cerevisiae, using InParanoid version 3.0 (http://inparanoid.cgb.ki.se/) and the output file that was generated from InParanoid. In addition, a file containing the processed output, listing orthology assigments is also provided. To run InParanoid, the haploid complement of C. albicans proteins from CGD was compared to the latest set of S. cerevisiae proteins from SGD (as of April 7th 2008), and the set of C. elegans proteins from the Sanger Institute, in wormpep 188, was used as an outgroup. Stringent cutoffs were set: BLOSUM80 (instead of the default BLOSUM62), and an InParanoid score of 100%. In total, 3453 ortholog mappings met these criteria. BLAST version 2.2.18 was used by InParanoid. Note, that the ortholog pairings were automatically generated, with no curator intervention. Thus, there will occasionally be pairings that may not occur with a different scoring matrix. In the interests of automating the process, we do not intend to hand-curate the ortholog pairs at this time. Please also note that, in the Assembly 21-based mapping, orthologs are not computed or displayed for the C. albcans ORFs that were present in a prior Assembly 19 and subsequently deleted from Assembly 21. The orthologs of these ORFs are present in the Assembly 19-based mapping file. For C. albicans proteins that did not have an ortholog that meets these criteria, we used BLASTp, using the same parameters as were used by InParanoid (-F \"m S\" -M BLOSUM80) with an expectation value (E) of 1e-5 to identify their best hit in the S. cerevisiae protein complement. Best hits were identified for 1373 of the C. albicans proteins. These data are available in the same format as the ortholog data. The following files are available: orf_trans_all_Candida_Assembly_21_haploid.fasta.gz - the C. albicans haploid protein complement orf_trans_all_Saccharomyces.fasta.gz - the S. cerevisiae protein complement wormpep188.fasta.gz - the C. elegans protein set used as an outgroup inparanoid_output.txt - the raw output from InParanoid rejected_sequences.wormpep.fasta.gz - the sequences rejected due to the worm outgroup CA_SC_orthologs.txt - the processed output, with the orf19 id, the SGDID, and the gene/ORF name from SGD best_hits.txt - best hits (expectation value of 1e-5) for proteins without orthologs