CGD Help: Protein Physicochemical Properties
The CGD Protein Physicochemical Properties Page displays a number of
properties and statistics calculated directly from the predicted ORF translation,
assuming no post-translational processing or modification. This is an unrealistic
assumption for most proteins, and users requiring accurate values for these properties
should consult the literature for experimentally derived values. Nevertheless,
the values may serve as a helpful snapshot, giving the user a general idea of
the physicochemical nature of the subject protein.
Property values are calculated locally, making use of several methods from
ExPASy's ProtParam tool.
Codon usage statistics are also computed locally, using the
The top of the page provides a site-wide quick search, links to the major CGD informational resources,
and a bar with links to popular tools, such as a local BLAST search. Beneath this is a series of tabs linking to other
locus-specific information pages, including Locus Summary,
Locus History, Literature,
Gene Ontology, and
The number and compositional percentage (100 x NumberAA/protein length) of each
standard amino acid (AA).
The total number of C, H, N, O and S atoms, as well as the chemical formula for the
Estimated using the method of
Pace et al.,
which calculates the sum of (NumberAA x Extinction CoefficientAA)
for three amino acids that absorb at 280 nm:
tyrosine, tryptophan, and the dimeric amino acid cystine (two cysteine [Cys]
residues covalently joined through a disulfide bond).
Two extinction coefficients are displayed:
- For the fully reduced protein (no cystines present).
- For the fully oxidized protein (all Cys residues exist as half-cystines).
The absorbance of the protein at 280 nm (A280, or OD280) is calculated by
dividing the extinction coefficient by the molecular weight of the protein.
- Average Hydropathy. Values greater than zero indicate a relatively hydrophobic protein,
negative values indicate a relatively hydrophillic protein. Calculated as the sum of
Kyte and Doolittle
hydropathy values for all amino acids in the protein, divided by the protein length.
- Aromaticity Score. The compositional fraction in the protein of aromatic amino acids
(phenylalanine, tyrosine, and tryptophan), calculated as in
Lobry and Gautier.
- Aliphatic Index. The relative volume of the protein occupied by aliphatic side chains. Positively
correlated with thermostability of globular proteins. Calculated using the method of
as the sum of (Molar %AA x VolumeAA) for alanine, leucine, isoleucine and valine
(where VolumeAA is the relative value compared to alanine).
- Instability Index. Values greater than 40 indicate that the protein may be
unstable in vitro. Calculated using the method of
Guruprasad et al., which
assigns a weighted instability value to each dipeptide in the protein. These values were derived
from an analysis that found a significant difference in the occurrance of certain dipeptides
between stable and unstable proteins.
The codon usage indices below tend to correlate with gene expression levels. Very low
index values may indicate an incorrect gene model. See the
note for more information.
- Codon Bias Index (CBI). The relative abundance in the gene of codons that occur
most frequently in the organism. The baseline codon usage is computed using the set of verified coding genes in organisms with well-characterized gene sets (such as Candida albicans); in organisms without well-characterized gene sets, the baseline codon usage is calculated with a set of predicted protein-coding genes containing all of the verified ORFs plus the ORFs with orthologs in Candida albicans.
- Codon Adaptation Index (CAI). The relative abundance in the gene of codons
that occur most frequently in a set of highly expressed genes.
- Frequency of Optimal Codons (FOP). The ratio of optimal codons (determined from
a set of highly expressed genes) to synonymous codons.