CGD Help: BLAST Searches

Contents

Description
Using BLAST
Accessing the BLAST Search Page

Description

BLAST stands for Basic Local Alignment Search Tool and was developed by Altschul et al. (1990). It is a very fast search algorithm that is used to separately search protein or DNA sequence databases. BLAST is best used for sequence similarity searching, rather than for motif searching.

A fairly complete on-line guide to BLAST searching can be found at the NCBI BLAST Help Manual. CGD has a separate help document for the BLAST results page. Documentation about WU-BLAST2 is posted here.

BLAST searches offered by CGD allow users to compare any query sequence to Candida sequence datasets. To search other fungal sequences, use SGD's Fungal BLAST tool. To search other datasets, NCBI BLAST can be used.

Using BLAST

First, enter the sequence that you would like to compare in Step 1, "Enter your query sequence."

The sequence can be entered directly in the box provided. (Alternately, the sequence may be uploaded from a text file; the ability to utilize this option is provided in the "Optional: upload a local sequence TEXT file" section near the bottom of the page.)

In Step 2, "Select one or more target genomes," select the organism dataset(s) against which your sequence will be compared.

More than one target dataset may be available for a given organism. For example, for C. albicans SC5314, we provide the option to search against Assembly 21 (haploid, chromosome-level genome assembly) or Assembly 19 (diploid, contig-level assembly).

In Step 3, "Select Target Sequence Dataset," choose the type of dataset against which your sequence will be compared. Options include: genomes (chromosomal or contig sequences), genes (ORFs plus any intronic sequence), coding sequences (ORFs, with intronic sequence removed), proteins (translations of ORF sequences), genomic sequence of non-coding features (including intronic sequence), and sequence of non-coding features with intronic sequences removed. The selection available for an individual organism will depend on the annotation available for the organism, and some types of dataset are not available for some of the organisms included.

In step 4, "Choose Appropriate BLAST Program," select the type of search to run. CGD offers these BLAST programs to accommodate different types of searches:

BLASTN compares a nucleotide query sequence against a nucleotide sequence dataset;
TBLASTX compares the six-frame translations of a DNA sequence to the six-frame translations of a nucleotide sequence dataset;
BLASTX compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence dataset;
BLASTP compares an amino acid query sequence against a protein sequence dataset;
TBLASTN compares a protein query sequence against a nucleotide sequence dataset dynamically translated in all six reading frames (both strands).

Program options are limited by Query Sequence type and Target Sequence Dataset choice. We try to guess your Query Sequence type from its text content. If that guess is wrong, you can override it using the radio-button selection option for "DNA" or "protein" located below the program selection pull-down menu.

NOTE
For BLASTX and TBLASTX searches:
You may choose an alternate genetic code to use for query translation. Queries for which this may be appropriate include DNA sequences from most Candida albicans-related species (use 12: Alternative Yeast Nuclear Code) or mitochondrial DNA sequences. C. glabrata uses the standard code, Translation Table 1, for translation of its nuclear genome. See the Non-standard Genetic Codes Help Page for more details. The default code for CGD is: 12: Alternative Yeast Nuclear Code.

In Step 5, you may submit the query, or clear the form to re-enter data.

Note: Two additional, optional sections of the BLAST submission form allow (1) query submission using a text file, and (2) customization of BLAST parameters, respectively.

1) Sequences can be submitted for a BLAST search in two different ways. The sequence can be uploaded from a local text file with FASTA, GCG, or RAW formatting, or the sequence can be typed or pasted into the Query Sequence window. (Note: The contents of an uploaded sequence file will not be displayed in the Query Sequence window of the search page.)

To use the Upload Local File option:

Macintosh
1. Click on Browse button
2. Click on folders to open them, and on the file to upload it
PC
1. Click on the Browse button
2. Change the file type from "HTML" to "all files"
3. Click on folders to open them, and on the file to upload it
UNIX
1. Click on the Browse button
2. Change *.html to * at the end of the string in the Filter box
3. Click on a folder and then the Filter button to open the folder
4. Click on a file and then the OK button to upload it

2) Other options are available, including the ability to add a note to the BLAST output, or to receive the results by email.

Changing other search parameters can also change the outcome of the BLAST search:

You may choose to allow (default) or disallow gapped alignments using the Yes/No option on the interface.

BLAST searches are subject to filtering. A filter will remove repetitive sequences from a query, so that the results of the BLAST search will be less numerous and, ideally, more informative. For nucleic acid query sequences, the "dust" filter is used as the default. For all other searches, the "seg" filter is the default. You can remove filtering using the On/Off option on the interface.

The Expect threshold ("E") reflects the number of matches expected to be found by chance. If the statistical significance of a match is greater than the Expect threshold, the match will not be reported. The E threshold default is set to 10. Decreasing the E threshold will increase the stringency of the search: fewer matches will be reported. On the other hand, increasing the E threshold will decrease the stringency of the search and result in more matches being reported.

The default scoring matrix used is BLOSUM62; however, other matrices may be selected from the pull-down menu provided on the interface.

The number of alignments displayed on the results page is customizable.

The user can also change the word length (W): BLAST first searches for a perfect match of at least the word length. Once a match is found then it tries to extend the high-scoring segment pair (HSP). The default W value for BLASTN is 11; for all other programs the default is 3. If the word length is less than 11 the query sequence must be less than 5000 bp.

If a query sequence is short (less than about 30 residues), the user may want to adjust the Cutoff Score ("S") to a lower value, which will result in a less stringent criterion for reporting matches.

A note on translation tables:
In C. albicans, nuclear encoded proteins are translated using Translation table 12 (Alternative Yeast Nuclear), whereas mitochondrial encoded proteins are translated using Translation table 4 (Mold Mitochondrial; Protozoan Mitochondrial; Coelenterate Mitochondrial; Mycoplasma; Spiroplasma). For BLAST searches where a nucleotide dataset must be translated (TBLASTN and TBLASTX), the CGD BLAST tool uses Table 4 for translation of the mitochondrial dataset, and Table 12 for translation of the datatsets containing nuclear genes that are available as BLAST target datasets in CGD. When a nucleotide query sequence is entered by the user, and this sequence is to be used in a BLAST search that requires its translation (BLASTX and TBLASTX), a choice must be made as to which translation table should be used. To handle these query sequences accurately, the "Query translation table" parameter should be set by the user to specify the translation table used to translate the query sequence. By default, the user-supplied query sequence is translated using the same table that is appropriate for the dataset against which it is being searched (i.e., Table 4 is used if the query sequence is being BLASTed against the mitochondrial dataset, and Table 12 in all other cases). However, if, for example, an S. cerevisiae nucleotide sequence were being used in a BLASTX or TBLASTX search, then translation Table 1 (the Standard table) should be selected. Please see NCBI's Taxonomy browser and Translation Table web page for more information about alternate translation tables.

Accessing the BLAST Search Page

BLAST can be accessed by selecting the hypertext link on the menu bar at the top of most CGD WWW pages, or by using the Sequence Analysis Tools menu on the right-hand sidebar of any CGD Locus Page, where the user is given the option of a BLAST search page with the sequence already filled in. Go to BLAST


Return to CGD	Send a Message to the CGD Curators