CGD Help: Batch Download



This resource allows simultaneous retrieval of DNA and protein sequences, basic information, GO annotations, phenotype annotations, or orthologs and Best Hits for genes and other chromosomal features, starting with either a list of their names or a list of chromosomal/contig regions in which the features of interest are located.

CGD's Batch Download tool and Gene/Sequence Resources tool both allow you to retrieve sequences in batch for a list of regions. The difference between the batch options of these two tools is that Batch Download retrieves only the sequences of the features (protein-coding and RNA genes, centromeres, etc.) that are annotated within the specified region(s), while Gene/Sequence Resources retrieves the entire nucleotide sequence between the coordinates specified in a list.

The data that can be retrieved using this tool are also found in files available at our Download site.

Using the Batch Download Tool

The Batch Download tool allows you to enter a list of gene identifiers or chromosomal/contig regions and select the types of data that you want to retrieve.

Step 1: Your Input
Two options are available in Step 1.
  1. Option 1: You may enter Feature names (e.g., orf19.2203), Standard Gene names (e.g., ACT1), CGDIDs (e.g., CAL0001571), or a mix of all three types of identifier. These may be typed or pasted into the input box, or you may upload a text file of these names or identifiers. In both cases, the input should contain one name or identifier per line, separated by a return. Only genetic names that are the CGD Standard Names should be used, not Alias names. Note: only genes from one species at a time can be dowloaded through Batch Download.

  2. Option 2: Enter sequence coordinates (chromosomal coordinates or contig coordinates, as appropriate) to retrieve all features between these coordinates. In this option you can either pick one chromosome/contig at a time or upload a file of chromosomal regions in the following format.
     chromosome_number or contig_number[tab]start_coordinate[tab]stop_coordinate
    If no coordinates are entered, all the features in the selected chromosome or contig will be retrieved.

    Note that this option will not retrieve data for features that are partially within and partially outside of the input coordinates.

    The name of the strain must also be chosen (from the pull-down menu) as part of the input to the Batch Download tool.

Step 2: Choosing a Data type
For a given set of Feature names/Standard names/CGDIDs, you can simultaneously retrieve data for multiple data types. Data are always output in
FASTA format except when using the Chromosomal Coordinates option. The following types of data may be retrieved:

Interpreting the results

If your input list of Feature/Gene names includes names that are Alias names, names that have been deleted or merged, or non-existing gene names, an error message will be displayed. From this intermediate page you can either go back, edit your list to exclude those names, and resubmit, or click 'Proceed' which will retrieve the data for your input list by excluding those invalid names.

The different types of data that you requested are retrieved and stored in separate files that are listed in a table on the results page. Please note that data for all the types of information requested may not be available or may not be applicable for all the input Gene/Feature name(s). For example, the phenotype data type may not be available for all the features, while the data type 'ORF translation' is not applicable for DNA features like RNA genes. A Gene/Feature is not included in the file if there are no data for that feature. The number of features included in each file is shown in the table on the results page.
The file name, which has the process ID appended to it, is descriptive of the type of data it contains, and the size of the file is displayed in the last column of the table. Clicking on the file name should download the file to the desktop of your local machine (depending on the browser). The file that contains your data will be available for 6 hours from the time it was requested.

Go to the Batch Download Tool

Return to CGD Send a Message to the CGD Curators