CGD

CGD Help: Batch Data Download


Contents



Description

This resource allows simultaneous retrieval of DNA and protein sequences for a list of Standard gene names or Feature names or CGDIDs (identifiers assigned at CGD). Data that can be retrieved using this tool are included in one or more files on our Download site.

Using the Batch Download Tool

The Batch Download tool allows you to enter a list of Standard gene names or Feature names or CGDIDs and select the types of data that you want to retrieve.

Step 1: Your Input
Two options are available in Step 1.
  1. Option 1: You can either enter Feature names (e.g., orf19.2203)/Standard Gene names (e.g., ACT1)/CGDIDs (e.g., CAL0001571) in the input box or upload a file of Feature/Standard names/CGDIDs.
    OR
  2. Option 2: Enter chromosomal coordinates (from Assembly 21 or 20) or contig coordinates (from Assembly 19) to retrieve all features between these coordinates. In this option you can either pick one chromosome/contig at a time or upload a file of chromosomal regions in the following format.
     chromosome_number[tab]start_coordinate[tab]stop_coordinate
    
    If no coordinates are entered, all the features in the selected chromosome will be retrieved. This latter option will not retrieve data for features that are partially within and partially outside of the input coordinates. If you are uploading a file, note that there should be one Feature/Standard gene name/CGDID per line and the file should be saved as a text file.

Step 2: Choosing a Data type
For a given set of Feature names/Standard names/CGDIDs, you can simultaneously retrieve data for multiple data types. Data are always output in
FASTA format except when using the Chromosomal Coordinates option. The following types of data may be retrieved:

Interpreting the results

If your input list of Feature/Gene names includes names that are Alias names, names that have been deleted or merged or non-existing gene names, an error message will be displayed. From this intermediate page you can either go back, edit your list to exclude those names, and resubmit, or click 'Proceed' which will retrieve the data for your input list by excluding those invalid names.

The different types of data that you requested are retrieved and stored in separate files that are listed in a table on the results page. Please note that data for all the types of information requested may not be available or may not be applicable for all the input Gene/Feature name(s). For example, the phenotype data type may not be available for all the features, while the data type 'ORF translation' is not applicable for DNA features like ARS elements or RNA genes. A Gene/Feature is not included in the file if there is no data for that feature. The number of features included in each file is shown in the table on the results page.
The file name, which has the process ID appended to it, is descriptive of the type of data it contains and the size of the file is displayed in the last column of the table. Clicking on the file name should download the file to the desktop of your local machine (depending on the browser). The file that contains your data will be available for 6 hours from the time it was requested.

Go to the Batch Download Tool


Return to CGD Send a Message to the CGD Curators