Before uploading or using any data, you must create a project. Each piece of data on the platform lives inside a project. You can create a project by clicking on New Project on the Project list page.
There are three options for uploading your data in the Add menu:
Upload Data from Computer - Use your web browser to upload data from your computer. You have to stay logged in and keep the browser open until the upload has completed.
Add Data from Server - Specify an URL of a globally-accessible server from which the file will be uploaded to the platform.
Copy Data from Project - Copy data from another project on the platform.
NOTE: If you are uploading very large files to the platform, the Upload Agent, available as a command-line tool, is the fastest and most reliable way to transfer large data files to the platform.
From the Projects tab in the main toolbar, click Resources and select the folder named Demo Data.
Select the Quickstart folder. This folder contains 2 files with the paired-end sequencing reads from chromosome 20 of exome SRR100022 (
SRR100022_20_2.fq.gz) from the 1000 Genomes project. The full exome data are available in the
SRR100022 folder, but for the sake of this demo, we use the smaller dataset.
Select all of the data in the Quickstart folder by checking the box next to the Name header.
Finally, click Add to add the data to your project.
You can collaborate within the platform by sharing projects with other DNAnexus users at various access levels. To share a project with a collaborator, click the Share icon in the upper right corner of the project page.
Type the username or the email address of an existing DNAnexus user.
Pick an access level.
Click Add User.
Repeat the above steps to add more users at the same time.
You can analyze the Quickstart data we imported earlier by running the provided DNAnexus apps. Select Tools Library from the Tools menu in the top navigation to see a list of what apps are available.
On the Apps page, you will see a list of apps. Installing an app bookmarks the app for your later reference and makes it easy to add that app to workflows. Install the following two apps:
BWA-MEM FASTQ Read Mapper
Vendor Human Exome GATK-Lite Pipeline
Apps and workflows are run from a particular project and put any output objects in the same project. Let's return to your project page to run the apps you've just installed.
Select Projects in the top navigation.
Select your project.
In the Manage view for a project, you can run an app by clicking on one of the two buttons:
Run - This lets you pick a single app to run, or to create a single-use disposable workflow.
New Workflow - This creates a new workflow capable of running multiple stages of apps. This allows you to string together multiple analysis steps that depend on each other.
Let's build a workflow using the two apps we've installed.
Click New Workflow.
Click Add a Step in the view that opens.
Click on BWA-MEM FASTQ Read Mapper and Vendor Human Exome GATK-Lite Pipeline, in that order. Note: the workflow progresses from top to bottom. If you added apps in the wrong order, you can click and drag on steps in the workflow to change their order.
Set the inputs for the BWA-MEM FASTQ Read Mapper step.
Click on the box for Reads (
*.fq.gz). Add the
SRR100022\_20\_1.fq.gz file. Note that the workflow only allows you to add files which match the file extensions specified by the input.
Since the SRR100022 exome was sequenced using paired-end sequencing, we will need to provide the right-mates of the first set of reads. Click on the box for Reads (right mates) (
*.fq.gz). Add the
SRR100022\_20\_2.fq.gz file. If you are using your own data and your data is from a single-end sequencing experiment, this input is optional.
Click on the box for BWA reference genome index. Note that a field opens up and displays Suggestions for input files. Select the Reference Genomes link from the Suggestions (the name may vary depending on your region) and navigate to the folder named
H. Sapiens - GRCh37 - b37 (1000 Genomes Phase I). Select the
Note that the BWA-MEM FASTQ Read Mapper app only takes as input a file with the extension
*.bwa-index.tar.gz which is a TAR archive file containing all the sequence index files as previously output by the BWA indexer. (Indexing is a one-time operation that needs to be performed to a reference genome sequence in order for it to be usable by BWA).
NOTE: The data selector for app inputs can contain a Suggestions section. For inputs which may use public reference data (i.e. reference genomes, indices for DNAnexus-provided read mappers, gene annotations), DNAnexus-provided apps often provide a path to a folder containing this data for your convenience.
You can find more information about the app and/or configure the BWA-MEM FASTQ Read Mapper app further by clicking on the black box with the icon for the app, which opens the parameters (9). In this case, the parameters correspond to options that you could supply if you were to run the BWA-MEM program locally.
Now, let's set the input for the Vendor Human Exome GATK-Lite Pipeline step. Drag the output of the BWA-MEM FASTQ Read Mapper app to the Sorted Mappings (
*.bam) input of the Vendor Human Exome GATK-Lite Pipeline app to indicate that the two apps should be connected in series.
The Vendor Human Exome GATK-Lite Pipeline app has an additional required configuration to specify which vendor exome kit was used to sequence the reads. You will not be able to run the analysis without setting this configuration. Click on the black box with the icon for the app, which opens the parameters. Note that the field Vendor Exome is in bold with a
* on the right side, indicating a required input (11). The SRR100022 exome originally used the Agilent SureSelect Human All Exon V2 kit, which is an option in the dropdown menu (
Select Save to save your configurations and close the dialogue box.
You are now ready to run the workflow by selecting Start Analysis. This will launch both stages of the workflow. In this example, the BWA-MEM FASTQ Read Mapper app will start executing immediately, and the Vendor Human Exome GATK-Lite Pipeline app will start after the first job has finished.
After starting the job, you can monitor its progress by selecting the Monitor tab. This page lists all the jobs that have been launched in your project.
When the workflow completes, the outputs will be deposited into a new folder in your project with the same name as your workflow. (You can also select a different folder for the outputs to go into by clicking on the app in the workflow and setting the Output Folder field.)
If you wish, you can run this workflow on the full SRR100022 exome, available in the
SRR100022 folder in the Demo Data project. It will take a proportionately longer amount of time to map the reads and call variants using the same workflow on this larger data set.