Before uploading or using any data, you must create a project. Each piece of data on the platform lives inside a project. You can create a project by clicking on New Project on the Project list page.
There are three options for uploading your data in the Add menu:
Upload Data from Computer - Use your web browser to upload data from your computer. You have to stay logged in and keep the browser open until the upload has completed.
Add Data from Server - Specify an URL of a globally-accessible server from which the file will be uploaded to the platform.
Copy Data from Project - Copy data from another project on the platform.
NOTE: If you are uploading very large files to the platform, the Upload Agent, available as a command-line tool, is the fastest and most reliable way to transfer large data files to the platform.
From the Projects tab in the main toolbar, click All Projects and search for project named Demo Data
Select the Quickstart folder. This folder contains 2 files with the paired-end sequencing reads from chromosome 20 of exome SRR100022 (
SRR100022_20_2.fq.gz) from the 1000 Genomes project. The full exome data are available in the
SRR100022 folder, but for the sake of this demo, we use the smaller dataset.
Select all of the data in the Quickstart folder by checking the box next to the Name header.
Finally, click Add to add the data to your project.
You can collaborate within the platform by sharing projects with other DNAnexus users at various access levels. To share a project with a collaborator, click the Share icon in the upper right corner of the project page.
Type the username or the email address of an existing DNAnexus user.
Pick an access level.
Click Add User.
Repeat the above steps to add more users at the same time.
You can analyze the Quickstart data we imported earlier by running the provided DNAnexus apps. Select Tools Library from the Tools menu in the top navigation to see a list of what apps are available.
On the Apps page, you will see a list of apps. Installing an app bookmarks the app for your later reference and makes it easy to add that app to workflows. Install the following two apps:
Apps and workflows are run from a particular project and put any output objects in the same project. Let's return to your project page to run the apps you've just installed.
Select Projects in the top navigation.
Select your project.
In the Manage view for a project, you can run an app by clicking on one of the two buttons:
Start Analysis - This lets you pick a single app to run, or to create a single-use disposable workflow.
Add/New Workflow - This creates a new workflow capable of running multiple stages of apps. This allows you to string together multiple analysis steps that depend on each other.
Let's build a workflow using the two apps we've installed.
Click Add/New Workflow.
Click Add a Step in the view that opens.
Click on BWA-MEM FASTQ Read Mapper and FreeBayes Variant Caller, in that order. Note: the workflow progresses from top to bottom. If you added apps in the wrong order, you can click and drag on steps in the workflow to change their order.
Set the inputs for the BWA-MEM FASTQ Read Mapper step.
Click on the box for Reads (
*.fq.gz). Add the
SRR100022_20_1.fq.gz file. Note that the workflow only allows you to add files which match the file extensions specified by the input.
Since the SRR100022 exome was sequenced using paired-end sequencing, we will need to provide the right-mates of the first set of reads. Click on the box for Reads (right mates) (
*.fq.gz). Add the
SRR100022_20_2.fq.gz file. If you are using your own data and your data is from a single-end sequencing experiment, this input is optional.
Click on the box for BWA reference genome index. Note that a field opens up and displays Suggestions for input files. Select the Reference Genomes link from the Suggestions (the name may vary depending on your region) and navigate to the folder named
H. Sapiens - GRCh37 - b37 (1000 Genomes Phase I). Select the
Note that the BWA-MEM FASTQ Read Mapper app only takes as input a file with the extension
*.bwa-index.tar.gz which is a TAR archive file containing all the sequence index files as previously output by the BWA indexer. (Indexing is a one-time operation that needs to be performed to a reference genome sequence in order for it to be usable by BWA).
You can find more information about the app and/or configure the BWA-MEM FASTQ Read Mapper app further by clicking on the black box with the icon for the app, which opens the parameters. In this case, the parameters correspond to options that you could supply if you were to run the BWA-MEM program locally.
NOTE: The data selector for app inputs can contain a Suggestions section. For inputs which may use public reference data (i.e. reference genomes, indices for DNAnexus-provided read mappers, gene annotations), DNAnexus-provided apps often provide a path to a folder containing this data for your convenience.
Now, let's set the input for the FreeBayes Variant Caller step. Drag the output of the BWA-MEM FASTQ Read Mapper app to the Sorted Mappings (
*.bam) input of the FreeBayes Variant Caller app to indicate that the two apps should be connected in series.
The FreeBayes Variant Caller app has an additional required
Genome input to specify a file, in gzipped FASTA format, with the reference genome that the reads were mapped against. Click on the
Genome input field, select the
DNAnexus Reference Genomes: AWS US-east project from the "Suggestions" area at the bottom of file selector, then select
H. Sapiens - GRCh37 - B37 (1000 Genomes Phase I)/human_g1k_v37.fa.gz file.
Select Save to save your configurations and close the dialogue box.
You are now ready to run the workflow by selecting Start Analysis, followed by Run Analysis to launch both stages of the workflow. In this example, the BWA-MEM FASTQ Read Mapper app will start executing immediately, and the FreeBayes Variant Caller app will start after the BWA-MEM FASTQ Read Mapper job has finished.
After starting the job, you can monitor its progress by selecting the Monitor tab. This page lists all the jobs that have been launched in your project.
When the workflow completes, the outputs will be deposited into a new folder in your project with the same name as your workflow. (You can also select a different folder for the outputs to go into by clicking on the app in the workflow and setting the Output Folder field.)
If you wish, you can run this workflow on the full SRR100022 exome, available in the
SRR100022 folder in the Demo Data project. It will take a proportionately longer amount of time to map the reads and call variants using the same workflow on this larger data set.