Command Line Quickstart
Learn to use the dx client for command-line access to the full range of DNAnexus Platform features.
The dx
command-line client is included in the DNAnexus SDK (dx-toolkit
). You can use the dx
client to log into the Platform; to upload, browse, and organize data; and to launch analyses.
All the projects and data referenced in this Quickstart are publicly available, so you can follow along step-by-step.
Before You Begin
If you haven't already done so, download and install the DNAnexus Platform toolkit, which includes the dx
command-line client, as well as range of useful utilities.
Getting Help
As you work, use this index of dx commands as a reference.
On the command line, you can also enter dx help
to see a list of commands, broken down by category. To see a list of commands from a particular category, enter dx help <category>
.
To learn what a particular command does, enter dx help <command>
, dx <command> -h
, or dx <command> -help
. For example, enter dx help ls
to learn about the command dx ls
:
Step 1: Log In
The first thing you'll need to do is to log in. If you haven't created a DNAnexus account yet, visit the website and sign up. User signup is not supported on the command line.
Your authentication token and your current project settings have now been saved in a local configuration file, and you're ready to start accessing your project.
Step 2: Explore
Public Projects
Let's look inside some of the public projects that have already been set up. From the command line, enter the command:
By running the dx select
command and picking a project, you've now done the command-line equivalent of going to the project page for Reference Genome Files: AWS US (East) (platform login required to access this link) on the website. This is a DNAnexus-sponsored project containing popular genomes for you to use when running analyses with your own data.
For more information about the dx select
command, please see the Changing Your Current Project page.
Now you can list all of the data in the top-level directory of the project you've just selected by running the command dx ls
. You can also see the contents of a folder by running the command dx ls <folder_name>
.
You can avoid typing out the full name of the folder by typing in dx ls C
and then pressing <TAB>
. The folder name will auto-complete from there.
You don't have to be in a project to inspect its contents. You can also look into another project, and a folder within the project, by giving the project name or ID, followed by a colon (:
) and the folder path. Here, we list the contents of the publicly available project "Demo Data" using both its name and ID.
As shown above, you can use the -l
flag in conjunction with dx ls
to list more details about files, such as the time a file was last modified, its size (if applicable), and its full DNAnexus ID.
Describing DNAnexus Objects
You can use the dx describe
command to learn more about files and other objects on the platform. Given a DNAnexus object ID or name, dx describe
will return detailed information about the object in question. dx describe
will only return results for data objects to which you have access.
Besides describing data and projects (examples for which are shown below), you can also describe apps, jobs, and users.
Describing a File
Below, we describe the reference genome file for C. elegans located in the "Reference Genome Files: AWS US (East)" project that we've been using (which should be accessible from other regions as well). Note that you need to add a colon (:) after the project name, here that would be Reference Genome Files\: AWS US (East):
.
Describing a Project
Below, we describe the publicly available Reference Genome Files project that we've been using.
Step 3: Create Your Own Project
Now, we'll use the command dx new project
to create a new project.
The text project-xxxx denotes a placeholder for a unique, immutable project ID. For more information about object IDs, see the Entity IDs page.
You're now ready to start uploading your data and running your own analyses.
Step 4: Upload and Manage Your Data
If you have a sample you would like to analyze, you can use the dx upload
command or the Upload Agent if you have installed it. For the purposes of this tutorial, you can also download the file small-celegans-sample.fastq, which represents the first 25000 C. elegans reads from SRR070372. We will use this file again later to run through a sample analysis.
For uploading multiple or large files, we strongly recommend that you use the Upload Agent; it will compress your files and upload them in parallel over multiple HTTP connections and boasts other features such as resumable uploads.
The following command uploads the small-celegans-sample.fastq
file into the current directory of the current project. The --wait
flag tells dx upload
to wait until it has finished uploading the data before returning the prompt and describing the result.
Examining Data
To take a quick look at the first few lines of the file you just uploaded, use the dx head
command. By default, it prints the first 10 lines of the given file.
Let's run it on the file we just uploaded and use the -n
flag to ask for the first 12 lines (the first 3 reads) of the FASTQ file.
Downloading Data
If you'd like to download a file from the platform, just use the dx download
command. This command will use the name of the file for the filename unless you specify your own with the -o
/--output
flag. In the example below, we download the same C. elegans file that we uploaded previously.
About Metadata
Files have different available fields for metadata, such as "properties" (key-value pairs) and "tags".
Step 5: Analyze a Sample
For the next few steps, if you would like to follow along, you will need a C. elegans FASTQ file. We will map the reads against the ce10 genome. If you haven't already, you can download and use the following FASTQ file, which contains the first 25,000 reads from SRR070372: small-celegans-sample.fastq.
The following walkthrough is helpful if you would like to understand what all the commands do and take a look at what apps you're running, but if you're just interested in converting a gzipped FASTQ file to a VCF file via BWA and the FreeBayes variant caller, then you can skip ahead to the Automate It section, where you can see all the commands necessary for running apps.
Uploading Reads
If you have not yet done so, you can upload a FASTQ file for analysis.
For more information about using the command dx upload
, please see the dx upload
page.
Mapping Reads
Next, use the BWA-MEM app (platform login required to access this link) to map the uploaded reads file to a reference genome.
Finding the App Name
If you don't know the command-line name of the app you would like to run, you have two options:
You can navigate to its web page from the Apps page (platform login required to access this link) on the platform. The app's page will tell you how to run it from the command line. You can find more information about the app we're running on the BWA-MEM FASTQ Read Mapper page (platform login required to access this link).
Alternatively, you can search for apps from the command line by running the command
dx find apps
. You will find the name of the app that you can use on the command line in the parentheses (underlined below).
Installing and Running the App
Now install the app using dx install
and check that it has been installed. While you do not always need to install an app to run it, you may find it useful as a bookmarking tool.
We can now run the app using dx run
. We will run it without any arguments; it will then prompt us for required and then optional arguments. Note that the reference file genomeindex_targz
for the C. elegans sample we are using is in a .tar.gz
format and can be found in the Reference Genome folder of the region your project is in.
Monitoring Your Job
You can use the command dx watch
to monitor jobs. The command will print out the log file of the job, including the STDOUT, STDERR, and INFO printouts.
You can also use the command dx describe job-xxxx
to learn more about your job. If you don't know the job's ID, you can use the command dx find jobs
to list all the jobs run in the current project, along with the user who ran them, their status, and when they began.
There are also additional options that you can use to restrict your search of previous jobs, such as by their names or when they were run.
Terminating Your Job
If for some reason you need to terminate your job before it completes, use the command dx terminate
.
After Your Job Finishes
You should now see two new files in your project: the mapped reads in a BAM file, and an index of that BAM file with a .bai
extension. You can refer to the output file by name or by the job that produced it using the syntax job-xxxx:<output field>
. Try it yourself with the job ID you got from calling the BWA-MEM app!
Variant Calling
You can use the FreeBayes Variant Caller app (platform login required to access this link) to call variants on your BAM file.
This time, we won't rely on the interactive mode to enter our inputs. Instead, we will provide them directly. But first, let's look up the app's spec so we know what the inputs are called. For this, let's run the command dx run freebayes -h
.
Optional inputs are shown using square brackets ([]
) around the command-line syntax for each input. You'll notice that there are two required inputs that must be specified:
Sorted mappings (
sorted_bams
): A list of files with a.bam
extension.Genome (
genome_fastagz
): A reference genome in FASTA format that has been gzipped.
Running the App with a One-Liner Using a Job-Based Object Reference
It is sometimes more convenient to run apps using a single one-line command. You can do this by specifying all the necessary inputs either via the command line or in a prepared file. We will use the -i
flag to specify inputs as suggested by the output of dx run freebayes ‑h
:
sorted_bams
: The output of the previous BWA step (see the Map Reads section for more information).genome_fastagz
: The ce10 genome in the Reference Genomes project.
To specify new job input using the output of a previous job, we'll use a job-based object reference via the job-xxxx:<output field>
syntax we used earlier.
Replace the job ID below with that generated by the BWA app you ran earlier. The -y
flag skips the input confirmation.
Automatically Running a Command After a Job Finishes
You can use the command dx wait
to wait for a job to finish. If we run the following command right after running the Freebayes app, it will show you the recent jobs only after the job has finished, as shown in the example below.
Congratulations! You have now called variants on a reads sample, and you did it all on the command line. Now let's look at how you can automate this process.
Automation
The beauty of the CLI is the ability to automate processes. In fact, we can automate everything we just did. The following script assumes that you've already logged in and is hardcoded to use the ce10 genome and takes in a local gzipped FASTQ file as its command-line argument.
Learn More
You're now ready to start scripting using dx
. As shown in some of the examples above, the --brief
flag can come in handy for scripting. A list of all dx
commands and flags is on the Index of dx Commands page.
For more detailed information about running apps and applets from the command line, see the Running Apps and Applets page.
For a comprehensive guide to the DNAnexus SDK, see the SDK documentation.
Want to start writing your own apps? Check out the Developer Portal for some useful tutorials.
Last updated
Was this helpful?