Command Line Quickstart
Learn to use the dx client for command-line access to the full range of DNAnexus Platform features.
The dx command-line client is included in the DNAnexus SDK (dx-toolkit). You can use the dx client to log into the Platform, to upload, browse, and organize data, and to launch analyses.
All the projects and data referenced in this Quickstart are publicly available, so you can follow along step-by-step.
Before You Begin
If you haven't already done so, download and install the DNAnexus Platform toolkit, which includes the dx command-line client, as well as range of useful utilities.
Getting Help
As you work, use the index of dx commands as a reference.
On the command line, you can also enter dx help to see a list of commands, broken down by category. To see a list of commands from a particular category, enter dx help <category>.
To learn what a particular command does, enter dx help <command>, dx <command> -h, or dx <command> -help . For example, enter dx help ls to learn about the command dx ls:
$ dx help ls
usage: dx ls [-h] [--color {off,on,auto}] [--delimiter [DELIMITER]]
[--env-help] [--brief | --summary | --verbose] [-a] [-l] [--obj]
[--folders] [--full]
[path]
List folders and/or objects in a folder
... # output truncated for brevityStep 1: Log In
The first step is to log in. If you have not created a DNAnexus account, open the DNAnexus Platform and sign up. User signup is not supported on the command line.
Your authentication token and your current project settings are saved in a local configuration file, and you can start accessing your project.
Step 2: Explore
Public Projects
Look inside some public projects that have already been set up. From the command line, enter the command:
By running the dx select command and picking a project, you perform the command-line equivalent of going to the project page for Reference Genome Files: AWS US (East) (platform login required to access this link) on the website. This is a DNAnexus-sponsored project containing popular genomes for use in analyses with your own data.
For more information about the dx select command, see the Changing Your Current Project page.
List the data in the top-level directory of the project you've selected by running the command dx ls. View the contents of a folder by running the command dx ls <folder_name>.
You can avoid typing out the full name of the folder by typing in dx ls C and then pressing <TAB>. The folder name auto-completes from there.
You don't have to be in a project to inspect its contents. You can also look into another project, and a folder within the project, by giving the project name or ID, followed by a colon (:) and the folder path. Here, the contents of the publicly available project "Demo Data" are listed using both its name and ID.
As shown above, you can use the -l flag with dx ls to list more details about files, such as the time a file was last modified, its size (if applicable), and its full DNAnexus ID.
Describing DNAnexus Objects
You can use the dx describe command to learn more about files and other objects on the platform. Given a DNAnexus object ID or name, dx describe returns detailed information about the object. dx describe only returns results for data objects to which you have access.
Besides describing data and projects (examples for which are shown below), you can also describe apps, jobs, and users.
Describing a File
Below, the reference genome file for C. Elegans located in the "Reference Genome Files: AWS US (East)" project that has been used is described (which should be accessible from other regions as well). You need to add a colon (:) after the project name, here that would be Reference Genome Files\: AWS US (East): .
Describing a Project
Below, the publicly available Reference Genome Files project that has been used is described.
Step 3: Create Your Own Project
Use the command dx new project to create a new project.
The text project-xxxx denotes a placeholder for a unique, immutable project ID. For more information about object IDs, see the Entity IDs page.
The project is ready for uploading data and running analyses.
Step 4: Upload and Manage Your Data
To analyze a sample, use the dx upload command or the Upload Agent if installed. For this tutorial, download the file small-celegans-sample.fastq, which represents the first 25000 C. elegans reads from SRR070372. This file is used in the sample analysis below.
For uploading multiple or large files, use the Upload Agent. It compresses files and uploads them in parallel over multiple HTTP connections and supports resumable uploads.
The following command uploads the small-celegans-sample.fastq file into the current directory of the current project. The --wait flag tells dx upload to wait until uploading is complete before returning the prompt and describing the result.
Examining Data
To take a quick look at the first few lines of the file you uploaded, use the dx head command. By default, it prints the first 10 lines of the given file.
Run it on the file you uploaded and use the -n flag to ask for the first 12 lines (the first 3 reads) of the FASTQ file.
Downloading Data
If you'd like to download a file from the platform, use the dx download command. This command uses the name of the file for the filename unless you specify your own with the -o or --output flag. The example below downloads the same C. elegans file that was uploaded previously.
About Metadata
Files have different available fields for metadata, such as "properties" (key-value pairs) and "tags".
Step 5: Analyze a Sample
For the next few steps, if you would like to follow along, you need a C. elegans FASTQ file. This tutorial maps the reads against the ce10 genome. If you haven't already, you can download and use the following FASTQ file, which contains the first 25,000 reads from SRR070372: small-celegans-sample.fastq.
The following walkthrough explains what each command does and shows which apps run. If you only want to convert a gzipped FASTQ file to a VCF via BWA and the FreeBayes Variant Caller, skip ahead to the Automate It section to see the commands required to run the apps.
Uploading Reads
If you have not yet done so, you can upload a FASTQ file for analysis.
For more information about using the command dx upload, see the dx upload page.
Mapping Reads
Next, use the BWA-MEM app (platform login required to access this link) to map the uploaded reads file to a reference genome.
Finding the App Name
If you don't know the command-line name of the app to run, you have two options:
Navigate to its web page from the Apps page (platform login required to access this link). The app's page shows how to run it from the command line. See the BWA-MEM FASTQ Read Mapper page for details on the app used here (platform login required).
Alternatively, search for apps from the command line by running
dx find apps. The command-line name appears in parentheses in the output (underlined below).
Installing and Running the App
Install the app using dx install and check that it has been installed. While you do not always need to install an app to run it, you may find it useful as a bookmarking tool.
You can run the app using dx run. When you run it without any arguments, it prompts you for required and then optional arguments. The reference file genomeindex_targz for this C. elegans sample is in a .tar.gz format and can be found in the Reference Genome folder of the region your project is in.
Monitoring Your Job
You can use the command dx watch to monitor jobs. The command prints out the log file of the job, including the STDOUT, STDERR, and INFO printouts.
You can also use the command dx describe job-xxxx to learn more about your job. If you don't know the job's ID, you can use the command dx find jobs to list all the jobs run in the current project, along with the user who ran them, their status, and when they began.
Additional options are available to restrict your search of previous jobs, such as by their names or when they were run.
Terminating Your Job
If for some reason you need to terminate your job before it completes, use the command dx terminate.
After Your Job Finishes
You should see two new files in your project: the mapped reads in a BAM file, and an index of that BAM file with a .bai extension. You can refer to the output file by name or by the job that produced it using the syntax job-xxxx:<output field>. Try it yourself with the job ID you got from calling the BWA-MEM app!
Variant Calling
You can use the FreeBayes Variant Caller app (platform login required to access this link) to call variants on your BAM file.
This time, instead of relying on interactive mode to enter inputs, you provide them directly. First, look up the app's spec to determine the input names. Run the command dx run freebayes -h.
Optional inputs are shown using square brackets ([]) around the command-line syntax for each input. Notice that there are two required inputs that must be specified:
Sorted mappings (
sorted_bams): A list of files with a.bamextension.Genome (
genome_fastagz): A reference genome in FASTA format that has been gzipped.
Running the App with a One-Liner Using a Job-Based Object Reference
It is sometimes more convenient to run apps using a single one-line command. You can do this by specifying all the necessary inputs either via the command line or in a prepared file. Use the -i flag to specify inputs as suggested by the output of dx run freebayes ‑h:
sorted_bams: The output of the previous BWA step (see the Map Reads section for more information).genome_fastagz: The ce10 genome in the Reference Genomes project.
To specify new job input using the output of a previous job, use a job-based object reference via the job-xxxx:<output field> syntax used earlier.
Replace the job ID below with that generated by the BWA app you ran earlier. The -y flag skips the input confirmation.
Automatically Running a Command After a Job Finishes
Use the command dx wait to wait for a job to finish. If you run the following command immediately after launching the FreeBayes app, it shows recent jobs only after the job has finished, as shown in the example below.
Congratulations! You have called variants on a reads sample using the command line. Next, see how to automate this process.
Automation
The CLI enables automation of these steps. The following script assumes that you are logged in. It is hardcoded to use the ce10 genome and takes a local gzipped FASTQ file as its command-line argument.
Learn More
You can start scripting using dx. The --brief flag is useful for scripting. A list of all dx commands and flags is on the Index of dx Commands page.
For more detailed information about running apps and applets from the command line, see the Running Apps and Applets page.
For a comprehensive guide to the DNAnexus SDK, see the SDK documentation.
Want to start writing your own apps? Check out the Developer Portal for some useful tutorials.
Last updated
Was this helpful?