Comment on page
Describing Data Objects
You can describe objects (files, app(let)s, and workflows) on the DNAnexus platform using the command
dx describe
.Objects can be described using their DNAnexus platform name via the command line interface (CLI) using a path.
Objects can be described relative to the user's current directory on the DNAnexus platform. In the following example, we describe the indexed reference genome file
human_g1k_v37.bwa-index.tar.gz
.$ dx describe "Original files/human_g1k_v37.bwa-index.tar.gz"
Result 1:
ID file-xxxx
Class file
Project project-xxxx
Folder /Original files
Name human_g1k_v37.bwa-index.tar.gz
State closed
Visibility visible
Types -
Properties -
Tags -
Outgoing links -
Created ----
Created by Amy
via the job job-xxxx
Last modified ----
archivalState "live"
Size 3.21 GB
NOTE: The entire path is enclosed in quotes due to the space in the folder name Original files. Instead of quotes, you can escape special characters with the\
character:dx describe Original\ files/human_g1k_v37.bwa-index.tar.gz
.
Objects can be described using an absolute path. This allows us to describe objects outside the current project context. In the following example, we
dx select
the project "My Research Project" and dx describe
the file human_g1k_v37.fa.gz
in the "Reference Genome Files" project.$ dx select "My Research Project"
$ dx describe Reference\ Genome\ Files:H.\ Sapiens\ -\ GRCh37\ -\ b37\ (1000\ Genomes\ Phase\ I)/human_g1k_v37.fa.gz
Result 1:
ID file-xxxx
Class file
Project project-xxxx
Folder /H. Sapiens - GRCh37 - b37 (1000 Genomes Phase I)
Name human_g1k_v37.fa.gz
State closed
Visibility visible
Types -
Properties -
Tags -
Outgoing links -
Created ----
Created by Amy
via the job job-xxxx
Last modified ----
archivalState "live"
Size 810.45 MB
Objects can be described using a unique object ID.
In this example, we describe workflow object "Exome Analysis Workflow" using its ID. This workflow is publicly available in the "Exome Analysis Demo" project.
$ dx describe "Exome Analysis Demo":workflow-G409jQQ0bZ46x5GF4GXqKxZ0
Result 1:
ID workflow-G409jQQ0bZ46x5GF4GXqKxZ0
Class workflow
Project project-BQfgzV80bZ46kf6pBGy00J38
Folder /
Name Exome Analysis Workflow
....
Stage 0 bwa_mem_fastq_read_mapper
Executable app-bwa_mem_fastq_read_mapper/2.0.1
Stage 1 fastqc
Executable app-fastqc/3.0.1
Stage 2 gatk4_bqsr
Executable app-gatk4_bqsr_parallel/2.0.1
Stage 3 gatk4_haplotypecaller
Executable app-gatk4_haplotypecaller_parallel/2.0.1
Stage 4 gatk4_genotypegvcfs
Executable app-gatk4_genotypegvcfs_single_sample_parallel/2.0.0
Due to the amount of information contained in a workflow (including multiple app(let)s, inputs/outputs, and default parameters), the
dx describe
output can seem overwhelming.The output from a
dx describe
command can be used for various purposes. The optional argument --json
will convert the output from dx describe
into JSON format for advanced scripting and command line use.In this example, we will describe the publicly available workflow object "Exome Analysis Workflow" and return the output in JSON format.
$ dx describe "Exome Analysis Demo":workflow-G409jQQ0bZ46x5GF4GXqKxZ0 --json
{
"project": "project-BQfgzV80bZ46kf6pBGy00J38",
"name": "Exome Analysis Workflow",
"inputSpec": [
{
"name": "bwa_mem_fastq_read_mapper.reads_fastqgzs",
"class": "array:file",
"help": "An array of files, in gzipped FASTQ format, with the first read mates to be mapped.",
"patterns": [ "*.fq.gz", "*.fastq.gz" ],
...
},
...
],
"stages": [
{
"id": "bwa_mem_fastq_read_mapper",
"executable": "app-bwa_mem_fastq_read_mapper/2.0.1",
"input": {
"genomeindex_targz": {
"$dnanexus_link": {
"project": "project-BQpp3Y804Y0xbyG4GJPQ01xv",
"id": "file-FFJPKp0034KY8f20F6V9yYkk"
}
}
},
...
},
{
"id": "fastqc",
"executable": "app-fastqc/3.0.1",
...
}
...
]
}
We can parse, process, and query the JSON output using
jq
. Below, we process the dx describe --json
output to generate a list of all stages in the aforementioned exome analysis pipeline.$ dx describe "Exome Analysis Demo":workflow-G409jQQ0bZ46x5GF4GXqKxZ0 --json |jq .stages
[{
"id": "bwa_mem_fastq_read_mapper",
"executable": "app-bwa_mem_fastq_read_mapper/2.0.1",
...
}, {
"id": "fastqc",
"executable": "app-fastqc/3.0.1",
...
}, {
"id": "gatk4_bqsr",
"executable": "app-gatk4_bqsr_parallel/2.0.1",
...
}
...
}]
We can output the "executable" value of each stage present in the "stages" value of the
dx describe
output above using the command below.$ dx describe "Exome Analysis Demo":workflow-G409jQQ0bZ46x5GF4GXqKxZ0 --json | jq '.stages | map(.executable) | .[]'
"app-bwa_mem_fastq_read_mapper/2.0.1"
"app-fastqc/3.0.1"
"app-gatk4_bqsr_parallel/2.0.1"
"app-gatk4_haplotypecaller_parallel/2.0.1"
"app-gatk4_genotypegvcfs_single_sample_parallel/2.0.0"
Field name | Objects | Description |
All | Unique ID assigned to a DNAnexus object. | |
Class | All | DNAnexus object type. |
Project | All | Container where the object is stored. |
Folder | All | Objects inside a container (project) can be organized into folders. Objects can only exist in one path within a project. |
Name | All | Object name on the platform. |
All | Status of the object on the platform. | |
Visibility | All | Whether or not the file is visible to the user through the platform web interface. |
Tags | All | Set of tags associated with an object. Tags are strings used to organize or annotate objects. |
Properties | All | Key/value pairs attached to object. |
All | JSON reference to another object on the platform. Linked objects will be copied along with the object if the object is cloned to another project. | |
Created | All | Date and time object was created. |
Created by | All | DNAnexus user who created the object. Contains subfield “via the job” if the object was created as a result of an app or applet. |
Last modified | All | Date and time the object was last modified. |
Input Spec | App(let)s and Workflows | App(let) or workflow input names and classes. With workflows, the corresponding applet stage ID is also provided. |
Output Spec | App(let) and Workflows | App(let) or workflow output names and classes. With workflows, the corresponding applet stage ID is also provided. |
Last modified 9mo ago