Describing Data Objects
You can describe objects (files, app(let)s, and workflows) on the DNAnexus Platform using the command dx describe.
Describing an Object by Name
Objects can be described using their DNAnexus Platform name via the command line interface (CLI) using a path.
Describe an Object With a Relative Path
Objects can be described relative to the user's current directory on the DNAnexus Platform. In the following example, the indexed reference genome file human_g1k_v37.bwa-index.tar.gz is described.
$ dx describe "Original files/human_g1k_v37.bwa-index.tar.gz"
Result 1:
ID file-xxxx
Class file
Project project-xxxx
Folder /Original files
Name human_g1k_v37.bwa-index.tar.gz
State closed
Visibility visible
Types -
Properties -
Tags -
Outgoing links -
Created ----
Created by Amy
via the job job-xxxx
Last modified ----
archivalState "live"
Size 3.21 GBThe entire path is enclosed in quotes because the folder name Original files contains whitespace. Instead of quotes, escape special characters with \: dx describe Original\ files/human_g1k_v37.bwa-index.tar.gz.
Describe an Object in a Different Project Using an Absolute Path
Objects can be described using an absolute path. This allows you to describe objects outside the current project context. In the following example, dx select selects the project "My Research Project" and dx describe describes the file human_g1k_v37.fa.gz in the "Reference Genome Files" project.
$ dx select "My Research Project"
$ dx describe Reference\ Genome\ Files:H.\ Sapiens\ -\ GRCh37\ -\ b37\ (1000\ Genomes\ Phase\ I)/human_g1k_v37.fa.gz
Result 1:
ID file-xxxx
Class file
Project project-xxxx
Folder /H. Sapiens - GRCh37 - b37 (1000 Genomes Phase I)
Name human_g1k_v37.fa.gz
State closed
Visibility visible
Types -
Properties -
Tags -
Outgoing links -
Created ----
Created by Amy
via the job job-xxxx
Last modified ----
archivalState "live"
Size 810.45 MBDescribe an Object Using Object ID
Objects can be described using a unique object ID.
This example describes the workflow object "Exome Analysis Workflow" using its ID. This workflow is publicly available in the "Exome Analysis Demo" project.
$ dx describe "Exome Analysis Demo":workflow-G409jQQ0bZ46x5GF4GXqKxZ0
Result 1:
ID workflow-G409jQQ0bZ46x5GF4GXqKxZ0
Class workflow
Project project-BQfgzV80bZ46kf6pBGy00J38
Folder /
Name Exome Analysis Workflow
....
Stage 0 bwa_mem_fastq_read_mapper
Executable app-bwa_mem_fastq_read_mapper/2.0.1
Stage 1 fastqc
Executable app-fastqc/3.0.1
Stage 2 gatk4_bqsr
Executable app-gatk4_bqsr_parallel/2.0.1
Stage 3 gatk4_haplotypecaller
Executable app-gatk4_haplotypecaller_parallel/2.0.1
Stage 4 gatk4_genotypegvcfs
Executable app-gatk4_genotypegvcfs_single_sample_parallel/2.0.0Because workflows can include many app(let)s, inputs/outputs, and default parameters, the dx describe output can seem overwhelming.
Manipulating Outputs
The output from a dx describe command can be used for multiple purposes. The optional argument --json converts the output from dx describe into JSON format for advanced scripting and command line use.
In this example, the publicly available workflow object "Exome Analysis Workflow" is described and the output is returned in JSON format.
$ dx describe "Exome Analysis Demo":workflow-G409jQQ0bZ46x5GF4GXqKxZ0 --json
{
"project": "project-BQfgzV80bZ46kf6pBGy00J38",
"name": "Exome Analysis Workflow",
"inputSpec": [
{
"name": "bwa_mem_fastq_read_mapper.reads_fastqgzs",
"class": "array:file",
"help": "An array of files, in gzipped FASTQ format, with the first read mates to be mapped.",
"patterns": [ "*.fq.gz", "*.fastq.gz" ],
...
},
...
],
"stages": [
{
"id": "bwa_mem_fastq_read_mapper",
"executable": "app-bwa_mem_fastq_read_mapper/2.0.1",
"input": {
"genomeindex_targz": {
"$dnanexus_link": {
"project": "project-BQpp3Y804Y0xbyG4GJPQ01xv",
"id": "file-FFJPKp0034KY8f20F6V9yYkk"
}
}
},
...
},
{
"id": "fastqc",
"executable": "app-fastqc/3.0.1",
...
}
...
]
}Parse, process, and query the JSON output using jq. Below, the dx describe --json output is processed to generate a list of all stages in the exome analysis pipeline.
$ dx describe "Exome Analysis Demo":workflow-G409jQQ0bZ46x5GF4GXqKxZ0 --json |jq .stages
[{
"id": "bwa_mem_fastq_read_mapper",
"executable": "app-bwa_mem_fastq_read_mapper/2.0.1",
...
}, {
"id": "fastqc",
"executable": "app-fastqc/3.0.1",
...
}, {
"id": "gatk4_bqsr",
"executable": "app-gatk4_bqsr_parallel/2.0.1",
...
}
...
}]To get the "executable" value of each stage present in the "stages" array value of the dx describe output above, use the following command:
$ dx describe "Exome Analysis Demo":workflow-G409jQQ0bZ46x5GF4GXqKxZ0 --json | jq '.stages | map(.executable) | .[]'
"app-bwa_mem_fastq_read_mapper/2.0.1"
"app-fastqc/3.0.1"
"app-gatk4_bqsr_parallel/2.0.1"
"app-gatk4_haplotypecaller_parallel/2.0.1"
"app-gatk4_genotypegvcfs_single_sample_parallel/2.0.0"General Response Fields Overview
Class
All
DNAnexus object type.
Project
All
Container where the object is stored.
Folder
All
Objects inside a container (project) can be organized into folders. Objects can only exist in one path within a project.
Name
All
Object name on the platform.
Visibility
All
Whether the file is visible to the user through the platform web interface.
Tags
All
Set of tags associated with an object. Tags are strings used to organize or annotate objects.
Properties
All
Key/value pairs attached to object.
All
JSON reference to another object on the platform. Linked objects are copied along with the object if the object is cloned to another project.
Created
All
Date and time object was created.
Created by
All
DNAnexus user who created the object. Contains subfield "via the job" if the object was created by an app or applet.
Last modified
All
Date and time the object was last modified.
Input Spec
App(let)s and Workflows
App(let) or workflow input names and classes. With workflows, the corresponding applet stage ID is also provided.
Output Spec
App(let) and Workflows
App(let) or workflow output names and classes. With workflows, the corresponding applet stage ID is also provided.
Last updated
Was this helpful?