Describing Data Objects

You can describe objects (files, app(let)s, and workflows) on the DNAnexus platform using the command dx describe.

Describing an Object by Name

Objects can be described using their DNAnexus platform name via the command line interface (CLI) using a path.

Describe an Object With a Relative Path

Objects can be described relative to the user's current directory on the DNAnexus platform. In the following example, we describe the indexed reference genome file human_g1k_v37.bwa-index.tar.gz.

$ dx describe "Original files/human_g1k_v37.bwa-index.tar.gz"
Result 1:
ID                file-xxxx
Class             file
Project           project-xxxx
Folder            /Original files
Name              human_g1k_v37.bwa-index.tar.gz
State             closed
Visibility        visible
Types             -
Properties        -
Tags              -
Outgoing links    -
Created           ----
Created by        Amy
 via the job      job-xxxx
Last modified     ----
archivalState     "live"
Size              3.21 GB

NOTE: The entire path is enclosed in quotes due to the space in the folder name Original files. Instead of quotes, you can escape special characters with the \ character: dx describe Original\ files/human_g1k_v37.bwa-index.tar.gz.

Describe an Object in a Different Project Using an Absolute Path

Objects can be described using an absolute path. This allows us to describe objects outside the current project context. In the following example, we dx select the project "My Research Project" and dx describe the file human_g1k_v37.fa.gz in the "Reference Genome Files" project.

$ dx select "My Research Project"
$ dx describe Reference\ Genome\ Files:H.\ Sapiens\ -\ GRCh37\ -\ b37\ (1000\ Genomes\ Phase\ I)/human_g1k_v37.fa.gz
Result 1:
ID                file-xxxx
Class             file
Project           project-xxxx
Folder           /H. Sapiens - GRCh37 - b37 (1000 Genomes Phase I)
Name              human_g1k_v37.fa.gz
State             closed
Visibility        visible
Types             -
Properties        -
Tags              -
Outgoing links    -
Created           ----
Created by        Amy
 via the job      job-xxxx
Last modified     ----
archivalState     "live"
Size              810.45 MB

Describe an Object Using Object ID

Objects can be described using a unique object ID.

In this example, we describe workflow object "Exome Analysis Workflow" using its ID. This workflow is publicly available in the "Exome Analysis Demo" project.

$ dx describe "Exome Analysis Demo":workflow-G409jQQ0bZ46x5GF4GXqKxZ0
Result 1:
ID                  workflow-G409jQQ0bZ46x5GF4GXqKxZ0
Class               workflow
Project             project-BQfgzV80bZ46kf6pBGy00J38
Folder              /
Name                Exome Analysis Workflow
....
Stage 0             bwa_mem_fastq_read_mapper
  Executable        app-bwa_mem_fastq_read_mapper/2.0.1
Stage 1             fastqc
  Executable        app-fastqc/3.0.1
Stage 2             gatk4_bqsr
  Executable        app-gatk4_bqsr_parallel/2.0.1
Stage 3             gatk4_haplotypecaller
  Executable        app-gatk4_haplotypecaller_parallel/2.0.1
Stage 4             gatk4_genotypegvcfs
  Executable        app-gatk4_genotypegvcfs_single_sample_parallel/2.0.0

Due to the amount of information contained in a workflow (including multiple app(let)s, inputs/outputs, and default parameters), the dx describe output can seem overwhelming.

Manipulating Outputs

The output from a dx describe command can be used for various purposes. The optional argument --json will convert the output from dx describe into JSON format for advanced scripting and command line use.

In this example, we will describe the publicly available workflow object "Exome Analysis Workflow" and return the output in JSON format.

$ dx describe "Exome Analysis Demo":workflow-G409jQQ0bZ46x5GF4GXqKxZ0 --json
  {
    "project": "project-BQfgzV80bZ46kf6pBGy00J38",
    "name": "Exome Analysis Workflow",
    "inputSpec": [
      {
        "name": "bwa_mem_fastq_read_mapper.reads_fastqgzs",
        "class": "array:file",
        "help": "An array of files, in gzipped FASTQ format, with the first read mates to be mapped.",
        "patterns": [ "*.fq.gz", "*.fastq.gz" ],
        ...
      },
      ...
    ],
    "stages": [
      {
        "id": "bwa_mem_fastq_read_mapper",
        "executable": "app-bwa_mem_fastq_read_mapper/2.0.1",
        "input": {
          "genomeindex_targz": {
            "$dnanexus_link": {
              "project": "project-BQpp3Y804Y0xbyG4GJPQ01xv",
              "id": "file-FFJPKp0034KY8f20F6V9yYkk"
            }
          }
        },
        ...
      },
      {
        "id": "fastqc",
        "executable": "app-fastqc/3.0.1",
        ...
      }
      ...
    ]
  }

We can parse, process, and query the JSON output using jq. Below, we process the dx describe --json output to generate a list of all stages in the aforementioned exome analysis pipeline.

$ dx describe "Exome Analysis Demo":workflow-G409jQQ0bZ46x5GF4GXqKxZ0 --json |jq .stages
[{
    "id": "bwa_mem_fastq_read_mapper",
    "executable": "app-bwa_mem_fastq_read_mapper/2.0.1",
  ...
  }, {
    "id": "fastqc",
    "executable": "app-fastqc/3.0.1",
  ...
  }, {
    "id": "gatk4_bqsr",
    "executable": "app-gatk4_bqsr_parallel/2.0.1",
  ...
  }
  ...
}]

We can output the "executable" value of each stage present in the "stages" value of the dx describe output above using the command below.

$ dx describe "Exome Analysis Demo":workflow-G409jQQ0bZ46x5GF4GXqKxZ0 --json | jq '.stages | map(.executable) | .[]'
  "app-bwa_mem_fastq_read_mapper/2.0.1"
  "app-fastqc/3.0.1"
  "app-gatk4_bqsr_parallel/2.0.1"
  "app-gatk4_haplotypecaller_parallel/2.0.1"
  "app-gatk4_genotypegvcfs_single_sample_parallel/2.0.0"

General Response Fields Overview

Last updated

Copyright 2024 DNAnexus