Monitoring Executions
This page covers how to monitor executions (jobs and analyses) on the DNAnexus platform. Monitoring includes the ability to see a job’s progress or a list of current or past executions.
On this page,
dx watch
is the main command used for monitoring job logs of current or past jobs, while dx find executions
is the main command for listing executions. If you type dx watch -h
or dx find executions -h
into the command line, it will give you a list of options that we cover in more detail below.By default,
dx find executions
will return up to ten of the most recent executions in your current project in order of execution creation time.On the DNAnexus Platform, jobs are limited to a runtime of 30 days. Jobs running longer than 30 days will be automatically terminated.
This page allows you to monitor jobs that have been launched in the context of the project. Jobs may appear in a project via one of the following actions:
- Using the Run dialog to run apps, applets, or workflows.
- Using the Add Data dialog to upload anything other than an Uncategorized file.
- Using DNAnexus command-line tools or APIs to launch jobs.
The jobs table contains the following columns:
- Status: The status of the job.
- When initially launched, the status of a job is "Waiting". The job will remain under that status until other jobs that it may depend on are finished, and until DNAnexus has allocated the required resources to run the job in the cloud.
- Once that is done, the status will change to "Running" and the job will start running in the cloud.
- If the job completes will no errors, its status will change to "Completed".
- If an error is encountered, its status will change to "Failed". (This change may also happen while the job's status is "Waiting" -- if a required dependency fails and the failure propagates). For more information on why errors may occur, see our Errors page.
- If the job is terminated prior to completion by a user, its status will change to Terminated.
- Name: The name of the job. The default name for a job is the name of the app (or applet) of the job. This can be changed at launch time; for example, when running an app or applet using the Run dialog, you can set the job name by clicking and renaming the title in the dialog (after selecting an app or applet). This name is also used in notification emails. The contents of this column are clickable; clicking on a job name opens the job details page for that job.
- Executable: The name (title, object name, or object id) of the app or applet that is associated with the job. When launching workflows, each workflow step appears as a separate job that corresponds to the particular app or applet related to that step.
- Launched: How much time ago the job was launched (such as "3 days ago"). NOTE: This time does not reflect the point at which the job started running; after a job is launched, it may be in the Waiting status for a while prior to starting execution.
- Launched by: The name of the user that launched the job.
- Price: The cost of the job being run. Users can see prices from the Job Monitor for the following scenarios:A project is billed to the user, and the user has confirmed billing information.A project is billed to an org that has confirmed billing information and user is allowed to perform billable activities in that org.These users will be able to see the estimated cost of a job as it is running, with the final job price being visible after the job is completed.
Each job name is a link that navigates to the "Job Details" page for the job.
The panel on the left allows you to filter the display for a subset of jobs based on a particular status.
Select a job on the Monitor page to view details about a job.
The panel at the top includes general information about the job:
- An icon denoting the status of the job.
- The job name and the job id.
- The executable (app or applet) name or id, the date and time the job was launched, and the name of the user who launched the job.
Actions you can perform:
- View Details -- Shows the details of job inputs and outputs and the standard error and standard output logs of the job.
- Terminate Job -- Terminates the job. This action is available only if the job is in the Waiting or Running status, and only if it was launched by you or if you are an Admin of the project.
The page shows a visualization of the job and any sub-jobs that got launched from the job along a time axis. The horizontal axis represents the time, and the vertical axis represents the job and its sub-jobs. In particular, the job itself is represented by the first row; in case there are any sub-jobs, these are shown as additional rows underneath, sorted by the time they started running.
Some apps launch several parallel sub-jobs. Often, the number of sub-jobs reflects the size of the input data or the size of a reference genome. Moreover, apps may call upon other apps which will also appear as sub-jobs. Since a sub-job itself can launch more sub-jobs, complex apps may end up generating an intricate set of sub-jobs with elaborate dependencies. The goal of this visualization is to give you a quick idea of these jobs as a whole, and the time they took to run. The rest of the text will use the term job to refer to either the job (first row) or its sub-jobs (additional rows).
Each row includes the job name, an icon describing the job status, and a bar depicting the time span of the job. This bar starts at the horizontal point when the job started running and ends at the horizontal point when the job transitioned into the "Failed", "Terminated" or "Completed" status (or at the current time, if the job is still running). This running phase may be divided into two pieces, using different colors:
- The blue part, always present, represents the time span when the job was actually running.
- The gray part, optionally present, represents the time span when the job was in a finalization phase, either waiting for some other sub-jobs to finish, or waiting for the system to perform some object finalization (such as closing tables and files).
Clicking on a job name shows the particular job inputs and outputs as well as the standard error and standard output logs of the particular job.
If the job has failed, a red banner underneath the top panel will communicate the error reason. If a particular sub-job is responsible for the job failure, the name of that sub-job will be shown in red.
You can use
dx watch
to view the log of a running job or any past jobs, which may have finished successfully, failed, or been terminated.If you'd like to view the job's log stream while it runs, you can use
dx watch
. The log stream includes a log of stdout, stderr, and additional information the worker outputs as it executes the job.$ dx watch job-xxxx
Watching job job-xxxx. Press Ctrl+C to stop.
* Sample Prints (sample_prints:main) (running) job-xxxx
amy 2017-01-01 09:00:00 (running for 0:00:37)
2017-01-01 09:06:00 Sample Prints INFO Logging initialized (priority)
2017-01-01 09:06:37 Sample Prints INFO CPU: 4% (4 cores) * Memory: 547/7479MB * Storage: 74GB free * Net: 0↓/0↑MBps
2017-01-01 09:06:37 Sample Prints STDOUT dxpy/0.227.1 (Linux-3.13.0-125-generic-x86_64-with-Ubuntu-14.04-trusty)
2017-01-01 09:06:37 Sample Prints INFO Installing apt packages dx-toolkit
2017-01-01 09:06:37 Sample Prints INFO Setting SSH public key
2017-01-01 09:06:37 Sample Prints STDOUT dxpy/0.227.1 (Linux-3.13.0-125-generic-x86_64-with-Ubuntu-14.04-trusty)
2017-01-01 09:06:37 Sample Prints STDOUT /usr/sbin/sshd already running.
2017-01-01 09:06:37 Sample Prints STDOUT Invoking main with {}
2017-01-01 09:06:37 Sample Prints STDOUT 0
...
If you'd like to view any jobs that have finished running, you can use the
dx watch
command. The log stream includes a log of stdout, stderr, and additional information the worker outputs as it executed the job.$ dx watch job-xxxx
Watching job job-xxxx. Press Ctrl+C to stop.
* Sample Prints (sample_prints:main) (running) job-xxxx
amy 2017-01-01 09:00:00 (running for 0:00:37)
2017-01-01 09:06:00 Sample Prints INFO Logging initialized (priority)
2017-01-01 09:06:37 Sample Prints INFO CPU: 4% (4 cores) * Memory: 547/7479MB * Storage: 74GB free * Net: 0↓/0↑MBps
2017-01-01 09:06:37 Sample Prints STDOUT dxpy/0.227.1 (Linux-3.13.0-125-generic-x86_64-with-Ubuntu-14.04-trusty)
2017-01-01 09:06:37 Sample Prints INFO Installing apt packages dx-toolkit
2017-01-01 09:06:37 Sample Prints INFO Setting SSH public key
2017-01-01 09:06:37 Sample Prints STDOUT dxpy/0.227.1 (Linux-3.13.0-125-generic-x86_64-with-Ubuntu-14.04-trusty)
2017-01-01 09:06:37 Sample Prints STDOUT /usr/sbin/sshd already running.
2017-01-01 09:06:37 Sample Prints STDOUT Invoking main with {}
2017-01-01 09:06:37 Sample Prints STDOUT 0
2017-01-01 09:06:37 Sample Prints STDOUT 1
2017-01-01 09:06:37 Sample Prints STDOUT 2
2017-01-01 09:06:37 Sample Prints STDOUT 3
* Sample Prints (sample_prints:main) (done) job-xxxx
amy 2017-01-01 09:08:11 (runtime 0:02:11)
Output: -
You can use
dx find executions
to return the ten most recent executions in your current project. You can specify the number of executions you wish to view by running dx find executions -n <specified number>
. The output from dx find executions
will be similar to the information shown in the "Monitor" tab on the DNAnexus web UI.Below is an example of
dx find executions
; in this case, only two executions have been run in the current project. There is an individual job, LoFreq Variant Caller, and a workflow consisting of two stages, Variant Calling Workflow. A stage is represented by either another analysis (if running a workflow) or a job (if running an app(let)).The job running the LoFreq Variant Caller executable is running and has been running for 10 minutes and 28 seconds. The analysis running the Variant Calling Workflow consists of 2 stages, Freebayes Variant Caller, which is waiting on input, and BWA-MEM FASTQ Read Mapper, which has been running for 10 minutes and 18 seconds.
$ dx find executions
* LoFreq Variant Caller (lofreq:main) (running) job-xxxx
amy 2017-01-01 09:00:18 (running for 0:10:28)
* Variant Calling Workflow (in_progress) analysis-xxxx
│ amy 2017-01-01 09:00:18
├── * FreeBayes Variant Caller (freebayes:main) (waiting_on_input) job-yyyy
│ amy 2017-01-01 09:00:18
└── * BWA-MEM FASTQ Read Mapper (bwa_mem_fastq_read_mapper:main) (running) job-zzzz
amy 2017-01-01 09:00:18 (running for 0:10:18)
By default, the
dx find executions
operation will search for jobs or analyses created when a user runs an app or applet. If a job is part of an analysis, the results will be returned in a tree representation linking all of the jobs in an analysis together. However, a user can also filter the returned executions by job type. Using the flag
--origin-jobs
in conjunction with the dx find executions
command returns only original jobs, whereas the flag --all-jobs
will also include subjobs.We can choose to monitor only analyses by running the command
dx find analyses
. Analyses are executions of workflows and consist of one or more app(let)s being run. When using dx find analyses
, the command will return only the top-level analyses, not any of the jobs contained therein.Below is an example of
dx find analyses
:$ dx find analyses
* Variant Calling Workflow (in_progress) analysis-xxxx
amy 2017-01-01 09:00:18
Jobs are runs of an individual app(let) and compose analyses. We can monitor jobs by running the command
dx find jobs
, which will return a flat list of jobs. If a job is in an analysis, all jobs within the analysis are also returned.Below is an example of
dx find jobs
:$ dx find jobs
* LoFreq Variant Caller (lofreq:main) (running) job-xxxx
amy 2017-01-01 09:10:00 (running for 0:00:28)
* FreeBayes Variant Caller (freebayes:main) (waiting_on_input) job-yyyy
amy 2017-01-01 09:00:18
* BWA-MEM FASTQ Read Mapper (bwa_mem_fastq_read_mapper:main) (running) job-zzzz
amy 2017-01-01 09:00:18 (running for 0:10:18)
Searches for executions can be restricted to specific parameters.
- To extract stdout only from this job, we can run the command
dx watch job-xxxx --get-stdout
- To extract stderr only from this job, we can run the command
dx watch job-xxxx --get-stderr\
- To extract only stdout and stderr from this job, we can run the command
dx watch job-xxxx --get-streams
Below is an example of viewing stdout lines of a job log:
$ dx watch job-xxxx --get-streams
Watching job job-xxxx. Press Ctrl+C to stop.
dxpy/0.227.1 (Linux-3.13.0-125-generic-x86_64-with-Ubuntu-14.04-trusty)
dxpy/0.227.1 (Linux-3.13.0-125-generic-x86_64-with-Ubuntu-14.04-trusty)
/usr/sbin/sshd already running.
Invoking main with {}
0
1
2
3
4
5
6
7
8
9
10
To view the entire job tree, including both main jobs and subjobs, use the command
dx watch job-xxxx --tree
.To view the entire job tree -- both main jobs and subjobs -- use the command
dx watch job-xxxx -n 8
. If the job already ran, the output is displayed as well.In the example below, the app Sample Prints doesn’t have any output.
$ dx watch job-F5vPQg807yxPJ3KP16Ff1zyG -n 8
Watching job job-xxxx. Press Ctrl+C to stop.
* Sample Prints (sample_prints:main) (done) job-xxxx
amy 2017-01-01 09:00:00 (runtime 0:02:11)
2017-01-01 09:06:00 Sample Prints INFO Logging initialized (priority)
2017-01-01 09:08:11 Sample Prints INFO CPU: 4% (4 cores) * Memory: 547/7479MB * Storage: 74GB free * Net: 0↓/0↑MBps
2017-01-01 09:08:11 Sample Prints STDOUT dxpy/0.227.1 (Linux-3.13.0-125-generic-x86_64-with-Ubuntu-14.04-trusty)
2017-01-01 09:08:11 Sample Prints INFO Installing apt packages dx-toolkit
2017-01-01 09:08:11 Sample Prints INFO Setting SSH public key
2017-01-01 09:08:11 Sample Prints STDOUT dxpy/0.227.1 (Linux-3.13.0-125-generic-x86_64-with-Ubuntu-14.04-trusty)
2017-01-01 09:08:11 Sample Prints STDOUT /usr/sbin/sshd already running.
* Sample Prints (sample_prints:main) (done) job-F5vPQg807yxPJ3KP16Ff1zyG
amy 2017-01-01 09:00:00 (runtime 0:02:11)
Output: -
By default,
dx find
will restrict your search to only your current project context. To search across all the projects to which you have access, use the --all-projects
flag.$ dx find executions -n 3 --all-projects
* Sample Prints (sample_prints:main) (done) job-xxxx
amy 2017-01-01 09:15:00 (runtime 0:02:11)
* Sample Applet (sample_applet:main) (done) job-yyyy
ben 2017-01-01 09:10:00 (runtime 0:00:28)
* Sample Applet (sample_applet:main) (failed) job-zzzz
amy 2017-01-01 09:00:00 (runtime 0:19:02)
By default,
dx find
will only return up to ten of the most recently launched executions matching your search query. To change the number of executions returned, you can use the -n
option.# Find the 100 most recently launched jobs in your project
$ dx find executions -n 100
# Find most recent executions running app-lofreq in the current project
$ dx find executions --executable app-lofreq
* LoFreq Variant Caller (lofreq:main) (running) job-xxxx
amy 2017-01-01 09:00:18 (running for 0:10:18)
Users can also use the
--created-before
and --created-after
options to search based on when the execution began.# Find executions run on January 2, 2017
$ dx find executions --created-after=2017-01-01 --created-before=2017-01-03
# Find executions created in the last 2 hours
$ dx find executions --created-after=-2h
# Find analyses created in the last 5 days
$ dx find analyses --created-after=-5d
Users can also restrict the search to a specific state, e.g. "done", "failed", "terminated".
# Find failed jobs in the current project
$ dx find jobs --state failed
The
--delim
flag will tab-delimit the output. This allows the output to be passed into other shell commands.$ dx find jobs --delim
* Cloud Workstation (cloud_workstation:main) done job-xxxx amy 2017-01-07 09:00:00 (runtime 1:00:00)
* GATK3 Human Exome Pipeline(gatk3_human_exome_pipeline:main) done job-yyyy amy 2017-01-07 09:00:00 (runtime 0:21:16)
You can use the
--brief
flag to return only the object IDs for the objects returned by your search query. The ‑‑origin‑jobs
flag will omit the subjob information.Below is an example usage of the
--brief
flag:$ dx find jobs executions -n 3 --delim
job-xxxx
job-yyyy
job-zzzz
Below is an example of using the flags
--origin-jobs
and --brief
. In the example below, we describe the last job run in the current default project.$ dx describe $(dx find jobs -n 1 --origin-jobs --brief)
Result 1:
ID job-xxxx
Class job
Job name BWA-MEM FASTQ Read Mapper
Executable name bwa_mem_fastq_read_mapper
Project context project-xxxx
Billed to amy
Workspace container-xxxx
Cache workspace container-yyyy
Resources container-zzzz
App app-xxxx
Instance Type mem1_ssd1_x8
Priority high
State done
Root execution job-zzzz
Origin job job-zzzz
Parent job -
Function main
Input genomeindex_targz = file-xxxx
reads_fastqgz = file-xxxx
[read_group_library = "1"]
[mark_as_secondary = true]
[read_group_platform = "ILLUMINA"]
[read_group_sample = "1"]
[add_read_group = true]
[read_group_id = {"$dnanexus_link": {"input": "reads_fastqgz", "metadata": "name"}}]
[read_group_platform_unit = "None"]
Output -
Output folder /
Launched by amy
Created Sun Jan 1 09:00:17 2017
Started running Sun Jan 1 09:00:10 2017
Stopped running Sun Jan 1 09:00:27 2017 (Runtime: 0:00:16)
Last modified Sun Jan 1 09:00:28 2017
Depends on -
Sys Requirements {"main": {"instanceType": "mem1_ssd1_x8"}}
Tags -
Properties -
# Find failed jobs in the current project from a time period
$ dx find jobs --state failed --created-after=2017-01-01 --created-before=2017-02-01
* BWA-MEM FASTQ Read Mapper (bwa_mem_fastq_read_mapper:main) (failed) job-xxxx
amy 2017-01-22 09:00:00 (runtime 0:02:12)
* BWA-MEM FASTQ Read Mapper (bwa_mem_fastq_read_mapper:main) (done) job-yyyy
amy 2017-01-07 06:00:00 (runtime 0:11:22)
# Find all failed executions of specified executable
$ dx find executions --state failed --executable app-bwa_mem_fastq_read_mapper
* BWA-MEM FASTQ Read Mapper (bwa_mem_fastq_read_mapper:main) (failed) job-xxxx
amy 2017-01-01 09:00:00 (runtime 0:02:12)
# Update the app and navigate to within app directory
$ dx build -a
INFO:dxpy:Archived app app-xxxx to project-xxxx:"/.App_archive/bwa_mem_fastq_read_mapper (Sun Jan 1 09:00:00 2017)"
{"id": "app-yyyy"}
# Rerun job with updated app
$ dx run bwa_mem_fastq_read_mapper --clone job-xxxx
$ dx find jobs --tag
Last modified 16d ago