Parallel xargs by Chr

This applet slices a BAM file by canonical chromosome then performs a parallelized samtools view -c using xargs. Type man xargs for general usage information.

View full source code on GitHub

How is the SAMtools dependency provided?

The SAMtools compiled binary is placed directory in the <applet dir>/resources directory. Any files found in the resources/ directory are uploaded so that they are present in the root directory of the worker. In this case:

├── Applet dir
│   ├── src
│   ├── dxapp.json
│   ├── resources
│       ├── usr
│           ├── bin
│               ├── <samtools binary>

When this applet is run on a worker, the resources/ folder is placed in the worker's root directory /:

/
├── usr
│   ├── bin
│       ├── < samtools binary >
├── home
│   ├── dnanexus

/usr/bin is part of the $PATH variable, so the script can reference the samtools command directly, for example, samtools view -c ....

Parallel Run

Splice BAM

First, download the BAM file and slice it by canonical chromosome, writing the *bam file names to another file.

To split a BAM by regions, you need a *.bai index. You can either create an app(let) which takes the *.bai as an input or generate a *.bai in the applet. In this tutorial, you generate the *.bai in the applet, sorting the BAM if necessary.

Xargs SAMtools view

In the previous section, you recorded the name of each sliced BAM file into a record file. Next, perform a samtools view -c on each slice using the record file as input.

Upload results

The results file is uploaded using the standard bash process:

  1. Upload a file to the job execution's container.

  2. Provide the DNAnexus link as a job's output using the script dx-jobutil-add-output <output name>

Last updated

Was this helpful?