# Parallel xargs by Chr

[View full source code on GitHub](https://github.com/dnanexus/dnanexus-example-applets/tree/master/Tutorials/bash/samtools_count_para_chr_xargs_sh)

## How is the SAMtools dependency provided?

The SAMtools compiled binary is placed directly in the `<applet dir>/resources` directory. Any files found in the `resources/` directory are uploaded so that they are present in the root directory of the worker. In this case:

```
├── Applet dir
│   ├── src
│   ├── dxapp.json
│   ├── resources
│       ├── usr
│           ├── bin
│               ├── <samtools binary>
```

When this applet is run on a worker, the `resources/` folder is placed in the worker's root directory `/`:

```
/
├── usr
│   ├── bin
│       ├── < samtools binary >
├── home
│   ├── dnanexus
```

`/usr/bin` is part of the `$PATH` variable, so the script can reference the `samtools` command directly, for example, `samtools view -c ...`.

## Parallel Run

### Splice BAM

First, download the BAM file and slice it by canonical chromosome, writing the `*bam` file names to another file.

To split a BAM by regions, you need a `*.bai` index. You can either create an app(let) which takes the `*.bai` as an input or generate a `*.bai` in the applet. In this tutorial, you generate the `*.bai` in the applet, sorting the BAM if necessary.

```shell
# Download BAM from DNAnexus
dx download "${mappings_bam}"

# Attempt to index the BAM file
indexsuccess=true
bam_filename="${mappings_bam_name}"
samtools index "${mappings_bam_name}" || indexsuccess=false

# If indexing fails, sort then index
if [[ $indexsuccess == false ]]; then
  samtools sort -o "${mappings_bam_name}" "${mappings_bam_name}"
  samtools index "${mappings_bam_name}"
  bam_filename="${mappings_bam_name}"
fi

# Extract chromosome names from header
chromosomes=$(
  samtools view -H "${bam_filename}" | \
  grep "@SQ" | \
  awk -F '\t' '{print $2}' | \
  awk -F ':' '{
    if ($2 ~ /^chr[0-9XYM]+$|^[0-9XYM]/) {
      print $2
    }
  }'
)

# Split BAM by chromosome and record filenames
for chr in $chromosomes; do
  samtools view -b "${bam_filename}" "${chr}" -o "bam_${chr}.bam"
  echo "bam_${chr}.bam"
done > bamfiles.txt
```

### Xargs SAMtools view

In [Splice BAM](#splice-bam), you recorded the name of each sliced BAM file into a record file. Next, perform a `samtools view -c` on each slice using the record file as input.

In the example below, `$view_options` represents any optional additional flags you want to pass to `samtools view`.

```shell
counts_txt_name="${mappings_bam_prefix}_count.txt"

# Sum all read counts across split BAM files
sum_reads=$(
  < bamfiles.txt xargs -I {} \
  samtools view -c $view_options '{}' | \
  awk '{s += $1} END {print s}'
)

# Write the total read count to a file
echo "Total Count: ${sum_reads}" > "${counts_txt_name}"
```

## Upload results

The results file is uploaded using the standard bash process:

1. Upload a file to the job execution's container.
2. Provide the DNAnexus link as a job's output using the script `dx-jobutil-add-output <output name>`

   ```shell
   counts_txt_id=$(dx upload "${counts_txt_name}" --brief)
   dx-jobutil-add-output counts_txt "${counts_txt_id}" --class=file
   ```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.dnanexus.com/getting-started/developer-tutorials/bash/parallel-xargs-by-chr.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
