# Parallel xargs by Chr

[View full source code on GitHub](https://github.com/dnanexus/dnanexus-example-applets/tree/master/Tutorials/bash/samtools_count_para_chr_xargs_sh)

## How is the SAMtools dependency provided?

The SAMtools compiled binary is placed directory in the `<applet dir>/resources` directory. Any files found in the `resources/` directory are uploaded so that they are present in the root directory of the worker. In this case:

```
├── Applet dir
│   ├── src
│   ├── dxapp.json
│   ├── resources
│       ├── usr
│           ├── bin
│               ├── <samtools binary>
```

When this applet is run on a worker, the `resources/` folder is placed in the worker's root directory `/`:

```
/
├── usr
│   ├── bin
│       ├── < samtools binary >
├── home
│   ├── dnanexus
```

`/usr/bin` is part of the `$PATH` variable, so the script can reference the `samtools` command directly, for example, `samtools view -c ...`.

## Parallel Run

### Splice BAM

First, download the BAM file and slice it by canonical chromosome, writing the `*bam` file names to another file.

To split a BAM by regions, you need a `*.bai` index. You can either create an app(let) which takes the `*.bai` as an input or generate a `*.bai` in the applet. In this tutorial, you generate the `*.bai` in the applet, sorting the BAM if necessary.

```shell
# Download BAM from DNAnexus
dx download "${mappings_bam}"

# Attempt to index the BAM file
indexsuccess=true
bam_filename="${mappings_bam_name}"
samtools index "${mappings_bam_name}" || indexsuccess=false

# If indexing fails, sort then index
if [[ $indexsuccess == false ]]; then
  samtools sort -o "${mappings_bam_name}" "${mappings_bam_name}"
  samtools index "${mappings_bam_name}"
  bam_filename="${mappings_bam_name}"
fi

# Extract chromosome names from header
chromosomes=$(
  samtools view -H "${bam_filename}" | \
  grep "@SQ" | \
  awk -F '\t' '{print $2}' | \
  awk -F ':' '{
    if ($2 ~ /^chr[0-9XYM]+$|^[0-9XYM]/) {
      print $2
    }
  }'
)

# Split BAM by chromosome and record filenames
for chr in $chromosomes; do
  samtools view -b "${bam_filename}" "${chr}" -o "bam_${chr}.bam"
  echo "bam_${chr}.bam"
done > bamfiles.txt
```

### Xargs SAMtools view

In the previous section, you recorded the name of each sliced BAM file into a record file. Next, perform a `samtools view -c` on each slice using the record file as input.

```shell
counts_txt_name="${mappings_bam_prefix}_count.txt"

# Sum all read counts across split BAM files
sum_reads=$(
  < bamfiles.txt xargs -I {} \
  samtools view -c $view_options '{}' | \
  awk '{s += $1} END {print s}'
)

# Write the total read count to a file
echo "Total Count: ${sum_reads}" > "${counts_txt_name}"
```

## Upload results

The results file is uploaded using the standard bash process:

1. Upload a file to the job execution's container.
2. Provide the DNAnexus link as a job's output using the script `dx-jobutil-add-output <output name>`

   ```shell
   counts_txt_id=$(dx upload "${counts_txt_name}" --brief)
   dx-jobutil-add-output counts_txt "${counts_txt_id}" --class=file
   ```
