# Parallel xargs by Chr

[View full source code on GitHub](https://github.com/dnanexus/dnanexus-example-applets/tree/master/Tutorials/bash/samtools_count_para_chr_xargs_sh)

## How is the SAMtools dependency provided?

The SAMtools compiled binary is placed directly in the `<applet dir>/resources` directory. Any files found in the `resources/` directory are uploaded so that they are present in the root directory of the worker. In this case:

```
├── Applet dir
│   ├── src
│   ├── dxapp.json
│   ├── resources
│       ├── usr
│           ├── bin
│               ├── < samtools binary >
```

When this applet is run on a worker, the `resources/` folder is placed in the worker's root directory `/`:

```
/
├── usr
│   ├── bin
│       ├── < samtools binary >
├── home
│   ├── dnanexus
```

`/usr/bin` is part of the `$PATH` variable, so in the script, you can reference the `samtools` command directly, as in `samtools view -c ...`.

## Parallel Run

### Splice BAM

First, download the BAM file and slice it by canonical chromosome, writing the `*bam` file names to another file.

To split a BAM by regions, you need to have a `*.bai` index. You can either create an app(let) which takes the `*.bai` as an input or generate a `*.bai` in the applet. In this tutorial, the `*.bai` is generated in the applet, sorting the BAM if necessary.

```shell
dx download "${mappings_bam}"

indexsuccess=true
bam_filename="${mappings_bam_name}"
samtools index "${mappings_bam_name}" || indexsuccess=false
if [[ $indexsuccess == false ]]; then
  samtools sort -o "${mappings_bam_name}" "${mappings_bam_name}"
  samtools index "${mappings_bam_name}"
  bam_filename="${mappings_bam_name}"
fi

chromosomes=$( \
  samtools view -H "${bam_filename}" \
  | grep "\@SQ" \
  | awk -F '\t' '{print $2}' \
  | awk -F ':' '{if ($2 ~ /^chr[0-9XYM]+$|^[0-9XYM]/) {print $2}}')

for chr in $chromosomes; do
  samtools view -b "${bam_filename}" "${chr}" -o "bam_${chr}."bam
  echo "bam_${chr}.bam"
done > bamfiles.txt
```

### Xargs SAMtools view

In the previous section, the name of each sliced BAM file was stored in a record file. Next, perform a `samtools view -c` on each slice using the record file as input.

```shell
counts_txt_name="${mappings_bam_prefix}_count.txt"

sum_reads=$( \
  <bamfiles.txt xargs -I {} samtools view -c $view_options '{}' \
  | awk '{s+=$1} END {print s}')
echo "Total Count: ${sum_reads}" > "${counts_txt_name}"
```

### Upload results

The results file is uploaded using the standard bash process:

1. Upload a file to the job execution's container.
2. Provide the DNAnexus link as a job's output using the script `dx-jobutil-add-output <output name>`

   ```shell
     counts_txt_id=$(dx upload "${counts_txt_name}" --brief)
     dx-jobutil-add-output counts_txt "${counts_txt_id}" --class=file
   ```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.dnanexus.com/getting-started/developer-tutorials/concurrent-computing-tutorials/parallel/parallel-xargs-by-chr.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
