# Distributed by Chr (sh)

[View full source code on GitHub](https://github.com/dnanexus/dnanexus-example-applets/tree/master/Tutorials/bash/samtools_count_distr_chr_slice_sh)

## How is the SAMtools dependency provided?

The SAMtools dependency is resolved by declaring an Apt-Get package in the `dxapp.json` file's `runSpec.execDepends`.

```json
{
...
    "runSpec": {
   ...
      "execDepends": [
        {"name": "samtools"}
      ]
    }
...
}
```

For additional information, see `execDepends`

## Entry Points

Distributed bash-interpreter apps use bash functions to declare entry points. This app has the following entry points specified as bash functions:

* `main`
* `count_func`
* `sum_reads`

Entry points are executed on a new worker with its own system requirements. The instance type can be set in the `dxapp.json` file's `runSpec.systemRequirements`:

```json
{
  "runSpec": {
    ...
    "systemRequirements": {
      "main": {
        "instanceType": "mem1_ssd1_x4"
      },
      "count_func": {
        "instanceType": "mem1_ssd1_x2"
      },
      "sum_reads": {
        "instanceType": "mem1_ssd1_x4"
      }
    },
    ...
  }
}
```

### `main`

The `main` function slices the initial `*.bam` file and generates an index `*.bai` if needed. The input `*.bam` is then sliced into smaller `*.bam` files containing only reads from canonical chromosomes. First, the main function downloads the BAM file and gets the headers.

```shell
dx download "${mappings_sorted_bam}"
chromosomes=$( \
  samtools view -H "${mappings_sorted_bam_name}" \
  | grep "\@SQ" \
  | awk -F '\t' '{print $2}' \
  | awk -F ':' '{if ($2 ~ /^chr[0-9XYM]+$|^[0-9XYM]/) {print $2}}')
```

Sliced `*.bam` files are uploaded and their file IDs are passed to the `count_func` entry point using the [`dx-jobutil-new-job`](/user/helpstrings-of-sdk-command-line-utilities.md) command.

```shell
if [ -z "${mappings_sorted_bai}" ]; then
  samtools index "${mappings_sorted_bam_name}"
else
  dx download "${mappings_sorted_bai}" -o "${mappings_sorted_bam_name}".bai
fi

count_jobs=()
for chr in $chromosomes; do
  seg_name="${mappings_sorted_bam_prefix}_${chr}".bam
  samtools view -b "${mappings_sorted_bam_name}" "${chr}" > "${seg_name}"
  bam_seg_file=$(dx upload "${seg_name}" --brief)
  count_jobs+=($(dx-jobutil-new-job -isegmentedbam_file="${bam_seg_file}" -ichr="${chr}" count_func))
done
```

Outputs from the `count_func` entry points are referenced as Job Based Object References (JBOR) and used as inputs for the `sum_reads` entry point.

```shell
for job in "${count_jobs[@]}"; do
  readfiles+=("-ireadfiles=${job}:counts_txt")
done

sum_reads_job=$(dx-jobutil-new-job "${readfiles[@]}" -ifilename="${mappings_sorted_bam_prefix}" sum_reads)
```

The output of the `sum_reads` entry point is used as the output of the main entry point via a JBOR reference using the command `dx-jobutil-add-output`.

### `count_func`

This entry point downloads and runs the command `samtools view -c` on the sliced `*.bam`. The generated `counts_txt` output file is uploaded as the entry point's job output via the command `dx-jobutil-add-output`.

```shell
count_func ()
{
    echo "Value of segmentedbam_file: '${segmentedbam_file}'";
    echo "Chromosome being counted '${chr}'";
    dx download "${segmentedbam_file}";
    readcount=$(samtools view -c "${segmentedbam_file_name}");
    printf "${chr}:\t%s\n" "${readcount}" > "${segmentedbam_file_prefix}.txt";
    readcount_file=$(dx upload "${segmentedbam_file_prefix}".txt --brief);
    dx-jobutil-add-output counts_txt "${readcount_file}" --class=file
}
```

### `sum_reads`

The `main` entry point triggers this sub job, providing the output of `count_func` as an input. This entry point gathers all the files generated by the `count_func` jobs and sums them.

This function returns `read_sum_file` as the entry point output.

```shell
sum_reads ()
{
    set -e -x -o pipefail;
    printf "Value of read file array %s" "${readfiles[@]}";
    echo "Filename: ${filename}";
    echo "Summing values in files and creating output read file";
    for read_f in "${readfiles[@]}";
    do
        echo "${read_f}";
        dx download "${read_f}" -o - >> chromosome_result.txt;
    done;
    count_file="${filename}_chromosome_count.txt";
    total=$(awk '{s+=$2} END {print s}' chromosome_result.txt);
    echo "Total reads: ${total}" >> "${count_file}";
    readfile_name=$(dx upload "${count_file}" --brief);
    dx-jobutil-add-output read_sum_file "${readfile_name}" --class=file
}
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.dnanexus.com/getting-started/developer-tutorials/bash/distributed-by-chr-sh.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
