> For the complete documentation index, see [llms.txt](https://documentation.dnanexus.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://documentation.dnanexus.com/getting-started/developer-tutorials/concurrent-computing-tutorials/distributed/distributed-by-chr-sh.md).

# Distributed by Chr (sh)

[View full source code on GitHub](https://github.com/dnanexus/dnanexus-example-applets/tree/master/Tutorials/bash/samtools_count_distr_chr_slice_sh)

## How is the SAMtools dependency provided?

The SAMtools dependency is resolved by declaring an Apt-Get package in the `dxapp.json` file's `runSpec.execDepends`.

```json
{

  ...
  "runSpec": {
    ...
    "execDepends": [
      {
        "name": "samtools"
      }
    ]
  }
  ...
}
```

For additional information, see the [SAMtools dependency example](/getting-started/developer-tutorials/bash/git-dependency.md#how-is-samtools-called-in-the-src-script).

## Entry Points

Distributed bash-interpreter apps use bash functions to declare entry points. This app has the following entry points specified as bash functions:

* `main`
* `count_func`
* `sum_reads`

Entry points are executed on a new worker with its own system requirements. The instance type can be set in the `dxapp.json` file's `runSpec.systemRequirements`:

```json
{
  "runSpec": {
    ...
    "systemRequirements": {
      "main": {
        "instanceType": "mem1_ssd1_x4"
      },
      "count_func": {
        "instanceType": "mem1_ssd1_x2"
      },
      "sum_reads": {
        "instanceType": "mem1_ssd1_x4"
      }
    },
    ...
  }
}
```

### `main`

The `main` function slices the initial `*.bam` file and generates an index `*.bai` if needed. The input `*.bam` is then sliced into smaller `*.bam` files containing only reads from canonical chromosomes. First, the main function downloads the BAM file and gets the headers.

```shell
dx download "${mappings_sorted_bam}"
chromosomes=$( \
  samtools view -H "${mappings_sorted_bam_name}" \
  | grep "\@SQ" \
  | awk -F '\t' '{print $2}' \
  | awk -F ':' '{if ($2 ~ /^chr[0-9XYM]+$|^[0-9XYM]/) {print $2}}')
```

Sliced `*.bam` files are uploaded and their file IDs are passed to the `count_func` entry point using the [`dx-jobutil-new-job`](/user/helpstrings-of-sdk-command-line-utilities.md#dx-jobutil-new-job) command.

```shell
if [ -z "${mappings_sorted_bai}" ]; then
    samtools index "${mappings_sorted_bam_name}"
else
    dx download "${mappings_sorted_bai}" -o "${mappings_sorted_bam_name}.bai"
fi

count_jobs=()

for chr in $chromosomes; do
    seg_name="${mappings_sorted_bam_prefix}_${chr}.bam"
    samtools view -b "${mappings_sorted_bam_name}" "${chr}" > "${seg_name}"
    bam_seg_file=$(dx upload "${seg_name}" --brief)
    count_jobs+=($(dx-jobutil-new-job \
        -isegmentedbam_file="${bam_seg_file}" \
        -ichr="${chr}" \
        count_func))
done
```

Outputs from the `count_func` entry points are referenced as Job Based Object References (JBOR) and used as inputs for the `sum_reads` entry point.

```shell
for job in "${count_jobs[@]}"; do
    readfiles+=("-ireadfiles=${job}:counts_txt")
done

sum_reads_job=$(
    dx-jobutil-new-job \
        "${readfiles[@]}" \
        -ifilename="${mappings_sorted_bam_prefix}" \
        sum_reads
)
```

The output of the `sum_reads` entry point is used as the output of the main entry point via a JBOR reference using the command `dx-jobutil-add-output`.

### `count_func`

This entry point downloads and runs the command `samtools view -c` on the sliced `*.bam`. The generated `counts_txt` output file is uploaded as the entry point's job output via the command `dx-jobutil-add-output`.

```shell
count_func () {
    echo "Value of segmentedbam_file: '${segmentedbam_file}'"
    echo "Chromosome being counted '${chr}'"

    dx download "${segmentedbam_file}"

    readcount=$(samtools view -c "${segmentedbam_file_name}")
    printf "${chr}:\t%s\n" "${readcount}" > "${segmentedbam_file_prefix}.txt"

    readcount_file=$(dx upload "${segmentedbam_file_prefix}.txt" --brief)
    dx-jobutil-add-output counts_txt "${readcount_file}" --class=file
}
```

### `sum_reads`

The `main` entry point triggers this sub job, providing the output of `count_func` as an input. This entry point gathers all the files generated by the `count_func` jobs and sums them.

This function returns `read_sum_file` as the entry point output.

```shell
sum_reads () {
    set -e -x -o pipefail

    printf "Value of read file array %s" "${readfiles[@]}"
    echo "Filename: ${filename}"
    echo "Summing values in files and creating output read file"

    for read_f in "${readfiles[@]}"; do
        echo "${read_f}"
        dx download "${read_f}" -o - >> chromosome_result.txt
    done

    count_file="${filename}_chromosome_count.txt"
    total=$(awk '{s+=$2} END {print s}' chromosome_result.txt)
    echo "Total reads: ${total}" >> "${count_file}"

    readfile_name=$(dx upload "${count_file}" --brief)
    dx-jobutil-add-output read_sum_file "${readfile_name}" --class=file
}
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.dnanexus.com/getting-started/developer-tutorials/concurrent-computing-tutorials/distributed/distributed-by-chr-sh.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
