# Distributed by Region (sh)

## Entry Points

Distributed bash-interpreter apps use bash functions to declare entry points. Entry points are executed as subjobs on new workers with their own respective system requirements. This app has the following entry points specified as bash functions:

* `main`
* `count_func`
* `sum_reads`

### `main`

The `main` function takes the initial `*.bam`, generates an index `*.bai` if needed, and gets the list of regions from the `*.bam` file. Every 10 regions are sent, as input, to the `count_func` entry point using the [`dx-jobutil-new-job`](/user/helpstrings-of-sdk-command-line-utilities.md#dx-jobutil-new-job) command.

```shell
regions=$(samtools view -H "${mappings_sorted_bam_name}" \
  | grep "\@SQ" | sed 's/.*SN:\(\S*\)\s.*/\1/')

echo "Segmenting into regions"
count_jobs=()
counter=0
temparray=()
for r in $(echo $regions); do
  if [[ "${counter}" -ge 10 ]]; then
    echo "${temparray[@]}"
    count_jobs+=( \
      $(dx-jobutil-new-job \
      -ibam_file="${mappings_sorted_bam}" \
      -ibambai_file="${mappings_sorted_bai}" "${temparray[@]}" count_func))
    temparray=()
    counter=0
  fi
  temparray+=("-iregions=${r}") # Here we add to an array of -i<parameter>'s
  counter=$((counter+1))
done

if [[ counter -gt 0 ]]; then # Previous loop misses last iteration if it's < 10
  echo "${temparray[@]}"
  count_jobs+=($(dx-jobutil-new-job -ibam_file="${mappings_sorted_bam}" -ibambai_file="${mappings_sorted_bai}" "${temparray[@]}" count_func))
fi
```

Job outputs from the `count_func` entry point are referenced as Job Based Object References [JBOR](/faqs/developing-apps-and-applets.md#what-are-job-based-object-references-jbors-and-how-can-i-use-them-when-running-apps) and used as inputs for the `sum_reads` entry point.

```shell
echo "Merge count files, jobs:"
echo "${count_jobs[@]}"
readfiles=()
for count_job in "${count_jobs[@]}"; do
  readfiles+=("-ireadfiles=${count_job}:counts_txt")
done
echo "file name: ${sorted_bamfile_name}"
echo "Set file, readfile variables:"
echo "${readfiles[@]}"
countsfile_job=$(dx-jobutil-new-job -ifilename="${mappings_sorted_bam_prefix}" "${readfiles[@]}" sum_reads)
```

Job outputs of the `sum_reads` entry point are used as the output of the `main` entry point via a JBOR reference in the [`dx-jobutil-add-output`](/user/helpstrings-of-sdk-command-line-utilities.md#dx-jobutil-add-output) command.

```shell
echo "Specifying output file"
dx-jobutil-add-output counts_txt "${countsfile_job}:read_sum" --class=jobref
```

### `count_func`

This entry point performs a SAMtools count of the 10 regions passed as input. This execution runs on a new worker. As a result, variables from other functions are not accessible here. This includes variables from the `main()` function.

Once the output file with counts is created, it is uploaded to the platform and assigned as the entry point's job output `counts_txt` via the command [`dx-jobutil-add-output`](/user/helpstrings-of-sdk-command-line-utilities.md#dx-jobutil-add-output).

```shell
count_func() {
  set -e -x -o pipefail

  echo "Value of bam_file: '${bam_file}'"
  echo "Value of bambai_file: '${bambai_file}'"
  echo "Regions being counted '${regions[@]}'"

  dx-download-all-inputs

  mkdir workspace
  cd workspace || exit
  mv "${bam_file_path}" .
  mv "${bambai_file_path}" .
  outputdir="./out/samtool/count"
  mkdir -p "${outputdir}"
  samtools view -c "${bam_file_name}" "${regions[@]}" >> "${outputdir}/readcounts.txt"

  counts_txt_id=$(dx upload "${outputdir}/readcounts.txt" --brief)
  dx-jobutil-add-output counts_txt "${counts_txt_id}" --class=file
}
```

### `sum_reads`

The `main` entry point triggers this subjob, providing the output of `count_func` as an input JBOR. This entry point gathers all the `readcount.txt` files generated by the `count_func` jobs and sums the totals.

This entry point returns `read_sum` as a JBOR, which is then referenced as job output.

```shell
sum_reads() {

  set -e -x -o pipefail
  echo "$filename"

  echo "Value of read file array '${readfiles[@]}'"
  dx-download-all-inputs
  echo "Value of read file path array '${readfiles_path[@]}'"

  echo "Summing values in files"
  readsum=0
  for read_f in "${readfiles_path[@]}"; do
    temp=$(cat "$read_f")
    readsum=$((readsum + temp))
  done

  echo "Total reads: ${readsum}" > "${filename}_counts.txt"

  read_sum_id=$(dx upload "${filename}_counts.txt" --brief)
  dx-jobutil-add-output read_sum "${read_sum_id}" --class=file
```

In the main function, the output is referenced as follows:

```shell
echo "Specifying output file"
dx-jobutil-add-output counts_txt "${countsfile_job}:read_sum" --class=jobref
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.dnanexus.com/getting-started/developer-tutorials/concurrent-computing-tutorials/distributed/distributed-by-region-sh.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
