# Distributed by Chr (sh)

[View full source code on GitHub](https://github.com/dnanexus/dnanexus-example-applets/tree/master/Tutorials/bash/samtools_count_distr_chr_slice_sh)

## How is the SAMtools dependency provided?

The SAMtools dependency is resolved by declaring an Apt-Get package in the `dxapp.json` file's `runSpec.execDepends`.

```json
{

  ...
  "runSpec": {
    ...
    "execDepends": [
      {
        "name": "samtools"
      }
    ]
  }
  ...
}
```

For additional information, see `execDepends`.

## Entry Points

Distributed bash-interpreter apps use bash functions to declare entry points. This app has the following entry points specified as bash functions:

* `main`
* `count_func`
* `sum_reads`

Entry points are executed on a new worker with its own system requirements. The instance type can be set in the `dxapp.json` file's `runSpec.systemRequirements`:

```json
{
  "runSpec": {
    ...
    "systemRequirements": {
      "main": {
        "instanceType": "mem1_ssd1_x4"
      },
      "count_func": {
        "instanceType": "mem1_ssd1_x2"
      },
      "sum_reads": {
        "instanceType": "mem1_ssd1_x4"
      }
    },
    ...
  }
}
```

### `main`

The `main` function slices the initial `*.bam` file and generates an index `*.bai` if needed. The input `*.bam` is then sliced into smaller `*.bam` files containing only reads from canonical chromosomes. First, the main function downloads the BAM file and gets the headers.

```shell
dx download "${mappings_sorted_bam}" \
  chromosomes=$( \
  samtools view -H "${mappings_sorted_bam_name}" \
  | grep "\@SQ" \
  | awk -F '\t' '{print $2}' \
  | awk -F ':' '{if ($2 ~ /^chr[0-9XYM]+$|^[0-9XYM]/) {print $2}}')
```

Sliced `*.bam` files are uploaded and their file IDs are passed to the `count_func` entry point using the [`dx-jobutil-new-job`](https://documentation.dnanexus.com/user/helpstrings-of-sdk-command-line-utilities#dx-jobutil-new-job) command.

```shell
if [ -z "${mappings_sorted_bai}" ]; then
    samtools index "${mappings_sorted_bam_name}"
else
    dx download "${mappings_sorted_bai}" -o "${mappings_sorted_bam_name}.bai"
fi

count_jobs=()

for chr in $chromosomes; do
    seg_name="${mappings_sorted_bam_prefix}_${chr}.bam"
    samtools view -b "${mappings_sorted_bam_name}" "${chr}" > "${seg_name}"
    bam_seg_file=$(dx upload "${seg_name}" --brief)
    count_jobs+=($(dx-jobutil-new-job \
        -isegmentedbam_file="${bam_seg_file}" \
        -ichr="${chr}" \
        count_func))
done
```

Outputs from the `count_func` entry points are referenced as Job Based Object References (JBOR) and used as inputs for the `sum_reads` entry point.

```shell
for job in "${count_jobs[@]}"; do
    readfiles+=("-ireadfiles=${job}:counts_txt")
done

sum_reads_job=$(
    dx-jobutil-new-job \
        "${readfiles[@]}" \
        -ifilename="${mappings_sorted_bam_prefix}" \
        sum_reads
)
```

The output of the `sum_reads` entry point is used as the output of the main entry point via a JBOR reference using the command `dx-jobutil-add-output`.

### `count_func`

This entry point downloads and runs the command `samtools view -c` on the sliced `*.bam`. The generated `counts_txt` output file is uploaded as the entry point's job output via the command `dx-jobutil-add-output`.

```shell
count_func () {
    echo "Value of segmentedbam_file: '${segmentedbam_file}'"
    echo "Chromosome being counted '${chr}'"

    dx download "${segmentedbam_file}"

    readcount=$(samtools view -c "${segmentedbam_file_name}")
    printf "${chr}:\t%s\n" "${readcount}" > "${segmentedbam_file_prefix}.txt"

    readcount_file=$(dx upload "${segmentedbam_file_prefix}.txt" --brief)
    dx-jobutil-add-output counts_txt "${readcount_file}" --class=file
}
```

### `sum_reads`

The `main` entry point triggers this sub job, providing the output of `count_func` as an input. This entry point gathers all the files generated by the `count_func` jobs and sums them.

This function returns `read_sum_file` as the entry point output.

```shell
sum_reads () {
    set -e -x -o pipefail

    printf "Value of read file array %s" "${readfiles[@]}"
    echo "Filename: ${filename}"
    echo "Summing values in files and creating output read file"

    for read_f in "${readfiles[@]}"; do
        echo "${read_f}"
        dx download "${read_f}" -o - >> chromosome_result.txt
    done

    count_file="${filename}_chromosome_count.txt"
    total=$(awk '{s+=$2} END {print s}' chromosome_result.txt)
    echo "Total reads: ${total}" >> "${count_file}"

    readfile_name=$(dx upload "${count_file}" --brief)
    dx-jobutil-add-output read_sum_file "${readfile_name}" --class=file
}
```
