Distributed by Chr (sh)

View full source code on GitHub

How is the SAMtools dependency provided?

The SAMtools dependency is resolved by declaring an Apt-Get package in the dxapp.json file’s runSpec.execDepends.

{

  ...
  "runSpec": {
  
    ...
    "execDepends": [
      {
        "name": "samtools"
      }
    ]
  }
  
  ...
}

For additional information, see execDepends.

Entry Points

Distributed bash-interpreter apps use bash functions to declare entry points. This app has the following entry points specified as bash functions:

  • main

  • count_func

  • sum_reads

Entry points are executed on a new worker with its own system requirements. The instance type can be set in the dxapp.json file’s runSpec.systemRequirements:

{
  "runSpec": {
  
    ...
    "systemRequirements": {
      "main": {
        "instanceType": "mem1_ssd1_x4"
      },
      
      "count_func": {
        "instanceType": "mem1_ssd1_x2"
      },
      
      "sum_reads": {
        "instanceType": "mem1_ssd1_x4"
      }
    },
    
    ...
  }
}

main

The main function slices the initial *.bam file and generates an index *.bai if needed. The input *.bam is the sliced into smaller *.bam files containing only reads from canonical chromosomes. First, the main function downloads the BAM file and gets the headers.

  dx download "${mappings_sorted_bam}"  chromosomes=$(samtools view -H "${mappings_sorted_bam_name}" | grep "\@SQ" | awk -F '\t' '{print $2}' | awk -F ':' '{if ($2 ~ /^chr[0-9XYM]+$|^[0-9XYM]/) {print $2}}')

Sliced *.bam files are uploaded and their file IDs are passed to the count_func entry point using the dx-jobutil-new-job command.

  if [ -z "${mappings_sorted_bai}" ]; then    samtools index "${mappings_sorted_bam_name}"  else    dx download "${mappings_sorted_bai}" -o "${mappings_sorted_bam_name}".bai  fi​  count_jobs=()  for chr in $chromosomes; do    seg_name="${mappings_sorted_bam_prefix}_${chr}".bam    samtools view -b "${mappings_sorted_bam_name}" "${chr}" > "${seg_name}"    bam_seg_file=$(dx upload "${seg_name}" --brief)    count_jobs+=($(dx-jobutil-new-job -isegmentedbam_file="${bam_seg_file}" -ichr="${chr}" count_func))  done

Outputs from the count_func entry points are referenced as Job Based Object References (JBOR) and used as inputs for the sum_reads entry point.

  for job in "${count_jobs[@]}"; do    readfiles+=("-ireadfiles=${job}:counts_txt")  done​  sum_reads_job=$(dx-jobutil-new-job "${readfiles[@]}" -ifilename="${mappings_sorted_bam_prefix}" sum_reads)

The output of the sum_reads entry point is used as the output of the main entry point via JBOR reference using the command dx-jobutil-add-output.

count_func

This entry point downloads and runs the command samtools view -c on the sliced *.bam. The generated counts_txt output file is uploaded as the entry point’s job output via the command dx-jobutil-add-output.

count_func () {     echo "Value of segmentedbam_file: '${segmentedbam_file}'";    echo "Chromosome being counted '${chr}'";    dx download "${segmentedbam_file}";    readcount=$(samtools view -c "${segmentedbam_file_name}");    printf "${chr}:\t%s\n" "${readcount}" > "${segmentedbam_file_prefix}.txt";    readcount_file=$(dx upload "${segmentedbam_file_prefix}".txt --brief);    dx-jobutil-add-output counts_txt "${readcount_file}" --class=file}

sum_reads

The main entry point triggers this sub job, providing the output of count_func as an input. This entry point gathers all the files generated by the count_func jobs and sums them.

This function returns read_sum_file as the entry point output.

sum_reads () {     set -e -x -o pipefail;    printf "Value of read file array %s" "${readfiles[@]}";    echo "Filename: ${filename}";    echo "Summing values in files and creating output read file";    for read_f in "${readfiles[@]}";    do        echo "${read_f}";        dx download "${read_f}" -o - >> chromosome_result.txt;    done;    count_file="${filename}_chromosome_count.txt";    total=$(awk '{s+=$2} END {print s}' chromosome_result.txt);    echo "Total reads: ${total}" >> "${count_file}";    readfile_name=$(dx upload "${count_file}" --brief);    dx-jobutil-add-output read_sum_file "${readfile_name}" --class=file}

Last updated