Distributed bash-interpreter apps use bash functions to declare entry points. This app has the following entry points specified as bash functions:
main
count_func
sum_reads
Entry points are executed on a new worker with its own system requirements. The instance type can be set in the dxapp.json file’s runSpec.systemRequirements:
The main function slices the initial *.bam file and generates an index *.bai if needed. The input *.bam is the sliced into smaller *.bam files containing only reads from canonical chromosomes. First, the main function downloads the BAM file and gets the headers.
Sliced *.bam files are uploaded and their file IDs are passed to the count_func entry point using the dx-jobutil-new-job command.
if [ -z "${mappings_sorted_bai}" ]; then samtools index "${mappings_sorted_bam_name}" else dx download "${mappings_sorted_bai}" -o "${mappings_sorted_bam_name}".bai fi count_jobs=() for chr in $chromosomes; do seg_name="${mappings_sorted_bam_prefix}_${chr}".bam samtools view -b "${mappings_sorted_bam_name}" "${chr}" > "${seg_name}" bam_seg_file=$(dx upload "${seg_name}" --brief) count_jobs+=($(dx-jobutil-new-job -isegmentedbam_file="${bam_seg_file}" -ichr="${chr}" count_func)) done
Outputs from the count_func entry points are referenced as Job Based Object References (JBOR) and used as inputs for the sum_reads entry point.
for job in "${count_jobs[@]}"; do readfiles+=("-ireadfiles=${job}:counts_txt") done sum_reads_job=$(dx-jobutil-new-job "${readfiles[@]}" -ifilename="${mappings_sorted_bam_prefix}" sum_reads)
The output of the sum_reads entry point is used as the output of the main entry point via JBOR reference using the command dx-jobutil-add-output.
count_func
This entry point downloads and runs the command samtools view -c on the sliced *.bam. The generated counts_txt output file is uploaded as the entry point’s job output via the command dx-jobutil-add-output.
The main entry point triggers this sub job, providing the output of count_func as an input. This entry point gathers all the files generated by the count_func jobs and sums them.
This function returns read_sum_file as the entry point output.
sum_reads () { set -e -x -o pipefail; printf "Value of read file array %s" "${readfiles[@]}"; echo "Filename: ${filename}"; echo "Summing values in files and creating output read file"; for read_f in "${readfiles[@]}"; do echo "${read_f}"; dx download "${read_f}" -o - >> chromosome_result.txt; done; count_file="${filename}_chromosome_count.txt"; total=$(awk '{s+=$2} END {print s}' chromosome_result.txt); echo "Total reads: ${total}" >> "${count_file}"; readfile_name=$(dx upload "${count_file}" --brief); dx-jobutil-add-output read_sum_file "${readfile_name}" --class=file}