Distributed by Region (sh)
View full source code on GitHub
Entry Points
Distributed bash-interpreter apps use bash functions to declare entry points. Entry points are executed as subjobs on new workers with their own respective system requirements. This app has the following entry points specified as bash functions:
main
count_func
sum_reads
main
main
The main
function takes the initial *.bam
, generates an index *.bai
if needed, and obtains the list of regions from the *.bam
file. Every 10 regions will be sent, as input, to the count_func
entry point using dx-jobutil-new-job
command.
# Extract list of reference regions from BAM header
regions=$(
samtools view -H "${mappings_sorted_bam_name}" | \
grep "@SQ" | \
sed 's/.*SN:\(\S*\)\s.*/\1/'
)
echo "Segmenting into regions"
count_jobs=()
counter=0
temparray=()
# Loop through each region
for r in $(echo "$regions"); do
if [[ "${counter}" -ge 10 ]]; then
echo "${temparray[@]}"
count_jobs+=($(
dx-jobutil-new-job \
-ibam_file="${mappings_sorted_bam}" \
-ibambai_file="${mappings_sorted_bai}" \
"${temparray[@]}" \
count_func
))
temparray=()
counter=0
fi
# Add region to temp array of -i<parameter>s
temparray+=("-iregions=${r}")
counter=$((counter + 1))
done
# Handle remaining regions (less than 10)
if [[ $counter -gt 0 ]]; then
echo "${temparray[@]}"
count_jobs+=($(
dx-jobutil-new-job \
-ibam_file="${mappings_sorted_bam}" \
-ibambai_file="${mappings_sorted_bai}" \
"${temparray[@]}" \
count_func
))
fi
Job outputs from the count_func
entry point are referenced as Job Based Object References JBOR and used as inputs for the sum_reads
entry point.
echo "Merge count files, jobs:"
echo "${count_jobs[@]}"
readfiles=()
for count_job in "${count_jobs[@]}"; do
readfiles+=("-ireadfiles=${count_job}:counts_txt")
done
echo "file name: ${sorted_bamfile_name}"
echo "Set file, readfile variables:"
echo "${readfiles[@]}"
countsfile_job=$(dx-jobutil-new-job -ifilename="${mappings_sorted_bam_prefix}" "${readfiles[@]}" sum_reads)
Job outputs of the sum_reads
entry point is used as the output of the main entry point via JBOR reference in the dx-jobutil-add-output
command.
echo "Specifying output file"
dx-jobutil-add-output counts_txt "${countsfile_job}:read_sum" --class=jobref
count_func
count_func
This entry point performs a SAMtools count of the 10 regions passed as input. This execution will be run on a new worker. As a result, variables from other functions will not be accessible here. This includes variables from the main()
function.
Once the output file with counts is created, it is uploaded to the platform and assigned as the entry point's job output counts_txt
via the command dx-jobutil-add-output
.
count_func() {
set -e -x -o pipefail
echo "Value of bam_file: '${bam_file}'"
echo "Value of bambai_file: '${bambai_file}'"
echo "Regions being counted '${regions[@]}'"
dx-download-all-inputs
mkdir workspace
cd workspace || exit
mv "${bam_file_path}" .
mv "${bambai_file_path}" .
outputdir="./out/samtool/count"
mkdir -p "${outputdir}"
samtools view -c "${bam_file_name}" "${regions[@]}" >> "${outputdir}/readcounts.txt"
counts_txt_id=$(dx upload "${outputdir}/readcounts.txt" --brief)
dx-jobutil-add-output counts_txt "${counts_txt_id}" --class=file
}
sum_reads
sum_reads
The main
entry point triggers this subjob, providing the output of count_func
as an input JBOR. This entry point gathers all the readcount.txt
files generated by the count_func
jobs and sums the totals.
This entry point returns read_sum
as a JBOR, which is then referenced as job output.
sum_reads() {
set -e -x -o pipefail
echo "$filename"
echo "Value of read file array '${readfiles[@]}'"
dx-download-all-inputs
echo "Value of read file path array '${readfiles_path[@]}'"
echo "Summing values in files"
readsum=0
for read_f in "${readfiles_path[@]}"; do
temp=$(cat "$read_f")
readsum=$((readsum + temp))
done
echo "Total reads: ${readsum}" > "${filename}_counts.txt"
read_sum_id=$(dx upload "${filename}_counts.txt" --brief)
dx-jobutil-add-output read_sum "${read_sum_id}" --class=file
}
In the main function, the output is referenced
echo "Specifying output file"
dx-jobutil-add-output counts_txt "${countsfile_job}:read_sum" --class=jobref
Last updated
Was this helpful?