# Pysam

[View full source code on GitHub](https://github.com/dnanexus/dnanexus-example-applets/tree/master/Tutorials/python/pysam_count)

## How is Pysam provided?

Pysam is provided through a `pip3 install` using the pip3 package manager in the `dxapp.json`'s `runSpec.execDepends` property:

```json
{
 "runSpec": {
    ...
    "execDepends": [
      {"name": "pysam",
         "package_manager": "pip3",
         "version": "0.15.4"
      }
    ]
    ...
 }
```

The `execDepends` value is a JSON array of dependencies to resolve before the applet source code is run. In this applet, `pip3` is specified as the package manager and `pysam version 0.15.4` as the dependency to resolve.

## Downloading Input

The fields `mappings_sorted_bam` and `mappings_sorted_bai` are passed to the main function as parameters for the job. These parameters are dictionary objects with key-value pairs `{"$dnanexus_link": "<file>-<xxxx>"}`. File objects from the platform are handled through [`DXFile`](http://autodoc.dnanexus.com/bindings/python/current/dxpy_dxfile.html?highlight=dxfile#module-dxpy.bindings.dxfile) handles. If an index file is not supplied, then a `*.bai` index is created.

```python
print(mappings_sorted_bai)
print(mappings_sorted_bam)

mappings_sorted_bam = dxpy.DXFile(mappings_sorted_bam)
sorted_bam_name = mappings_sorted_bam.name
dxpy.download_dxfile(mappings_sorted_bam.get_id(),
                        sorted_bam_name)
ascii_bam_name = unicodedata.normalize(  # Pysam requires ASCII not Unicode string.
    'NFKD', sorted_bam_name).encode('ascii', 'ignore')

if mappings_sorted_bai is not None:
    mappings_sorted_bai = dxpy.DXFile(mappings_sorted_bai)
    dxpy.download_dxfile(mappings_sorted_bai.get_id(),
                            mappings_sorted_bai.name)
else:
    pysam.index(ascii_bam_name)
```

## Working with Pysam

Pysam provides key methods that mimic SAMtools commands. In this applet example, the focus is only on canonical chromosomes. The Pysam object representation of a BAM file is `pysam.AlignmentFile`.

```python
mappings_obj = pysam.AlignmentFile(ascii_bam_name, "rb")
regions = get_chr(mappings_obj, canonical_chr)
```

The helper function `get_chr` is shown below.

```python
def get_chr(bam_alignment, canonical=False):
    """Helper function to return canonical chromosomes from SAM/BAM header

    Arguments:
        bam_alignment (pysam.AlignmentFile): SAM/BAM pysam object
        canonical (boolean): Return only canonical chromosomes
    Returns:
        regions (list[str]): Region strings
    """
    regions = []
    headers = bam_alignment.header
    seq_dict = headers['SQ']

    if canonical:
        re_canonical_chr = re.compile(r'^chr[0-9XYM]+$|^[0-9XYM]')
        for seq_elem in seq_dict:
            if re_canonical_chr.match(seq_elem['SN']):
                regions.append(seq_elem['SN'])
    else:
        regions = [''] * len(seq_dict)
        for i, seq_elem in enumerate(seq_dict):
            regions[i] = seq_elem['SN']

    return regions
```

Once a list of canonical chromosomes is established, you can iterate over them and perform the Pysam version of `samtools view -c`, `pysam.AlignmentFile.count`.

```python
total_count = 0
count_filename = "{bam_prefix}_counts.txt".format(
    bam_prefix=ascii_bam_name[:-4])

with open(count_filename, "w") as f:
    for region in regions:
        temp_count = mappings_obj.count(region=region)
        f.write("{region_name}: {counts}\n".format(
            region_name=region, counts=temp_count))
        total_count += temp_count

    f.write("Total reads: {sum_counts}".format(sum_counts=total_count))
```

## Uploading Outputs

The summarized counts are returned as the job output. The `dx-toolkit` Python SDK function [`dxpy.upload_local_file`](http://autodoc.dnanexus.com/bindings/python/current/dxpy_dxfile.html#dxpy.bindings.dxfile_functions.upload_local_file) uploads and generates a `DXFile` corresponding to the tabulated result file.

```python
counts_txt = dxpy.upload_local_file(count_filename)
output = {}
output["counts_txt"] = dxpy.dxlink(counts_txt)

return output
```

Python job outputs have to be a dictionary of key-value pairs, with the keys being job output names as defined in the `dxapp.json` file and the values being the output values for corresponding output classes. For files, the output type is a `DXLink`. The [`dxpy.dxlink`](http://autodoc.dnanexus.com/bindings/python/current/dxpy_functions.html?highlight=dxlink#dxpy.bindings.dxdataobject_functions.dxlink) function generates the appropriate `DXLink` value.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.dnanexus.com/getting-started/developer-tutorials/python/pysam.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
