# Mkfifo and dx cat

[View full source code on GitHub](https://github.com/dnanexus/dnanexus-example-applets/tree/master/Tutorials/bash/samtools_count_catmkfifo_sh)

This applet performs a SAMtools count on an input file while minimizing disk usage. For additional details on using FIFO (named pipes) special files, run the command `man fifo` in your shell.

{% hint style="warning" %}
Named pipes require **BOTH** a `stdin` and `stdout`. The following examples run incomplete named pipes in background processes so the foreground script does not block.
{% endhint %}

To approach this use case, outline the steps for the applet:

1. Stream the BAM file from the platform to a worker.
2. While the BAM streams, count the number of reads present.
3. Write the result to a file.
4. Stream the result file to the platform.

## Stream BAM file from the platform to a worker

First, establish a named pipe on the worker. Then, stream to the `stdin` of the named pipe and download the file as a stream from the platform using [`dx cat`](/user/helpstrings-of-sdk-command-line-utilities.md#cat).

```shell
mkdir workspace
mappings_fifo_path="workspace/${mappings_bam_name}"
mkfifo "${mappings_fifo_path}" # FIFO file is created
dx cat "${mappings_bam}" > "${mappings_fifo_path}" &
input_pid="$!"
```

| FIFO     | `stdin` | `stdout` |
| -------- | ------- | -------- |
| BAM file | **YES** | **NO**   |

## Output BAM file read count

Having created the FIFO special file representing the streamed BAM, you can call the `samtools` command. The `samtools` command reading the BAM provides the BAM FIFO file with a `stdout`. However, remember that you want to stream the output back to the Platform. You must create a named pipe representing the output file too.

```shell
mkdir -p ./out/counts_txt/

counts_fifo_path="./out/counts_txt/${mappings_bam_prefix}_counts.txt"

mkfifo "${counts_fifo_path}" # FIFO file is created, readcount.txt
samtools view -c "${mappings_fifo_path}" > "${counts_fifo_path}" &
process_pid="$!"
```

| FIFO        | `stdin` | `stdout` |
| ----------- | ------- | -------- |
| BAM file    | **YES** | **YES**  |
| output file | **YES** | **NO**   |

The directory structure created here (`~/out/counts_txt`) is required to use the [`dx-upload-all-outputs`](/user/helpstrings-of-sdk-command-line-utilities.md#dx-upload-all-outputs) command in the next step. All files found in the path `~/out/<output name>` are uploaded to the corresponding `<output name>` specified in the `dxapp.json`.

## Stream the result file to the platform

A stream from the platform has been established, piped into a `samtools` command, and the results are output to another named pipe. However, the background process remains blocked without a `stdout` for the output file. Creating an upload stream to the platform resolves this.

Upload as a stream to the platform using the commands [`dx-upload-all-outputs`](/user/helpstrings-of-sdk-command-line-utilities.md#dx-upload-all-outputs) or [`dx upload -`](/user/helpstrings-of-sdk-command-line-utilities.md#upload). Specify `--buffer-size` when needed.

```shell
dx-upload-all-outputs &
upload_pid="$!"
```

| FIFO        | `stdin` | `stdout` |
| ----------- | ------- | -------- |
| BAM file    | **YES** | **YES**  |
| output file | **YES** | **YES**  |

Alternatively, `dx upload -` can upload directly from `stdin`, removing the need for the directory structure required for `dx-upload-all-outputs`.

```shell
dx upload - --path "${mappings_bam_prefix}_counts.txt" < "${counts_fifo_path}" &
upload_pid="$!"
```

{% hint style="warning" %}
When uploading a file that exists on disk, `dx upload` knows the file size and automatically handles any cloud service provider upload chunk requirements. When uploading as a stream, the file size is not known automatically and `dx upload` uses default parameters. These defaults are fine for most use cases, but you may need to specify an upload part size with the `--buffer-size` option.
{% endhint %}

## Wait for background processes

With background processes running, `wait` in the foreground for those processes to finish.

```shell
wait -n  # "$input_pid"
wait -n  # "$process_pid"
wait -n  # "$upload_pid"
```

Without waiting, the app script running in the foreground would finish and terminate the job prematurely.

## How is the SAMtools dependency provided?

The SAMtools compiled binary is placed directly in the `<applet dir>/resources` directory. Any files found in the `resources/` directory are uploaded so that they are present in the worker's root directory. In this case:

```
├── Applet dir
│   ├── src
│   ├── dxapp.json
│   ├── resources
│       ├── usr
│           ├── bin
│               ├── < samtools binary >
```

When this applet is run on a worker, the `resources/` folder is placed in the worker's root directory `/`:

```
/
├── usr
│   ├── bin
│       ├── < samtools binary >
├── home
│   ├── dnanexus
```

`/usr/bin` is part of the `$PATH` variable, so the `samtools` command can be referenced directly in the script as `samtools view -c ...`.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.dnanexus.com/getting-started/developer-tutorials/bash/mkfifo-and-dx-cat.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
