Setting Up an Interactive Workstation

The cloud workstation app (platform login required to access this link) sets up a job where you can SSH into the worker running the job and use the worker as a "workstation" to explore and manipulate data stored on DNAnexus as you would on a local Linux machine.

The benefit of using this workstation app as opposed to running on your local machine is that in the workstation, you will be able able to access data stored on DNAnexus without downloading the files to your local machine and being constrained by your local internet bandwidth. In addition, you can configure the applet to launch more powerful instance types (virtual computer configurations) available to DNAnexus users. Any files or results you may want to save from your workstation session can simply be uploaded back into the project from which you launched your app.

Run the Workstation App

The cloud workstation app provides basic functionality such as access to all your data, network access to download public tools and can be run as is.

If you would like to customize your cloud workstation experience, we also provide the source code of the app so you can build your own version of the workstation.

NOTE: You can only give SSH access permissions and access the interactive worker via the command-line client. Download and install it if you have not done so already.

Step 1: Configure SSH for your account

If you haven't already, you will need to configure your account to allow use of SSH connections using dx ssh_config. For more information on configuring your account and connecting to jobs, click here.

Step 2: Run the app

To run the workstation app and SSH into the terminal, navigate to the project you would like to work in. You will need CONTRIBUTE or ADMINISTER access to run the app in that project.

$ dx select "my-working-project"

Run the dx command shown in the code block below. The --ssh flag will automatically configure the job to allow SSH access and connect to it after launching. This applet takes as input a maximum session length (in minutes).

$ dx run app-cloud_workstation --ssh
Select an optional parameter to set by its # (^D or <ENTER> to finish):
[0] Maximum Session Length (suffixes allowed: s, m, h, d, w, M, y) (max_session_length) [default="1h"]
[1] Files (fids)
Optional param #: 0
Input: Maximum Session Length (suffixes allowed: s, m, h, d, w, M, y) (max_session_length)
Class: string
Enter string value ('?' for more options)
max_session_length: 3h
Select an optional parameter to set by its # (^D or <ENTER> to finish):
[0] Maximum Session Length (suffixes allowed: s, m, h, d, w, M, y) (max_session_length) [="3h"]
[1] Files (fids)
Optional param #: <ENTER>

Upon confirmation of input, you will be connected to the worker running the cloud workstation app and shown the following message:

Calling app-cloud_workstation with output destination
project-xxxx:/
Job ID: job-xxxx
Waiting for job-xxxx to start......
Resolving job hostname and SSH host key...........................
Checking connectivity to ec2-xx-xxx-xxx-xxx.compute-1.amazonaws.com...OK
Connecting to ec2-xx-xxx-xxx-xxx.compute-1.amazonaws.com
Welcome to DNAnexus!
This is the DNAnexus Execution Environment, running job-xxxx.
Job: Cloud Workstation
App: cloud_workstation:main
Instance type: mem2_hdd2_x2
Project: Cloud Workstation Project (project-xxxx)
Workspace: container-xxxx
Running since: Fri Oct 31 17:45:26 UTC 2014
Running for: 0:00:30
The public address of this instance is ec2-xx-xxx-xxx-xxx.compute-1.amazonaws.com.
You are running byobu, a terminal session manager.
If you get disconnected from this instance, you can log in again and your work will be saved a
s long as the job is running.
For more information on byobu, press F1.
The job is running in terminal 1. To switch to it, use the F4 key (press F4 again
to switch back to this terminal).
Use sudo to run administrative commands.
From this window, you can:
- Use the DNAnexus API with dx
- Monitor processes on the worker with htop
- Install packages with apt-get install
- Use this instance as a general-purpose Linux workstation
OS version: Ubuntu 14.04.5 LTS (GNU/Linux 4.4.0-96-generic x86_64)
dnanexus@job-xxxx:~$

Step 3: Set up the workspace

Due to differences in the execution environment, in order to upload or download files from your parent projects, you must first run the following two commands in your workstation:

$ unset DX_WORKSPACE_ID
$ dx cd $DX_PROJECT_CONTEXT_ID:

The first command unsets an environment variable which is set when the applet is launched and allows you to navigate into any of the projects you have access to. The second command is an invocation of dx cd to change the working directory of your workstation to the parent project (the only project your workstation has CONTRIBUTE access to). For more information about the environment variables in the job container, please visit the Execution Environment Reference.

The workstation should now be ready to use.

Downloading files saved on DNAnexus

This app is configured to have VIEW access to all projects that the user running the app can access. This means that you will be able to download any file you have access on DNAnexus using the dx download command.

To download a file named my-file.txt from the parent project:

$ dx download my-file.txt

To download one set of reads from the SRR100022 exome from the public Demo Data project:

$ dx download project-BQbJpBj0bvygyQxgQ1800Jkk:/SRR100022/SRR100022_1.filt.fastq.gz

To navigate to another project you have access to, other than the parent project and download a file from that project you can do the following:

$ dx select --level=VIEW
Available projects (VIEW or higher):
0) Working Project (CONTRIBUTE)
1) Research Project (VIEW)
2) Production Project (VIEW)
[...]
Pick a numbered choice or "m" for more options [0]: 1
Setting current project to: Research Project
$ dx ls
my-file-1.txt
$ dx download my-file-1.txt

Downloading additional tools

This app has network access, so you will be able to download any tool you may need during your session as you would on a Linux workstation. After downloading your tools, you can use the worker as a general-purpose workstation to manipulate and explore your data as needed.

If you would like to have your tools packaged into your workstation as it is launched, you can customize your own version of the cloud workstation applet.

Uploading files back to the parent project

If you wish to save any files or results from your workstation session, you must upload the files back into the project from which the cloud workstation app was launched (the "parent project"). To allow you to do this, the cloud workstation app is given CONTRIBUTE access to the parent project.

If you have been navigating around your projects, downloading files, you should use the --path option with dx upload to ensure that the files you created are uploaded to the correct project.

$ dx upload --path "$DX_PROJECT_CONTEXT_ID:" <FILE>

To perform a test upload, do the following:

$ dx ls
$ echo "This is a test file" > file_from_workstation.txt
$ dx upload --path "$DX_PROJECT_CONTEXT_ID:" test_file_from_workstation.txt
$ dx ls

You should see the contents of your project change between the first and second invocations of dx ls.

Terminating the session

By default, your workstation will automatically shut down after the maximum session length. However, if you wish to terminate the workstation app before the end of the session, simply use the dx terminate command with the job-ID of this instance of the cloud workstation app, or terminate the job from the web platform.

$ dx terminate $DX_JOB_ID

NOTE: The contents of your workstation will be destroyed upon termination (either manual termination or after the workstation has run for the maximum session length). Remember to upload any files you wish to save before the end of your session.

Cloud Workstation Execution Environment

Instance type

By default, the cloud workstation app will launch on a mem1_ssd1_x4 instance type which has 4 cores, 7.5 GB memory, and 80 GB storage. To run the app on a different instance type, use the --instance-type flag for dx run.

$ dx run --instance-type mem1_ssd1_x32 --ssh app-cloud-workstation

Operating system

The cloud workstation is set up to use Ubuntu 16.04.

Job execution environment vs. local environment

When connecting to the execution environment, you are using the job's credentials to interact with the DNAnexus API. The job has a limited subset of your user's permissions. By default, jobs running the cloud workstation app has VIEW permissions to all projects in which you have VIEW permissions or greater.

The dx select command by default hides projects to which you only have VIEW permissions, so you will want to run dx select --level=VIEW in the execution environment to see those projects.

Customizing the workstation

The provided cloud workstation app provides the minimum functionality for an interactive workstation.

To make your own version of the applet, you can use dx-app-wizard to set up a source code template for your applet. To find the original source code for the app, run dx get app-cloud_workstation.

Some example customizations:

  • Specifying different inputs

  • Prepackage external utilities for use within the worker

  • Change the instance type of the worker

  • Change the access permissions