Using JupyterLab

Jupyter notebooks are a popular way to track the work performed in computational experiments the way a lab notebook tracks the work done in a wet lab setting. DXJupyterLab is an application provided by DNAnexus that allows you to perform computational experiments in the DNAnexus cloud using Jupyter notebooks. DXJupyterLab allows users on the DNAnexus platform to collaborate on notebooks and extends JupyterLab with options for directly accessing a DNAnexus project from the JupyterLab environment.

This page provides an overview of the JupyterLab that can be run on the platform and its DNAnexus-specific features.

NOTE: A license is required to access DXJupyterLab products. Please contact sales@dnanexus.com for more information.

Use cases for DXJupyterLab

JupyterLab can be run on the DNAnexus platform in two different ways: a general-purpose JupyterLab application and a Spark cluster-enabled one, which can be used within the DNAnexus Apollo framework.

Both apps instantiate a JupyterLab server that allows for data analyses to be interactively performed in Jupyter notebooks on a DNAnexus worker.

The DXJupyterLab Spark Cluster app contains all the features found in the general-purpose DXJupyterLab along with access to a fully-managed, on-demand Spark cluster for big data processing and translational informatics.

DXJupyterlab is a versatile application that can be used to:

  • collaborate on exploratory analysis of data

  • reproduce and fork work performed in computational analyses

  • visualize and gain insights into data generated from biological experiments

  • create figures and tables for scientific publications

  • build and test algorithms directly in the cloud before creating DNAnexus apps and workflows

  • test and train machine/deep learning models

  • interactively run commands on a terminal

Creating Interactive Notebooks

A step-by-step guide on how to start with DXJupyterLab and create and edit Jupyter notebooks can be found in the Quickstart.

DXJupyterLab Environments

There are two environments you will use while working in a DXJupyterlab session:

  1. The project on the DNAnexus platform

  2. Worker execution environment

The project on the DNAnexus platform

You have direct access to the project in which the application is run from the JupyterLab session. The project file browser, which lists folders, notebooks, and other files in the project, can be accessed from the DNAnexus tab in the left sidebar or from the terminal:

You can create, edit, and save notebooks directly in the project as well as duplicate, delete, or download them to your local machine. The project is selected when the DXJupyterLab app is started and cannot be subsequently changed.

The project file browser displays all subfolders, all Jupyter notebooks, and files in the project (limited to the 1,000 most recently modified files). In the Spark-enabled app databases in the project are also visible (limited to the 1,000 most recently modified databases). A list of all the objects in the project can be obtained programmatically or by using dx ls. The file listing is refreshed every 10s and it is possible to enforce a refresh by clicking on the whirl arrow icon in the top right corner of the file browser.

Worker execution environment

The notebooks are executed in the temporary execution container, just as any other app code in the DNAnexus platform. When you open and run a notebook from the DNAnexus file browser the kernel corresponding to this notebook is started in this environment and the notebook code will be executed here.

Any input data used by the notebook should first be downloaded to the execution environment. Any outputs from the notebook will also be generated here and should be uploaded to the project before the session ends.

The execution environment file browser can be accessed from the left sidebar (notice the folder icon at the top) or from the terminal:

Collaboration in the Cloud

Collaborators can work on notebooks in the project without the risk of overwriting each other's changes.

Notebook locking during editing

If a user has opened a specific notebook in a JupyterLab session, other users cannot open or edit the notebook. This is indicated by a red lock icon next to the notebook's name.

It is still possible to create a duplicate to see what changes are being saved in the locked notebook or to continue work on this "forked" version of the notebook. To copy a notebook, right-click on its name and select Duplicate; after a few seconds, a notebook with the same name and a "copy" suffix should appear in the project.

Once the editing user closes the notebook, the lock will be released and anybody else with access to the project will be able to open it.

Notebook versioning

Whenever a notebook is saved in the project, it is uploaded to the platform as a new file that will replace the previous version, i.e. the file of the same name. The previous version is moved to the .Notebook_archive folder with a timestamp suffix added to its name and its ID is saved in the properties of the new file. Saving notebooks directly in the project ensures that your analyses won't be lost when the DXJupyterLab session ends.

Session Timeout Control

DXJupyterLab sessions begin with a set duration, after which they will shut down automatically. The timeout clock is displayed in the footer on the right side and it can also be adjusted there (using the Update duration button). Even if the DxJupyterLab webpage is closed, the termination will be executed at the set timestamp. Job lengths have an upper limit of 48 hours, which cannot be extended.

A session can be terminated immediately from the top menu (DNAnexus > End Session).

Environment Snapshots

It is possible to save the current session environment and data and reload it later by creating a session snapshot (DNAnexus > Create Snapshot).

A DXJupyterLab session is run in a Docker container, and a session snapshot file is a tarball generated by saving the Docker container state (with the docker commit and docker save commands). Any installed packages and files created locally are saved to a snapshot file. This file is then uploaded to the project to .Notebook_snapshots and can be passed as input the next time the app is started.

NOTE: If many large files are created locally, the resulting snapshots will take very long to save and load. In general, it is recommended not to snapshot more than 1 GB of locally saved data/packages and rely on downloading larger files as needed.

Viewing Other Files in the Project

Viewing any other file types from your project, such as CSV, JSON, PDF files, images, scripts, etc. is convenient because JupyterLab will render them accordingly. For example, JSON files will be collapsible and easy to navigate and CSV files will be presented in the tabular format.

However, editing and saving any open files from the project other than IPython notebooks will result in an error.

Permissions in the JupyterLab Session

The JupyterLab apps are run in a specific project, defined at start time, and this project cannot be subsequently changed. The job associated with the JupyterLab app has CONTRIBUTE access to the project in which it is run.

When running DXJupyterLab app, it is not possible to view any other projects, while the [DXJupyterLab Spark Cluster]((/developer/jupyter-notebooks/dxjupyterlab-spark-cluster.md)) app has VIEW access to all the projects the user has access to. This enhanced scope is required to be able to read databases which may be located in different projects and are not cloneable.

Running Jobs in the JupyterLab Session

It is possible to start new jobs with dx run from a notebook or the Terminal. The started jobs will be the subjobs of the JupyterLab job associated with the running app. If dx run is executed with the --project argument, this flag is ignored and the JupyterLab job's workspace is used to run the job, not the given project. If a subjob fails or is terminated on the DNAnexus platform, the whole job tree fails or is terminated, therefore the JupyterLab session would stop running in such a case.

Environment and Pre-Installed Packages

The DXJupyterLab app is a Docker-based app that runs the JupyterLab server instance in a Docker container. The server runs on port 443. Because the app is an HTTPS app, you can bring up the JupyterLab environment in a web browser using the URL https://job-xxxx.dnanexus.cloud, where job-xxxx is the ID of the job that runs the app. Only the user who launched the JupyterLab job has access to the JupyterLab environment. Other users will see a “403 Permission Forbidden” message under the JupyterLab session's URL.

NOTE: The JupyterLab server runs in the Miniconda Python 3.6 environment and debian is the container's operating system. This base operating system may change for future releases of DXJupyterlab.

In addition to dx-toolkit, the JupyterLab server comes pre-installed with a number of popular data packages:

  • numpy

  • scipy

  • pandas

  • seaborn

  • matplotlib

Kernels available for the notebooks:

  • Python 3

  • R 3.5

Additional packages can easily be installed during a JupyterLab session. By creating a snapshot, users can then start subsequent sessions with the new packages pre-installed by providing the snapshot as input.

JupyterLab Documentation

You can find more information on the features and benefits of JupyterLab in the official JupyterLab documentation.

Next Steps

  • Create your first notebooks by following Quickstart.

  • References has tips and info on the most useful DXJupyterLab features.