1 of 10

Using JupyterLab

Use Jupyter notebooks on the DNAnexus Platform to craft sophisticated custom analyses in your preferred coding language.

JupyterLab is accessible to all users of the UK Biobank Research Analysis Platform and the Our Future Health Trusted Research Environment.

A license is required to access JupyterLab on the DNAnexus Platform. Contact DNAnexus Sales for more information.

Jupyter notebooks are a popular way to track the work performed in computational experiments the way a lab notebook tracks the work done in a wet lab setting. JupyterLab is an application provided by DNAnexus that allows you to perform computational experiments on the DNAnexus Platform using Jupyter notebooks. JupyterLab allows users on the DNAnexus Platform to collaborate on notebooks and extends with options for directly accessing a DNAnexus project from the JupyterLab environment.

Why Use JupyterLab?

JupyterLab supports the use of Bioconductor and Bioconda, useful tools for bioinformatics analysis.

JupyterLab is a versatile application that can be used to:

Collaborate on exploratory analysis of data
Reproduce and fork work performed in computational analyses
Visualize and gain insights into data generated from biological experiments

The DNAnexus Platform offers two different JupyterLab apps. One is a general-purpose JupyterLab application. The other is Spark cluster-enabled, and can be used within the framework.

Both apps instantiate a JupyterLab server that allows for data analyses to be interactively performed in Jupyter notebooks on a DNAnexus worker.

The app contains all the features found in the general-purpose JupyterLab along with access to a fully-managed, on-demand Spark cluster for big data processing and translational informatics.

Version Information

JupyterLab 2.2 is the default version on the DNAnexus Platform. .

Creating Interactive Notebooks

A step-by-step guide on how to start with JupyterLab and create and edit Jupyter notebooks can be found in the .

JupyterLab Environments

Creating a JupyterLab session requires the use of two different environments:

The DNAnexus project (accessible through the web platform and the CLI).
The worker execution environment.

The Project on the DNAnexus Platform

You have direct access to the project in which the application is run from the JupyterLab session. The project file browser (which lists folders, notebooks, and other files in the project) can be accessed from the DNAnexus tab in the left sidebar or from the :

The project is selected when the JupyterLab app is started and cannot be subsequently changed.

The DNAnexus file browser shows:

Up to 1,000 of your most recently modified files and folders
All Jupyter notebooks in the project
Databases (Spark-enabled app only, limited to 1,000 most recent)

The file list refreshes automatically every 10 seconds. You can also refresh manually by clicking the circular arrow icon in the top right corner.

Need to see more files? Use dx ls in the terminal or access them programmatically through the API.

Worker Execution Environment

When you open and run a notebook from the the kernel corresponding to this notebook is started in the worker execution environment and is used to execute the notebook code. DNAnexus notebooks have a [DX] prepended to the notebook name in the tab of all opened notebooks.

The execution environment file browser is accessible from the left sidebar (notice the folder icon at the top) or from the terminal:

To create Jupyter notebooks in the worker execution environment, use the File menu. These notebooks are stored on the local file system of the JupyterLab execution environment and require persistence in a DNAnexus project. More information about saving appears in the .

Local vs. DNAnexus Notebooks

DNAnexus Notebooks

You can directly in the DNAnexus project as well as duplicate, delete, or download them to your local machine. Notebooks stored in your DNAnexus project, which are housed within the DNAnexus tab on the left sidebar, are fetched from and saved to the project on the DNAnexus Platform without being stored in the JupyterLab execution environment file system. These are referred to as "DNAnexus notebooks" and these notebooks persist in the DNAnexus project after the JupyterLab instance is terminated.

DNAnexus notebooks can be recognized by the [DX] that is prepended to its name in the tab of all opened notebooks.

DNAnexus notebooks can be created by clicking the DNAnexus Notebook icon from the Launcher tab that appears on starting the JupyterLab session, or by clicking the DNAnexus tab on the upper menu and then clicking "New notebook". The Launcher tab can also be opened by clicking File and then selecting "New Launcher" from the upper menu.

Local Notebooks

To create a new local notebook, click the File tab in the upper menu and then select "New" and then "Notebook". These non-DNAnexus notebooks can be saved to DNAnexus by dragging and dropping them in the DNAnexus file viewer in the left panel.

Accessing Data

In JupyterLab, users can access input data that is located in a DNAnexus project in one of the following ways.

For reading the input file multiple times or for reading a large fraction of the file in random order:
- Download the file from the DNAnexus project to the execution environment with dx download and access the downloaded local file from Jupyter notebook.

Uploading Data

Files, such as local notebooks, can be persisted in the DNAnexus project by using one of these options:

dx upload in bash console.
Drag the file onto the DNAnexus tab that is in the column of icons on the left side of the screen. This uploads the file into the selected DNAnexus folder.

Exporting DNAnexus Notebooks

Exporting DNAnexus notebooks to formats such as HTML or PDF is not supported. However, you can dx download the DNAnexus notebook from the current DNAnexus project to the JupyterLab environment and export the downloaded notebook. For exporting local notebook to certain formats, the following commands might be needed beforehand: apt-get update && apt-get install texlive-xetex texlive-fonts-recommended texlive-plain-generic.

Non-Interactive Execution of Notebooks

A command can be executed in the JupyterLab worker execution environment without starting an interactive JupyterLab server. To do that, provide the cmd input and additional input files using the in input file array to the JupyterLab app. The provided command runs in the /opt/notebooks/directory and any output files generated in this directory are uploaded to the project and returned in the out output field of the job that ran the JupyterLab app.

The cmd input makes it possible to use the papermill command that is pre-installed in the JupyterLab environment to execute notebooks non-interactively. For example, to execute all the cells in a notebook and produce an output notebook:

Where notebook.ipynb is the input notebook to the papermill command, which is passed to the dxjupyterlab app using the in input, and output_notebook.ipynb is the name of the output notebook, which contains the result of executing the input notebook and is uploaded to the project at the end of app's execution. See the for details.

Collaboration in the Cloud

Collaborators can work on notebooks in the project without the risk of overwriting each other's changes.

Notebook Locking During Editing

If a user has opened a specific notebook in a JupyterLab session, other users cannot open or edit the notebook. This is indicated by a red lock icon next to the notebook's name.

It is still possible to create a duplicate to see what changes are being saved in the locked notebook or to continue work on this "forked" version of the notebook. To copy a notebook, right-click on its name and select Duplicate. After a few seconds, a notebook with the same name and a "copy" suffix should appear in the project.

Once the editing user closes the notebook, the lock is released and anybody else with access to the project can open it.

Notebook Versioning

Whenever a notebook is saved in the project, it is uploaded to the platform as a new file that replaces the previous version, that is, the file of the same name. The previous version is moved to the .Notebook_archive folder with a timestamp suffix added to its name and its ID is saved in the properties of the new file. Saving notebooks directly in the project ensures that your analyses are not lost when the JupyterLab session ends.

If a notebook saved to the project exceeds 20 MB, it may no longer open in JupyterLab and could trigger a "JSON Parse Error." To recover your code, open an earlier version from the .Notebook_archive folder, or download the notebook to your local machine and clear the notebook's outputs using a local Jupyter editor before re-uploading.

Session Timeout Control

JupyterLab sessions begin with a set duration and shut down automatically at the end of this period. The timeout clock appears in the footer on the right side and can be adjusted using the Update duration button. The session terminates at the set timestamp even if the JupyterLab webpage is closed. Job lengths have an upper limit of 30 days, which cannot be extended.

A session can be terminated immediately from the top menu (DNAnexus > End Session).

Environment Snapshots

It is possible to save the current session environment and data and reload it later by creating a session snapshot (DNAnexus > Create Snapshot).

A JupyterLab session is , and a session snapshot file is a tarball generated by saving the Docker container state (with the docker commit and docker save commands). Any installed packages and files created locally are saved to a snapshot file, except for directories /home/dnanexus and /mnt/, which are not included. This file is then uploaded to the project to .Notebook_snapshots and can be passed as input the next time the app is started.

If many large files are created locally, the resulting snapshots take a long time to save and load. In general, it is recommended not to snapshot more than 1 GB of locally saved data/packages and rely on downloading larger files as needed.

Snapshots Created in Older Versions of JupyterLab

Snapshots created with JupyterLab versions older than 2.0.0 (released mid-2023) are not compatible with the current version. These previous snapshots contain tool versions that may conflict with the newer environment, potentially causing problems.

Using Previous Snapshots in the Current Version of JupyterLab

To use a snapshot from a previous version in the current version of JupyterLab, recreate the snapshot as follows:

Create a tarball incorporating all the necessary data files and packages.
Save the tarball in a project.
Launch the current version of JupyterLab.

Accessing an Older Snapshot in an Older Version of JupyterLab

If you don't want to have to recreate your older snapshot, you can run an and access the snapshot there.

Viewing Other Files in the Project

Viewing any other file types from your project, such as CSV, JSON, PDF files, images, or scripts, is convenient because JupyterLab displays them accordingly. For example, JSON files are collapsible and navigable and CSV files are presented in the tabular format.

However, editing and saving any open files from the project other than IPython notebooks results in an error.

Files larger than 20 MB display only their metadata in the JupyterLab file viewer. To access the full contents of a large file, download it using or the , or use the DNAnexus file browser on the platform.

Permissions in the JupyterLab Session

The JupyterLab apps are run in a specific project, defined at start time, and this project cannot be subsequently changed. The job associated with the JupyterLab app has CONTRIBUTE access to the project in which it is run.

When running the JupyterLab app, it is possible to view, but not update, other projects the user has access to. This enhanced scope is required to be able to read databases which may be located in different projects and cannot be cloned.

Running Jobs in the JupyterLab Session

Use dx run to start new jobs from within a notebook or the terminal. If the billTo for the project where your JupyterLab session runs does not have a license for detached executions, any started jobs run as subjobs of your interactive JupyterLab session. In this situation, the --project argument for dx run is ignored, and the job uses the JupyterLab session's workspace instead of the specified project. If a subjob fails or terminates on the DNAnexus Platform, the entire job tree—including your interactive JupyterLab session—terminates as well.

Jobs are limited to a runtime of 30 days. The system automatically terminates jobs running longer than 30 days.

Environment and Feature Options

The JupyterLab app is a Docker-based app that runs the JupyterLab server instance in a Docker container. The server runs on port 443. Because it's an HTTPS app, you can bring up the JupyterLab environment in a web browser using the URL https://job-xxxx.dnanexus.cloud, where job-xxxx is the ID of the job that runs the app. Only the user who launched the JupyterLab job has access to the JupyterLab environment. Other users see a "403 Permission Forbidden" message under the JupyterLab session's URL.

On the DNAnexus Platform, the JupyterLab server runs in a Python 3.9.16 environment, in a container running Ubuntu 20.04 as its operating system.

Feature Options

When launching JupyterLab, the feature options available are PYTHON_R, ML, IMAGE_PROCESSING, STATA, and MONAI_ML.

PYTHON_R (default option): Loads the environment with Python3 and R kernel and interpreter.
ML: Loads the environment with Python3 and machine learning packages, such as TensorFlow, PyTorch, CNTK as well as the image processing package Nipype, but it does not contain R.
IMAGE_PROCESSING

The JupyterLab environment is headless and command-line only. While FSL and FreeSurfer command-line tools are available for batch processing, GUI viewers such as fsleyes and freeview cannot be launched. To visualize results interactively, download the output files to your local machine.

STATA: Requires a license to run. See for more information about running Stata in JupyterLab.
MONAI_ML: Loads the environment with Python3 and extends the ML feature. This feature is ideal for medical imaging research involving machine learning model development and testing. It includes medical imaging frameworks designed for AI-powered analysis. For details, see .

For the full list of pre-installed packages, see the . This list includes details on feature-specific packages available when running the PYTHON_R, ML, IMAGE_PROCESSING, STATA, and MONAI_ML features.

Installing Additional Packages

Additional packages can be during a JupyterLab session. By creating a Docker container , users can then start subsequent sessions with the new packages pre-installed by providing the snapshot as input.

JupyterLab Documentation

For more information on the features and benefits of JupyterLab, see the .

Next Steps

Create your first notebooks by following the instructions in the guide.
See the guide for tips and info on the most useful JupyterLab features.

JupyterLab Quickstart

In this tutorial, you will learn how to create and run a notebook in JupyterLab on the platform, download data from the notebook, and upload results to the platform.

JupyterLab is accessible to all users of the UK Biobank Research Analysis Platform and the Our Future Health Trusted Research Environment.

A license is required to access JupyterLab on the DNAnexus Platform. Contact DNAnexus Sales for more information.

Run a JupyterLab Session and Create Notebooks

1. Launch JupyterLab and View the Project

First, launch JupyterLab in the project of your choice, as described in the guide.

After starting your JupyterLab session, click on the DNAnexus tab on the left sidebar to see all the files and folders in the project.

2. Create an Empty Notebook

To create a new empty notebook in the DNAnexus project, select DNAnexus > New Notebook from the top menu.

This creates an untitled ipynb file, viewable in the DNAnexus project browser, which refreshes every few seconds.

To rename your file, right-click on its name and select Rename.

3. Edit and Save the Notebook in the Project

You can open and edit the newly created notebook directly from the project (accessible from the DNAnexus tab in the left sidebar). To save your changes, press Ctrl+S (or Command+S on macOS), or click on the save icon in the Toolbar (an area below the tab bar at the top). A new notebook version lands in the project, and you should see in the "Last modified" column that the file was created recently.

Since DNAnexus files are immutable, each notebook save creates a new version in the project, replacing the file of the same name. The previous version moves to the .Notebook_archive with a timestamp suffix added to its name. Saving notebooks directly in the project as new files preserves your analyses beyond the JupyterLab session's end.

4. Download the Data to the Execution Environment

To process your data in the notebook, the data must be available in the execution environment (as is the case with any DNAnexus app).

You can for your notebook using dx download in a notebook cell:

You can also use the to execute the dx command.

5. Upload Data to the Project

For any data generated by your notebook that needs to be preserved, before the session ends and the JupyterLab worker terminates. Upload data directly in the notebook by running dx upload from a notebook cell or from the terminal:

If you create a notebook from the Launcher or from the top menu (File > New > Notebook), the notebook is not created in the project but in the . To move it to the project, you must upload it to the project manually. Make sure you before the session expires, or work on your notebooks directly from the project, so as not to lose your work.

Next Steps

Check the guide for tips on the most useful operations and features in JupyterLab.

Running JupyterLab

Learn to launch a JupyterLab session on the DNAnexus Platform, via the JupyterLab app.

JupyterLab is accessible to all users of the UK Biobank Research Analysis Platform and the Our Future Health Trusted Research Environment.

For DNAnexus Platform users, a license is required to access JupyterLab. Contact DNAnexus Sales for more information.

Running from the UI

In the main menu, navigate to Tools > JupyterLab. If you have used JupyterLab before, the page shows your previous sessions across different projects.
Click New JupyterLab.
Configure your JupyterLab session:

Snapshots created using older versions of JupyterLab are incompatible with the current version. If you need to use an older JupyterLab snapshot, see .

For a detailed list of libraries included in each feature option, see the .

Running JupyterLab from the CLI

You can start the JupyterLab environment directly from the command line by running the app:

Once the app starts, you may check if the JupyterLab server is ready to server connections, which is indicated by the job's property httpsAppState set to running. Once it is running, you can open your browser and go to https://job-xxxx.dnanexus.cloud where job-xxxx is the ID of the job running the app.

To run the Spark version of the app, use the command:

You can check the optional input parameters for the apps on the DNAnexus Platform (platform login required to access the links):

From the CLI, you can learn more about dx run with the following command:

where APP_NAME is either app-dxjupyterlab or app-dxjupyterlab_spark_cluster.

Next Steps

See the and pages for more details on how to use JupyterLab.

FreeSurfer in JupyterLab

Learn how to use FreeSurfer in JupyterLab.

JupyterLab is accessible to all users of the UK Biobank Research Analysis Platform and the Our Future Health Trusted Research Environment.

A license is required to access JupyterLab on the DNAnexus Platform. Contact DNAnexus Sales for more information.

About FreeSurfer

FreeSurfer is a software package for the analysis and visualization of structural and functional neuroimaging data from cross-sectional or longitudinal studies.

The FreeSurfer package comes pre-installed with the IMAGE_PROCESSING .

FreeSurfer License Registration

To use FreeSurfer on the DNAnexus Platform, you need a valid FreeSurfer license. You can register for the FreeSurfer license at the .

Using the FreeSurfer License on DNAnexus

To use the FreeSurfer license, complete the following steps:

Upload the license text file to your project on the DNAnexus Platform.
Launch the JupyterLab app and specify the IMAGE_PROCESSING feature.
Once JupyterLab is running, open your existing notebook (or a new notebook) and download the license file into the FREESURFER_HOME directory.

The commands to download the license file are as follows:

Python kernel: !dx download license.txt -o $FREESURFER_HOME
Bash kernel: dx download license.txt -o $FREESURFER_HOME

Spark Cluster-Enabled JupyterLab

Learn to use the JupyterLab Spark Cluster app.

JupyterLab is accessible to all users of the UK Biobank Research Analysis Platform and the Our Future Health Trusted Research Environment.

A license is required to access JupyterLab on the DNAnexus Platform. Contact DNAnexus Sales for more information.

Overview

The JupyterLab Spark Cluster app is a that runs a fully-managed standalone Spark/Hadoop cluster. This cluster enables distributed data processing and analysis from directly within the JupyterLab application. In the JupyterLab session, you can interactively create and query DNAnexus databases or run any analysis on the Spark cluster.

Besides the core JupyterLab features, the Spark cluster-enabled JupyterLab app allows you to:

Explore the available databases and get an overview of the available datasets
Perform analyses and visualizations directly on data available in the database
Create databases

Check the general for an introduction to JupyterLab.

Running and Using JupyterLab Spark Cluster

The page contains information on how to start a JupyterLab session and create notebooks on the DNAnexus Platform. The page has additional useful tips for using the environment.

Instantiating the Spark Context

Having created your notebook in the project, you can populate your first cells as below. It is good practice to instantiate your Spark context at the beginning of your analyses, as shown below.

Basic Operations on DNAnexus Databases

Exploring Existing Databases

To view any databases to which you have access to in your current region and project context, run a cell with the following code:

A sample output should be:

You can inspect one of the returned databases by running:

which should return an output similar to:

To find a database in your current region that may be in a different project than your current context, run the following code:

A sample output should be:

To inspect one of the databases listed in the output, use the unique database name. If you use only the database name, results are limited to the current project. For example:

Creating Databases

Here's an example of how to create and populate your own database:

You can separate each line of code into different cells to view the outputs iteratively.

Using Hail

is an open-source, scalable framework for exploring and analyzing genomic data. It is designed to run primarily on a Spark cluster and is available with JupyterLab Spark Cluster. It is included in the app and can be used when the app is run with the feature input set to HAIL (set as default).

Initialize the context when beginning to use Hail. It's important to pass previously started Spark Context sc as an argument:

We recommend continuing your exploration of Hail with the . For example:

Using VEP with Hail

To use (Ensembl Variant Effect Predictor) with Hail, select "Feature," then "HAIL" when launching Spark Cluster-Enabled JupyterLab via the CLI.

VEP can predict the functional effects of genomic variants on genes, transcripts, protein sequences, and regulatory regions. This includes the , which is activated when using the configuration file below.

Add the following JSON configuration file to your DNAnexus project:

Once the vep-GRCh38.json file is in your project, you can annotate the Hail MatrixTable (mt) using the following command:

Behind the Scenes

The Spark cluster app is a Docker-based app which runs the JupyterLab server in a Docker container.

The JupyterLab instance runs on port 443. Because it is an HTTPS app, you can bring up the JupyterLab environment in a web browser using the URL https://job-xxxx.dnanexus.cloud, where job-xxxx is the ID of the job that runs the app.

The script run at the instantiation of the container, /opt/start_jupyterlab.sh, configures the environment and starts the server needed to connect to the Spark cluster. The environment variables needed are set by sourcing two scripts, bind-mounted into the container:

The default user in the container is root.

The option --network host is used when starting Docker to remove the network isolation between the host and the Docker container, which allows the container to bind to the host's network and access Sparks master port directly.

Accessing AWS S3 Buckets

S3 buckets can have private or public access. Either the s3 or the s3a scheme can be used to access S3 buckets. The s3 scheme is automatically aliased to s3a in all Apollo Spark Clusters.

Public Bucket Access

To access public s3 buckets, you do not need to have s3 credentials. The example below shows how to access the public 1000Genomes bucket in a JupyterLab notebook:

When the above is run in a notebook, the following is displayed:

Private Bucket Access

To access private buckets, see the example code below. The example assumes that a Spark session has been created as shown above.

Exploring and Querying Datasets

A license is required to access Spark functionality on the DNAnexus Platform. for more information.

Extracting Data From a Dataset With Spark

MONAI in JupyterLab

Using MONAI Core, MONAI Label/3D Slicer (SlicerJupyter) via JupyterLab

Medical Open Network for AI () is a framework built for deep learning in healthcare imaging. To use MONAI on the DNAnexus Platform, with the MONAI_ML feature, which includes:

: PyTorch-based framework for deep learning in healthcare imaging.
: An intelligent image labeling and learning tool designed to create training datasets and build AI annotation models. It provides a server-client framework that integrates with imaging viewers.

Stata in JupyterLab

Using Stata via JupyterLab, working with project files, and creating datasets with Spark.

Stata is a powerful statistics package for data science. Stata commands and functionality can be accessed on the DNAnexus Platform via stata_kernel, in Jupyter notebooks.

Before You Begin

Project License Requirement

On the DNAnexus Platform, use the to create and edit Jupyter notebooks.

You can only run this app within a project that's billed to an account with a license that allows the use of both and . if you need to upgrade your license.

JupyterLab is accessible to all users of the UK Biobank Research Analysis Platform and the Our Future Health Trusted Research Environment. A license is required to access JupyterLab on the DNAnexus Platform. for more information.

Stata License Requirement

To use Stata on the DNAnexus Platform, you need a valid Stata license. Before launching Stata in a project, you must save your license details according to the instructions below in a plain text file with the extension .json, then upload this file to the project's root directory. You only need to do this once per project.

Creating a Stata License Details File

Start by creating the file in a text editor, including all the fields shown here, where <user> is your DNAnexus username, and<organization>is the org of which you're a member:

Save the file according to the following format, where <username> is your DNAnexus username: .stataSettings.user-<username>.json

Some operating systems may not support the naming of files with a "." as the first character. If this is the case, you can rename the .json file after uploading it to your project by hovering over the name of your file and clicking the pencil icon that appears.

Uploading the Stata License Details File

Open the project in which you want to use Stata. Upload the Stata license details file to the project's root directory by going to your project's Manage tab, clicking on the Add button on the upper right, and then selecting the Upload data option.

Secure Indirect Format Option for Shared Projects

When working in a shared project, you can take an additional step to avoid exposing your Stata license details to project collaborators.

Create a private project. Then create and save a Stata license details file in that project's root directory, per the instructions above.

Within the shared project, create and save a Stata license details file in this format, where project-yyyy is the name of the private project, and file-xxxx is the license details file ID, in that private project:

When working on the Research Analysis Platform, you can only create a private credentials project from the .

Launching JupyterLab

Open the project in which you want to use Stata. From within the project's Manage tab, click the Start Analysis button.
Select the app JupyterLab with Python, R, Stata, ML, Image Processing.
Click the Run Selected button. If you haven't run this app before, you are prompted to install it. Next, you are taken to the Run Analysis screen.

The app can take some time to load and start running.

Once the analysis starts, you see the notification "Running" appear under the name of the app.

Opening JupyterLab

Click the Monitor tab heading. This opens a list of running and past jobs. Jobs are shown in reverse chronological order, with the most recently launched at the top. The topmost row should show the job you launched. To open the job and enter the JupyterLab interface, click on the URL shown under Worker URL.

If you do not see the worker URL, click on the name of the job in the Monitor page.

Using Stata Within JupyterLab

Within the JupyterLab interface, open the DNAnexus tab shown at the left edge of the screen.

Open a new Stata notebook by clicking the Stata tile in the Notebooks section.

Working with Project Files

You can download DNAnexus data files to the JupyterLab container from Stata notebook with:

Data files in the current project can also be accessed using a /mnt/project folder from a Stata notebook as follows: To load a DTA file:

To load a CSV file:

To write a DTA file to the JupyterLab container:

To write a CSV file to the JupyterLab container:

To upload a data file from the JupyterLab container to the project, use the following command in a Stata notebook:

Alternatively, open a new Launcher tab, open Terminal, and run:

The /mnt/project directory is read-only, so trying to write to it results in an error.

Creating a Stata Dataset with Spark

can be used to query and filter DNAnexus returning a PySpark DataFrame. PySpark Dataframe can be converted to a pandas DataFrame with:

Pandas dataframe can be exported to CSV or Stata DTA files in the JupyterLab container with:

To upload a data file from the JupyterLab container to the DNAnexus project in the JupyterLab Spark Cluster app, use

Once saved to the project, data files can be used in a JupyterLab Stata session using the instructions above.

Running Older Versions of JupyterLab

Learn how to run an older version of JupyterLab via the user interface or command-line interface.

Why Run an Older Version of JupyterLab?

The primary reason to run an older version of JupyterLab is to access snapshots containing tools that cannot be run in the current version's execution environment.

JupyterLab Reference

This page is a reference for most useful operations and features in the JupyterLab environment.

JupyterLab is accessible to all users of the UK Biobank Research Analysis Platform and the Our Future Health Trusted Research Environment.

A license is required to access JupyterLab on the DNAnexus Platform. for more information.

Using JupyterLab

Use Jupyter notebooks on the DNAnexus Platform to craft sophisticated custom analyses in your preferred coding language.

JupyterLab is accessible to all users of the UK Biobank Research Analysis Platform and the Our Future Health Trusted Research Environment.

A license is required to access JupyterLab on the DNAnexus Platform. Contact DNAnexus Sales for more information.

Why Use JupyterLab?

JupyterLab supports the use of Bioconductor and Bioconda, useful tools for bioinformatics analysis.

JupyterLab is a versatile application that can be used to:

Collaborate on exploratory analysis of data
Reproduce and fork work performed in computational analyses
Visualize and gain insights into data generated from biological experiments

The DNAnexus Platform offers two different JupyterLab apps. One is a general-purpose JupyterLab application. The other is Spark cluster-enabled, and can be used within the framework.

Both apps instantiate a JupyterLab server that allows for data analyses to be interactively performed in Jupyter notebooks on a DNAnexus worker.

The app contains all the features found in the general-purpose JupyterLab along with access to a fully-managed, on-demand Spark cluster for big data processing and translational informatics.

Version Information

JupyterLab 2.2 is the default version on the DNAnexus Platform. .

Creating Interactive Notebooks

A step-by-step guide on how to start with JupyterLab and create and edit Jupyter notebooks can be found in the .

JupyterLab Environments

Creating a JupyterLab session requires the use of two different environments:

The DNAnexus project (accessible through the web platform and the CLI).
The worker execution environment.

The Project on the DNAnexus Platform

The project is selected when the JupyterLab app is started and cannot be subsequently changed.

The DNAnexus file browser shows:

Up to 1,000 of your most recently modified files and folders
All Jupyter notebooks in the project
Databases (Spark-enabled app only, limited to 1,000 most recent)

The file list refreshes automatically every 10 seconds. You can also refresh manually by clicking the circular arrow icon in the top right corner.

Need to see more files? Use dx ls in the terminal or access them programmatically through the API.

Worker Execution Environment

The execution environment file browser is accessible from the left sidebar (notice the folder icon at the top) or from the terminal:

Local vs. DNAnexus Notebooks

DNAnexus Notebooks

DNAnexus notebooks can be recognized by the [DX] that is prepended to its name in the tab of all opened notebooks.

Local Notebooks

Accessing Data

In JupyterLab, users can access input data that is located in a DNAnexus project in one of the following ways.

For reading the input file multiple times or for reading a large fraction of the file in random order:
- Download the file from the DNAnexus project to the execution environment with dx download and access the downloaded local file from Jupyter notebook.

Uploading Data

Files, such as local notebooks, can be persisted in the DNAnexus project by using one of these options:

dx upload in bash console.
Drag the file onto the DNAnexus tab that is in the column of icons on the left side of the screen. This uploads the file into the selected DNAnexus folder.

Exporting DNAnexus Notebooks

Non-Interactive Execution of Notebooks

Collaboration in the Cloud

Collaborators can work on notebooks in the project without the risk of overwriting each other's changes.

Notebook Locking During Editing

If a user has opened a specific notebook in a JupyterLab session, other users cannot open or edit the notebook. This is indicated by a red lock icon next to the notebook's name.

Once the editing user closes the notebook, the lock is released and anybody else with access to the project can open it.

Notebook Versioning

Session Timeout Control

A session can be terminated immediately from the top menu (DNAnexus > End Session).

Environment Snapshots

It is possible to save the current session environment and data and reload it later by creating a session snapshot (DNAnexus > Create Snapshot).

Snapshots Created in Older Versions of JupyterLab

Using Previous Snapshots in the Current Version of JupyterLab

To use a snapshot from a previous version in the current version of JupyterLab, recreate the snapshot as follows:

Create a tarball incorporating all the necessary data files and packages.
Save the tarball in a project.
Launch the current version of JupyterLab.

Accessing an Older Snapshot in an Older Version of JupyterLab

If you don't want to have to recreate your older snapshot, you can run an and access the snapshot there.

Viewing Other Files in the Project

However, editing and saving any open files from the project other than IPython notebooks results in an error.

Permissions in the JupyterLab Session

Running Jobs in the JupyterLab Session

Jobs are limited to a runtime of 30 days. The system automatically terminates jobs running longer than 30 days.

Environment and Feature Options

On the DNAnexus Platform, the JupyterLab server runs in a Python 3.9.16 environment, in a container running Ubuntu 20.04 as its operating system.

Feature Options

When launching JupyterLab, the feature options available are PYTHON_R, ML, IMAGE_PROCESSING, STATA, and MONAI_ML.

PYTHON_R (default option): Loads the environment with Python3 and R kernel and interpreter.
ML: Loads the environment with Python3 and machine learning packages, such as TensorFlow, PyTorch, CNTK as well as the image processing package Nipype, but it does not contain R.
IMAGE_PROCESSING

STATA: Requires a license to run. See for more information about running Stata in JupyterLab.
MONAI_ML: Loads the environment with Python3 and extends the ML feature. This feature is ideal for medical imaging research involving machine learning model development and testing. It includes medical imaging frameworks designed for AI-powered analysis. For details, see .

Using JupyterLab

hashtagWhy Use JupyterLab?

hashtagVersion Information

hashtagCreating Interactive Notebooks

hashtagJupyterLab Environments

hashtagThe Project on the DNAnexus Platform

hashtagWorker Execution Environment

hashtagLocal vs. DNAnexus Notebooks

hashtagDNAnexus Notebooks

hashtagLocal Notebooks

hashtagAccessing Data

hashtagUploading Data

hashtagExporting DNAnexus Notebooks

hashtagNon-Interactive Execution of Notebooks

hashtagCollaboration in the Cloud

hashtagNotebook Locking During Editing

hashtagNotebook Versioning

hashtagSession Timeout Control

hashtagEnvironment Snapshots

hashtagSnapshots Created in Older Versions of JupyterLab

hashtagUsing Previous Snapshots in the Current Version of JupyterLab

hashtagAccessing an Older Snapshot in an Older Version of JupyterLab

hashtagViewing Other Files in the Project

hashtagPermissions in the JupyterLab Session

hashtagRunning Jobs in the JupyterLab Session

hashtagEnvironment and Feature Options

hashtagFeature Options

hashtagInstalling Additional Packages

hashtagJupyterLab Documentation

hashtagNext Steps

JupyterLab Quickstart

hashtagRun a JupyterLab Session and Create Notebooks

hashtag1. Launch JupyterLab and View the Project

hashtag2. Create an Empty Notebook

hashtag3. Edit and Save the Notebook in the Project

hashtag4. Download the Data to the Execution Environment

hashtag5. Upload Data to the Project

hashtagNext Steps

Running JupyterLab

hashtagRunning from the UI

hashtagRunning JupyterLab from the CLI

hashtagNext Steps

FreeSurfer in JupyterLab

hashtagAbout FreeSurfer

hashtagFreeSurfer License Registration

hashtagUsing the FreeSurfer License on DNAnexus

Spark Cluster-Enabled JupyterLab

hashtagOverview

hashtagRunning and Using JupyterLab Spark Cluster

hashtagInstantiating the Spark Context

hashtagBasic Operations on DNAnexus Databases

hashtagExploring Existing Databases

hashtagCreating Databases

hashtagUsing Hail

hashtagUsing VEP with Hail

hashtagBehind the Scenes

hashtagAccessing AWS S3 Buckets

hashtagPublic Bucket Access

hashtagPrivate Bucket Access

Exploring and Querying Datasets

hashtagExtracting Data From a Dataset With Spark

MONAI in JupyterLab

Stata in JupyterLab

hashtagBefore You Begin

hashtagProject License Requirement

hashtagStata License Requirement

hashtagCreating a Stata License Details File

hashtagUploading the Stata License Details File

hashtagSecure Indirect Format Option for Shared Projects

hashtagLaunching JupyterLab

hashtagOpening JupyterLab

hashtagUsing Stata Within JupyterLab

hashtagWorking with Project Files

hashtagCreating a Stata Dataset with Spark

Running Older Versions of JupyterLab

hashtagWhy Run an Older Version of JupyterLab?

JupyterLab Reference

JupyterLab Quickstart

hashtagRun a JupyterLab Session and Create Notebooks

hashtag1. Launch JupyterLab and View the Project

Why Use JupyterLab?

Version Information

Creating Interactive Notebooks

JupyterLab Environments

The Project on the DNAnexus Platform

Worker Execution Environment

Local vs. DNAnexus Notebooks

DNAnexus Notebooks

Local Notebooks

Accessing Data

Uploading Data

Exporting DNAnexus Notebooks

Non-Interactive Execution of Notebooks

Collaboration in the Cloud

Notebook Locking During Editing

Notebook Versioning

Session Timeout Control

Environment Snapshots

Snapshots Created in Older Versions of JupyterLab

Using Previous Snapshots in the Current Version of JupyterLab

Accessing an Older Snapshot in an Older Version of JupyterLab

Viewing Other Files in the Project

Permissions in the JupyterLab Session

Running Jobs in the JupyterLab Session

Environment and Feature Options

Feature Options

Installing Additional Packages

JupyterLab Documentation

Next Steps

Run a JupyterLab Session and Create Notebooks

1. Launch JupyterLab and View the Project

2. Create an Empty Notebook

3. Edit and Save the Notebook in the Project

4. Download the Data to the Execution Environment

5. Upload Data to the Project

Next Steps

Running from the UI

Running JupyterLab from the CLI

Next Steps

About FreeSurfer

FreeSurfer License Registration

Using the FreeSurfer License on DNAnexus

Overview

Running and Using JupyterLab Spark Cluster

Instantiating the Spark Context

Basic Operations on DNAnexus Databases

Exploring Existing Databases

Creating Databases

Using Hail

Using VEP with Hail

Behind the Scenes

Accessing AWS S3 Buckets

Public Bucket Access

Private Bucket Access

Extracting Data From a Dataset With Spark

Before You Begin

Project License Requirement

Stata License Requirement

Creating a Stata License Details File

Uploading the Stata License Details File

Secure Indirect Format Option for Shared Projects

Launching JupyterLab

Opening JupyterLab

Using Stata Within JupyterLab

Working with Project Files

Creating a Stata Dataset with Spark

Why Run an Older Version of JupyterLab?

Run a JupyterLab Session and Create Notebooks

1. Launch JupyterLab and View the Project

2. Create an Empty Notebook

3. Edit and Save the Notebook in the Project

4. Download the Data to the Execution Environment

5. Upload Data to the Project

Next Steps

About FreeSurfer

FreeSurfer License Registration

Using the FreeSurfer License on DNAnexus

Using MONAI Core

Using MONAI Label with 3D Slicer

Running from the UI