Skip to content

Server configuration

This document describes the different options for configuring a datalab instance. It is primarily intended for those who are deploying datalab on persistent hardware, but may also be useful for developers. Deployment instructions can be found under "Deploying datalab and server administration".

datalab has 3 main configuration sources.

  1. The Python ServerConfig (described below) that allows for datalab-specific configuration, such as database connection info, filestore locations and remote filesystem configuration. .
    • This can be provided via a JSON or YAML config file at the location provided by the PYDATALAB_CONFIG_FILE environment variable, or as environment variables themselves, prefixed with PYDATALAB_. The available configuration variables and their default values are listed below.
  2. Additional server configuration provided as environment variables, such as secrets like the Flask server's SECRET_KEY, API keys for external services (e.g., SMTP MAIL_PASSWORD) and OAuth client credentials (for logging in via GitHub, ORCID, etc.). These can be provided as either:
    • environment variables with the appropriate FLASK_ or PYDATALAB_ prefix (for options that are also in the config model from option 1.)
    • an .env file in the directory from which pydatalab is launched (NB: here, the FLASK_ prefix is not required, but any options present in the pydatalab config must still have the PYDATALAB_ prefix).
  3. Web app configuration, such as the URL of the relevant datalab API and branding (logo URLs, external homepage links).
    • These are typically provided as a .env file in the directory from which the webapp is built/served.
    • The main options include (a full list can be found in the docker-compose.yml file):
      • VUE_APP_API_URL: the URL of the datalab API, which is used by the web app to communicate with the server.
      • VUE_APP_LOGO_URL: the URL of an image to use as the logo header in the web app.
      • VUE_APP_HOMEPAGE_URL: a URL to provide as a link from the web app header.
      • VUE_APP_EDITABLE_INVENTORY: whether the inventory can be edited by non-admin users in the web app.
      • VUE_APP_WEBSITE_TITLE: the title of the web app, which is displayed in the browser tab and header.
      • VUE_APP_QR_CODE_RESOLVER_URL: the URL of a service that can resolve QR codes to datalab entries, which is used by the web app to display QR codes for entries (see datalab-org/datalab-purl for more information).
      • VUE_APP_AUTOMATICALLY_GENERATE_ID_DEFAULT: whether to automatically generate IDs for new entries in the web app by default, or require a checkbox to be ticked at item creation.

Note

The possible ways to set configuration options can be inconsistent with each other, e.g., values required to be None in Python should be set to null in the JSON config file and as .env values. Similarly, boolean values may be set to true or false in the JSON config file, but can be set to {1, yes, true} or {0, no, false} in a .env file.

Mandatory settings

There is only one mandatory setting when creating a deployment. This is the IDENTIFIER_PREFIX, which shall be prepended to every entry's refcode to enable global uniqueness of datalab entries. For now, the prefixes themselves are not checked for uniqueness across the fledgling datalab federation, but will in the future.

This prefix should be set to something relatively short (max 10 chars.) that describes your group or your deployment, e.g., the PI's surname, project ID or department.

This can be set either via a config file, or as an environment variable (e.g., PYDATALAB_IDENTIFIER_PREFIX='grey'). Be warned, if the prefix changes between server launches, all entries will have to be migrated manually to the desired prefix, or maintained at the old prefix.

User registration & authentication

datalab has two supported user registration/authentication mechanisms:

  1. OAuth2 via an OAuth provider like GitHub, ORCID or Google and Microsoft.
  2. via magic links sent to email addresses

Each is configured differently. If left unconfigured, then the corresponding registration mechanism will not be available to the user.

To support sign-in via email magic-links, you must currently provide additional configuration for authorized SMTP server. The SMTP server must be configured via the settings EMAIL_AUTH_SMTP_SETTINGS, with expected values MAIL_SERVER, MAIL_USER, MAIL_DEFAULT_SENDER, MAIL_PORT and MAIL_USE_TLS, following the environment variables described in the Flask-Mail documentation. The MAIL_PASSWORD setting should then be provided via a .env file.

Third-party options with a free tier include resend, which can be configured to use an appropriate API key, after verifying ownership of the MAIL_DEFAULT_SENDER address via DNS (see resend for an example configuration).

The email addresses that are allowed to sign up can be restricted by domain/subdomain using the EMAIL_DOMAIN_ALLOW_LIST setting.

OAuth2

OAuth2 allows users to log in using their existing accounts with third-party providers, without the need for a password. Generally, you register an application with the provider, which gives you a client ID and secret that you can use to configure the OAuth2 settings in datalab.

Each provider then has bespoke settings to control the permissions that accounts registered via the external provider will have.

For developers, if you are testing locally without HTTPS, you must also set OAUTHLIB_INSECURE_TRANSPORT=1 and OAUTHLIB_RELAX_TOKEN_SCOPE=1 in your environment to circumvent security requirements; this should not be used in production.

Note

A common user confusion occurs when they register an account via an OAuth provider, but then try to log in via email magic link (or vice versa) (or via another OAuth provider). These accounts will not be associated with each other, so the user will end up with multiple accounts that they have to log in to separately. Admins can merge the accounts manually, or simply delete the duplicates and ask the user to login via the appropriate method before trying to connect the other external account.

GitHub OAuth2

For GitHub, you must register a GitHub OAuth application for your instance, providing the client ID and secret in the .env for the API, using the variable names GITHUB_OAUTH_CLIENT_ID and GITHUB_OAUTH_CLIENT_SECRET. These should be provided in a .env file local to your app and not added to your main config file.

The authorization callback URL in the GitHub app settings should be set to <YOUR_API_URL>/login/github/authorized. A user's first login may direct them to this page rather than the web app, depending on their browser. The user will then simply have to navigate back to the URL of the web app, where they should find themselves to be logged in.

Then, you can configure GITHUB_ORG_ALLOW_LIST with a list of string IDs of GitHub organizations that user's must be a public member of to register an account. If this value is set to None, then any GitHub account will be able to register, and if it is set to an empty list, then no accounts will be able to register. You can find the relevant organization IDs using the GitHub API, for example at https://api.github.com/orgs/<org_name>.

Google OAuth2

For Google, you must register a Google OAuth 2.0 Client in the Google Cloud Console for your instance. You need to provide the client ID and secret in the .env file for the API using the variable names GOOGLE_OAUTH_CLIENT_ID and GOOGLE_OAUTH_CLIENT_SECRET.

The Authorized redirect URI in the Google Cloud settings must be set to <YOUR_API_URL>/login/google/authorized.

You can restrict registration to specific email domains using the EMAIL_DOMAIN_ALLOW_LIST setting. If this is set to None or null, any user with a Google account can register.

Microsoft OAuth2

For Microsoft (Azure AD), you must register an Azure App Registration for your instance, providing the client ID and secret in the .env for the API, using the variable names MICROSOFT_OAUTH_CLIENT_ID and MICROSOFT_OAUTH_CLIENT_SECRET. These should be provided in a .env file local to your app and not added to your main config file.

The redirect URI in the Azure app settings should be set to <YOUR_API_URL>/login/microsoft/authorized. When creating the credentials in the Azure Portal, ensure you navigate to "Certificates & secrets" and copy the Value of the client secret, as the "Secret ID" cannot be used for authentication.

By default, any Microsoft account (work, school, or personal) can be used to register if the application is configured as "Multi-tenant" in Azure.

ORCID OAuth2

For ORCID integration, each datalab instance must currently register for the ORCID developer program and request new credentials for their public API. These credentials can then be provided via the ORCID_OAUTH_CLIENT_ID and ORCID_OAUTH_CLIENT_SECRET environment variables, in the same way as the GitHub settings above.

Remote filesystems

This package allows you to attach files from remote filesystems to samples and other entries. These filesystems can be configured in the config file with the REMOTE_FILESYSTEMS option. In practice, these options should be set in a centralised deployment.

Currently, there are two mechanisms for accessing remote files:

  1. You can mount the filesystem locally and provide the path in your datalab config file. For example, for Cambridge Chemistry users, you will have to (connect to the ChemNet VPN and) mount the Grey Group backup servers on your local machine, then define these folders in your config.
  2. Access over SSH: alternatively, you can set up passwordless ssh access to a machine (e.g., using citadel as a proxy jump), and paths on that remote machine can be configured as separate filesystems. The filesystem metadata will be synced periodically, and any files attached in datalab will be downloaded and stored locally on the pydatalab server (with the file being kept younger than 1 hour old on each access).

Customisation and branding

Deployments can customise the look and feel of their datalab instance by providing files under a public/custom/ directory in the web app. This directory can be a symlink to a folder in your deployment repository, or mounted as a volume in Docker. It should be made available at webapp/public/custom/ relative to the datalab source tree.

The following customisations are supported:

CSS overrides (public/custom/override.css)

A CSS file that is loaded globally before the app styles, allowing you to override any default styling. Common uses include setting a custom font, accent colours, or other branding tweaks. If this file does not exist, it is silently ignored.

For example, to set a custom font globally:

@font-face {
  font-family: "My Custom Font";
  src: url("fonts/MyCustomFont-Regular.woff2") format("woff2");
  font-weight: 400;
}

:root {
  --custom-font-family: "My Custom Font";
}

#app {
  font-family: var(--custom-font-family), sans-serif;
}

Custom fonts (public/custom/fonts/)

Place font files (.woff2, .ttf, etc.) in this directory and reference them from override.css using relative paths (e.g., url("fonts/MyFont.woff2")).

Custom logos (public/custom/logos/)

Place logo images in this directory. They can be referenced from custom components or CSS using absolute paths (e.g., /custom/logos/mylogo.png).

The main instance logo can also be customised via the VUE_APP_LOGO_URL environment variable, which accepts a path relative to the public/ directory.

Custom about page (public/custom/components/CustomAbout.vue)

Deployments can provide a custom Vue component that will be displayed in a collapsible panel on the About page. Place a CustomAbout.vue file in public/custom/components/ and it will automatically replace the default empty skeleton at build time (via webpack's NormalModuleReplacementPlugin).

The component can contain any valid Vue template, script and scoped styles. No special configuration or flags are needed — if the file exists, it will be used.

Directory structure

A typical deployment customisation directory looks like:

public/custom/
├── override.css
├── fonts/
│   ├── MyFont-Regular.woff2
│   └── MyFont-Bold.woff2
├── logos/
│   └── mylogo.png
└── components/
    └── CustomAbout.vue

Config API Reference

pydatalab.config.ServerConfig

Bases: BaseSettings

A model that provides settings for deploying the API.

Attributes:

Name Type Description
APP_URL str | None
ASYNC_BLOCK_TYPES list[str]
AUTO_ACTIVATE_ACCOUNTS bool
BACKUP_STRATEGIES dict[str, BackupStrategy] | None
BEHIND_REVERSE_PROXY bool
DEBUG bool
DEPLOYMENT_METADATA DeploymentMetadata | None
EMAIL_AUTH_SMTP_SETTINGS SMTPSettings | None
EMAIL_AUTO_ACTIVATE_ACCOUNTS bool
EMAIL_DOMAIN_ALLOW_LIST list[str] | None
FILE_DIRECTORY str | Path
GITHUB_ORG_ALLOW_LIST list[str] | None
IDENTIFIER_PREFIX str
LOG_FILE str | Path | None
MAX_BATCH_CREATE_SIZE int
MAX_CONTENT_LENGTH int
MONGO_URI str
REFCODE_GENERATOR type[RefCodeFactory]
REMOTE_CACHE_MAX_AGE int
REMOTE_CACHE_MIN_AGE int
REMOTE_FILESYSTEMS list[RemoteFilesystem]
ROOT_PATH str
SECRET_KEY str
SESSION_LIFETIME int
TESTING bool

APP_URL

APP_URL: str | None = Field(None, description='The canonical URL for any UI associated with this instance; will be used for redirects on user login/registration.')

ASYNC_BLOCK_TYPES

ASYNC_BLOCK_TYPES: list[str] = Field([], description="A list of block type slugs (e.g. ['cycle', 'xrd']) that should be processed asynchronously via the task queue. Defaults to no blocks.")

AUTO_ACTIVATE_ACCOUNTS

AUTO_ACTIVATE_ACCOUNTS: bool = Field(False, description='Whether to automatically activate accounts created via any registration method.')

BACKUP_STRATEGIES

BACKUP_STRATEGIES: dict[str, BackupStrategy] | None = Field({'daily-snapshots': BackupStrategy(hostname=None, location='/tmp/datalab-backups/daily-snapshots/', frequency='5 4 * * *', retention=7), 'weekly-snapshots': BackupStrategy(hostname=None, location='/tmp/datalab-backups/weekly-snapshots/', frequency='5 3 * * 1', retention=5), 'quarterly-snapshots': BackupStrategy(hostname=None, location='/tmp/datalab-backups/quarterly-snapshots/', frequency='5 2 1 1,4,7,10 *', retention=4)}, description='The desired backup configuration.')

BEHIND_REVERSE_PROXY

BEHIND_REVERSE_PROXY: bool = Field(False, description='Whether the Flask app is being deployed behind a reverse proxy. If `True`, the reverse proxy middleware described in the [Flask docs](https://flask.palletsprojects.com/en/2.2.x/deploying/proxy_fix/) will be attached to the app.')

DEBUG

DEBUG: bool = Field(True, description='Whether to enable debug-level logging in the server.')

DEPLOYMENT_METADATA

DEPLOYMENT_METADATA: DeploymentMetadata | None = Field(None, description='A dictionary containing metadata to serve at `/info`.')

EMAIL_AUTH_SMTP_SETTINGS

EMAIL_AUTH_SMTP_SETTINGS: SMTPSettings | None = Field(None, description='A dictionary containing SMTP settings for sending emails for account registration.')

EMAIL_AUTO_ACTIVATE_ACCOUNTS

EMAIL_AUTO_ACTIVATE_ACCOUNTS: bool = Field(False, description='Whether to automatically activate accounts created via email registration.')

EMAIL_DOMAIN_ALLOW_LIST

EMAIL_DOMAIN_ALLOW_LIST: list[str] | None = Field([], description='A list of domains for which users will be able to register accounts if they have a matching verified email address, which still need to be verified by an admin. Setting the value to `None` will allow any email addresses at any domain to register *and activate* an account, otherwise the default `[]` will not allow any email addresses registration.')

FILE_DIRECTORY

FILE_DIRECTORY: str | Path = Field(resolve(), description='The path under which to place stored files uploaded to the server.')

GITHUB_ORG_ALLOW_LIST

GITHUB_ORG_ALLOW_LIST: list[str] | None = Field([], description='A list of GitHub organization IDs (available from `https://api.github.com/orgs/<org_name>`, and are immutable) or organisation names (which can change, so be warned), that the membership of which will be required to register a new datalab account. Setting the value to `None` will allow any GitHub user to register an account.')

IDENTIFIER_PREFIX

IDENTIFIER_PREFIX: str = Field(None, description="The prefix to use for identifiers in this deployment, e.g., 'grey' in `grey:AAAAAA`")

LOG_FILE

LOG_FILE: str | Path | None = Field(None, description='The path to the log file to use for the server and all associated processes (e.g., invoke tasks)')

MAX_BATCH_CREATE_SIZE

MAX_BATCH_CREATE_SIZE: int = Field(10000, description='Maximum number of items that can be created in a single batch operation.')

MAX_CONTENT_LENGTH

MAX_CONTENT_LENGTH: int = Field(10 * 1000 ** 3, description='Direct mapping to the equivalent Flask setting. In practice, limits the file size that can be uploaded.\nDefaults to 10 GB to avoid filling the tmp directory of a server.\n\nWarning: this value will overwrite any other values passed to `FLASK_MAX_CONTENT_LENGTH` but is included here to clarify\nits importance when deploying a datalab instance.')

MONGO_URI

MONGO_URI: str = Field('mongodb://localhost:27017/datalabvue', description='The URI for the underlying MongoDB.')

REFCODE_GENERATOR

REFCODE_GENERATOR: type[RefCodeFactory] = Field(RandomAlphabeticalRefcodeFactory, description='The class to use to generate refcodes.')

REMOTE_CACHE_MAX_AGE

REMOTE_CACHE_MAX_AGE: int = Field(60, description='The maximum age, in minutes, of the remote filesystem cache after which it should be invalidated.')

REMOTE_CACHE_MIN_AGE

REMOTE_CACHE_MIN_AGE: int = Field(1, description='The minimum age, in minutes, of the remote filesystem cache, below which the cache will not be invalidated if an update is manually requested.')

REMOTE_FILESYSTEMS

REMOTE_FILESYSTEMS: list[RemoteFilesystem] = Field([], descripton='A list of dictionaries describing remote filesystems to be accessible from the server.')

ROOT_PATH

ROOT_PATH: str = Field('/', description='The root path of the application, e.g., `/api` if hosting from a subpath.')

SECRET_KEY

SECRET_KEY: str = Field(None, description='The secret key to use for Flask. This value should be changed and/or loaded from an environment variable for production deployments.')

SESSION_LIFETIME

SESSION_LIFETIME: int = Field(7 * 24, description='The lifetime of each authenticated session, in hours.')

TESTING

TESTING: bool = Field(False, description='Whether to run the server in testing mode, i.e., without user auth.')

deactivate_backup_strategies_during_testing

deactivate_backup_strategies_during_testing(values)

make_missing_log_directory

make_missing_log_directory(v)

Make sure that the log directory exists and is writable.

update

update(mapping)

validate_cache_ages

validate_cache_ages(values)

validate_identifier_prefix

validate_identifier_prefix(v, values)

Make sure that the identifier prefix is set and is valid, raising clear error messages if not.

If in testing mode, then set the prefix to 'test' too. The app startup will test for this value and should also warn aggressively that this is unset.

validate_root_path

validate_root_path(v)

validate_secret_key

validate_secret_key(v, values)

pydatalab.config.RemoteFilesystem

Bases: BaseModel

Configuration for specifying a single remote filesystem accessible from the server.

Attributes:

Name Type Description
hostname str | None
name str
path Path

hostname

hostname: str | None = Field(None, description='The hostname for the filesystem. `None` indicates the filesystem is already mounted locally.')

name

name: str = Field(description='The name of the filesystem to use in the UI.')

path

path: Path = Field(description='The path to the base of the filesystem to include.')

pydatalab.config.SMTPSettings

Bases: BaseModel

Configuration for specifying SMTP settings for sending emails.

Attributes:

Name Type Description
MAIL_DEFAULT_SENDER str
MAIL_PORT int
MAIL_SERVER str
MAIL_USERNAME str
MAIL_USE_TLS bool

MAIL_DEFAULT_SENDER

MAIL_DEFAULT_SENDER: str = Field('', description='The email address to use as the sender for emails.')

MAIL_PORT

MAIL_PORT: int = Field(587, description='The port to use for the SMTP server.')

MAIL_SERVER

MAIL_SERVER: str = Field('127.0.0.1', description='The SMTP server to use for sending emails.')

MAIL_USERNAME

MAIL_USERNAME: str = Field('', description='The username to use for the SMTP server. Will use the externally provided `MAIL_PASSWORD` environment variable for authentication.')

MAIL_USE_TLS

MAIL_USE_TLS: bool = Field(True, description='Whether to use TLS for the SMTP connection.')

pydatalab.config.DeploymentMetadata

Bases: BaseModel

A model for specifying metadata about a datalab deployment.

Attributes:

Name Type Description
homepage AnyUrl | None
issue_tracker AnyUrl | None
maintainer Person | None
source_repository AnyUrl | None

homepage

homepage: AnyUrl | None

issue_tracker

issue_tracker: AnyUrl | None = Field('https://github.com/datalab-org/datalab/issues')

maintainer

maintainer: Person | None

source_repository

source_repository: AnyUrl | None = Field('https://github.com/datalab-org/datalab')

strip_fields_from_person

strip_fields_from_person(v)