Using datasets with qDrive

Each qDrive dataset is uniquely identified by a UUID (Universally Unique Identifier). A dataset has several metadata associated with it that help you sorting and filtering through your data, some are user defined (the Data Identifiers) and some are standard (e.g. acquisition time, ranking, …).

A dataset can hold multiple files where the actual data is stored. While files can be in any format, the DataQruiser app supports rendering the following file types: netCDF4 files, json files, code/text files and pictures.

Log in with Python

You can log into qDrive using one of two methods: through a graphical user interface (GUI) or via a Python command.

You can login to qdrive through two methods: either using python or by employing a graphical user interface.

Through the Graphical User Interface: from the environment where you installed qdrive run the following command to launch the synchronization GUI.

python -c "import qdrive; qdrive.launch_GUI()"

Via a Python Command: execute the following command in the Python kernel/console/jupyter-notebook from the environment where qdrive is installed:

import qdrive
qdrive.authenticate_with_console()

Note

Both methods establish a persistent login session, so you won’t need to log in again unless you log-out.

To log-out:

Use the command qdrive.logout() in Python, or
Click the log-out button (symbol) in the GUI.

Creating new datasets

An empty dataset can be created using the following commands:

from qdrive import dataset

# Minimal example - creates dataset with just a name
ds_1 = dataset.create('my_dataset_name')

# Complete example - creates dataset with extended metadata
ds_2 = dataset.create(
    'Qubit 2 T2*',
    description='T2* measurement of qubit 2, RF power is also applied to qubit 1',
    scope_name='2Q SC processor A14',
    tags=['calibration'],
    attributes={
        'set_up': 'Fridge B256',
        'sample_id': 'Q7-R3'
    },
    alt_uid='exp20240115-124501'
)

Every dataset is automatically assigned a UUID (Universally Unique Identifier) that can be accessed after creation:

# Print the UUID of the newly created dataset
print(f"Dataset UUID: {ds_2.uuid}")

# Print the alternative identifier (if provided)
if hasattr(ds_2, 'alt_uid') and ds_2.alt_uid:
    print(f"Alternative identifier: {ds_2.alt_uid}")

Parameters Explained:

name (required): A descriptive name for your dataset.
description (optional): A longer description of the dataset contents.
scope_name (optional): The scope where the dataset will be stored.
tags (optional): A list of tags associated with the dataset (e.g. calibration, tuning, …).
attributes (optional): A dictionary of key-value pairs providing structured metadata.
alt_uid (optional): An alternative identifier that can be used instead of the UUID, for accessing the dataset. In most cases this is not needed.

Note

If no scope_name is provided, the dataset will be created in the default scope (see here).
The scope_name parameter accepts any of the following:
- A scope name as a string (e.g., 'quantum_project')
- A scope object, e.g. returned by the get_scopes() function in scopes.
- A scope UUID (e.g., uuid.UUID('12345678-1234-5678-1234-567812345678'))

Tip

Use descriptive names and consistent attributes across related datasets to make them easier to find and filter later. The attributes and tags are fully searchable in both the Python API and the dataQruiser application.

Loading existing datasets

Load the data from a dataset using the following commands (you can copy your dataset uuid with one click from the dataQruiser app):

from qdrive import dataset
dsq = dataset('d30eec8071014f99b09cc3dfce60187d') #this is the dataset uuid or the alternative identifier
print(dsq)

When printed, the contents of the dataset can be inspected. A typical dataset will look like this:

Contents of dataset :: single shot - sensor tuning
==================================================

uuid :: 59c40af3-cef3-49aa-8747-64707a9b080a
Alternative identifier :: 1695914164228175126
Scope :: my_scope
Ranking :: 0
Attributes ::
        set-up : my_setup
        sample : my_sample
Files ::
name              type                  selected version number (version_id)      Maximal version number
----------------  --------------------  --------------------------------------  ------------------------
measurement       FileType.HDF5_NETCDF  0 (1719573749579)                                              0
snapshot          FileType.JSON         0 (1719573749579)                                              0
analysis          FileType.HDF5_NETCDF  1 (1719573831702)                                              1
fit_params        FileType.JSON         1 (1719573831822)                                              1

uuid: is a universal unique identifier ofr the dataset. Each dataset that is created has its own unique identifier which can be retrieved in the dataQruiser.
Alternative identifier: this field contains the identifier assigned to the dataset from your data-acquisition software, e.g. the core-tools uid or qcodes GUID.
Scope: scope where this dataset is part of. A scope is usually the name of a long standing project, to which data from several users can belong.
Ranking: integer value indicating how much you like your dataset, useful for filtering in the dataQruiser app.
Attributes: these are searchable key-value field, which give further structure to the scope. In this case the set-up and sample are set as attributes.
Files: A dataset can contain several files, this could be files representing a measurement, raw text, a python file, … . Each file can have its own version (see further).

It is possible to adjust the following information of the dataset:

Modifying Dataset Metadata

After creating a dataset, you can modify its metadata properties as needed:

# Load an existing dataset
ds = dataset("59c40af3-cef3-49aa-8747-64707a9b080a")

# Update the description
ds.description = 'T2* measurement of qubit 2 with RF power applied to qubit 1'

# Update tags - replacing all existing tags
ds.tags = ['calibration']

# Add a new tag without replacing existing ones
ds.tags.append('tuning')

# Update attributes - replacing all existing attributes
ds.attributes = {
    'set_up': 'Fridge B256',
    'sample_id': 'Q7-R3'
}

# Add or update a single attribute
ds.attributes['sample_id'] = 'Q7-R4'

# Set the ranking (useful for filtering in dataQruiser or the search_datasets function)
ds.ranking = 1  # 1 = like, 0 = neutral, -1 = dislike/hidden

Note

All changes made to the dataset are first made locally and then synchronized with the server (this can take a few seconds before the changes become available on other devices).

Searching for datasets

The following command can be used to search for datasets:

from qdrive.dataset.search import search_datasets

search_result = search_datasets(search_query='my_coolest_dataset')

# iterate over the search result
for ds in search_result:
    print(ds)

# get the first dataset
ds = search_result.first

The search_datasets function returns a list of datasets that match the search query.This function can handle the following arguments:

search_query : A string that is used to search for datasets. The search query can be a dataset name, a dataset UUID, or a dataset alternative identifier.
attributes : Additional attributes to filter datasets, for example {'set_up' : 'my_setup'} will return only datasets with the attribute set-up equal to ‘my_setup’. It is also possible get results from multiple set-ups by using a list of values, for example {'set_up' : ['my_setup1', 'my_setup2']}.
ranking The ranking score to filter datasets. Defaults to 0. Hidden datasets have a ranking of -1. The search queries for datasets with a ranking greater than or equal to the specified value.
start_date : The start date to filter datasets. Only datasets collected after this date will be included in the results (e.g. datetime.datetime(2024, 12, 01)).
end_date : The end date to filter datasets. Only datasets collected before this date will be included in the results.
scopes : A list of scopes to filter datasets. The scope can either be represented be its name (str), its UUID (uuid.UUID), or the scope object. More information on scopes can be found here.

Warning

The search query can take considerable time when iterating over a large number of datasets (e.g., when iterating over every dataset in the whole scope). Note that only the datasets that are needed are loaded.

Working with Files

The dataset object allows to manage files within a dataset. In this section, we will cover:

Adding new files
Inspecting and selecting different file versions
Using file type-specific methods to easily access data within the files

Adding New Files

You can add files to a dataset by assigning them directly. The following example demonstrates how to add files from various sources to a dataset:

from qdrive import dataset
from pathlib import Path
import numpy as np
import xarray as xr

new_dataset = dataset.create('my_dataset_name')

# add a file from a path
new_dataset['my_file.extension'] = Path('C:/location/of/file.extension')

# add a file from a python object (list, dict, numpy array, xarray)
new_dataset['my_array.npz'] = np.linspace(0,10,100)
new_dataset['my_json.json'] = {'a': 1, 'b':[1,2,3]}
new_dataset['my_xarray.hdf5'] = xr.Dataset({'a': (['x'], np.arange(10))})

# add the current script file
new_dataset['my_script.py'] = __file__

Note

Assigning a file with the same key multiple times will create a new version of that file.

Inspecting and Selecting Different File Versions

Each file within a dataset can have multiple versions, which can be inspected with the following command:

print(my_dataset['my_file.extension'].version_info)

This will display information like the following:

File object information
=======================
Name : my_file
Selected File version : 1720711563075
File versions (3) :
      1720711517406 (created on 11/07/2024 17:25:17)
      1720711551649 (created on 11/07/2024 17:25:51)
    * 1720711563075 (created on 11/07/2024 17:26:03)

By default, accessing a file returns its latest version. The selected version is marked by an asterisk (*).

To access a specific version of a file by its unique version ID, use:

my_dataset['my_file.extension'].version_id(1720711517406)

Tip

You can also access file versions by their position in the version history:

my_dataset['my_file.extension'].version(0) # first version
my_dataset['my_file.extension'].version(1) # second version
my_dataset['my_file.extension'].version(2) # third version

Using file type-specific methods

Numerical data

For storing numerical data, we recommend using .hdf5 files formatted according to the NETCDF4 standard. While it’s possible to assign an HDF5 file directly, a more user-friendly approach is to work with xarray datasets.

When assigning an xarray dataset, the dataset is automatically converted to an HDF5 file and uploaded to the cloud.

Tip

Xarray is a library that allows for easy labeling of NumPy arrays and supports defining relationships between different dimensions. Assigning data in this way also enables automatic plotting of datasets in the DataQruiser app.

Example:

import xarray as xr
import numpy as np

# Create an xarray dataset with two variables, y1 and y2, and a shared coordinate, x


x_data = np.linspace(0,30,100)
y1_data = np.sin(x_data)
y2_data = np.cos(x_data)

xr_ds = xr.Dataset(
    {
        "y1": (["x"], y1_data, {"units": "mV"}),
        "y2": (["x"], y2_data, {"units": "mV"}),
    },
    coords={
        "x": ("x", x_data, {"units": "s"})
    }
)

# Optionally, to link y1 and y2 for joint plotting, you can use a temporary solution:
xr_ds["y1"].attrs.update({**target_fit_param, "__join_plot": "y2"})
xr_ds["y2"].attrs.update({**target_fit_param, "__join_plot": "y1"})

# Note: This is a workaround and will be improved in future releases.

In this example, we create a dataset xr_ds with two variables, y1 and y2, each associated with the coordinate x. The units of each variable and coordinate can be specified for clarity.

HDF5 files saved in NETCDF4 format can be accessed in multiple ways, allowing for flexibility depending on your analysis needs:

from qdrive import dataset
dsq         = dataset("my_dataset_uuid_or_alt_id")

xarray_ds   = dsq["measurement.hdf5"].xarray    # load as a Xarray (recommended option)
pandas_ds   = dsq["measurement.hdf5"].pandas    # load as a pandas Dataframe
hdf5_handle = dsq["measurement.hdf5"].hdf5      # load as h5py File

Adding metadata

Metadata can either be added directly to a dataset or stored within files. A common approach is to save metadata as JSON files, which can then be accessed as dictionaries in Python.

Example of adding a JSON file to a dataset:

dsq['my_json_data.json'] = {'item1': "value1", 'item2': "value2"}
# New keys can be added dynamically, though it's recommended to assign all keys at once for performance.
dsq['my_json_data.json']['item3'] = "value3"

In addition to dictionaries, lists can also be assigned for metadata storage.

Note

If you have metadata directly associated with numerical data, it can be embedded within an xarray dataset. This approach allows the metadata to be visible in the DataQruiser app.

Example of adding metadata to an xarray dataset:

import xarray as xr
import numpy as np

xr_ds = xr.Dataset(
    {"y": (["x"], np.linspace(0, 100), {"units": "mV"})},
    coords={"x": ("x", np.linspace(0, 100), {"units": "s"})}
)

# Add additional metadata as attributes
xr_ds.attrs['my_fit_params'] = {'param1': 1, 'param2': 2, 'param3': 3}

# Store the xarray dataset in the dataset
dsq['my_xarray.hdf5'] = xr_ds

Adding source files

To automatically upload a Python script to a QDrive dataset each time it’s run, add the following to your script:

from qdrive import dataset
from pathlib import Path

# Create a new dataset or load an existing one using its UUID
dsq = dataset.create('my_dataset')  # or use: dsq = dataset('uuid')

# Upload the current script
dsq['my_script.py'] = Path(__file__)

If you’re working in a Jupyter notebook, you can upload the notebook file by running the following in a cell, specifying the notebook’s name or path:

dsq['my_notebook.ipynb'] = Path('my_notebook_name.ipynb')

Note

Be sure to save your file before uploading to ensure all changes are included.

Data conversion policies

Qdrive handles Python objects assigned to datasets by converting them as follows:

xarray.Dataset or xarray.DataArray: Converted to a NetCDF4 file. Attributes of type dict are serialized to JSON strings. An error occurs if attributes contain types incompatible with JSON.
h5py.File: Stored directly as an HDF5 file.
Python str, int, float, bool, tuple, dict, or list: Converted to a JSON string and stored as a .json file.
NumPy arrays: Stored as .npz files. This also supports list<np.ndarray> and dict<str, np.ndarray>.
Other Python objects: No pre-defined conversions are defined for these types.

The sync agent also performs specific conversions:

QCoDeS sync agent: converted using the qcodes.dataset.to_xarray_dataset() function from the QCoDeS library, and the resulting xarray object is then directly converted into a NetCDF4 file.
Core-Tools sync agent: converted using the core_tools.data.ds.ds2xarray.ds2xarray() function from the Core-Tools library, and the resulting xarray object is then directly converted into a NetCDF4 file.
FileBase sync agent: zarr files are not uploaded by default, the user can use pre-built classes in the file Uploader to convert zarr files into either (1) a ZIP file or (2) a NetCDF4 file.

Note

All NetCDF4 conversions are performed with engine='h5netcdf' and invalid_netcdf=True.