Using datasets with qDrive
Each qDrive dataset is uniquely identified by a UUID (Universally Unique Identifier). A dataset has several metadata associated with it that help you sorting and filtering through your data, some are user defined (the Data Identifiers) and some are standard (e.g. acquisition time, ranking, …).
A dataset can hold multiple files where the actual data is stored. While files can be in any format, the DataQruiser app supports rendering the following file types: netCDF4 files, json files, code/text files and pictures.
Log in with Python
You can log into qDrive using one of two methods: through a graphical user interface (GUI) or via a Python command.
You can login to qdrive through two methods: either using python or by employing a graphical user interface.
Through the Graphical User Interface: from the environment where you installed qdrive run the following command to launch the synchronization GUI.
python -c "import qdrive; qdrive.launch_GUI()"
Via a Python Command: execute the following command in the Python kernel/console/jupyter-notebook from the environment where qdrive
is installed:
import qdrive
qdrive.authenticate_with_console()
Note
Both methods establish a persistent login session, so you won’t need to log in again unless you log-out.
To log-out:
Use the command
qdrive.logout()
in Python, orClick the log-out button (symbol) in the GUI.
Creating new datasets
An empty dataset can be created using the following commands:
from qdrive import dataset
# Minimal example - creates dataset with just a name
ds_1 = dataset.create('my_dataset_name')
# Complete example - creates dataset with extended metadata
ds_2 = dataset.create(
'Qubit 2 T2*',
description='T2* measurement of qubit 2, RF power is also applied to qubit 1',
scope_name='2Q SC processor A14',
keywords=['calibration'],
attributes={
'set_up': 'Fridge B256',
'sample_id': 'Q7-R3'
},
alt_uid='exp20240115-124501'
)
Every dataset is automatically assigned a UUID (Universally Unique Identifier) that can be accessed after creation:
# Print the UUID of the newly created dataset
print(f"Dataset UUID: {ds_2.uuid}")
# Print the alternative identifier (if provided)
if hasattr(ds_2, 'alt_uid') and ds_2.alt_uid:
print(f"Alternative identifier: {ds_2.alt_uid}")
Parameters Explained:
name (required): A descriptive name for your dataset.
description (optional): A longer description of the dataset contents.
scope_name (optional): The scope where the dataset will be stored.
keywords (optional): A list of tags associated with the dataset (e.g. calibration, tuning, …).
attributes (optional): A dictionary of key-value pairs providing structured metadata.
alt_uid (optional): An alternative identifier that can be used instead of the UUID, for accessing the dataset. In most cases this is not needed.
Note
If no
scope_name
is provided, the dataset will be created in the default scope (see here).The
scope_name
parameter accepts any of the following:A scope name as a string (e.g.,
'quantum_project'
)A scope object, e.g. returned by the
get_scopes()
function in scopes.A scope UUID (e.g.,
uuid.UUID('12345678-1234-5678-1234-567812345678')
)
Tip
Use descriptive names and consistent attributes across related datasets to make them easier to find and filter later. The attributes and keywords are fully searchable in both the Python API and the dataQruiser application.
Loading existing datasets
Load the data from a dataset using the following commands (you can copy your dataset uuid with one click from the dataQruiser app):
from qdrive import dataset
dsq = dataset('d30eec8071014f99b09cc3dfce60187d') #this is the dataset uuid or the alternative identifier
print(dsq)
When printed, the contents of the dataset can be inspected. A typical dataset will look like this:
Contents of dataset :: single shot - sensor tuning
==================================================
uuid :: 59c40af3-cef3-49aa-8747-64707a9b080a
Alternative identifier :: 1695914164228175126
Scope :: my_scope
Ranking :: 0
Attributes ::
set-up : my_setup
sample : my_sample
Files ::
name type selected version number (version_id) Maximal version number
---------------- -------------------- -------------------------------------- ------------------------
measurement FileType.HDF5_NETCDF 0 (1719573749579) 0
snapshot FileType.JSON 0 (1719573749579) 0
analysis FileType.HDF5_NETCDF 1 (1719573831702) 1
fit_params FileType.JSON 1 (1719573831822) 1
uuid: is a universal unique identifier ofr the dataset. Each dataset that is created has its own unique identifier which can be retrieved in the dataQruiser.
Alternative identifier: this field contains the identifier assigned to the dataset from your data-acquisition software, e.g. the core-tools uid or qcodes GUID.
Scope: scope where this dataset is part of. A scope is usually the name of a long standing project, to which data from several users can belong.
Ranking: integer value indicating how much you like your dataset, useful for filtering in the dataQruiser app.
Attributes: these are searchable key-value field, which give further structure to the scope. In this case the set-up and sample are set as attributes.
Files: A dataset can contain several files, this could be files representing a measurement, raw text, a python file, … . Each file can have its own version (see further).
It is possible to adjust the following information of the dataset:
Modifying Dataset Metadata
After creating a dataset, you can modify its metadata properties as needed:
# Load an existing dataset
ds = dataset("59c40af3-cef3-49aa-8747-64707a9b080a")
# Update the description
ds.description = 'T2* measurement of qubit 2 with RF power applied to qubit 1'
# Update keywords - replacing all existing keywords
ds.keywords = ['calibration']
# Add a new keyword without replacing existing ones
ds.keywords.append('tuning')
# Update attributes - replacing all existing attributes
ds.attributes = {
'set_up': 'Fridge B256',
'sample_id': 'Q7-R3'
}
# Add or update a single attribute
ds.attributes['sample_id'] = 'Q7-R4'
# Set the ranking (useful for filtering in dataQruiser or the search_datasets function)
ds.ranking = 1 # 1 = like, 0 = neutral, -1 = dislike/hidden
Note
All changes made to the dataset are first made locally and then synchronized with the server (this can take a few seconds before the changes become available on other devices).
Searching for datasets
The following command can be used to search for datasets:
from qdrive.dataset.search import search_datasets
search_result = search_datasets(search_query='my_coolest_dataset')
# iterate over the search result
for ds in search_result:
print(ds)
# get the first dataset
ds = search_result.first
The search_datasets
function returns a list of datasets that match the search query.This function can handle the following arguments:
search_query : A string that is used to search for datasets. The search query can be a dataset name, a dataset UUID, or a dataset alternative identifier.
attributes : Additional attributes to filter datasets, for example
{'set_up' : 'my_setup'}
will return only datasets with the attribute set-up equal to ‘my_setup’. It is also possible get results from multiple set-ups by using a list of values, for example{'set_up' : ['my_setup1', 'my_setup2']}
.ranking The ranking score to filter datasets. Defaults to 0. Hidden datasets have a ranking of -1. The search queries for datasets with a ranking greater than or equal to the specified value.
start_date : The start date to filter datasets. Only datasets collected after this date will be included in the results (e.g.
datetime.datetime(2024, 12, 01)
).end_date : The end date to filter datasets. Only datasets collected before this date will be included in the results.
scopes : A list of scopes to filter datasets. The scope can either be represented be its name (str), its UUID (uuid.UUID), or the scope object. More information on scopes can be found here.
Warning
The search query can take considerable time when iterating over a large number of datasets (e.g., when iterating over every dataset in the whole scope). Note that only the datasets that are needed are loaded.
Working with Files
The dataset object allows to manage files within a dataset. In this section, we will cover:
Adding new files
Inspecting and selecting different file versions
Using file type-specific methods to easily access data within the files
Adding New Files
You can add files to a dataset by assigning them directly. The following example demonstrates how to add files from various sources to a dataset:
from qdrive import dataset
from pathlib import Path
import numpy as np
import xarray as xr
new_dataset = dataset.create('my_dataset_name')
# add a file from a path
new_dataset['my_file'] = Path('C:/location/of/file.extension')
# add a file from a python object (list, dict, numpy array, xarray)
new_dataset['my_array'] = np.linspace(0,10,100)
new_dataset['my_json'] = {'a': 1, 'b':[1,2,3]}
new_dataset['my_xarray'] = xr.Dataset({'a': (['x'], np.arange(10))})
# add the current script file
new_dataset['my_script'] = __file__
Note
Assigning a file with the same key multiple times will create a new version of that file.
Inspecting and Selecting Different File Versions
Each file within a dataset can have multiple versions, which can be inspected with the following command:
print(my_dataset['my_file'].version_info)
This will display information like the following:
File object information
=======================
Name : my_file
Selected File version : 1720711563075
File versions (3) :
1720711517406 (created on 11/07/2024 17:25:17)
1720711551649 (created on 11/07/2024 17:25:51)
* 1720711563075 (created on 11/07/2024 17:26:03)
By default, accessing a file returns its latest version. The selected version is marked by an asterisk (*).
To access a specific version of a file by its unique version ID, use:
my_dataset['my_file'].version_id(1720711517406)
Tip
You can also access file versions by their position in the version history:
my_dataset['my_file'].version(0) # first version
my_dataset['my_file'].version(1) # second version
my_dataset['my_file'].version(2) # third version
Using file type-specific methods
Numerical data
For storing numerical data, we recommend using .hdf5
files formatted according to the NETCDF4
standard.
While it’s possible to assign an HDF5 file directly, a more user-friendly approach is to work with xarray datasets.
When assigning an xarray dataset, the dataset is automatically converted to an HDF5 file and uploaded to the cloud.
Tip
Xarray is a library that allows for easy labeling of NumPy arrays and supports defining relationships between different dimensions. Assigning data in this way also enables automatic plotting of datasets in the DataQruiser app.
Example:
import xarray as xr
import numpy as np
# Create an xarray dataset with two variables, y1 and y2, and a shared coordinate, x
x_data = np.linspace(0,30,100)
y1_data = np.sin(x_data)
y2_data = np.cos(x_data)
xr_ds = xr.Dataset(
{
"y1": (["x"], y1_data, {"units": "mV"}),
"y2": (["x"], y2_data, {"units": "mV"}),
},
coords={
"x": ("x", x_data, {"units": "s"})
}
)
# Optionally, to link y1 and y2 for joint plotting, you can use a temporary solution:
xr_ds["y1"].attrs.update({**target_fit_param, "__join_plot": "y2"})
xr_ds["y2"].attrs.update({**target_fit_param, "__join_plot": "y1"})
# Note: This is a workaround and will be improved in future releases.
In this example, we create a dataset xr_ds with two variables, y1 and y2, each associated with the coordinate x. The units of each variable and coordinate can be specified for clarity.
HDF5 files saved in NETCDF4 format can be accessed in multiple ways, allowing for flexibility depending on your analysis needs:
from qdrive import dataset
dsq = dataset("my_dataset_uuid_or_alt_id")
xarray_ds = dsq["measurement"].xarray # load as a Xarray (recommended option)
pandas_ds = dsq["measurement"].pandas # load as a pandas Dataframe
hdf5_handle = dsq["measurement"].hdf5 # load as h5py File
Adding metadata
Metadata can either be added directly to a dataset or stored within files. A common approach is to save metadata as JSON files, which can then be accessed as dictionaries in Python.
Example of adding a JSON file to a dataset:
dsq['my_json_data'] = {'item1': "value1", 'item2': "value2"}
# New keys can be added dynamically, though it's recommended to assign all keys at once for performance.
dsq['my_json_data']['item3'] = "value3"
In addition to dictionaries, lists can also be assigned for metadata storage.
Note
If you have metadata directly associated with numerical data, it can be embedded within an xarray dataset. This approach allows the metadata to be visible in the DataQruiser app.
Example of adding metadata to an xarray dataset:
import xarray as xr
import numpy as np
xr_ds = xr.Dataset(
{"y": (["x"], np.linspace(0, 100), {"units": "mV"})},
coords={"x": ("x", np.linspace(0, 100), {"units": "s"})}
)
# Add additional metadata as attributes
xr_ds.attrs['my_fit_params'] = {'param1': 1, 'param2': 2, 'param3': 3}
# Store the xarray dataset in the dataset
dsq['my_xarray'] = xr_ds
Adding source files
To automatically upload a Python script to a QDrive dataset each time it’s run, add the following to your script:
from qdrive import dataset
from pathlib import Path
# Create a new dataset or load an existing one using its UUID
dsq = dataset.create('my_dataset') # or use: dsq = dataset('uuid')
# Upload the current script
dsq['my_script'] = Path(__file__)
If you’re working in a Jupyter notebook, you can upload the notebook file by running the following in a cell, specifying the notebook’s name or path:
dsq['my_notebook'] = Path('my_notebook_name.ipynb')
Note
Be sure to save your file before uploading to ensure all changes are included.