Using datasets with qDrive ========================== Each qDrive dataset is uniquely identified by a UUID (Universally Unique Identifier). A dataset has several metadata associated with it that help you sorting and filtering through your data, some are user defined (the `Data Identifiers`) and some are standard (e.g. acquisition time, ranking, ...). A dataset can hold multiple files where the actual data is stored. While files can be in any format, the DataQruiser app supports rendering the following file types: netCDF4 files, json files, code/text files and pictures. Log in with Python ------------------ You can log into qDrive using one of two methods: through a graphical user interface (GUI) or via a Python command. You can login to qdrive through two methods: either using python or by employing a graphical user interface. **Through the Graphical User Interface**: from the environment where you installed qdrive run the following command to launch the synchronization GUI. .. code-block:: console python -c "import qdrive; qdrive.launch_GUI()" **Via a Python Command**: execute the following command in the Python kernel/console/jupyter-notebook from the environment where ``qdrive`` is installed: .. code-block:: python import qdrive qdrive.authenticate_with_console() .. note:: Both methods establish a persistent login session, so you won’t need to log in again unless you log-out. To log-out: * Use the command ``qdrive.logout()`` in Python, or * Click the log-out button (symbol) in the GUI. Creating new datasets --------------------- An empty dataset can be created using the following commands: .. code-block:: python from qdrive import dataset # Minimal example - creates dataset with just a name ds_1 = dataset.create('my_dataset_name') # Complete example - creates dataset with extended metadata ds_2 = dataset.create( 'Qubit 2 T2*', description='T2* measurement of qubit 2, RF power is also applied to qubit 1', scope_name='2Q SC processor A14', keywords=['calibration'], attributes={ 'set_up': 'Fridge B256', 'sample_id': 'Q7-R3' }, alt_uid='exp20240115-124501' ) Every dataset is automatically assigned a UUID (Universally Unique Identifier) that can be accessed after creation: .. code-block:: python # Print the UUID of the newly created dataset print(f"Dataset UUID: {ds_2.uuid}") # Print the alternative identifier (if provided) if hasattr(ds_2, 'alt_uid') and ds_2.alt_uid: print(f"Alternative identifier: {ds_2.alt_uid}") **Parameters Explained:** * **name** (required): A descriptive name for your dataset. * **description** (optional): A longer description of the dataset contents. * **scope_name** (optional): The scope where the dataset will be stored. * **keywords** (optional): A list of tags associated with the dataset (e.g. calibration, tuning, ...). * **attributes** (optional): A dictionary of key-value pairs providing structured metadata. * **alt_uid** (optional): An alternative identifier that can be used instead of the UUID, for accessing the dataset. In most cases this is not needed. .. note:: * If no ``scope_name`` is provided, the dataset will be created in the default scope (see :doc:`here `). * The ``scope_name`` parameter accepts any of the following: * A scope name as a string (e.g., ``'quantum_project'``) * A scope object, e.g. returned by the ``get_scopes()`` function in :doc:`scopes `. * A scope UUID (e.g., ``uuid.UUID('12345678-1234-5678-1234-567812345678')``) .. tip:: Use descriptive names and consistent attributes across related datasets to make them easier to find and filter later. The attributes and keywords are fully searchable in both the Python API and the dataQruiser application. Loading existing datasets ------------------------- Load the data from a dataset using the following commands (you can copy your dataset uuid with one click from the dataQruiser app): .. code-block:: python from qdrive import dataset dsq = dataset('d30eec8071014f99b09cc3dfce60187d') #this is the dataset uuid or the alternative identifier print(dsq) When printed, the contents of the dataset can be inspected. A typical dataset will look like this: .. code-block:: text Contents of dataset :: single shot - sensor tuning ================================================== uuid :: 59c40af3-cef3-49aa-8747-64707a9b080a Alternative identifier :: 1695914164228175126 Scope :: my_scope Ranking :: 0 Attributes :: set-up : my_setup sample : my_sample Files :: name type selected version number (version_id) Maximal version number ---------------- -------------------- -------------------------------------- ------------------------ measurement FileType.HDF5_NETCDF 0 (1719573749579) 0 snapshot FileType.JSON 0 (1719573749579) 0 analysis FileType.HDF5_NETCDF 1 (1719573831702) 1 fit_params FileType.JSON 1 (1719573831822) 1 .. Several fields can be observed: * **uuid**: is a universal unique identifier ofr the dataset. Each dataset that is created has its own unique identifier which can be retrieved in the dataQruiser. * **Alternative identifier**: this field contains the identifier assigned to the dataset from your data-acquisition software, e.g. the core-tools uid or qcodes GUID. * **Scope**: scope where this dataset is part of. A scope is usually the name of a long standing project, to which data from several users can belong. * **Ranking**: integer value indicating how much you like your dataset, useful for filtering in the dataQruiser app. * **Attributes**: these are searchable key-value field, which give further structure to the scope. In this case the set-up and sample are set as attributes. * **Files**: A dataset can contain several files, this could be files representing a measurement, raw text, a python file, ... . Each file can have its own version (see further). It is possible to adjust the following information of the dataset: Modifying Dataset Metadata ~~~~~~~~~~~~~~~~~~~~~~~~~~ After creating a dataset, you can modify its metadata properties as needed: .. code-block:: python # Load an existing dataset ds = dataset("59c40af3-cef3-49aa-8747-64707a9b080a") # Update the description ds.description = 'T2* measurement of qubit 2 with RF power applied to qubit 1' # Update keywords - replacing all existing keywords ds.keywords = ['calibration'] # Add a new keyword without replacing existing ones ds.keywords.append('tuning') # Update attributes - replacing all existing attributes ds.attributes = { 'set_up': 'Fridge B256', 'sample_id': 'Q7-R3' } # Add or update a single attribute ds.attributes['sample_id'] = 'Q7-R4' # Set the ranking (useful for filtering in dataQruiser or the search_datasets function) ds.ranking = 1 # 1 = like, 0 = neutral, -1 = dislike/hidden .. note:: All changes made to the dataset are first made locally and then synchronized with the server (this can take a few seconds before the changes become available on other devices). Searching for datasets ---------------------- The following command can be used to search for datasets: .. code-block:: python from qdrive.dataset.search import search_datasets search_result = search_datasets(search_query='my_coolest_dataset') # iterate over the search result for ds in search_result: print(ds) # get the first dataset ds = search_result.first The ``search_datasets`` function returns a list of datasets that match the search query.\ This function can handle the following arguments: * **search_query** : A string that is used to search for datasets. The search query can be a dataset name, a dataset UUID, or a dataset alternative identifier. * **attributes** : Additional attributes to filter datasets, for example ``{'set_up' : 'my_setup'}`` will return only datasets with the attribute set-up equal to 'my_setup'. It is also possible get results from multiple set-ups by using a list of values, for example ``{'set_up' : ['my_setup1', 'my_setup2']}``. * **ranking** The ranking score to filter datasets. Defaults to 0. Hidden datasets have a ranking of -1. The search queries for datasets with a ranking greater than or equal to the specified value. * **start_date** : The start date to filter datasets. Only datasets collected after this date will be included in the results (e.g. ``datetime.datetime(2024, 12, 01)``). * **end_date** : The end date to filter datasets. Only datasets collected before this date will be included in the results. * **scopes** : A list of scopes to filter datasets. The scope can either be represented be its name (`str`), its UUID (`uuid.UUID`), or the `scope` object. More information on scopes can be found :doc:`here `. .. warning:: The search query can take considerable time when iterating over a large number of datasets (e.g., when iterating over every dataset in the whole scope). Note that only the datasets that are needed are loaded. Working with Files ------------------ The dataset object allows to manage files within a dataset. In this section, we will cover: * Adding new files * Inspecting and selecting different file versions * Using file type-specific methods to easily access data within the files Adding New Files ~~~~~~~~~~~~~~~~ You can add files to a dataset by assigning them directly. The following example demonstrates how to add files from various sources to a dataset: .. code-block:: python from qdrive import dataset from pathlib import Path import numpy as np import xarray as xr new_dataset = dataset.create('my_dataset_name') # add a file from a path new_dataset['my_file'] = Path('C:/location/of/file.extension') # add a file from a python object (list, dict, numpy array, xarray) new_dataset['my_array'] = np.linspace(0,10,100) new_dataset['my_json'] = {'a': 1, 'b':[1,2,3]} new_dataset['my_xarray'] = xr.Dataset({'a': (['x'], np.arange(10))}) # add the current script file new_dataset['my_script'] = __file__ .. note:: Assigning a file with the same key multiple times will create a new version of that file. Inspecting and Selecting Different File Versions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Each file within a dataset can have multiple versions, which can be inspected with the following command: .. code-block:: python print(my_dataset['my_file'].version_info) This will display information like the following: .. code-block:: File object information ======================= Name : my_file Selected File version : 1720711563075 File versions (3) : 1720711517406 (created on 11/07/2024 17:25:17) 1720711551649 (created on 11/07/2024 17:25:51) * 1720711563075 (created on 11/07/2024 17:26:03) By default, accessing a file returns its latest version. The selected version is marked by an asterisk (*). To access a specific version of a file by its unique version ID, use: .. code-block:: python my_dataset['my_file'].version_id(1720711517406) .. tip:: You can also access file versions by their position in the version history: .. code-block:: python my_dataset['my_file'].version(0) # first version my_dataset['my_file'].version(1) # second version my_dataset['my_file'].version(2) # third version Using file type-specific methods ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Numerical data** For storing numerical data, we recommend using ``.hdf5`` files formatted according to the ``NETCDF4`` standard. While it’s possible to assign an HDF5 file directly, a more user-friendly approach is to work with xarray datasets. When assigning an xarray dataset, the dataset is automatically converted to an HDF5 file and uploaded to the cloud. .. tip:: Xarray is a library that allows for easy labeling of NumPy arrays and supports defining relationships between different dimensions. Assigning data in this way also enables automatic plotting of datasets in the DataQruiser app. Example: .. code-block:: python import xarray as xr import numpy as np # Create an xarray dataset with two variables, y1 and y2, and a shared coordinate, x x_data = np.linspace(0,30,100) y1_data = np.sin(x_data) y2_data = np.cos(x_data) xr_ds = xr.Dataset( { "y1": (["x"], y1_data, {"units": "mV"}), "y2": (["x"], y2_data, {"units": "mV"}), }, coords={ "x": ("x", x_data, {"units": "s"}) } ) # Optionally, to link y1 and y2 for joint plotting, you can use a temporary solution: xr_ds["y1"].attrs.update({**target_fit_param, "__join_plot": "y2"}) xr_ds["y2"].attrs.update({**target_fit_param, "__join_plot": "y1"}) # Note: This is a workaround and will be improved in future releases. In this example, we create a dataset xr_ds with two variables, y1 and y2, each associated with the coordinate x. The units of each variable and coordinate can be specified for clarity. HDF5 files saved in NETCDF4 format can be accessed in multiple ways, allowing for flexibility depending on your analysis needs: .. code-block:: python from qdrive import dataset dsq = dataset("my_dataset_uuid_or_alt_id") xarray_ds = dsq["measurement"].xarray # load as a Xarray (recommended option) pandas_ds = dsq["measurement"].pandas # load as a pandas Dataframe hdf5_handle = dsq["measurement"].hdf5 # load as h5py File **Adding metadata** Metadata can either be added directly to a dataset or stored within files. A common approach is to save metadata as JSON files, which can then be accessed as dictionaries in Python. Example of adding a JSON file to a dataset: .. code-block:: python dsq['my_json_data'] = {'item1': "value1", 'item2': "value2"} # New keys can be added dynamically, though it's recommended to assign all keys at once for performance. dsq['my_json_data']['item3'] = "value3" In addition to dictionaries, lists can also be assigned for metadata storage. .. note:: If you have metadata directly associated with numerical data, it can be embedded within an xarray dataset. This approach allows the metadata to be visible in the DataQruiser app. Example of adding metadata to an xarray dataset: .. code-block:: python import xarray as xr import numpy as np xr_ds = xr.Dataset( {"y": (["x"], np.linspace(0, 100), {"units": "mV"})}, coords={"x": ("x", np.linspace(0, 100), {"units": "s"})} ) # Add additional metadata as attributes xr_ds.attrs['my_fit_params'] = {'param1': 1, 'param2': 2, 'param3': 3} # Store the xarray dataset in the dataset dsq['my_xarray'] = xr_ds **Adding source files** To automatically upload a Python script to a QDrive dataset each time it’s run, add the following to your script: .. code-block:: python from qdrive import dataset from pathlib import Path # Create a new dataset or load an existing one using its UUID dsq = dataset.create('my_dataset') # or use: dsq = dataset('uuid') # Upload the current script dsq['my_script'] = Path(__file__) If you’re working in a Jupyter notebook, you can upload the notebook file by running the following in a cell, specifying the notebook's name or path: .. code-block:: python dsq['my_notebook'] = Path('my_notebook_name.ipynb') .. note:: Be sure to save your file before uploading to ensure all changes are included. .. .. code-block:: python .. from qdrive import dataset .. dsq = dataset('d30eec8071014f99b09cc3dfce60187d') #this is the dataset uuid or the alternative identifier .. print(dsq) .. For creating and accessing HDF5_NETCDF, JSON, numpy and other common filetypes we have convenience methods outlined in the sections below . .. In general you can add any file to an existing qdrive dataset using: .. .. code-block:: python .. # add file from path .. dsq['my_file'] = 'path_of_file_to_add' .. # add file from json (dict or list) .. dsq['my_json'] = {'a': 1, 'b':[1,2,3]} .. # add file from xarray .. dsq['my_xarray'] = xarray_data .. #add data from numpy .. sq['my_xarray'] = np.arange(10) .. This commands add the files to the qdrive dataset and synchronize it to the server. .. You can access a file using: .. .. code-block:: python .. dsq['my_file'] .. You can save the file locally in your computer using: .. .. code-block:: python .. dsq['my_file'].export(path='.', file_name='my_file.extension') .. HDF5_NETCDF files (measurement & analysis) .. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. Measurement and analysis data are stored as an HDF5 file with NETCDF4 compatibility. .. **Access a HDF5 file** with one of the following commands: .. .. code-block:: python .. from qdrive import dataset .. dsq = dataset("my_dataset_uuid_or_alt_id") .. # Several options available: .. xarray_ds = dsq["measurement"].xarray # load as a Xarray (recommended option) .. pandas_ds = dsq["measurement"].pandas # load as a pandas Dataframe .. hdf5_handle = dsq["measurement"].hdf5 # load as h5py File .. netcdf4_ds = dsq["measurement"].netcdf4 # load with the Netcdf4 library .. We think xarray is the most versatile library to work with labeled datasets, it has the power of numpy combined with the convenience of labels. .. You can find xarray docs `here `_. .. To convert an array from xarray to numpy you can use ``xarray_ds['variable'].to_numpy()`` .. **Add a new HDF5 file** to a qdrive dataset as follows: .. .. code-block:: python .. import xarray as xr .. import numpy as np .. from qdrive.dataset.dataset import dataset .. # Create a new dataset or load an existing one .. dsq = dataset.create('my_dataset_name') .. # Example data .. x_data = np.linspace(0,30,100) .. y1_data = np.sin(x_data) .. y2_data = np.cos(x_data) .. # Create the xarray dataset .. xr_ds = xr.Dataset( .. { .. "y1": (["x"], y1_data, {"units": "unit_y1"}), .. "y2": (["x"], y2_data, {"units": "unit_y2"}), .. }, .. coords={ .. "x": ("x", x_data, {"units": "unit_x"}) .. } .. ) .. # Add the xarray dataset to the qdrive dataset as file called 'my_data' .. dsq['my_data'] = xr_ds .. # A new version of the file `my_data` can be added to the qdrive dataset by assigning a new xarray dataset .. # dsq['my_data'] = xr_ds_new .. This command adds the xarray dataset as an HDF5_NETCDF file to the qdrive dataset. .. The new file is automatically synchronized to the cloud and will be immediately available in the dataQruiser app and through python from all logged in devices. .. JSON files .. ~~~~~~~~~~ .. **Access a JSON file** with the following commands: .. .. code-block:: python .. dsq['fit_params'].json .. JSON files are represented in Python as dictionaries or lists. .. **Add a new JSON file** to a qdrive dataset as follows: .. .. code-block:: python .. dsq['my_json_data'] = {'item1' : "value1"} .. # reassigning a new object, creates a new version of the 'my_json_data' file. .. dsq['my_json_data'] = ['item1', 'item2'] .. # adding items to an existing json file creates a new version of the 'my_json_data' file. .. dsq['my_json_data']['item2'] = 'value2' .. NUMPY raw files .. ~~~~~~~~~~~~~~~ .. While it is recommended to store numerical data in the HDF5 format, .. raw NumPy arrays can also be stored easily in the dataset. For example: .. .. code-block:: python .. my_dataset['new_data'] = np.zeros([100,100]) .. numpy_array = my_dataset['new_data'].raw #access raw numpy array .. # more options of providing numpy arrays .. my_dataset['new_data'] = [np.ones([100,100]), np.zeros([100,100])] .. numpy_array_list = my_dataset['new_data'].raw .. my_dataset['new_data'] = { "x" : np.ones([100,100]), .. "y" : np.zeros([100,100]), .. "z" : np.zeros([100,100])} .. numpy_array_dict = my_dataset['new_data'].raw["x"] .. Script Files .. ~~~~~~~~~~~~ .. To automatically upload a python script file to a qdrive dataset each time the script is run you can add to your script the following: .. .. code-block:: python .. from qdrive import dataset .. dsq = dataset.create('my_dataset') # you can also load an existing dataset with its uuid using: dsq = dataset('uuid') .. dsq['my_script'] = __file__ .. If you want to upload a currently used Jupyter-notebook you can run this in a cell, specifying the notebook name (or path): .. .. code-block:: python .. dsq['my_notebook'] = 'my_notebook_name.ipynb' .. Make sure your file is saved before you upload it, such that all changes are reflected. .. .. _file-versioning: .. File versioning .. ~~~~~~~~~~~~~~~ .. Several version of the same file can be created by: .. .. code-block:: python .. # from file path .. dsq['my_file'] = file_path_1 .. dsq['my_file'] = file_path_2 # creates a new version of my_file .. # from json object (list or dict) .. dsq['my_json'] = {'a': 1, 'b':[1,2]} .. dsq['my_json'] = {'a': 1, 'b':[1,2,3]} # creates a new version of my_json .. # from xarray .. dsq['my_xr'] = xr.DataArray(np.arange(10)) .. dsq['my_xr'] = xr.DataArray(np.arange(20)) # creates a new version of my_xr .. In this cases, two versions of the same files will be created. .. The different versions of a file can be inspected using the following command: .. .. code-block:: python .. print(my_dataset['my_file'].version_info) .. .. TODO remove print statement fom above once this is solved https://github.com/qEncoder/eTiKeT-testing/issues/23 .. which return the following information : .. .. code-block:: .. File object information .. ======================= .. Name : my_file .. Selected File version : 1720711563075 .. File versions (3) : .. 1720711517406 (created on 11/07/2024 17:25:17) .. 1720711551649 (created on 11/07/2024 17:25:51) .. * 1720711563075 (created on 11/07/2024 17:26:03) .. When accessing a file, by default the latest version (marked by a \*) is returned. .. You can access a specific version of a file by calling: .. .. code-block:: python .. my_dataset['my_file'].version_id(1720711517406) .. .. tip:: .. It is also possible to access file versions by specifying the nth version of the file, e.g. .. .. code-block:: python .. my_dataset['my_file'].version(0) # first version .. my_dataset['my_file'].version(1) # second version .. my_dataset['my_file'].version(2) # third version