provenaclient.modules.datastore

Created Date: Thursday June 6th 2024 +1000 Author: Peter Baker —– Last Modified: Thursday June 6th 2024 1:39:55 pm +1000 Modified By: Peter Baker —– Description: Datastore L3 module. Includes the Data store review sub module. —– HISTORY: Date By Comments ———- — ———————————————————

29-08-2024 | Parth Kulkarni | Added Downloading Specific file/directory functionality to interactive class. 22-08-2024 | Parth Kulkarni | Completed Interactive Dataset class + Doc Strings. 15-08-2024 | Parth Kulkarni | Added a prototype/draft of the Interactive Dataset Class.

Attributes

DEFAULT_SEARCH_LIMIT

DATASTORE_DEFAULT_SEARCH_LIMIT

Classes

ReviewSubModule

This class interface just captures that the client has an instantiated auth

InteractiveDataset

This class interface just captures that the client has an instantiated auth

Datastore

This class interface just captures that the client has an instantiated auth

Module Contents

provenaclient.modules.datastore.DEFAULT_SEARCH_LIMIT = 25
provenaclient.modules.datastore.DATASTORE_DEFAULT_SEARCH_LIMIT = 20
class provenaclient.modules.datastore.ReviewSubModule(auth: provenaclient.modules.module_helpers.AuthManager, config: provenaclient.modules.module_helpers.Config, datastore_client: provenaclient.clients.DatastoreClient)[source]

Bases: provenaclient.modules.module_helpers.ModuleService

This class interface just captures that the client has an instantiated auth manager which allows for helper functions abstracted for L3 clients.

_datastore_client: provenaclient.clients.DatastoreClient
_auth
_config
async delete_dataset_reviewer(reviewer_id: str) None[source]

Delete a reviewer.

Parameters:

reviewer_id (str) – Id of an existing reviewer within the system.

async add_dataset_reviewer(reviewer_id: str) None[source]

Add a reviewer.

Parameters:

reviewer_id (str) – Id of a reviewer.

async dataset_approval_request(approval_request: ProvenaInterfaces.DataStoreAPI.ReleaseApprovalRequest) ProvenaInterfaces.DataStoreAPI.ReleaseApprovalRequestResponse[source]

Submit a request for approval of dataset.

Parameters:

approval_request (ReleaseApprovalRequest) – An object that requires the dataset id, approver id and notes

Returns:

Contains details of the approval request.

Return type:

ReleaseApprovalRequestResponse

async action_approval_request(action_approval_request: ProvenaInterfaces.DataStoreAPI.ActionApprovalRequest) ProvenaInterfaces.DataStoreAPI.ActionApprovalRequestResponse[source]

Action an approval request from a dataset approval request via the datastore.

Parameters:

action_approval_request (ActionApprovalRequest) – The dataset id, your decision of approval and any extra information you want to add (notes).

Returns:

The details of the approval action and the relevant dataset details.

Return type:

ActionApprovalRequestResponse

class provenaclient.modules.datastore.InteractiveDataset(dataset_id: str, auth: provenaclient.modules.module_helpers.AuthManager, datastore_client: provenaclient.clients.DatastoreClient, io: provenaclient.modules.submodules.IOSubModule)[source]

Bases: provenaclient.modules.module_helpers.ModuleService

This class interface just captures that the client has an instantiated auth manager which allows for helper functions abstracted for L3 clients.

dataset_id: str
auth: provenaclient.modules.module_helpers.AuthManager
datastore_client: provenaclient.clients.DatastoreClient
io: provenaclient.modules.submodules.IOSubModule
_auth
_datastore_client
async fetch_dataset() ProvenaInterfaces.DataStoreAPI.RegistryFetchResponse[source]

Fetches current dataset from the datastore.

Returns:

A interactive python datatype of type RegistryFetchResponse containing the dataset details.

Return type:

RegistryFetchResponse

async download_all_files(destination_directory: str) None[source]

Downloads all files to the destination path for your current dataset.

  • Fetches info

  • Fetches creds

  • Uses s3 cloud path lib to download all files to specified location

Parameters:

destination_directory (str):

The destination path to save files to - use a directory

async upload_all_files(source_directory: str) None[source]

Uploads all files in the source path to the current dataset’s storage location.

  • Fetches info

  • Fetches creds

  • Uses s3 cloud path lib to upload all files to specified location

Parameters:

(str) (source_directory) – The source path to upload files from - use a directory

async version(reason: str) ProvenaInterfaces.RegistryAPI.VersionResponse[source]

Versioning operation which creates a new version from the current dataset.

Parameters:

reason (str) – The reason for versioning this dataset.

Returns:

Response of the versioning of the dataset, containing new version ID and job session ID.

Return type:

VersionResponse

async revert_dataset_metadata(history_id: int, reason: str) ProvenaInterfaces.DataStoreAPI.StatusResponse[source]

Reverts the metadata for the current dataset to a previous identified historical version.

Parameters:
  • history_id (int) – The identifier of the historical version to revert to.

  • reason (str) – The reason for reverting the dataset’s metadata.

Returns:

Response indicating whether your dataset metadata revert request was successful.

Return type:

StatusResponse

async generate_read_access_credentials(console_session_required: bool) ProvenaInterfaces.DataStoreAPI.CredentialResponse[source]
Given an S3 location, will attempt to generate programmatic access keys for

the storage bucket at this particular subdirectory.

Parameters:

console_session_required (bool) – Specifies whether a console session URL is required.

Returns:

The AWS credentials creating read level access into the subset of the bucket requested in the S3 location object.

Return type:

CredentialResponse

async generate_write_access_credentials(console_session_required: bool) ProvenaInterfaces.DataStoreAPI.CredentialResponse[source]
Given an S3 location, will attempt to generate programmatic access keys for

the storage bucket at this particular subdirectory.

Parameters:

console_session_required (bool) – Specifies whether a console session URL is required.

Returns:

The AWS credentials creating write level access into the subset of the bucket requested in the S3 location object.

Return type:

CredentialResponse

async download_specific_file(s3_path: str, destination_directory: str) None[source]

Downloads a specific file or folder for the current dataset from an S3 bucket to a provided destination path.

This method handles various cases: - If s3_path is a specific file, it downloads that file directly to destination_directory. - If s3_path is a folder (without a trailing slash), it downloads the entire folder and its contents, preserving the folder structure in destination_directory. - If s3_path is a folder (with a trailing slash), it downloads all contents (including subfolders) within that folder but not the folder itself to destination_directory.

Parameters:
  • s3_path (str) – The S3 path of the file or folder to download. - If this is a specific file, it will download just that file. - If this is a folder without a trailing slash (e.g., ‘nested’), it will download the entire folder and all its contents, preserving the structure. - If this is a folder with a trailing slash (e.g., ‘nested/’), it will download all contents within that folder but not the folder itself unless subfolders are present.

  • destination_directory (str) – The destination path to save files to - use a directory.

class provenaclient.modules.datastore.Datastore(auth: provenaclient.modules.module_helpers.AuthManager, config: provenaclient.modules.module_helpers.Config, datastore_client: provenaclient.clients.DatastoreClient, search_client: provenaclient.clients.SearchClient)[source]

Bases: provenaclient.modules.module_helpers.ModuleService

This class interface just captures that the client has an instantiated auth manager which allows for helper functions abstracted for L3 clients.

_datastore_client: provenaclient.clients.DatastoreClient
_search_client: provenaclient.clients.SearchClient
review: ReviewSubModule
io: provenaclient.modules.submodules.IOSubModule
_auth
_config
async get_health_check() provenaclient.models.HealthCheckResponse[source]

Health check the API

Returns:

Response

Return type:

HealthCheckResponse

async fetch_dataset(id: str) ProvenaInterfaces.DataStoreAPI.RegistryFetchResponse[source]

Fetches a dataset from the datastore based on the provided ID.

Parameters:

id (str) – The unique identifier of the dataset to be retrieved. For example: “10378.1/1451860”

Returns:

A interactive python datatype of type RegistryFetchResponse containing the dataset details.

Return type:

RegistryFetchResponse

async mint_dataset(dataset_mint_info: ProvenaInterfaces.RegistryModels.CollectionFormat) ProvenaInterfaces.DataStoreAPI.MintResponse[source]

Creates a new dataset in the datastore with the provided dataset information.

Parameters:

dataset_mint_info (CollectionFormat) – A structured format containing all necessary information to register a new dataset, including associations, approvals, and dataset-specific information.

Returns:

A interactive python datatype of type MintResponse containing the newly created dataset details.

Return type:

MintResponse

async validate_dataset_metadata(metadata_payload: ProvenaInterfaces.RegistryModels.CollectionFormat) ProvenaInterfaces.DataStoreAPI.StatusResponse[source]

Validates the dataset metadata creation for testing and does not publish.

Parameters:

metadata_payload (CollectionFormat) – A structured format containing all necessary information to register a new dataset, including associations, approvals, and dataset-specific information.

Returns:

Response indicating whether your dataset metadata setup is valid and correct.

Return type:

StatusResponse

async update_dataset_metadata(handle_id: str, reason: str, metadata_payload: ProvenaInterfaces.RegistryModels.CollectionFormat) ProvenaInterfaces.DataStoreAPI.UpdateMetadataResponse[source]

Updates an existing dataset’s metadata.

Parameters:
  • handle_id (str) – The id of the dataset.

  • reason (str) – The reason for changing metadata of the dataset.

  • metadata_payload (CollectionFormat) – A structured format containing all necessary information to register a new dataset, including associations, approvals, and dataset-specific information.

Returns:

The updated metadata response

Return type:

UpdateMetadataResponse

async revert_dataset_metadata(metadata_payload: provenaclient.models.RevertMetadata) ProvenaInterfaces.DataStoreAPI.StatusResponse[source]

Reverts the metadata for a dataset to a previous identified historical version.

Parameters:

metadata_payload (RevertMetadata) – The revert request, passed through to the registry API and requires dataset id, history id and reason for reverting.

Returns:

Response indicating whether your dataset revert metadata request was correct.

Return type:

StatusResponse

async version_dataset(version_request: ProvenaInterfaces.RegistryAPI.VersionRequest) ProvenaInterfaces.RegistryAPI.VersionResponse[source]

Versioning operation which creates a new version from the specified ID.

Parameters:

version_request (VersionRequest) – The request which includes the item ID and reason for versioning.

Returns:

Response of the versioning of the dataset, containing new version ID and job session ID.

Return type:

VersionResponse

async for_all_datasets(list_dataset_request: ProvenaInterfaces.RegistryAPI.NoFilterSubtypeListRequest, total_limit: provenaclient.utils.exceptions.Optional[int] = None) AsyncGenerator[ProvenaInterfaces.DataStoreAPI.ItemDataset, None][source]
Fetches all datasets based on the provided datasets in datastore based on

the provided sorting criteria, pagination key and page size.

Parameters:
  • list_dataset_request (NoFilterSubtypeListRequest) – A request object configured with sorting options, pagination keys, and page size that defines how datasets are queried from the datastore.

  • total_limit (Optional[int], optional) – A maximum number of datasets to fetch. If specified, the generator will stop yielding datasets once this limit is reached. If None, it will fetch datasets until there are no more to fetch.

Returns:

An asynchronous generator yielding “ItemDataset” object which is an individual dataset from the datastore.

Return type:

AsyncGenerator[ItemDataset, None]

Yields:

Iterator[AsyncGenerator[ItemDataset, None]] – Each yield provides a “ItemDataset” containing an individual dataset.

async list_datasets(list_dataset_request: ProvenaInterfaces.RegistryAPI.NoFilterSubtypeListRequest) ProvenaInterfaces.RegistryAPI.DatasetListResponse[source]

Takes a specific dataset list request and returns the response.

Parameters:

list_dataset_request (NoFilterSubtypeListRequest) – A request object configured with sorting options, pagination keys, and page size that defines how datasets are queried from the datastore.

Returns:

Response containing the requested datasets in the datastore based on sort criteria and page size, and contains other attributes such as total_item_counts and optional pagination key.

Return type:

DatasetListResponse

async list_all_datasets(sort_criteria: provenaclient.utils.exceptions.Optional[ProvenaInterfaces.RegistryAPI.SortOptions] = None) List[ProvenaInterfaces.DataStoreAPI.ItemDataset][source]

Fetches all datasets from the datastore and you may provide your own sort criteria. By default uses display name sort criteria.

Parameters:

sort_criteria (Optional[SortOptions]) – An object configured with sorting options that you want when displaying all datasets within the datastore.

Returns:

A list of all datasets in the datastore, sorted as requested.

Return type:

List[ItemDataset]

async generate_dataset_presigned_url(dataset_presigned_request: ProvenaInterfaces.DataStoreAPI.PresignedURLRequest) ProvenaInterfaces.DataStoreAPI.PresignedURLResponse[source]

Generates a presigned url for an existing dataset.

Parameters:

dataset_presigned_request (PresignedURLRequest) – Contains the dataset id + file path + length of expiry of URL.

Returns:

A response with the presigned url.

Return type:

PresignedURLResponse

async generate_read_access_credentials(credentials: ProvenaInterfaces.DataStoreAPI.CredentialsRequest) ProvenaInterfaces.DataStoreAPI.CredentialResponse[source]
Given an S3 location, will attempt to generate programmatic access keys for

the storage bucket at this particular subdirectory.

Parameters:

credentials (CredentialsRequest) – Contains the dataset id + console session URL required flag (boolean)

Returns:

The AWS credentials creating read level access into the subset of the bucket requested in the S3 location object.

Return type:

CredentialResponse

async generate_write_access_credentials(credentials: ProvenaInterfaces.DataStoreAPI.CredentialsRequest) ProvenaInterfaces.DataStoreAPI.CredentialResponse[source]
Given an S3 location, will attempt to generate programmatic access keys for

the storage bucket at this particular subdirectory.

Parameters:

credentials (CredentialsRequest) – Contains the dataset id + console session URL required flag (boolean)

Returns:

The AWS credentials creating write level access into the subset of the bucket requested in the S3 location object.

Return type:

CredentialResponse

async search_datasets(query: str, limit: int = DEFAULT_SEARCH_LIMIT) provenaclient.models.LoadedSearchResponse[source]

Utilises the L2 search client to search for datasets with the specified query.

Loads all datasets in the result payload from the data store and sorts based on auth, or other exceptions if not successful.

Parameters:
  • query (str) – The query to make limit (int, optional): The result

  • DEFAULT_SEARCH_LIMIT. (count limit. Defaults to)

Returns:

The loaded items incl errors.

Return type:

LoadedSearchResponse

async interactive_dataset(dataset_id: str) InteractiveDataset[source]

Creates an interactive “session” with a dataset that allows you to perform further operations without re-supplying dataset id and creating objects required for other methods.

Parameters:

dataset_id (str) – The unique identifier of the dataset to be retrieved. For example: “10378.1/1451860”

Returns:

An instance that allows you to perform various operations on the provided dataset.

Return type:

InteractiveDataset