provenaclient.modules.datastore
Created Date: Thursday June 6th 2024 +1000 Author: Peter Baker —– Last Modified: Thursday June 6th 2024 1:39:55 pm +1000 Modified By: Peter Baker —– Description: Datastore L3 module. Includes the Data store review sub module. —– HISTORY: Date By Comments ———- — ———————————————————
29-08-2024 | Parth Kulkarni | Added Downloading Specific file/directory functionality to interactive class. 22-08-2024 | Parth Kulkarni | Completed Interactive Dataset class + Doc Strings. 15-08-2024 | Parth Kulkarni | Added a prototype/draft of the Interactive Dataset Class.
Attributes
Classes
This class interface just captures that the client has an instantiated auth |
|
This class interface just captures that the client has an instantiated auth |
|
This class interface just captures that the client has an instantiated auth |
Module Contents
- provenaclient.modules.datastore.DEFAULT_SEARCH_LIMIT = 25
- provenaclient.modules.datastore.DATASTORE_DEFAULT_SEARCH_LIMIT = 20
- class provenaclient.modules.datastore.ReviewSubModule(auth: provenaclient.modules.module_helpers.AuthManager, config: provenaclient.modules.module_helpers.Config, datastore_client: provenaclient.clients.DatastoreClient)[source]
Bases:
provenaclient.modules.module_helpers.ModuleService
This class interface just captures that the client has an instantiated auth manager which allows for helper functions abstracted for L3 clients.
- _datastore_client: provenaclient.clients.DatastoreClient
- _auth
- _config
- async delete_dataset_reviewer(reviewer_id: str) None [source]
Delete a reviewer.
- Parameters:
reviewer_id (str) – Id of an existing reviewer within the system.
- async add_dataset_reviewer(reviewer_id: str) None [source]
Add a reviewer.
- Parameters:
reviewer_id (str) – Id of a reviewer.
- async dataset_approval_request(approval_request: ProvenaInterfaces.DataStoreAPI.ReleaseApprovalRequest) ProvenaInterfaces.DataStoreAPI.ReleaseApprovalRequestResponse [source]
Submit a request for approval of dataset.
- Parameters:
approval_request (ReleaseApprovalRequest) – An object that requires the dataset id, approver id and notes
- Returns:
Contains details of the approval request.
- Return type:
ReleaseApprovalRequestResponse
- async action_approval_request(action_approval_request: ProvenaInterfaces.DataStoreAPI.ActionApprovalRequest) ProvenaInterfaces.DataStoreAPI.ActionApprovalRequestResponse [source]
Action an approval request from a dataset approval request via the datastore.
- Parameters:
action_approval_request (ActionApprovalRequest) – The dataset id, your decision of approval and any extra information you want to add (notes).
- Returns:
The details of the approval action and the relevant dataset details.
- Return type:
ActionApprovalRequestResponse
- class provenaclient.modules.datastore.InteractiveDataset(dataset_id: str, auth: provenaclient.modules.module_helpers.AuthManager, datastore_client: provenaclient.clients.DatastoreClient, io: provenaclient.modules.submodules.IOSubModule)[source]
Bases:
provenaclient.modules.module_helpers.ModuleService
This class interface just captures that the client has an instantiated auth manager which allows for helper functions abstracted for L3 clients.
- dataset_id: str
- auth: provenaclient.modules.module_helpers.AuthManager
- datastore_client: provenaclient.clients.DatastoreClient
- _auth
- _datastore_client
- async fetch_dataset() ProvenaInterfaces.DataStoreAPI.RegistryFetchResponse [source]
Fetches current dataset from the datastore.
- Returns:
A interactive python datatype of type RegistryFetchResponse containing the dataset details.
- Return type:
RegistryFetchResponse
- async download_all_files(destination_directory: str) None [source]
Downloads all files to the destination path for your current dataset.
Fetches info
Fetches creds
Uses s3 cloud path lib to download all files to specified location
Parameters:
- destination_directory (str):
The destination path to save files to - use a directory
- async upload_all_files(source_directory: str) None [source]
Uploads all files in the source path to the current dataset’s storage location.
Fetches info
Fetches creds
Uses s3 cloud path lib to upload all files to specified location
- Parameters:
(str) (source_directory) – The source path to upload files from - use a directory
- async version(reason: str) ProvenaInterfaces.RegistryAPI.VersionResponse [source]
Versioning operation which creates a new version from the current dataset.
- Parameters:
reason (str) – The reason for versioning this dataset.
- Returns:
Response of the versioning of the dataset, containing new version ID and job session ID.
- Return type:
VersionResponse
- async revert_dataset_metadata(history_id: int, reason: str) ProvenaInterfaces.DataStoreAPI.StatusResponse [source]
Reverts the metadata for the current dataset to a previous identified historical version.
- Parameters:
history_id (int) – The identifier of the historical version to revert to.
reason (str) – The reason for reverting the dataset’s metadata.
- Returns:
Response indicating whether your dataset metadata revert request was successful.
- Return type:
StatusResponse
- async generate_read_access_credentials(console_session_required: bool) ProvenaInterfaces.DataStoreAPI.CredentialResponse [source]
- Given an S3 location, will attempt to generate programmatic access keys for
the storage bucket at this particular subdirectory.
- Parameters:
console_session_required (bool) – Specifies whether a console session URL is required.
- Returns:
The AWS credentials creating read level access into the subset of the bucket requested in the S3 location object.
- Return type:
CredentialResponse
- async generate_write_access_credentials(console_session_required: bool) ProvenaInterfaces.DataStoreAPI.CredentialResponse [source]
- Given an S3 location, will attempt to generate programmatic access keys for
the storage bucket at this particular subdirectory.
- Parameters:
console_session_required (bool) – Specifies whether a console session URL is required.
- Returns:
The AWS credentials creating write level access into the subset of the bucket requested in the S3 location object.
- Return type:
CredentialResponse
- async download_specific_file(s3_path: str, destination_directory: str) None [source]
Downloads a specific file or folder for the current dataset from an S3 bucket to a provided destination path.
This method handles various cases: - If s3_path is a specific file, it downloads that file directly to destination_directory. - If s3_path is a folder (without a trailing slash), it downloads the entire folder and its contents, preserving the folder structure in destination_directory. - If s3_path is a folder (with a trailing slash), it downloads all contents (including subfolders) within that folder but not the folder itself to destination_directory.
- Parameters:
s3_path (str) – The S3 path of the file or folder to download. - If this is a specific file, it will download just that file. - If this is a folder without a trailing slash (e.g., ‘nested’), it will download the entire folder and all its contents, preserving the structure. - If this is a folder with a trailing slash (e.g., ‘nested/’), it will download all contents within that folder but not the folder itself unless subfolders are present.
destination_directory (str) – The destination path to save files to - use a directory.
- class provenaclient.modules.datastore.Datastore(auth: provenaclient.modules.module_helpers.AuthManager, config: provenaclient.modules.module_helpers.Config, datastore_client: provenaclient.clients.DatastoreClient, search_client: provenaclient.clients.SearchClient)[source]
Bases:
provenaclient.modules.module_helpers.ModuleService
This class interface just captures that the client has an instantiated auth manager which allows for helper functions abstracted for L3 clients.
- _datastore_client: provenaclient.clients.DatastoreClient
- _search_client: provenaclient.clients.SearchClient
- review: ReviewSubModule
- _auth
- _config
- async get_health_check() provenaclient.models.HealthCheckResponse [source]
Health check the API
- Returns:
Response
- Return type:
- async fetch_dataset(id: str) ProvenaInterfaces.DataStoreAPI.RegistryFetchResponse [source]
Fetches a dataset from the datastore based on the provided ID.
- Parameters:
id (str) – The unique identifier of the dataset to be retrieved. For example: “10378.1/1451860”
- Returns:
A interactive python datatype of type RegistryFetchResponse containing the dataset details.
- Return type:
RegistryFetchResponse
- async mint_dataset(dataset_mint_info: ProvenaInterfaces.RegistryModels.CollectionFormat) ProvenaInterfaces.DataStoreAPI.MintResponse [source]
Creates a new dataset in the datastore with the provided dataset information.
- Parameters:
dataset_mint_info (CollectionFormat) – A structured format containing all necessary information to register a new dataset, including associations, approvals, and dataset-specific information.
- Returns:
A interactive python datatype of type MintResponse containing the newly created dataset details.
- Return type:
MintResponse
- async validate_dataset_metadata(metadata_payload: ProvenaInterfaces.RegistryModels.CollectionFormat) ProvenaInterfaces.DataStoreAPI.StatusResponse [source]
Validates the dataset metadata creation for testing and does not publish.
- Parameters:
metadata_payload (CollectionFormat) – A structured format containing all necessary information to register a new dataset, including associations, approvals, and dataset-specific information.
- Returns:
Response indicating whether your dataset metadata setup is valid and correct.
- Return type:
StatusResponse
- async update_dataset_metadata(handle_id: str, reason: str, metadata_payload: ProvenaInterfaces.RegistryModels.CollectionFormat) ProvenaInterfaces.DataStoreAPI.UpdateMetadataResponse [source]
Updates an existing dataset’s metadata.
- Parameters:
handle_id (str) – The id of the dataset.
reason (str) – The reason for changing metadata of the dataset.
metadata_payload (CollectionFormat) – A structured format containing all necessary information to register a new dataset, including associations, approvals, and dataset-specific information.
- Returns:
The updated metadata response
- Return type:
UpdateMetadataResponse
- async revert_dataset_metadata(metadata_payload: provenaclient.models.RevertMetadata) ProvenaInterfaces.DataStoreAPI.StatusResponse [source]
Reverts the metadata for a dataset to a previous identified historical version.
- Parameters:
metadata_payload (RevertMetadata) – The revert request, passed through to the registry API and requires dataset id, history id and reason for reverting.
- Returns:
Response indicating whether your dataset revert metadata request was correct.
- Return type:
StatusResponse
- async version_dataset(version_request: ProvenaInterfaces.RegistryAPI.VersionRequest) ProvenaInterfaces.RegistryAPI.VersionResponse [source]
Versioning operation which creates a new version from the specified ID.
- Parameters:
version_request (VersionRequest) – The request which includes the item ID and reason for versioning.
- Returns:
Response of the versioning of the dataset, containing new version ID and job session ID.
- Return type:
VersionResponse
- async for_all_datasets(list_dataset_request: ProvenaInterfaces.RegistryAPI.NoFilterSubtypeListRequest, total_limit: provenaclient.utils.exceptions.Optional[int] = None) AsyncGenerator[ProvenaInterfaces.DataStoreAPI.ItemDataset, None] [source]
- Fetches all datasets based on the provided datasets in datastore based on
the provided sorting criteria, pagination key and page size.
- Parameters:
list_dataset_request (NoFilterSubtypeListRequest) – A request object configured with sorting options, pagination keys, and page size that defines how datasets are queried from the datastore.
total_limit (Optional[int], optional) – A maximum number of datasets to fetch. If specified, the generator will stop yielding datasets once this limit is reached. If None, it will fetch datasets until there are no more to fetch.
- Returns:
An asynchronous generator yielding “ItemDataset” object which is an individual dataset from the datastore.
- Return type:
AsyncGenerator[ItemDataset, None]
- Yields:
Iterator[AsyncGenerator[ItemDataset, None]] – Each yield provides a “ItemDataset” containing an individual dataset.
- async list_datasets(list_dataset_request: ProvenaInterfaces.RegistryAPI.NoFilterSubtypeListRequest) ProvenaInterfaces.RegistryAPI.DatasetListResponse [source]
Takes a specific dataset list request and returns the response.
- Parameters:
list_dataset_request (NoFilterSubtypeListRequest) – A request object configured with sorting options, pagination keys, and page size that defines how datasets are queried from the datastore.
- Returns:
Response containing the requested datasets in the datastore based on sort criteria and page size, and contains other attributes such as total_item_counts and optional pagination key.
- Return type:
DatasetListResponse
- async list_all_datasets(sort_criteria: provenaclient.utils.exceptions.Optional[ProvenaInterfaces.RegistryAPI.SortOptions] = None) List[ProvenaInterfaces.DataStoreAPI.ItemDataset] [source]
Fetches all datasets from the datastore and you may provide your own sort criteria. By default uses display name sort criteria.
- Parameters:
sort_criteria (Optional[SortOptions]) – An object configured with sorting options that you want when displaying all datasets within the datastore.
- Returns:
A list of all datasets in the datastore, sorted as requested.
- Return type:
List[ItemDataset]
- async generate_dataset_presigned_url(dataset_presigned_request: ProvenaInterfaces.DataStoreAPI.PresignedURLRequest) ProvenaInterfaces.DataStoreAPI.PresignedURLResponse [source]
Generates a presigned url for an existing dataset.
- Parameters:
dataset_presigned_request (PresignedURLRequest) – Contains the dataset id + file path + length of expiry of URL.
- Returns:
A response with the presigned url.
- Return type:
PresignedURLResponse
- async generate_read_access_credentials(credentials: ProvenaInterfaces.DataStoreAPI.CredentialsRequest) ProvenaInterfaces.DataStoreAPI.CredentialResponse [source]
- Given an S3 location, will attempt to generate programmatic access keys for
the storage bucket at this particular subdirectory.
- Parameters:
credentials (CredentialsRequest) – Contains the dataset id + console session URL required flag (boolean)
- Returns:
The AWS credentials creating read level access into the subset of the bucket requested in the S3 location object.
- Return type:
CredentialResponse
- async generate_write_access_credentials(credentials: ProvenaInterfaces.DataStoreAPI.CredentialsRequest) ProvenaInterfaces.DataStoreAPI.CredentialResponse [source]
- Given an S3 location, will attempt to generate programmatic access keys for
the storage bucket at this particular subdirectory.
- Parameters:
credentials (CredentialsRequest) – Contains the dataset id + console session URL required flag (boolean)
- Returns:
The AWS credentials creating write level access into the subset of the bucket requested in the S3 location object.
- Return type:
CredentialResponse
- async search_datasets(query: str, limit: int = DEFAULT_SEARCH_LIMIT) provenaclient.models.LoadedSearchResponse [source]
Utilises the L2 search client to search for datasets with the specified query.
Loads all datasets in the result payload from the data store and sorts based on auth, or other exceptions if not successful.
- Parameters:
query (str) – The query to make limit (int, optional): The result
DEFAULT_SEARCH_LIMIT. (count limit. Defaults to)
- Returns:
The loaded items incl errors.
- Return type:
- async interactive_dataset(dataset_id: str) InteractiveDataset [source]
Creates an interactive “session” with a dataset that allows you to perform further operations without re-supplying dataset id and creating objects required for other methods.
- Parameters:
dataset_id (str) – The unique identifier of the dataset to be retrieved. For example: “10378.1/1451860”
- Returns:
An instance that allows you to perform various operations on the provided dataset.
- Return type: