provenaclient.modules.submodules.datastore_io_submodule

Created Date: Tuesday June 18th 2024 +1000 Author: Peter Baker —– Last Modified: Tuesday June 18th 2024 12:56:41 pm +1000 Modified By: Peter Baker —– Description: Datastore file IO sub module, includes file upload and download helpers —– HISTORY: Date By Comments ———- — ——————————————————— 22-08-2024 | Parth Kulkarni | Implemented method to do download specific files/directory and helper function to create S3 path. 18-06-2024 | Peter Baker | First implementation including download_all_files and upload_all_files methods

Classes

AccessEnum

str(object='') -> str

IOSubModule

This class interface just captures that the client has an instantiated auth

Functions

setup_s3_client(→ cloudpathlib.s3.S3Client)

Uses the datastore creds response to generate an s3 cloud path lib client

print_file_info(→ None)

Pretty prints a file specifying file/directory.

Module Contents

class provenaclient.modules.submodules.datastore_io_submodule.AccessEnum[source]

Bases: str, ProvenaInterfaces.DataStoreAPI.Enum

str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.

READ = 'read'
WRITE = 'write'
provenaclient.modules.submodules.datastore_io_submodule.setup_s3_client(creds: ProvenaInterfaces.DataStoreAPI.CredentialResponse) cloudpathlib.s3.S3Client[source]

Uses the datastore creds response to generate an s3 cloud path lib client with auth.

Parameters:

creds (CredentialResponse) – The data store credentials response

Returns:

The s3 client ready to use

Return type:

s3.S3Client

provenaclient.modules.submodules.datastore_io_submodule.print_file_info(file: cloudpathlib.s3.S3Path) None[source]

Pretty prints a file specifying file/directory. File := s3.S3Path from Cloudpathlib

Parameters:

file (s3.S3Path) – The file to print

class provenaclient.modules.submodules.datastore_io_submodule.IOSubModule(auth: provenaclient.modules.module_helpers.AuthManager, config: provenaclient.modules.module_helpers.Config, datastore_client: provenaclient.clients.DatastoreClient)[source]

Bases: provenaclient.modules.module_helpers.ModuleService

This class interface just captures that the client has an instantiated auth manager which allows for helper functions abstracted for L3 clients.

_datastore_client: provenaclient.clients.DatastoreClient
_auth
_config
async _create_s3_path(dataset_id: str, access_type: AccessEnum) cloudpathlib.S3Path[source]

This helper function creates an S3 URI in PATH format by ingesting the dataset id and access type (read, write).

Parameters:
  • dataset_id (str) – The ID of the dataset to download files for - ensure you have the right access.

  • access_type (AccessEnum) – The access type required (Read or Write)

Returns:

S3Path instance that represent a path in S3 with filesystem path semantics.

Return type:

S3Path

async download_all_files(destination_directory: str, dataset_id: str) None[source]

Downloads all files to the destination path for a given dataset id.

  • Fetches info

  • Fetches creds

  • Uses s3 cloud path lib to download all files to specified location

Parameters:
  • destination_directory (str) – The destination path to save files to - use a directory

  • dataset_id (str) – The ID of the dataset to download files for - ensure you have read access

async list_all_files(dataset_id: str, print_list: bool = False) ProvenaInterfaces.DataStoreAPI.List[cloudpathlib.s3.S3Path][source]

Lists all files stored in the given dataset by ID.

  • Fetches info

  • Fetches creds

  • Uses s3 cloud path lib to list all files to specified location

Parameters:

dataset_id (str) – The ID of the dataset to download files for - ensure you have read access

async upload_all_files(source_directory: str, dataset_id: str) None[source]

Uploads all files in the source path to the specified dataset id’s storage location.

  • Fetches info

  • Fetches creds

  • Uses s3 cloud path lib to upload all files to specified location

Parameters:
  • source_directory (str) – The source path to upload files from - use a directory

  • dataset_id (str) – The ID of the dataset to upload files for - ensure you have write access

async download_specific_file(dataset_id: str, s3_path: str, destination_directory: str) None[source]

Downloads a specific file or folder from an S3 bucket to a provided destination path.

This method handles various cases: - If s3_path is a specific file, it downloads that file directly to destination_directory. - If s3_path is a folder (without a trailing slash), it downloads the entire folder and its contents, preserving the folder structure in destination_directory. - If s3_path is a folder (with a trailing slash), it downloads all contents (including subfolders) within that folder but not the folder itself to destination_directory.

Parameters:
  • dataset_id (str) – The ID of the dataset that contains the files or folders to download from S3.

  • s3_path (str) – The S3 path of the file or folder to download. - If this is a specific file, it will download just that file. - If this is a folder without a trailing slash (e.g., ‘nested’), it will download the entire folder and all its contents, preserving the structure. - If this is a folder with a trailing slash (e.g., ‘nested/’), it will download all contents within that folder but not the folder itself unless subfolders are present.

  • destination_directory (str) – The destination path to save files to - use a directory.