# Quickstart ## Initialising the provena client To use the provena client, you will need to provide both `auth` and `config` objects to select the authentication method, and configuration details for the Provena instance you would like to interface with. Replace `my-provena.cloud` with the URL of the Provena instance and change `realm_name` value to the appropriate Keycloak realm. ``` client_config = Config( domain="my-provena.cloud", realm_name="provena" ) auth = DeviceFlow(config=client_config, client_id="client-tools") client = ProvenaClient(auth=auth, config=client_config) await client.datastore.get_health_check() ``` ## Finding and searching items from the Registry ### Fetching items from the Registry You can fetch any item from the Registry using `general_fetch_item()` and the id of the item. ``` res = await client.registry.general_fetch_item(id=id) print(f"{res.item['display_name']} {res.item['item_subtype']}") ``` Each sub-type has an associated `fetch()` too. ``` res = await client.registry.model_run.fetch(id=id) ``` You can use the ID of an item if you know ahead of time, or you can list them. ### Listing items in the Registry #### General list function You can list items from the Registry using `list_general_registry_items()`. This function expects an instance of `GeneralListRequest` and returns a generic result list. ``` from ProvenaInterfaces.RegistryAPI import GeneralListRequest general_list_request = GeneralListRequest( filter_by=None, sort_by=None, pagination_key=None ) res = await client.registry.list_general_registry_items(general_list_request) ``` #### Sub-type list function You can also list item from the sub-type modules via `list_items()`. In this case, it will only return a list of registry items of that sub-type. You may need to cast the returned object into Responses using the pydantic `parse_obj_as()`. ``` from pydantic import parse_obj_as list_model_run_workflow = await client.registry.model_run_workflow.list_items(list_items_payload=general_list_request) model_run_workflow_templates = parse_obj_as(ModelRunWorkflowTemplateListResponse, list_model_run_workflow) print(f"Found {model_run_workflow_templates.total_item_count} model_run_workflow_templates") ``` Iterate over the list response like so. ``` for item in model_run_workflow_templates.items: m = parse_obj_as(ItemModelRunWorkflowTemplate, item) print(m) ``` #### Listing dataset items Datasets are special. List them this way from the client library's `datastore` module. ``` list_dataset = await client.datastore.list_all_datasets() print(f"Found {len(list_dataset)} datasets") for item in list_dataset: d = parse_obj_as(ItemDataset, item) print(d) ``` ### Search for items Provena provides a general search engine to search all items in the Registry. Currently the search results return a `QueryResult` object with: * `status` (Search result success or failure with `success` and `details` attributes) * `results` (list of results of `id` and `score` attributes for each result) * `warnings` ``` qres = await client.search.search_registry(query="fire", limit=10, subtype_filter=ItemSubType.DATASET) assert qres.status.success for r in qres.results: print(f"{r.id} {r.score}") ``` * `subtype_filter` can be `None` if you want to search over all subtypes * `limit` can also be `None` to unrestrict the number of returned items Using the returned `id` for a result, you can issue a fetch request (see above). ## Registering Registering items creates an object in the registry with required metadata/payload values and issues a unique ID (via Handle). See the following Provena docs section for more info on the Registry: [https://docs.provena.io/registry/](https://docs.provena.io/registry/) ### Registering datasets Background information about registering a dataset in Provena is found here: [https://docs.provena.io/data-store/registering-a-dataset.html](https://docs.provena.io/data-store/registering-a-dataset.html) Minting a dataset. ``` register_response = await client.datastore.mint_dataset(dataset_mint_info=metadata) assert register_response.status.success print(f"Created Dataset Handle: {register_response.handle}. Access entity link: https://hdl.handle.net/{register_response.handle}") ``` The dataset may already exists and instead of minting, you may want to version it. Find out more about versioning here: [https://docs.provena.io/versioning/versioning-overview.html](https://docs.provena.io/versioning/versioning-overview.html) First we must find the latest version of the dataset. One way to do this would be to use the following function. ``` async def latest_version_of_dataset(dataset_id): fetch_resp = await client.datastore.fetch_dataset(id=dataset_id) if fetch_resp.item.versioning_info.next_version == None or fetch_resp.item.versioning_info.next_version == "": #assume this was the latest version return dataset_id else: latest_version = await latest_version_of_dataset(fetch_resp.item.versioning_info.next_version) return latest_version ``` Then apply that function... ``` print(f"Find the latest version of this dataset {dataset_item.id}") latest_version = await latest_version_of_dataset(dataset_item.id) #use the latest_version id for the version request version_request = VersionRequest( id=latest_version, reason="Updating dataset" ) register_response = await client.datastore.version_dataset(version_request=version_request) print(f"New Version of Dataset Handle: {register_response.new_version_id}. Access entity link: https://hdl.handle.net/{register_response.new_version_id}") ``` This creates a new dataset with its metadata copied over. We will then update the dataset metadata. ``` register_response = await client.datastore.update_dataset_metadata(handle_id=register_response.new_version_id, reason="Updating metadata for new version", metadata_payload=metadata) ``` ### Registering a model run Build the payload ``` from ProvenaInterfaces.ProvenanceAPI import ModelRunRecord, TemplatedDataset, DatasetType, AssociationInfo from ProvenaInterfaces.AsyncJobAPI import JobStatus # Building the Model Run Payload. model_run_payload = ModelRunRecord( workflow_template_id=model_run_workflow_template_item.id, model_version = None, inputs = [ TemplatedDataset( dataset_template_id=input_dataset_template_item.id, dataset_id=input_dataset_item.id, dataset_type=DatasetType.DATA_STORE ) ], outputs=[ TemplatedDataset( dataset_template_id=output_dataset_template_item.id, dataset_id=output_dataset_item.id, dataset_type=DatasetType.DATA_STORE ) ], annotations=None, display_name="Notebook Model Run Testing", description="Standard Provena Model Run Example", study_id=None, associations=AssociationInfo( modeller_id=person_item.id, requesting_organisation_id=organisation_item.id ), start_time=start_time, end_time=end_time ) ``` Register the payload ``` model_run_register_result = await client.prov_api.register_model_run(model_run_payload=model_run_payload) ``` ``` # Check the response of the model run registration print("Status of registration", model_run_register_result.status) print("Job Session ID", model_run_register_result.session_id) ``` ``` # Check the job to see if it's complete. We will do this by polling the job_api job_result = await client.job_api.await_successful_job_completion(session_id=model_run_register_result.session_id) while job_result.status != JobStatus.SUCCEEDED: # Keep polling on this cell till this turns to "SUCCEEDED" job_result = await client.job_api.await_successful_job_completion(session_id=model_run_register_result.session_id) pprint(job_result.result) pprint(job_result.job_type) print() print("Current job status:", job_result.status) ``` Inspect the result of a successful model run record registration. ``` from pprint import pprint model_run_record = job_result.result["record"] pprint(model_run_record) ``` To understand more about what it means to register a model run, see [https://docs.provena.io/provenance/registering-model-runs/registration-process/overview.html](https://docs.provena.io/provenance/registering-model-runs/registration-process/overview.html). ### Registering other items via the Registry Other than Dataset and Model Run, registering all other registry subtypes is straightforward. You will need to instantiate the relevant DomainInfo payload. Then use the sub-type module in the registry client module to create the item. We will show a workflow for a "Model" sub-type below, however, you can substitute this for other types, i.e. Dataset Template, Model Run Workflow Template, Organisation, Person, Study. First build the DomainInfo payload for the `Model` subtype using `ModelDomainInfo`. ``` from ProvenaInterfaces.RegistryModels import * model_payload = ModelDomainInfo( display_name=model_item_payload['display_name'], name=model_item_payload['name'] , description=model_item_payload['description'], documentation_url=model_item_payload['documentation_url'], source_url=model_item_payload['source_url'], user_metadata=model_item_payload['user_metadata'] ) ``` Register the `Model` subtype instance via `create_item()`. ``` model_register_result = await client.registry.model.create_item(create_item_request=model_payload) assert model_register_result.status.success ``` Just like a `Dataset`, the item may already exists and instead of creating the item, you may want to version it. Find out more about versioning here: [https://docs.provena.io/versioning/versioning-overview.html](https://docs.provena.io/versioning/versioning-overview.html) First we must find the latest version of the item. One way to do this would be to use the following function. We will continue using the `Model` example. One way to find the latest version of the item for the `Model` subtype is: ``` async def latest_version_of_model(item_id): fetch_resp = await client.registry.model.fetch(id=item_id) if fetch_resp.item.versioning_info.next_version == None or fetch_resp.item.versioning_info.next_version == "": #assume this was the latest version return item_id else: latest_version = await latest_version_of_model(fetch_resp.item.versioning_info.next_version) return latest_version ``` We can then apply `latest_version_of_model()` in the steps to register a new version of the `Model`. ``` latest_version = await latest_version_of_model(model_item.id) print(f"Find the latest version of this {model_item.id}") latest_version = await latest_version_of_model(model_item.id) print(f"Latest version: {latest_version}") #use the latest_version id for the version request version_request = VersionRequest( id=latest_version, reason="Updating model" ) register_response = await client.registry.model.version_item(version_request=version_request) print(f"New Version of Model Handle: {register_response.new_version_id}. Access entity link: https://hdl.handle.net/{register_response.new_version_id}") #update the metadata register_response = await client.registry.model.update(id=register_response.new_version_id, reason="Updating metadata for new version", domain_info=model_payload) ```