bkbit.data_translators.library_generation_translator module

Module for parsing and processing specimen data using BICAN models and NIMP API endpoints. This module provides functionality to:

  1. Parse nhash IDs for specimens from the NIMP API, either in a top-down (descendants) or bottom-up (ancestors) fashion.

  2. Generate BICAN objects based on the parsed specimen data.

  3. Serialize the extracted information into JSON-LD format for further use.

  4. Check if values belong to a specific enumeration set.

Classes:

SpecimenPortal: A class responsible for handling the parsing and generation of BICAN objects for specimen data.

Functions:

get_field_type: Determines if an annotation is multivalued and returns the field type. generate_bican_object: Generates a BICAN object based on the provided data and parent relationships. parse_nhash_id_bottom_up: Parses ancestors of the provided nhash ID, generating BICAN objects. parse_nhash_id_top_down: Parses descendants of the provided nhash ID, generating BICAN objects. serialize_to_jsonld: Serializes the generated BICAN objects into JSON-LD format. specimen2jsonld: Command-line function for parsing nhash IDs and serializing the result into JSON-LD format.

Usage:

The module can be run as a standalone script using the command-line interface with the appropriate arguments and options:

` python specimen_portal.py <nhash_id> [-d] `

This script will parse the nhash ID and serialize the generated data into JSON-LD format, with the option to parse descendants or ancestors.

Example

` python specimen_portal.py "DO-GICE7463" -d `

This will parse the descendants of the specimen identified by the nhash ID and save the result as a JSON-LD file.

Dependencies:
  • json

  • os

  • click

  • tqdm

  • multiprocessing.Pool

  • bkbit.models.library_generation

  • bkbit.utils.nimp_api_endpoints (get_data, get_ancestors, get_descendants)

class bkbit.data_translators.library_generation_translator.SpecimenPortal(jwt_token)[source]

Bases: object

The SpecimenPortal class is responsible for parsing and generating BICAN objects for specimen data by traversing through nodes (ancestors or descendants) based on nhash IDs. It provides utilities to recursively parse node relationships and convert the data into a JSON-LD format.

jwt_token

The authentication token used to access the specimen data.

Type:

str

generated_objects

A dictionary that stores generated BICAN objects, keyed by nhash IDs.

Type:

dict

get_field_type(annotation, collected_annotations=None)[source]

Static method that determines whether a field is multivalued and returns the type of the field.

parse_nhash_id_bottom_up(nhash_id)[source]

Parses ancestors of the provided nhash_id, starting from the node and moving upwards to the root (Donor).

parse_nhash_id_top_down(nhash_id)[source]

Parses descendants of the provided nhash_id, starting from the node and moving downwards to the leaves (Library Pool).

generate_bican_object(data, was_derived_from=None)[source]

Generates a BICAN object based on the provided data and parent relationships.

serialize_to_jsonld(exclude_none=True, exclude_unset=False)[source]

Serializes the generated objects into JSON-LD format for further use or storage.

parse_single_nashid(jwt_token, nhash_id, descendants, save_to_file=False)

Parses a single nhash ID and optionally saves the result to a JSON-LD file.

parse_multiple_nashids(jwt_token, file_path, descendants)

Parses multiple nhash IDs from a file and saves the results to JSON-LD files.

Static Methods:
__check_valueset_membership(enum_type, nimp_value):

Checks if a given value belongs to a specified enum.

classmethod generate_bican_object(data, was_derived_from: list[str] | None = None)[source]

Generate a Bican object based on the provided data.

Parameters:
  • data (dict) – The data retrieved from the NIMP portal.

  • was_derived_from (list) – A list of parent NHash IDs.

Returns:

The generated Bican object.

Raises:

None.

static get_field_type(annotation, collected_annotations=None)[source]

Determines the field type based on the provided annotation.

Parameters:
  • annotation – The annotation to determine the field type for.

  • collected_annotations – A dictionary to collect annotations encountered during recursion.

Returns:

A tuple containing a boolean indicating if the field is multivalued and the selected field type.

parse_nhash_id_bottom_up(nhash_id: str)[source]

Parses the given nhash_id from bottom to top, retrieving ancestors and generating respective BICAN objects.

Parameters:

nhash_id (str) – The nhash_id to parse.

Returns:

If there is an error retrieving ancestors or generating objects.

Return type:

None

Raises:

ValueError – If there is an error retrieving ancestors or generating objects.

parse_nhash_id_top_down(nhash_id: str)[source]

Parses the given nhash_id in a top-down manner, traversing the nodes all the way to the leaves (Library Pool).

Parameters:

nhash_id (str) – The nhash_id to be parsed.

Returns:

If an error occurs while retrieving descendants or generating objects.

Return type:

None

Raises:
  • ValueError – If an error occurs while retrieving descendants.

  • Exception – If an unexpected error occurs while retrieving descendants or generating objects.

serialize_to_jsonld(exclude_none: bool = True, exclude_unset: bool = False)[source]

Serialize the object and write it to the specified output file.

Parameters:

output_file (str) – The path of the output file.

Returns:

None

bkbit.data_translators.library_generation_translator.parse_multiple_nashids(jwt_token, file_path, descendants)[source]

Parse multiple nashids from a file.

Parameters:
  • jwt_token (str) – The JWT token.

  • file_path (str) – The path to the file containing the nashids.

  • descendants (bool) – The direction of parsing. True for descendants, False for ancestors.

Returns:

A list of results from parsing each nashid.

Return type:

list

bkbit.data_translators.library_generation_translator.parse_single_nashid(jwt_token, nhash_id, descendants, save_to_file=False)[source]

Parse a single nashid using the SpecimenPortal class.

Parameters: - jwt_token (str): The JWT token for authentication. - nhash_id (str): The nashid to parse. - descendants (bool): The direction of parsing. True for descendants, False for ancestors. - save_to_file (bool): Whether to save the parsed data to a file. Default is False.

Returns: - None

Raises: - None