bkbit.utils.get_ncbi_taxonomy module

This script downloads a zip file containing taxonomic data from a given URL, extracts and processes the content of the ‘names.dmp’ file in memory, and saves the parsed data into JSON files. The script includes three main functions:

download_and_extract_zip_in_memory(url):
Downloads a zip file from the given URL and extracts the content of the ‘names.dmp’ file in memory.
parse_dmp_content(dmp_content):
Parses the content of a DMP file and extracts taxonomic information into dictionaries.
process_and_save_taxdmp_in_memory(url, output_dir):
Downloads and processes the taxdump file from the given URL, and saves the parsed data into separate JSON files in the specified output directory.

Usage:: The script can be executed as a standalone program. Modify the URL and output directory as needed.

bkbit.utils.get_ncbi_taxonomy.download_and_extract_zip_in_memory(url)[source]

Downloads a zip file from the given URL and extracts the content of the ‘names.dmp’ file in memory.

Parameters:: url (str) – The URL of the zip file to download.
Returns:: The content of the ‘names.dmp’ file as a string.
Return type:: str
Raises:: requests.exceptions.HTTPError – If the file download fails with a non-200 status code.

bkbit.utils.get_ncbi_taxonomy.load_json(file_path)[source]

Load JSON data from a file.

Parameters:: file_path (str) – The path to the JSON file.
Returns:: The loaded JSON data.
Return type:: dict

bkbit.utils.get_ncbi_taxonomy.parse_dmp_content(dmp_content)[source]

Parses the content of a DMP file and extracts taxonomic information.

Parameters:

dmp_content (str) – The content of the DMP file.

Returns:

A tuple containing three dictionaries:

taxid_to_scientific_name: A dictionary mapping taxonomic IDs to scientific names.
taxid_to_common_name: A dictionary mapping taxonomic IDs to common names.
scientific_name_to_taxid: A dictionary mapping scientific names to taxonomic IDs.

Return type:

tuple

bkbit.utils.get_ncbi_taxonomy.process_and_save_taxdmp_in_memory(url, output_dir)[source]

Downloads and processes the taxdump file from the given URL, and saves the parsed data into separate JSON files in the specified output directory.

Parameters:

url (str) – The URL of the taxdump file to download and process.
output_dir (str) – The directory where the parsed data will be saved.

Returns:

None