bkbit.utils.get_ncbi_taxonomy module

This script downloads a zip file containing taxonomic data from a given URL, extracts and processes the content of the ‘names.dmp’ file in memory, and saves the parsed data into JSON files. The script includes three main functions:

  1. download_and_extract_zip_in_memory(url):

    Downloads a zip file from the given URL and extracts the content of the ‘names.dmp’ file in memory.

  2. parse_dmp_content(dmp_content):

    Parses the content of a DMP file and extracts taxonomic information into dictionaries.

  3. process_and_save_taxdmp_in_memory(url, output_dir):

    Downloads and processes the taxdump file from the given URL, and saves the parsed data into separate JSON files in the specified output directory.

Usage:

The script can be executed as a standalone program. Modify the URL and output directory as needed.

bkbit.utils.get_ncbi_taxonomy.download_and_extract_zip_in_memory(url)[source]

Downloads a zip file from the given URL and extracts the content of the ‘names.dmp’ file in memory.

Parameters:

url (str) – The URL of the zip file to download.

Returns:

The content of the ‘names.dmp’ file as a string.

Return type:

str

Raises:

requests.exceptions.HTTPError – If the file download fails with a non-200 status code.

bkbit.utils.get_ncbi_taxonomy.load_json(file_path)[source]

Load JSON data from a file.

Parameters:

file_path (str) – The path to the JSON file.

Returns:

The loaded JSON data.

Return type:

dict

bkbit.utils.get_ncbi_taxonomy.parse_dmp_content(dmp_content)[source]

Parses the content of a DMP file and extracts taxonomic information.

Parameters:

dmp_content (str) – The content of the DMP file.

Returns:

A tuple containing three dictionaries:
  • taxid_to_scientific_name: A dictionary mapping taxonomic IDs to scientific names.

  • taxid_to_common_name: A dictionary mapping taxonomic IDs to common names.

  • scientific_name_to_taxid: A dictionary mapping scientific names to taxonomic IDs.

Return type:

tuple

bkbit.utils.get_ncbi_taxonomy.process_and_save_taxdmp_in_memory(url, output_dir)[source]

Downloads and processes the taxdump file from the given URL, and saves the parsed data into separate JSON files in the specified output directory.

Parameters:
  • url (str) – The URL of the taxdump file to download and process.

  • output_dir (str) – The directory where the parsed data will be saved.

Returns:

None