bkbit.utils.get_ncbi_taxonomy module
This script downloads a zip file containing taxonomic data from a given URL, extracts and processes the content of the ‘names.dmp’ file in memory, and saves the parsed data into JSON files. The script includes three main functions:
- download_and_extract_zip_in_memory(url):
Downloads a zip file from the given URL and extracts the content of the ‘names.dmp’ file in memory.
- parse_dmp_content(dmp_content):
Parses the content of a DMP file and extracts taxonomic information into dictionaries.
- process_and_save_taxdmp_in_memory(url, output_dir):
Downloads and processes the taxdump file from the given URL, and saves the parsed data into separate JSON files in the specified output directory.
- Usage:
The script can be executed as a standalone program. Modify the URL and output directory as needed.
- bkbit.utils.get_ncbi_taxonomy.download_and_extract_zip_in_memory(url)[source]
Downloads a zip file from the given URL and extracts the content of the ‘names.dmp’ file in memory.
- Parameters:
url (str) – The URL of the zip file to download.
- Returns:
The content of the ‘names.dmp’ file as a string.
- Return type:
str
- Raises:
requests.exceptions.HTTPError – If the file download fails with a non-200 status code.
- bkbit.utils.get_ncbi_taxonomy.load_json(file_path)[source]
Load JSON data from a file.
- Parameters:
file_path (str) – The path to the JSON file.
- Returns:
The loaded JSON data.
- Return type:
dict
- bkbit.utils.get_ncbi_taxonomy.parse_dmp_content(dmp_content)[source]
Parses the content of a DMP file and extracts taxonomic information.
- Parameters:
dmp_content (str) – The content of the DMP file.
- Returns:
- A tuple containing three dictionaries:
taxid_to_scientific_name: A dictionary mapping taxonomic IDs to scientific names.
taxid_to_common_name: A dictionary mapping taxonomic IDs to common names.
scientific_name_to_taxid: A dictionary mapping scientific names to taxonomic IDs.
- Return type:
tuple
- bkbit.utils.get_ncbi_taxonomy.process_and_save_taxdmp_in_memory(url, output_dir)[source]
Downloads and processes the taxdump file from the given URL, and saves the parsed data into separate JSON files in the specified output directory.
- Parameters:
url (str) – The URL of the taxdump file to download and process.
output_dir (str) – The directory where the parsed data will be saved.
- Returns:
None