PubMed Mapper: A Python library that map PubMed XML to Python object

Overview

pubmed-mapper: A Python Library that map PubMed XML to Python object

中文文档

1. Philosophy

view UML

Programmatically access PubMed article is a common task for me. Luckily, with the help of eutils, we can access full article data in XML format. What I need is Python objects, not just XML strings, so pubmed-mapper was born.

2. Installation

pip install pubmed-mapper

3. Usage

3.1 use as library

3.1.1 parse a PubMed ID

from pubmed_mapper import Article


article = Article.parse_pmid('32329900')

# PubMed ID
print(article.pmid)  # 32329900

# ids
print(article.ids)  # [pubmed: 32329900, doi: 10.1111/jgs.16467]
print(article.ids[1].id_type)  # doi
print(article.ids[1].id_value)  # 10.1111/jgs.16467

# title
print(article.title)  # Associations of Coffee...

# abstract
print(article.abstract)  # <p><strong>Background: </strong>Coffee and tea...

# keywords
print(article.keywords)  # ['aging', 'coffee; diet; longevity', 'tea']

# MeSH headings
print(article.mesh_headings)  # ['Aged', 'Body Mass Index', '...']

# authors
print(article.authors)  # [hadyab AH Aladdin H, Manson JE JoAnn E, ...]
print(article.authors[0].last_name)  # Shadyab
print(article.authors[0].forename)  # Aladdin H
print(article.authors[0].initials)  # AH
print(article.authors[0].affiliation)  # Department of Family...

# journal
print(article.journal)  # Journal of the American Geriatrics Society
print(article.journal.issn)  # 1532-5415
print(article.journal.issn_type)  # Electronic
print(article.journal.title)  # Journal of the American Geriatrics Society
print(article.journal.abbr)  # J Am Geriatr Soc

# volume
print(article.volume)  # 68

# issue
print(article.issue)  # 9

# references
print(article.references)  # [n. 2013;129:643-659....]
print(article.references[0].citation)  # Lotfield E, Freedman ND...
print(article.references[0].ids)  # []

# pubdate
print(article.pubdate)  # 2020-09-01

3.1.2 parse a downloaded XML file

from lxml import etree
from pubmed_mapper import Article


infile = 'xxx.xml'
with open(infile) as fp:
    root = etree.parse(fp)


articles = []
for pubmed_article_element in root.xpath('/PubmedArticleSet/PubmedArticle'):
    article =  Article.parse_element(pubmed_article_element)
    articles.append(article)

3.2 use as command line software

3.2.1 parse PubMed ID

pubmed-mapper pmid -p 32329900

3.2.2 parse single PubMed XML file

pubmed-mapper file -i data/pubmed21n0001.xml -o output/pubmed21n0001.jl

3.2.3 parse a directory who contains multiple PubMed XML files

pubmed-mapper directory -i data/ -o output/pubmed-mapper.jl

4. FAQs

4.1 There many types of PubMed article publication date, how do you convert it to datetime.date object?

Parse publication date is a hard work, until now pubmed-mapper can't parse all types of them. The types pubmed-mapper can be parsed and the parsed value are:

type value
2021-03-13 2021-03-13
2021-03 2021-03-01
2021 Spring 2021-04-01
2021 2021-01-01
2021 Jan-Feb 2021-01-01
2021 Mar 13-15 2021-03-13
2021 Mar-2022 Jan 2021-03-01
2021-2022 2021-01-01
2021 Mar 13-Dec 15 2021-03-13
1976-1977 Winter 1976-01-01
1977-1978 Fall-Winter 1977-10-01

4.2 What is pubmed-mapper.log generated by pubmed-mapper?

pubmed-mapper.log is the default log file generate by pubmed-mapper, you can change the file by using --log-file options:

pubmed-mapper --log-file my-custom.log file -i data/pubmed21n0001.xml -o output/pubmed21n0001.jl

You can go to this log file to find out more parsing details.

4.3 I want log detail message in my log file?

Using --log-level can log more detail message:

pubmed-mapper --log-file my-custom.log --log-level DEBUG file -i data/pubmed21n0001.xml -o output/pubmed21n0001.jl
Owner
灵魂工具人
大家好,我是灵魂工具人,我会分享一些由我做的生物信息工具,希望大家喜欢。
灵魂工具人
#crypto #cipher #encode #decode #hash

🌹 CYPHER TOOLS 🌹 Written by TMRSWRR Version 1.0.0 All in one tools for CRYPTOLOGY. Instagram: Capture the Root 🖼️ Screenshots 🖼️ 📹 How to use 📹

50 Dec 23, 2022
Toolkit for storing files and attachments in web applications

DEPOT - File Storage Made Easy DEPOT is a framework for easily storing and serving files in web applications on Python2.6+ and Python3.2+. DEPOT suppo

Alessandro Molina 139 Dec 25, 2022
Py2neo is a client library and toolkit for working with Neo4j from within Python

Py2neo Py2neo is a client library and toolkit for working with Neo4j from within Python applications. The library supports both Bolt and HTTP and prov

py2neo.org 1.2k Jan 02, 2023
A fast PostgreSQL Database Client Library for Python/asyncio.

asyncpg -- A fast PostgreSQL Database Client Library for Python/asyncio asyncpg is a database interface library designed specifically for PostgreSQL a

magicstack 5.8k Dec 31, 2022
Redis Python Client

redis-py The Python interface to the Redis key-value store. Python 2 Compatibility Note redis-py 3.5.x will be the last version of redis-py that suppo

Andy McCurdy 11k Dec 29, 2022
A Python wheel containing PostgreSQL

postgresql-wheel A Python wheel for Linux containing a complete, self-contained, locally installable PostgreSQL database server. All servers run as th

Michel Pelletier 71 Nov 09, 2022
A Python library for Cloudant and CouchDB

Cloudant Python Client This is the official Cloudant library for Python. Installation and Usage Getting Started API Reference Related Documentation De

Cloudant 162 Dec 19, 2022
aiomysql is a library for accessing a MySQL database from the asyncio

aiomysql aiomysql is a "driver" for accessing a MySQL database from the asyncio (PEP-3156/tulip) framework. It depends on and reuses most parts of PyM

aio-libs 1.5k Jan 03, 2023
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

dbd: database prototyping tool dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL d

Zdenek Svoboda 47 Dec 07, 2022
A Redis client library for Twisted Python

txRedis Asynchronous Redis client for Twisted Python. Install Install via pip. Usage examples can be found in the examples/ directory of this reposito

Dorian Raymer 127 Oct 23, 2022
aiosql - Simple SQL in Python

aiosql - Simple SQL in Python SQL is code. Write it, version control it, comment it, and run it using files. Writing your SQL code in Python programs

Will Vaughn 1.1k Jan 08, 2023
Pure-python PostgreSQL driver

pg-purepy pg-purepy is a pure-Python PostgreSQL wrapper based on the anyio library. A lot of this library was inspired by the pg8000 library. Credits

Lura Skye 11 May 23, 2022
MongoX is an async python ODM for MongoDB which is built on top Motor and Pydantic.

MongoX MongoX is an async python ODM (Object Document Mapper) for MongoDB which is built on top Motor and Pydantic. The main features include: Fully t

Amin Alaee 112 Dec 04, 2022
Pony Object Relational Mapper

Downloads Pony Object-Relational Mapper Pony is an advanced object-relational mapper. The most interesting feature of Pony is its ability to write que

3.1k Jan 04, 2023
PubMed Mapper: A Python library that map PubMed XML to Python object

pubmed-mapper: A Python Library that map PubMed XML to Python object 中文文档 1. Philosophy view UML Programmatically access PubMed article is a common ta

灵魂工具人 33 Dec 08, 2022
Pure Python MySQL Client

PyMySQL Table of Contents Requirements Installation Documentation Example Resources License This package contains a pure-Python MySQL client library,

PyMySQL 7.2k Jan 09, 2023
A tool to snapshot sqlite databases you don't own

The core here is my first attempt at a solution of this, combining ideas from browser_history.py and karlicoss/HPI/sqlite.py to create a library/CLI tool to (as safely as possible) copy databases whi

Sean Breckenridge 10 Dec 22, 2022
A database migrations tool for SQLAlchemy.

Alembic is a database migrations tool written by the author of SQLAlchemy. A migrations tool offers the following functionality: Can emit ALTER statem

SQLAlchemy 1.7k Jan 01, 2023
MinIO Client SDK for Python

MinIO Python SDK for Amazon S3 Compatible Cloud Storage MinIO Python SDK is Simple Storage Service (aka S3) client to perform bucket and object operat

High Performance, Kubernetes Native Object Storage 582 Dec 28, 2022
Pysolr — Python Solr client

pysolr pysolr is a lightweight Python client for Apache Solr. It provides an interface that queries the server and returns results based on the query.

Haystack Search 626 Dec 01, 2022