官术网_书友最值得收藏!

There's more...

There are many more databases at NCBI. You will probably want to check the Sequence Read Archive (SRA) database (previously known as Short Read Archive) if you are working with NGS data. The SNP database contains information on single-nucleotide polymorphisms (SNPs), whereas the protein database has protein sequences, and so on. A full list of databases in Entrez is linked in the See also section of this recipe.

Another database that you probably already know about with regard to NCBI is PubMed, which includes a list of scientific and medical citations, abstracts, and even full texts. You can also access it via Biopython. Furthermore, GenBank records often contain links to PubMed. For example, we can perform this on our previous record, as shown here:

from Bio import Medline
refs = rec.annotations['references']
for ref in refs:
if ref.pubmed_id != '':
print(ref.pubmed_id)
handle = Entrez.efetch(db='pubmed', id=[ref.pubmed_id], rettype='medline', retmode='text')
records = Medline.parse(handle)
for med_rec in records:
for k, v in med_rec.items():
print('%s: %s' % (k, v))

This will take all reference annotations, check whether they have a PubMed identifier, and then access the PubMed database to retrieve the records, parse them, and then print them.

The output per record is a Python dictionary. Note that there are many references to external databases on a typical GenBank record.

Of course, there are many other biological databases outside NCBI, such as Ensembl (http://www.ensembl.org) and UCSC Genome Bioinformatics (http://genome.ucsc.edu/). The support for many of these databases in Python will vary a lot.

An introductory recipe on biological databases would not be complete without at least a passing reference to BLAST. Basic local alignment search tool (BLAST) is an algorithm that assesses the similarity of sequences. NCBI provides a service that allows you to compare your sequence of interest against its own database. Of course, you can use have your local BLAST database instead of using NCBI's service. Biopython provides extensive support for this, but as this is too introductory, I will just refer you to the Biopython tutorial.

主站蜘蛛池模板: 武乡县| 武宁县| 调兵山市| 周宁县| 东平县| 海盐县| 二连浩特市| 鄱阳县| 滕州市| 广宗县| 嘉义市| 织金县| 鲁山县| 阿鲁科尔沁旗| 墨竹工卡县| 泸水县| 老河口市| 商都县| 海原县| 呼和浩特市| 汤阴县| 抚州市| 浦县| 始兴县| 金塔县| 云和县| 东丰县| 怀安县| 揭东县| 株洲市| 上虞市| 右玉县| 宁化县| 台北县| 永定县| 通城县| 兴仁县| 新绛县| 平和县| 尉氏县| 新乡市|