官术网_书友最值得收藏!

Getting ready

As discussed in the previous recipe, we will use data from the 1,000 Genomes Project. We will use the exome alignment for chromosome 20 of female NA18489. This is just 312 MB. The whole exome alignment for this individual is 14.2 GB, and the whole genome alignment (at a low coverage of 4x) is 40.1 GB. This data is a paired-end with reads of 76 bp. This is common nowadays, but slightly more complex to process. We will take this into account. If your data is not paired, just simplify the following recipe appropriately.

As usual, if you use Notebook, the cell at the top of Chapter02/Working_with_BAM.ipynb will download the data for you. If you don't use Notebooks, get the data from our dataset list at https://github.com/PacktPublishing/Bioinformatics-with-Python-Cookbook-Second-Edition/blob/master/Datasets.ipynb. The files you will want are NA18490_20_exome.bam and NA18490_20_exome.bam.bai.

We will use pysam, a Python wrapper to the SAMtools C API. This was installed in Chapter 1, Python and the Surrounding Software Ecology.

主站蜘蛛池模板: 石景山区| 浮梁县| 田阳县| 涿鹿县| 安福县| 林口县| 绥德县| 平舆县| 周至县| 延庆县| 阳西县| 盐山县| 冀州市| 甘洛县| 沂水县| 吉隆县| 韩城市| 龙南县| 徐汇区| 行唐县| 新安县| 五家渠市| 宜兰县| 东辽县| 剑阁县| 岳池县| 松潘县| 洛南县| 开鲁县| 大港区| 鄂温| 云霄县| 府谷县| 永昌县| 南昌县| 延安市| 宁海县| 汽车| 益阳市| 开鲁县| 浪卡子县|