官术网_书友最值得收藏!

There's more...

The purpose of this recipe is to get you up to speed with the PyVCF module. At this stage, you should be comfortable with the API. We will not spend too much time on usage details because this will be the main purpose of the next recipe: using the VCF module to study the quality of your variant calls.

It will probably not be a shocking revelation that PyVCF is not the fastest module on earth. The file format (highly text-based) makes processing a time-consuming task. There are two main strategies for dealing with this problem. One strategy is parallel processing, which we will discuss in the last chapter, Chapter 9, Python for Big Genomics Datasets. The second strategy is to convert to a more efficient format; we will provide an example of this in Chapter 4, Population Genetics. Note that VCF developers are working on a binary (BCF) version to deal with parts of these problems (http://www.1000genomes.org/wiki/analysis/variant-call-format/bcf-binary-vcf-version-2).

主站蜘蛛池模板: 三明市| 科技| 巨野县| 青川县| 阿图什市| 喀喇| 施甸县| 北碚区| 桐城市| 新民市| 桐梓县| 五大连池市| 仁怀市| 汕头市| 苏尼特左旗| 奉节县| 呼图壁县| 扎鲁特旗| 女性| 松原市| 株洲县| 塔河县| 长顺县| 洞头县| 桂平市| 邢台县| 禹州市| 桃江县| 吉木乃县| 交口县| 瑞昌市| 满洲里市| 唐海县| 邢台县| 奎屯市| 镇安县| 务川| 如东县| 新田县| 文成县| 五寨县|