官术网_书友最值得收藏!

There's more...

Although it's impossible to discuss all the variations of output coming from sequencer files, paired-end reads are worth mentioning because they are common and require a different processing approach. With paired-end sequencing, both ends of a DNA fragment are sequenced with a gap in the middle (called the insert). In this case, two files will be produced: X_1.FASTQ and X_2.FASTQ. Both files will have the same order and exact same number of sequences. The first sequence will be in X_1 pairs with the first sequence of X_2, and so on. With regards to the programming technique, if you want to keep the pairing information, you might perform something like this:

f1 = gzip.open('X_1.filt.fastq.gz', 'rt, enconding='utf-8')
f2 = gzip.open('X_2.filt.fastq.gz', 'rt, enconding='utf-8')
recs1 = SeqIO.parse(f1, 'fastq')
recs2 = SeqIO.parse(f2, 'fastq')
cnt = 0
for rec1, rec2 in zip(recs1, recs2):
cnt +=1
print('Number of pairs: %d' % cnt)

The preceding code reads all pairs in order and just counts the number of pairs. You will probably want to do something more, but this exposes a dialect that is based on the Python zip function that allows you to iterate through both files simultaneously. Remember to replace X for your FASTQ prefix.

Note that the preceding code will most probably crash Python 2 as the  zip function is eager in Python 2, (that is, it will read all records before needing them). Indeed, the lazy behavior of iterators in Python 3 is one of the many features that makes it more suitable for big data analysis. If you really need to use Python 2, then consider the itertools module, which provides lazy implementations of common iterators.

Finally, if you are sequencing human genomes, you may want to use sequencing data from Complete Genomics. In this case, read the There's more section in the next recipe, where we briefly discuss Complete Genomics data.

主站蜘蛛池模板: 措美县| 水富县| 郴州市| 达拉特旗| 抚松县| 宣武区| 孝昌县| 长治县| 华阴市| 社旗县| 延长县| 伽师县| 巴青县| 霍城县| 黄浦区| 陇川县| 英德市| 华阴市| 盘山县| 大埔县| 唐海县| 道孚县| 万州区| 浦城县| 绵竹市| 安西县| 调兵山市| 徐水县| 淮滨县| 龙川县| 甘泉县| 边坝县| 安新县| 科技| 改则县| 崇文区| 恭城| 武平县| 化隆| 克什克腾旗| 定兴县|