官术网_书友最值得收藏!

ADAM for large-scale genomics data processing

Analyzing DNA and RNA sequencing data requires large-scale data processing to interpret the data according to its context. Excellent tools and solutions have been developed at academic labs, but often fall short on scalability and interoperability. By this means, ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark and Parquet.

However, large-scale data processing solutions such as ADAM-Spark can be applied directly to the output data from a sequencing pipeline, that is, after quality control, mapping, read preprocessing, and variant quantification using single sample data. Some examples are DNA variants for DNA sequencing, read counts for RNA sequencing, and so on.

See more at http://bdgenomics.org/ and the related publication: Massie, Matt and Nothaft, Frank et al., ADAM: Genomics Formats and Processing Patterns for Cloud Scale Computing, UCB/EECS-2013-207, EECS Department, University of California, Berkeley.

In our study, ADAM is used to achieve the scalable genomics data analytics platform with support for the VCF file format so that we can transform genotype-based RDD into a Spark DataFrame.

主站蜘蛛池模板: 左权县| 灵丘县| 方山县| 铜陵市| 黎川县| 东辽县| 鄂尔多斯市| 阿拉善盟| 土默特左旗| 安岳县| 浦县| 汨罗市| 云霄县| 门头沟区| 乐昌市| 新安县| 文山县| 双鸭山市| 旌德县| 衡东县| 凤凰县| 夏河县| 方城县| 宾川县| 台南县| 盐城市| 大田县| 罗平县| 湘潭县| 南召县| 孟津县| 思南县| 阳朔县| 仁化县| 顺义区| 青海省| 沙田区| 邯郸市| 襄汾县| 敦化市| 平南县|