書名： Scala Machine Learning Projects
作者名： Md. Rezaul Karim
本章字數： 163字
更新時間： 2021-06-30 19:05:43

Population scale clustering and geographic ethnicity

Next-generation genome sequencing (NGS) reduces overhead and time for genomic sequencing, leading to big data production in an unprecedented way. In contrast, analyzing this large-scale data is computationally expensive and increasingly becomes the key bottleneck. This increase in NGS data in terms of number of samples overall and features per sample demands solutions for massively parallel data processing, which imposes extraordinary challenges on machine learning solutions and bioinformatics approaches. The use of genomic information in medical practice requires efficient analytical methodologies to cope with data from thousands of individuals and millions of their variants.

One of the most important tasks is the analysis of genomic profiles to attribute individuals to specific ethnic populations, or the analysis of nucleotide haplotypes for disease susceptibility. The data from the 1000 Genomes project serves as the prime source to analyze genome-wide single nucleotide polymorphisms (SNPs) at scale for the prediction of the individual's ancestry with regards to continental and regional origins.

官术网_书友最值得收藏!

Scala Machine Learning Projects

Population scale clustering and geographic ethnicity