官术网_书友最值得收藏!

Population scale clustering and geographic ethnicity

Next-generation genome sequencing (NGS) reduces overhead and time for genomic sequencing, leading to big data production in an unprecedented way. In contrast, analyzing this large-scale data is computationally expensive and increasingly becomes the key bottleneck. This increase in NGS data in terms of number of samples overall and features per sample demands solutions for massively parallel data processing, which imposes extraordinary challenges on machine learning solutions and bioinformatics approaches. The use of genomic information in medical practice requires efficient analytical methodologies to cope with data from thousands of individuals and millions of their variants.

One of the most important tasks is the analysis of genomic profiles to attribute individuals to specific ethnic populations, or the analysis of nucleotide haplotypes for disease susceptibility. The data from the 1000 Genomes project serves as the prime source to analyze genome-wide single nucleotide polymorphisms (SNPs) at scale for the prediction of the individual's ancestry with regards to continental and regional origins.

主站蜘蛛池模板: 柏乡县| 分宜县| 兴化市| 万荣县| 伊吾县| 东乡县| 天津市| 易门县| 隆化县| 肥城市| 北辰区| 仙居县| 云龙县| 东台市| 涪陵区| 大石桥市| 沙湾县| 秭归县| 双柏县| 兴仁县| 腾冲县| 石棉县| 和林格尔县| 台山市| 赫章县| 宾川县| 定西市| 绥江县| 洪湖市| 荆门市| 桐庐县| 江阴市| 阜宁县| 渭南市| 红河县| 东宁县| 新闻| 改则县| 绍兴县| 日喀则市| 湘潭市|