官术网_书友最值得收藏!

Sampling data with Spark SQL APIs

Often, we need to visualize individual data points to understand the nature of our data. Statisticians use sampling techniques extensively for data analysis. Spark supports both approximate and exact sample generation. Approximate sampling is faster and is often good enough in most cases.

In this section, we will explore Spark SQL APIs used for generating samples. We will work through some examples of generating approximate and exact stratified samples, with and without replacement, using the DataFrame/Dataset API and RDD-based methods.

主站蜘蛛池模板: 塘沽区| 九江市| 泾阳县| 古蔺县| 东乡族自治县| 馆陶县| 中山市| 陈巴尔虎旗| 兴安盟| 威远县| 和顺县| 永春县| 临朐县| 桃园县| 杭州市| 日照市| 仁怀市| 离岛区| 泽州县| 龙江县| 金门县| 陆良县| 涿鹿县| 上饶县| 新竹市| 瑞丽市| 义乌市| 侯马市| 台南县| 三都| 巨鹿县| 保定市| 石阡县| 东至县| 东乌| 罗定市| 阜南县| 南安市| 甘肃省| 楚雄市| 巨野县|