- Learning Spark SQL
- Aurobindo Sarkar
- 86字
- 2021-07-02 18:23:47
Sampling data with Spark SQL APIs
Often, we need to visualize individual data points to understand the nature of our data. Statisticians use sampling techniques extensively for data analysis. Spark supports both approximate and exact sample generation. Approximate sampling is faster and is often good enough in most cases.
In this section, we will explore Spark SQL APIs used for generating samples. We will work through some examples of generating approximate and exact stratified samples, with and without replacement, using the DataFrame/Dataset API and RDD-based methods.
推薦閱讀
- 基于粒計(jì)算模型的圖像處理
- Redis Applied Design Patterns
- 零基礎(chǔ)學(xué)Java(第4版)
- WordPress Plugin Development Cookbook(Second Edition)
- Mastering AndEngine Game Development
- R大數(shù)據(jù)分析實(shí)用指南
- Learning Concurrent Programming in Scala
- Vue.js 2 Web Development Projects
- Kubernetes源碼剖析
- Extending Unity with Editor Scripting
- Android Game Programming by Example
- Data Science Algorithms in a Week
- 面向?qū)ο蟪绦蛟O(shè)計(jì)及C++(第3版)
- Learning Alfresco Web Scripts
- Splunk Essentials