- Hands-On Big Data Analytics with PySpark
- Rudy Lai Bart?omiej Potaczek
- 96字
- 2021-06-24 15:52:33
Getting Your Big Data into the Spark Environment Using RDDs
Primarily, this chapter will provide a brief overview of how to get your big data into the Spark environment using resilient distributed datasets (RDDs). We will be using a wide array of tools to interact with and modify this data so that useful insights can be extracted. We will first load the data on Spark RDDs and then carry out parallelization with Spark RDDs.
In this chapter, we will cover the following topics:
- Loading data onto Spark RDDs
- Parallelization with Spark RDDs
- Basics of RDD operation
推薦閱讀
- 企業(yè)數(shù)字化創(chuàng)新引擎:企業(yè)級PaaS平臺HZERO
- 數(shù)據(jù)庫應(yīng)用實(shí)戰(zhàn)
- Java Data Science Cookbook
- Architects of Intelligence
- Dependency Injection with AngularJS
- Scratch 3.0 藝術(shù)進(jìn)階
- Hadoop 3.x大數(shù)據(jù)開發(fā)實(shí)戰(zhàn)
- 大數(shù)據(jù)技術(shù)入門
- 大數(shù)據(jù)架構(gòu)商業(yè)之路:從業(yè)務(wù)需求到技術(shù)方案
- Oracle高性能SQL引擎剖析:SQL優(yōu)化與調(diào)優(yōu)機(jī)制詳解
- 數(shù)據(jù)庫與數(shù)據(jù)處理:Access 2010實(shí)現(xiàn)
- Internet of Things with Python
- 利用Python進(jìn)行數(shù)據(jù)分析(原書第2版)
- Python 3爬蟲、數(shù)據(jù)清洗與可視化實(shí)戰(zhàn)
- 數(shù)據(jù)中臺實(shí)戰(zhàn):手把手教你搭建數(shù)據(jù)中臺