- PySpark Cookbook
- Denny Lee Tomasz Drabas
- 132字
- 2021-06-18 19:06:35
How to do it...
To quickly create an RDD, run PySpark on your machine via the bash terminal, or you can run the same query in a Jupyter notebook. There are two ways to create an RDD in PySpark: you can either use the parallelize() method—a collection (list or an array of some elements) or reference a file (or files) located either locally or through an external source, as noted in subsequent recipes.
The following code snippet creates your RDD (myRDD) using the sc.parallelize() method:
myRDD = sc.parallelize([('Mike', 19), ('June', 18), ('Rachel',16), ('Rob', 18), ('Scott', 17)])
To view what is inside your RDD, you can run the following code snippet:
myRDD.take(5)
The output is as follows:
Out[10]: [('Mike', 19), ('June', 18), ('Rachel',16), ('Rob', 18), ('Scott', 17)]
推薦閱讀
- C語言程序設計實踐教程(第2版)
- 動手玩轉Scratch3.0編程:人工智能科創教育指南
- Learning Elixir
- FreeSWITCH 1.6 Cookbook
- Learning AWS Lumberyard Game Development
- Effective Python Penetration Testing
- FPGA Verilog開發實戰指南:基于Intel Cyclone IV(進階篇)
- Java編程的邏輯
- 微服務架構深度解析:原理、實踐與進階
- 智能手機APP UI設計與應用任務教程
- 玩轉.NET Micro Framework移植:基于STM32F10x處理器
- Android技術內幕(系統卷)
- Mastering XenApp?
- Backbone.js Patterns and Best Practices
- 面向對象分析與設計(第3版)