書名: PySpark Cookbook作者名: Denny Lee Tomasz Drabas本章字數: 56字更新時間: 2021-06-18 19:06:41
Getting ready
As in the previous sections, let's make use of the flights dataset and create an RDD and a DataFrame against this dataset:
## Create flights RDD
flights = sc.textFile('/databricks-datasets/flights/departuredelays.csv')\
.map(lambda line: line.split(","))\
.zipWithIndex()\
.filter(lambda (row, idx): idx > 0)\
.map(lambda (row, idx): row)
# Create flightsDF DataFrame
flightsDF = spark.read\
.options(header='true', inferSchema='true')
.csv('~/data/flights/departuredelays.csv')
flightsDF.createOrReplaceTempView("flightsDF")
推薦閱讀
- .NET之美:.NET關鍵技術深入解析
- Rust編程:入門、實戰與進階
- C# 2012程序設計實踐教程 (清華電腦學堂)
- Getting Started with CreateJS
- Building a Recommendation Engine with Scala
- Unity Shader入門精要
- Mastering JavaScript Design Patterns(Second Edition)
- Working with Odoo
- 快人一步:系統性能提高之道
- 零基礎趣學C語言
- Oracle GoldenGate 12c Implementer's Guide
- Python機器學習之金融風險管理
- .NET Standard 2.0 Cookbook
- JavaScript動態網頁編程
- 后臺開發:核心技術與應用實踐