書名： PySpark Cookbook
作者名： Denny Lee Tomasz Drabas
本章字數： 56字
更新時間： 2021-06-18 19:06:41

Getting ready

As in the previous sections, let's make use of the flights dataset and create an RDD and a DataFrame against this dataset:

## Create flights RDD
flights = sc.textFile('/databricks-datasets/flights/departuredelays.csv')\
  .map(lambda line: line.split(","))\
  .zipWithIndex()\
  .filter(lambda (row, idx): idx > 0)\
  .map(lambda (row, idx): row)

# Create flightsDF DataFrame
flightsDF = spark.read\
  .options(header='true', inferSchema='true')
  .csv('~/data/flights/departuredelays.csv')
flightsDF.createOrReplaceTempView("flightsDF")

官术网_书友最值得收藏!

PySpark Cookbook

Getting ready