官术网_书友最值得收藏!

  • PySpark Cookbook
  • Denny Lee Tomasz Drabas
  • 226字
  • 2021-06-18 19:06:33

There's more...

Now that we have everything in place, let's see what this can do. 

First, start Jupyter (note that we do not use the pyspark command):

jupyter notebook

You should now be able to see the following options if you want to add a new notebook:

If you click on PySpark, it will open a notebook and connect to a kernel. 

There are a number of available magics to interact with the notebooks; type %%help to list them all. Here's the list of the most important ones:

Once you have configured your session, you will get information back from Livy about the active sessions that are currently running:

Let's try to create a simple data frame using the following code:

from pyspark.sql.types import *

# Generate our data
ListRDD = sc.parallelize([
(123, 'Skye', 19, 'brown'),
(223, 'Rachel', 22, 'green'),
(333, 'Albert', 23, 'blue')
])

# The schema is encoded using StructType
schema = StructType([
StructField("id", LongType(), True),
StructField("name", StringType(), True),
StructField("age", LongType(), True),
StructField("eyeColor", StringType(), True)
])

# Apply the schema to the RDD and create DataFrame
drivers = spark.createDataFrame(ListRDD, schema)

# Creates a temporary view using the data frame
drivers.createOrReplaceTempView("drivers")

Once you execute the preceding code in a cell inside the notebook, only then will the SparkSession be created:

If you execute %%sql magic, you will get the following:

主站蜘蛛池模板: 清水县| 宁波市| 舒兰市| 睢宁县| 丰镇市| 乌鲁木齐县| 太湖县| 神池县| 奎屯市| 囊谦县| 喀喇| 清镇市| 信阳市| 奉贤区| 丁青县| 双桥区| 铅山县| 靖州| 浮山县| 易门县| 涟水县| 阆中市| 东台市| 泾阳县| 长沙市| 肥乡县| 武功县| 平阳县| 揭阳市| 怀宁县| 治县。| 宜州市| 侯马市| 通江县| 页游| 汾西县| 武宁县| 尉犁县| 嘉荫县| 巴彦淖尔市| 彭山县|