官术网_书友最值得收藏!

Reading the training dataset

There is a Cryotherapy.xlsx Excel file, which contains data as well as data usage agreement texts. So, I just copied the data and saved it in a CSV file named Cryotherapy.csv. Let's start by creating SparkSession—the gateway to access Spark:

val spark = SparkSession
.builder
.master("local[*]")
.config("spark.sql.warehouse.dir", "/temp")
.appName("CryotherapyPrediction")
.getOrCreate()

import spark.implicits._

Then let's read the training set and see a glimpse of it:

var CryotherapyDF = spark.read.option("header", "true")
.option("inferSchema", "true")
.csv("data/Cryotherapy.csv")

Let's take a look to see if the preceding CSV reader managed to read the data properly, including header and types:

CryotherapyDF.printSchema()

As seen from the following screenshot, the schema of the Spark DataFrame has been correctly identified. Also, as expected, all the features of my ML algorithms are numeric (in other words, in integer or double format):

A snapshot of the dataset can be seen using the show() method. We can limit the number of rows; here, let's say 5:

CryotherapyDF.show(5)

The output of the preceding line of code shows the first five samples of the DataFrame:

主站蜘蛛池模板: 仁化县| 齐河县| 伽师县| 利川市| 乌鲁木齐市| 水富县| 民和| 平凉市| 桐柏县| 洞头县| 景洪市| 柳河县| 平塘县| 花莲市| 麟游县| 墨脱县| 灵宝市| 大安市| 湘潭县| 瓮安县| 民县| 电白县| 张掖市| 平潭县| 溆浦县| 万全县| 洛宁县| 临潭县| 前郭尔| 石首市| 科尔| 黄大仙区| 安溪县| 冀州市| 龙南县| 新蔡县| 阿克苏市| 含山县| 平江县| 商丘市| 河曲县|