官术网_书友最值得收藏!

Spark shell

We will go back into our Spark folder, which is spark-2.3.2-bin-hadoop2.7, and start our PySpark binary by typing .\bin\pyspark.

We can see that we've started a shell session with Spark in the following screenshot:

Spark is now available to us as a spark variable. Let's try a simple thing in Spark. The first thing to do is to load a random file. In each Spark installation, there is a README.md markdown file, so let's load it into our memory as follows:

text_file = spark.read.text("README.md")

If we use spark.read.text and then put in README.md, we get a few warnings, but we shouldn't be too concerned about that at the moment, as we will see later how we are going to fix these things. The main thing here is that we can use Python syntax to access Spark.

What we have done here is put README.md as text data read by spark into Spark, and we can use text_file.count() can get Spark to count how many characters are in our text file as follows:

text_file.count()

From this, we get the following output:

103

We can also see what the first line is with the following:

text_file.first()

We will get the following output:

Row(value='# Apache Spark')

We can now count a number of lines that contain the word Spark by doing the following:

lines_with_spark = text_file.filter(text_file.value.contains("Spark"))

Here, we have filtered for lines using the filter() function, and within the filter() function, we have specified that text_file_value.contains includes the word "Spark", and we have put those results into the lines_with_spark variable.

We can modify the preceding command and simply add .count(), as follows: 

text_file.filter(text_file.value.contains("Spark")).count()

We will now get the following output:

20

We can see that 20 lines in the text file contain the word Spark. This is just a simple example of how we can use the Spark shell.

主站蜘蛛池模板: 凉城县| 中山市| 浦北县| 潢川县| 稷山县| 娱乐| 邯郸市| 和政县| 黔江区| 金坛市| 甘南县| 康定县| 昌邑市| 陵水| 渝北区| 铜山县| 南阳市| 日土县| 定西市| 郎溪县| 建始县| 韩城市| 静乐县| 滨海县| 高陵县| 凉山| 乐都县| 西藏| 安平县| 通河县| 鄂托克前旗| 城步| 通辽市| 宜州市| 始兴县| 罗江县| 灵台县| 独山县| 玉门市| 赫章县| 石门县|