- Apache Spark 2.x for Java Developers
- Sourav Gulati Sumit Kumar
- 205字
- 2021-07-02 19:02:02
Storage
If during the execution of a job, the user persists/cache an RDD then information about that RDD can be retrieved on this tab. It can be accessed at http://localhost:4040/storage/.
Let's launch Spark shell again, read a file, and run an action on it. However, this time we will cache the file before running an action on it.
Initially, when you launch Spark shell, the Storage tab appears blank.

Let's read the file using SparkContext, as follows:
scala>val file=sc.textFile("/usr/local/spark/examples/src/main/resources/people.txt")
file: org.apache.spark.rdd.RDD[String] = /usr/local/spark/examples/src/main/resources/people.txt MapPartitionsRDD[1] at textFile at <console>:24
This time we will cache this RDD. By default, it will be cached in memory:
scala>file.cache
res0: file.type = /usr/local/spark/examples/src/main/resources/people.txt MapPartitionsRDD[1] at textFile at <console>:24
As explained earlier, the DAG of transformations will only be executed when an action is performed, so the cache step will also be executed when we run an action on the RDD. So let's run a collect on it:
scala>file.collect
res1: Array[String] = Array(Michael, 29, Andy, 30, Justin, 19)
Now, you can find information about an RDD being cached on the Storage tab.

If you click on the RDD name, it provides information about the partitions on the RDD along with the address of the host on which the RDD is stored.

- Qt 5 and OpenCV 4 Computer Vision Projects
- 流量的秘密:Google Analytics網站分析與優化技巧(第2版)
- 工程軟件開發技術基礎
- 深度學習經典案例解析:基于MATLAB
- Java Web開發之道
- Learning Firefox OS Application Development
- Getting Started with Python Data Analysis
- Unity UI Cookbook
- 從零開始學Python網絡爬蟲
- JavaScript從入門到精通(視頻實戰版)
- 深度實踐KVM:核心技術、管理運維、性能優化與項目實施
- Python 3快速入門與實戰
- Improving your Penetration Testing Skills
- Learning NHibernate 4
- Learn C Programming