官术网_书友最值得收藏!

Building Spark applications

Using Spark in an interactive mode with the Spark shell is very good for quick prototyping; however for developing applications, we need an IDE. The choices for the Spark IDE have come a long way since the days of Spark 1.0. One can use an array of the Spark IDEs for developing algorithms, data wrangling (that is, exploring data), and modeling analytics applications. As a general rule of thumb, iPython and Zeppelin are used for data exploration IDEs. The language of choice for iPython is Python and Scala/Java for Zeppelin. This is a general observation; all of them can handle the major languages; Scala, Java, Python, and SQL. For developing Scala and Java, the preferred IDE is Eclipse and IntelliJ. We will mostly use the Spark shell (and occasionally iPython) in this book, as our focus is data wrangling and understanding the Spark APIs. Of course, deploying Spark applications require compiling for Java and Scala.

Building the Spark jobs is a bit trickier than building a normal application as all dependencies have to be available on all the machines that are in your cluster.

In this chapter, we will first look at iPython and Eclipse, and then cover the process of building a Java and Scala Spark job with Maven, and learn to build the Spark jobs with a non-Maven aware build system. A reference website for building Spark is at http://spark.apache.org/docs/latest/building-spark.html.

主站蜘蛛池模板: 永春县| 北流市| 清水河县| 高淳县| 新晃| 恩施市| 乌兰察布市| 汽车| 米脂县| 孝感市| 汾西县| 乐业县| 措勤县| 静安区| 新乡市| 罗平县| 巨鹿县| 平南县| 宣汉县| 清流县| 株洲市| 慈利县| 兴宁市| 富顺县| 汉中市| 当雄县| 正宁县| 昭通市| 曲松县| 广宁县| 江陵县| 莱西市| 响水县| 溧水县| 山丹县| 册亨县| 方正县| 邢台县| 普定县| 温州市| 孟连|