官术网_书友最值得收藏!

SparkContext and SparkConf

The starting point of writing any Spark program is SparkContext (or JavaSparkContext in Java). SparkContext is initialized with an instance of a SparkConf object, which contains various Spark cluster-configuration settings (for example, the URL of the master node).

It is a main entry point for Spark functionality. A SparkContext is a connection to a Spark cluster. It can be used to create RDDs, accumulators, and broadcast variables on the cluster.

Only one SparkContext is active per JVM. You must call stop(), which is the active SparkContext, before creating a new one.

Once initialized, we will use the various methods found in the SparkContext object to create and manipulate distributed datasets and shared variables. The Spark shell (in both Scala and Python, which is unfortunately not supported in Java) takes care of this context initialization for us, but the following lines of code show an example of creating a context running in the local mode in Scala:

val conf = new SparkConf() 
.setAppName("Test Spark App")
.setMaster("local[4]")
val sc = new SparkContext(conf)

This creates a context running in the local mode with four threads, with the name of the application set to Test Spark App. If we wish to use the default configuration values, we could also call the following simple constructor for our SparkContext object, which works in the exact same way:

val sc = new SparkContext("local[4]", "Test Spark App")
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book from any other source, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
主站蜘蛛池模板: 麻栗坡县| 汝阳县| 潮安县| 获嘉县| 大同县| 兴山县| 台山市| 华坪县| 神池县| 鸡西市| 林口县| 安义县| 慈利县| 岳阳县| 东安县| 淮北市| 菏泽市| 宝丰县| 平遥县| 噶尔县| 甘洛县| 泗水县| 轮台县| 聂拉木县| 井冈山市| 瓦房店市| 威宁| 原平市| 子长县| 山阳县| 高州市| 余姚市| 黔江区| 涞源县| 江陵县| 武安市| 永安市| 海兴县| 穆棱市| 海门市| 潼南县|