官术网_书友最值得收藏!

The cluster structure

The size and structure of your big data cluster is going to affect performance. If you have a cloud-based cluster, your IO and latency will suffer in comparison to an unshared hardware cluster. You will be sharing the underlying hardware with multiple customers and the cluster hardware may be remote. There are some exceptions to this. The IBM cloud, for instance, offers dedicated bare metal high performance cluster nodes with an InfiniBand network connection, which can be rented on an hourly basis.

Additionally, the positioning of cluster components on servers may cause resource contention. For instance, think carefully about locating Hadoop NameNodes, Spark servers, Zookeeper, Flume, and Kafka servers in large clusters. With high workloads, you might consider segregating servers to individual systems. You might also consider using an Apache system such as Mesos that provides better distributions and assignment of resources to the individual processes.

Consider potential parallelism as well. The greater the number of workers in your Spark cluster for large Datasets, the greater the opportunity for parallelism. One rule of thumb is one worker per hyper-thread or virtual core respectively.

主站蜘蛛池模板: 宜宾市| 临泽县| 上思县| 凭祥市| 乡城县| 平遥县| 廊坊市| 扶余县| 枣强县| 桦甸市| 马公市| 武汉市| 清新县| 从化市| 平顶山市| 金华市| 滦平县| 平山县| 宜章县| 闽清县| 靖边县| 岢岚县| 辽阳县| 唐海县| 涿州市| 怀化市| 安丘市| 孙吴县| 满洲里市| 宁城县| 图们市| 盐津县| 沾益县| 高碑店市| 阿拉善右旗| 隆德县| 天柱县| 易门县| 龙口市| 洞头县| 会泽县|