官术网_书友最值得收藏!

  • Hadoop Beginner's Guide
  • Garry Turkington
  • 240字
  • 2021-07-29 16:51:41

Chapter 4. Developing MapReduce Programs

Now that we have explored the technology of MapReduce, we will spend this chapter looking at how to put it to use. In particular, we will take a more substantial dataset and look at ways to approach its analysis by using the tools provided by MapReduce.

In this chapter we will cover the following topics:

  • Hadoop Streaming and its uses
  • The UFO sighting dataset
  • Using Streaming as a development/debugging tool
  • Using multiple mappers in a single job
  • Efficiently sharing utility files and data across the cluster
  • Reporting job and task status and log information useful for debugging

Throughout this chapter, the goal is to introduce both concrete tools and ideas about how to approach the analysis of a new data set. We shall start by looking at how to use scripting programming languages to aid MapReduce prototyping and initial analysis. Though it may seem strange to learn the Java API in the previous chapter and immediately move to different languages, our goal here is to provide you with an awareness of different ways to approach the problems you face. Just as many jobs make little sense being implemented in anything but the Java API, there are other situations where using another approach is best suited. Consider these techniques as new additions to your tool belt and with that experience you will know more easily which is the best fit for a given scenario.

主站蜘蛛池模板: 沅陵县| 平泉县| 甘孜县| 邵阳市| 封开县| 南部县| 杭锦后旗| 如皋市| 喜德县| 长宁县| 阳朔县| 昌乐县| 鄂尔多斯市| 襄垣县| 博野县| 吴旗县| 九龙城区| 张掖市| 神农架林区| 朝阳区| 察隅县| 隆尧县| 中卫市| 新源县| 石门县| 通州市| 贡山| 莱阳市| 辽中县| 晋中市| 顺义区| 邯郸市| 安仁县| 东兰县| 玉溪市| 翼城县| 寿宁县| 卢氏县| 绥化市| 黄山市| 广昌县|