官术网_书友最值得收藏!

  • Elasticsearch for Hadoop
  • Vishal Shukla
  • 224字
  • 2021-07-09 21:34:30

Chapter 2. Getting Started with ES-Hadoop

Hadoop provides you with a batch-oriented distributed storage and a computing engine. Elasticsearch is a full-text search engine with rich aggregation capabilities. Getting the data from Hadoop to Elasticsearch can open doors to run some data discovery tools to find out interesting patterns and perform full-text search or geospatial analytics. ES-Hadoop is a library that bridges Hadoop with Elasticsearch. The goal of this book is to get you up-and-running with ES-Hadoop and enable you to solve real-world analytics problems.

Our goal in this chapter is to develop MapReduce jobs to write/read the data to/from Elasticsearch. You probably already know how to write basic MapReduce jobs using Hadoop that writes its output to HDFS. ES-Hadoop is a connector library that provides a dedicated InputFormat and OutputFormat that you can use to read/write data from/to Elasticsearch in Hadoop jobs. To take the first step in this direction, we will start with how to set up Hadoop, Elasticsearch, and the related toolsets, which you will use throughout the rest of the book.

We encourage you to try the examples in the book to speed up the learning process.

We will cover the following topics in this chapter:

  • Understanding the WordCount program
  • Going real—network monitoring data
  • Writing a network logs mapper job
  • Getting data from Elasticsearch to HDFS
主站蜘蛛池模板: 金门县| 高密市| 四平市| 三原县| 五寨县| 兴业县| 池州市| 棋牌| 郸城县| 页游| 法库县| 阳信县| 达州市| 综艺| 璧山县| 保德县| 承德县| 桐庐县| 贵南县| 庆云县| 建水县| 涿州市| 临朐县| 铁力市| 贵州省| 陇西县| 临西县| 申扎县| 黄平县| 湘乡市| 金溪县| 札达县| 道孚县| 平塘县| 镇雄县| 卓尼县| 吉水县| 沛县| 双辽市| 临高县| 唐河县|