- Hadoop 2.x Administration Cookbook
- Gurmukh Singh
- 424字
- 2021-07-09 20:10:31
Configuring YARN history server
Whenever a MapReduce job runs, it launches containers on multiple nodes and the logs for that container are only written on that particular node. If the user needs details of the job, he needs to go to all the nodes to fetch the logs, which could be very tedious in large clusters.
A better approach will be to aggregate the logs at a common location once the job finishes and then it can be accessed using a web server or other means. To address this, History Server was introduced in Hadoop, to aggregate logs and provide a Web UI, for users to see logs for all the containers of a job at one place.
Getting ready
You need to have a running cluster with YARN set up and should have completed the previous recipe to make sure the cluster is working fine in terms of HDFS and YARN.
The following steps will guide you through the process of setting up Job history server.
How to do it...
- Connect to the ResourceManager node, which is the YARN master and switch to user
hadoop
. - Navigate to the directory
/opt/cluster/hadoop/etc/hadoop
. - Edit the
yarn-site.xml
file to add the following configurations, as shown in the upcoming steps and screenshots. - Firstly, enable
yarn.log
aggregation using the following parameter:<property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property>
- Add
jobhistory
server address. The following is the RPC configuration parameter: - Add the
jobhistory
web server address: - Configure a location to store logs on HDFS:
- Copy the
yarn-site.xml
file to all nodes in the cluster. - Start history server on the master using the following command:
$ mr-jobhistory-daemon.sh start historyserver
- Restart YARN daemons for changes to take effect, as shown next:
$ stop-yarn.sh $ start-yarn.sh
How it works...
Let's take a look at what we did throughout this recipe. In steps 1 through 7, we enabled YARN log aggregation, which is disabled by default. Then, we configured the RPC and web server ports and also the location where logs will be stored.
Whenever a container is cleaned, a log collection thread wakes up and does an upload of the logs to the configured location. The log location is similar to a web hosting directory, where the history server can publish its contents and is accessible through Web UI. There is a retention period, for how long the logs must be stored by the yarn.log-aggregation.retain-seconds
parameter.
- Mastering Matplotlib 2.x
- Mastering D3.js
- Associations and Correlations
- Windows 7寶典
- 信息物理系統(CPS)測試與評價技術
- PostgreSQL 10 Administration Cookbook
- SAP Business Intelligence Quick Start Guide
- Word 2007,Excel 2007辦公應用融會貫通
- Excel 2010函數與公式速查手冊
- 人工智能:語言智能處理
- 網絡服務器搭建與管理
- Web編程基礎
- Puppet 3 Beginner’s Guide
- 實戰Windows Azure
- 大話數據科學:大數據與機器學習實戰(基于R語言)