舉報

會員
Apache Hadoop 3 Quick Start Guide
ApacheHadoopisawidelyuseddistributeddataplatform.Itenableslargedatasetstobeefficientlyprocessedinsteadofusingonelargecomputertostoreandprocessthedata.ThisbookwillgetyoustartedwiththeHadoopecosystem,andintroduceyoutothemaintechnicaltopics,includingMapReduce,YARN,andHDFS.ThebookbeginswithanoverviewofbigdataandApacheHadoop.Then,youwillsetupapseudoHadoopdevelopmentenvironmentandamulti-nodeenterpriseHadoopcluster.Youwillseehowtheparallelprogrammingparadigm,suchasMapReduce,cansolvemanycomplexdataprocessingproblems.Thebookalsocoverstheimportantaspectsofthebigdatasoftwaredevelopmentlifecycle,includingqualityassuranceandcontrol,performance,administration,andmonitoring.YouwillthenlearnabouttheHadoopecosystem,andtoolssuchasKafka,Sqoop,Flume,Pig,Hive,andHBase.Finally,youwilllookatadvancedtopics,includingrealtimestreamingusingApacheStorm,anddataanalyticsusingApacheSpark.Bytheendofthebook,youwillbewellversedwithdifferentconfigurationsoftheHadoop3cluster.
目錄(176章)
倒序
- coverpage
- Title Page
- Dedication
- Packt Upsell
- Why subscribe?
- Packt.com
- Contributors
- About the author
- About the reviewer
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Code in action
- Conventions used
- Get in touch
- Reviews
- Hadoop 3.0 - Background and Introduction
- How it all started
- What Hadoop is and why it is important
- How Apache Hadoop works
- Resource Manager
- Node Manager
- YARN Timeline Service version 2
- NameNode
- DataNode
- Hadoop 3.0 releases and new features
- Choosing the right Hadoop distribution
- Cloudera Hadoop distribution
- Hortonworks Hadoop distribution
- MapR Hadoop distribution
- Summary
- Planning and Setting Up Hadoop Clusters
- Technical requirements
- Prerequisites for Hadoop setup
- Preparing hardware for Hadoop
- Readying your system
- Installing the prerequisites
- Working across nodes without passwords (SSH in keyless)
- Downloading Hadoop
- Running Hadoop in standalone mode
- Setting up a pseudo Hadoop cluster
- Planning and sizing clusters
- Initial load of data
- Organizational data growth
- Workload and computational requirements
- High availability and fault tolerance
- Velocity of data and other factors
- Setting up Hadoop in cluster mode
- Installing and configuring HDFS in cluster mode
- Setting up YARN in cluster mode
- Diagnosing the Hadoop cluster
- Working with log files
- Cluster debugging and tuning tools
- JPS (Java Virtual Machine Process Status)
- JStack
- Summary
- Deep Dive into the Hadoop Distributed File System
- Technical requirements
- How HDFS works
- Key features of HDFS
- Achieving multi tenancy in HDFS
- Snapshots of HDFS
- Safe mode
- Hot swapping
- Federation
- Intra-DataNode balancer
- Data flow patterns of HDFS
- HDFS as primary storage with cache
- HDFS as archival storage
- HDFS as historical storage
- HDFS as a backbone
- HDFS configuration files
- Hadoop filesystem CLIs
- Working with HDFS user commands
- Working with Hadoop shell commands
- Working with data structures in HDFS
- Understanding SequenceFile
- MapFile and its variants
- Summary
- Developing MapReduce Applications
- Technical requirements
- How MapReduce works
- What is MapReduce?
- An example of MapReduce
- Configuring a MapReduce environment
- Working with mapred-site.xml
- Working with Job history server
- RESTful APIs for Job history server
- Understanding Hadoop APIs and packages
- Setting up a MapReduce project
- Setting up an Eclipse project
- Deep diving into MapReduce APIs
- Configuring MapReduce jobs
- Understanding input formats
- Understanding output formats
- Working with Mapper APIs
- Working with the Reducer API
- Compiling and running MapReduce jobs
- Triggering the job remotely
- Using Tool and ToolRunner
- Unit testing of MapReduce jobs
- Failure handling in MapReduce
- Streaming in MapReduce programming
- Summary
- Building Rich YARN Applications
- Technical requirements
- Understanding YARN architecture
- Key features of YARN
- Resource models in YARN
- YARN federation
- RESTful APIs
- Configuring the YARN environment in a cluster
- Working with YARN distributed CLI
- Deep dive with YARN application framework
- Setting up YARN projects
- Writing your YARN application with YarnClient
- Writing a custom application master
- Building and monitoring a YARN application on a cluster
- Building a YARN application
- Monitoring your application
- Summary
- Monitoring and Administration of a Hadoop Cluster
- Roles and responsibilities of Hadoop administrators
- Planning your distributed cluster
- Hadoop applications ports and URLs
- Resource management in Hadoop
- Fair Scheduler
- Capacity Scheduler
- High availability of Hadoop
- High availability for NameNode
- High availability for Resource Manager
- Securing Hadoop clusters
- Securing your Hadoop application
- Securing your data in HDFS
- Performing routine tasks
- Working with safe mode
- Archiving in Hadoop
- Commissioning and decommissioning of nodes
- Working with Hadoop Metric
- Summary
- Demystifying Hadoop Ecosystem Components
- Technical requirements
- Understanding Hadoop's Ecosystem
- Working with Apache Kafka
- Writing Apache Pig scripts
- Pig Latin
- User-defined functions (UDFs)
- Transferring data with Sqoop
- Writing Flume jobs
- Understanding Hive
- Interacting with Hive – CLI beeline and web interface
- Hive as a transactional system
- Using HBase for NoSQL storage
- Summary
- Advanced Topics in Apache Hadoop
- Technical requirements
- Hadoop use cases in industries
- Healthcare
- Oil and Gas
- Finance
- Government Institutions
- Telecommunications
- Retail
- Insurance
- Advanced Hadoop data storage file formats
- Parquet
- Apache ORC
- Avro
- Real-time streaming with Apache Storm
- Data analytics with Apache Spark
- Summary
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-06-10 19:19:10
推薦閱讀
- AutoCAD繪圖實用速查通典
- 三菱FX3U/5U PLC從入門到精通
- Getting Started with MariaDB
- WOW!Illustrator CS6完全自學寶典
- 西門子S7-200 SMART PLC從入門到精通
- 圖形圖像處理(Photoshop)
- 機器人智能運動規劃技術
- RPA:流程自動化引領數字勞動力革命
- Apache Spark Deep Learning Cookbook
- Ruby on Rails敏捷開發最佳實踐
- 中國戰略性新興產業研究與發展·工業機器人
- Blender 3D Printing by Example
- Godot Engine Game Development Projects
- 深度學習與目標檢測
- FPGA/CPLD應用技術(Verilog語言版)
- 電腦上網輕松入門
- R Machine Learning Projects
- Linux系統管理員工具集
- 機床電氣控制與PLC
- Linux Shell Scripting Cookbook(Third Edition)
- 企業級Web開發實戰
- 智能+:制造業的智能化轉型
- Mastering Microsoft Dynamics 365 Customer Engagement
- 面向Agent的軟件設計開發方法
- Photoshop CS6婚紗數碼照片處理達人秘笈
- Java Web開發入行真功夫
- 網絡硬件搭建與配置實踐
- 人人都應該知道的人工智能
- Excel 2007電子表格
- Photoshop CS4中文版平面設計100例