官术网_书友最值得收藏!

What this book covers

Chapter 1, The Era of "Big Data", gently introduces the concept of Big Data, the growing landscape of large-scale analytics tools, and the origins of R programming language and the statistical environment.

Chapter 2, Introduction to R Programming Language and Statistical Environment, explains the most essential data management and processing functions available to R users. This chapter also guides you through various methods of Exploratory Data Analysis and hypothesis testing in R, for instance, correlations, tests of differences, ANOVAs, and Generalized Linear Models.

Chapter 3, Unleashing the Power of R From Within, explores possibilities of using R language for large-scale analytics and out-of-memory data on a single machine. It presents a number of third-party packages and core R methods to address traditional limitations of Big Data processing in R.

Chapter 4, Hadoop and MapReduce Framework for R, explains how to create a cloud-hosted virtual machine with Hadoop and to integrate its HDFS and MapReduce frameworks with R programming language. In the second part of the chapter, you will be able to carry out a large-scale analysis of electricity meter data on a multinode Hadoop cluster directly from the R console.

Chapter 5, R with Relational Database Management Systems (RDBMSs), guides you through the process of setting up and deploying traditional SQL databases, for example,  SQLite, PostgreSQL and MariaDB/MySQL, which can be easily integrated with their current R-based data analytics workflows. The chapter also provides detailed information on how to build and benefit from a highly scalable Amazon Relational Database Service instance and query its records directly from R.

Chapter 6, R with Non-Relational (NoSQL) Databases, builds on the skills acquired in the previous chapters and allows you to connect R with two popular nonrelational databases a.) a fast and user-friendly MongoDB installed on a Linux-run virtual machine, and b.) HBase database operated on a Hadoop cluster run as part of the Azure HDInsight service.

Chapter 7, Faster than Hadoop: Spark with R, presents a practical example and a detailed explanation of R integration with the Apache Spark framework for faster Big Data manipulation and analysis. Additionally, the chapter shows how to use Hive database as a data source for Spark on a multinode cluster with Hadoop and Spark installed.

Chapter 8, Machine Learning Methods for Big Data in R, takes you on a journey through the most cutting-edge predictive analytics available in R. Firstly, you will perform fast and highly optimized Generalized Linear Models using Spark MLlib library on a multinode Spark HDInsight cluster. In the second part of the chapter, you will implement Na?ve Bayes and multilayered Neural Network algorithms using R’s connectivity with H2O-an award-winning, open source, big data distributed machine learning platform.

Chapter 9, The Future of R: Big, Fast and Smart Data, wraps up the contents of the earlier chapters by discussing potential areas of development for R language and its opportunities in the landscape of emerging Big Data tools.

Online Chapter, Pushing R Further, available at https://www.packtpub.com/sites/default/files/downloads/5396_6457OS_ PushingRFurther.pdf, enables you to configure and deploy their own scaled-up and Cloud-based virtual machine with fully operational R and RStudio Server installed and ready to use.

主站蜘蛛池模板: 汶上县| 广安市| 故城县| 浑源县| 班戈县| 白山市| 江达县| 体育| 聊城市| 科技| 建湖县| 嵩明县| 漠河县| 钟山县| 辛集市| 昆山市| 徐闻县| 高雄县| 民县| 平邑县| 西安市| 大邑县| 岐山县| 厦门市| 武穴市| 壤塘县| 商城县| 新巴尔虎右旗| 建平县| 大石桥市| 辛集市| 鄯善县| 乐业县| 阳春市| 西平县| 安平县| 康马县| 南靖县| 孝感市| 西盟| 霍邱县|