官术网_书友最值得收藏!

Summary

In the first chapter we explained the ambiguity of Big Data definitions and highlighted its major features. We also talked about a deluge of Big Data sources, and mentioned that even one event, such as Messi's goal, can lead to an avalanche of large amounts of data being created almost instantaneously.

You were then introduced to some most commonly used Big Data tools we will be working with later, such as Hadoop, its Distributed File System and the parallel MapReduce framework, traditional SQL and NoSQL databases, and the Apache Spark project, which allows faster (and in many cases easier) data processing than in Hadoop.

We ended the chapter by presenting the origins of the R programming language, its gradual evolution into the most widely-used statistical computing environment, and the current position of R amongst a spectrum of Big Data analytics tools.

In the next chapter you will finally have a chance to get your hands dirty and learn, or revise, a number of frequently used functions in R for data management, transformations, and analysis.

主站蜘蛛池模板: 克拉玛依市| 攀枝花市| 南皮县| 于都县| 巴中市| 准格尔旗| 达孜县| 肇州县| 新乡县| 红安县| 拉萨市| 深水埗区| 深州市| 柏乡县| 佛冈县| 江津市| 台南县| 电白县| 长顺县| 盐边县| 大连市| 汤原县| 黄浦区| 鄂托克前旗| 新巴尔虎右旗| 十堰市| 上高县| 宜兴市| 区。| 宝鸡市| 普陀区| 务川| 淮阳县| 潞城市| 德惠市| 蕲春县| 安西县| 重庆市| 上虞市| 筠连县| 齐河县|