官术网_书友最值得收藏!

Learning from Big Data

In the first two chapters, we set the context for intelligent machines with the big data revolution and how big data is fueling rapid advances in artificial intelligence. We also emphasized the need for a global vocabulary for universal knowledge representation. We have also seen how that need is fulfilled with the use of ontologies and how ontologies help construct a semantic view of the world.

The quest is for the knowledge, which is derived from information, which is in turn derived from the vast amounts of data that we are generating. Knowledge facilitates a rational decision-making process for machines that complements and augments human capabilities. We have seen how the Resource Description Framework (RDF) provides the schematic backbone for the knowledge assets along with Web Ontology Language (OWL) fundamentals and the query language for RDFs (SPARQL).

In this chapter, we are going to look at some of the basic concepts of machine learning and take a deep pe into some of the algorithms. We will use Spark's machine learning libraries. Spark is one of the most popular computer frameworks for the implementation of algorithms and as a generic computation engine on big data. Spark fits into the big data ecosystem well, with a simple programming interface, and very effectively leverages the power of distributed and resilient computing frameworks. Although this chapter does not assume any background with statistics and mathematics, it will greatly help if the reader has some programming background, in order to understand the code snippets and to try and experiment with the examples.

In this chapter, we will see broad categories of machine learning in supervised and unsupervised learning, before taking a deep pe, with examples, into:

  • Regression analysis
  • Data clustering
  • K-means
  • Data dimensionality reduction
  • Singular value decomposition
  • Principal component analysis (PCA)

In the end, we will have an overview of the Spark programming model and Spark's Machine Learning library (Spark MLlib). With all this background knowledge at our disposal, we will implement a recommendation system to conclude this chapter. 

主站蜘蛛池模板: 勃利县| 防城港市| 康平县| 博爱县| 贡嘎县| 疏附县| 阳高县| 抚宁县| 凤阳县| 南充市| 兴山县| 伊春市| 中江县| 蒲江县| 石家庄市| 黔南| 砀山县| 峨山| 西平县| 洪雅县| 凌海市| 虞城县| 醴陵市| 甘洛县| 芒康县| 平定县| 临高县| 锡林郭勒盟| 商水县| 民勤县| 磐安县| 丰宁| 南投市| 裕民县| 定州市| 阿瓦提县| 岗巴县| 山丹县| 凯里市| 什邡市| 麟游县|