官术网_书友最值得收藏!

SQL Server and big data

Let's face reality. SQL Server is not a big-data system. However, there's a feature on the SQL Server that allows us to interact with other big-data systems, which are deployed in the enterprise. This is huge!

This allows us to use the traditional relational data on the SQL Server and combine it with the results from the big-data systems directly or even run the queries towards the big-data systems from the SQL Server. The answer to this problem is a technology called PolyBase:

PolyBase is a bridge between SQL Server and big-data systems such as Hadoop, which can run in numerous different configurations. You can have your own Hadoop deployment, or utilize some Azure services such as HDInsight or Azure Data Lake, which are implementations of Hadoop and HDFS filesystem from the Hadoop framework. We'll get deeper into PolyBase in Chapter 4Data Sources for AnalyticsIf you would like to test drive Hadoop with SQL Server, there are several appliances ready for testing and evaluation, such as Hortonworks Data Platform or Cloudera.

You can download prebuilt virtual machines, which you can connect to from SQL Server with the PolyBase feature to evaluate how the big-data Integration is working. For Hortonworks, you can check out  https://hortonworks.com/products/data-platforms/hdp/
For Cloudera Quickstart VMs, you can check out  https://www.cloudera.com/downloads/quickstart_vms/5-13.html

Hadoop itself is external to SQL Server and is described as a collection of software tools for distributed storage and the processing of big data. The base Apache Hadoop framework is composed of the following modules:

  • Hadoop Common: Contains libraries and utilities needed by other Hadoop modules
  • Hadoop Distributed File System (HDFS): A distributed filesystem that stores data on commodity machines, providing very high aggregate bandwidth across the cluster
  • Hadoop YARN: Introduced in 2012 as a platform responsible for managing computing resources in clusters and using them for scheduling users' applications
  • Hadoop MapReduce: An implementation of the MapReduce programming model for large-scale data processing:

主站蜘蛛池模板: 新平| 新乐市| 明水县| 通榆县| 霍城县| 锡林浩特市| 江永县| 井研县| 新丰县| 右玉县| 临邑县| 新乡市| 洪洞县| 邵阳市| 阳城县| 连南| 来安县| 太保市| 宾川县| 南岸区| 阿克| 兴业县| 濮阳县| 无极县| 康乐县| 秦皇岛市| 和平县| 溧阳市| 霍林郭勒市| 夏津县| 松江区| 甘肃省| 武义县| 井陉县| 佳木斯市| 镇远县| 两当县| 宜城市| 渭源县| 内丘县| 石渠县|