财神捕鱼官网机灵系统

書名： Effective Business Intelligence with QuickSight
作者名： Rajesh Nadipalli
本章字數： 527字
更新時間： 2021-07-09 19:28:09

AWS big data ecosystem

Amazon's big data ecosystem has several software services that enable business insights from data. These services can be broadly classified into four major categories - Collect, Store, Analyze, and Orchestrate, as shown in the following diagram:

Figure 2.1: AWS big data ecosystem

Let's look at each category in detail.

Collect

The first step for any BI initiative is to collect data from external systems to Amazon for which AWS has the following services:

Direct connect: With direct connect, you can establish private connectivity between AWS and your enterprise data center and provide an easy way to move data files from your applications to AWS for analysis
Snowball: Snowball (also known as Import/Export) lets you import hundreds of terabytes of data quickly into AWS using Amazon-provided, secure appliances for secure transport
Kinesis and Kinesis Firehose: Kinesis services enable building custom applications that process or analyze streaming data

Store

The data collected needs to be stored and Amazon offers several options, which you can pick and choose, based on latency and budget requirements. Following is a summary:

S3: Amazon Simple Storage Service (S3) can be used to store and retrieve any amount of data. It is an object store and very reliable.
Glacier: Glacier is an extremely low-cost storage service that provides secure, durable, and flexible storage for data backup and archival with low cost (1 cent per GB per month).
RDS and Aurora: RDS services enables easy setup for the most commonly used relational databases in AWS including Oracle, MySQL, SQLServer, and Postgres and manages the time-consuming administration tasks of backup. The Aurora service is a MySQL compatible service at a fraction of the RDS cost.
Redshift: The Redshift service provides a fast, full-managed data warehouse for a low cost ($1,000 per TB per year).

Analyze

Once data is in Amazon, we have several options to analyze data. Following is a summary:

EMR: Amazon EMR provides a managed Hadoop framework that makes it an easy, fast, and cost-effective way to process a vast amount of data at scale and on-demand.
Machine learning: Machine learning provides visualization tools and wizards for creating machine learning models and execute them on your big data.
QuickSight: QuickSight is the fast, cloud-powered BI service and the theme of this book.
Athena: It is a query service that makes it easy to analyze data directly from files in S3 using standard SQL statements. Athena is server-less, which makes it really stand out since there is no additional infrastructure to be provisioned.

Orchestrate

To move, orchestrate, and integrate data between the various AWS stores, Amazon has two key products; Data Pipeline and Glue. The following is a summary of these products:

Data Pipeline: Amazon Data Pipeline allows reliable data movement from different AWS compute and storage services, as well as on-premise data sources at specified intervals.
Glue: Glue is a fully managed ETL service (launched Dec 2016) with a data catalog. It crawls data sources, identifies data formats, allows transformations to be built using an IDE, and schedules these jobs.

This completes the AWS big data ecosystem overview. Next, let's look at how to onboard data to QuickSight in detail.

官术网_书友最值得收藏!

Effective Business Intelligence with QuickSight

AWS big data ecosystem

Collect

Store

Analyze

Orchestrate