官术网_书友最值得收藏!

Enterprise Data Warehouse architecture

Over the last 3 decades, organizations have built EDW that consolidates data from various sources across the organization to enable business decisions, typically, related to current operational metrics and future what-if analysis for strategy decisions.

The following figure shows you a typical EDW architecture and also shows how information flows from the various source systems to the hands of business users:

Enterprise Data Warehouse architecture

Let's take a look at the stack from bottom to top.

Source systems

Typical data sources for an EDW are as follows:

  • OLTP databases: These databases store data for transactional systems such as customer relationship management (CRM), Enterprise resource planning (ERP), including manufacturing, inventory, shipping, and others.
  • XML and Text Files: Data is also received in the form of text files, which are generally delimited, or XML, or some other fixed format known within the organization.

Data warehouse

A data warehouse has two key subcomponents: storage and processing. Let's review these in detail.

Storage

The following are the key data stores for EDW:

  • EDW: This is the heart of the complete architecture and is a relational database that hosts data from disparate sources in a consistent format such as base facts and dimensions. It is organized by the subject area/domain and preserves history for several years to enable analytics, trends, and ad hoc queries. An EDW infrastructure needs to be robust and scalable to meet the business continuity and growth requirements.
  • Data marts: Each data mart is a relational database and is a subset of EDW typically, focusing on one subject area such as finance. It queries base facts from EDW and builds summarized facts and stores them as star or snowflake dimensional models.
  • MDM: Master data management or MDM is a relational database that stores reference data to ensure consistent reporting across various business units of an organization. Common MDM datasets include products, customers, and accounts. MDM systems require governance to ensure reporting from various data marts that can be correlated and are consistent.

Processing

The following are the key processing mechanisms for EDW:

  • ETL: Extract, Transform, and Load is a standard data warehouse design pattern that has three key steps: extract from various sources, transform to cleanse, and convert data to the information that is then loaded to various data marts for reporting. There are several tools in the marketplace such as Microsoft SQL Server Integration Services, Informatica, Pentaho, and others. ETL workflows are scheduled typically at daily frequency to update EDW facts and dimensions.
  • SQL-based stored procedures: This is an alternative to using ETL tools and transform data natively using database features. Most relational databases provide custom stored procedure capabilities such as SQL Server, Oracle, and IBM.

User access

The following are typically used access mechanisms:

  • BI dashboard: Business intelligence tools access the data from data marts to provide key performance indicators (KPI), scorecards, and dashboards. They allow the business to look at historical trends, current operational performance, perform what-if analysis, and predict future trends. There are several tools in the marketplace, including Microsoft SQL Server Reporting Services, Microsoft Power BI, Oracle Business Intelligence Enterprise Edition (OBIEE), SAP BusinessObjects, Tableau, and so on.
  • Ad hoc analysis: IT organizations have realized the need to provide business with direct access to certain data for discovery and ad hoc analysis. Excel and several reporting tools fit this need.
  • Operational reports: These are the day-to-day canned reports required to run the businesses such as daily sales collections, and customer support tickets opened and closed for the current day. These reports are generally required to be near real time and are based on one source system such as customer help desk system. There are several reporting systems such as SQL Server Reporting Services, IBM Cognos, and others that fit this need.
  • Analytics: Analytical reports look for trends such as how is my customer satisfaction trending over time and the average operational overhead in dollars for support. These reports typically have a one day/week refresh cycle and collect information from multiple sources such as help desk system and PPM. Business intelligence tools typically fit this need.

Provisioning and monitoring

This area of the architecture is responsible for the following functions:

  • Managing deployments of ETL code across various environments: development, test, and production
  • Monitoring ingestions and jobs to ensure service-level agreements are met
  • Operating procedures to recover in case of a failure

Data governance and security

Effective data governance in an enterprise ensures that data is consistent across departments and requires data stewards that identify and enforce the system of records. This requires tools, processes, and people to have high-quality data.

Security on the enterprise data warehouse affects all layers of the architecture and ensures that the right people have the right access to data. There are several technologies that enable fine-grained access at database level, table level, and row level. The filesystem can be encrypted to provide additional level of security.

Pain points of EDW

Based on an IDC research, by 2020, the digital universe will reach 40 ZB (zettabyte), which is a 50-fold growth from the beginning of 2010. (Reference: http://www.emc.com/about/news/press/2012/20121211-01.htm). Current IT data architectures based on EDW were not designed to handle this amount of data and are being stretched. The following are the key pain points of EDW:

  • Scale: A data warehouse was built to handle data in terabytes (TBs) and currently business needs are reaching petabytes (PB). Typically, data warehouse DBAs archive data older than a certain date window like 5 years to address this issue. This data can be a useful information for long-term trends.
  • Cost: EDW is typically an appliance-based model with proprietary engineered hardware and software supported by vendors such as Teradata, Oracle, IBM, and others. Upgrades are expensive and typically require all servers to be identical.
  • Unstructured data: Data warehouses struggle to handle unstructured data such as logfiles, social media, and machine data.
  • Timeliness: To produce business insights, data has to go through several transformations. Current EDW architectures are unable to meet growth and business demands for new insights on their data.
主站蜘蛛池模板: 郓城县| 徐州市| 深州市| 大新县| 腾冲县| 长宁区| 横山县| 商丘市| 克拉玛依市| 松潘县| 亳州市| 濮阳市| 玉溪市| 凤庆县| 榆树市| 陆河县| 炉霍县| 永宁县| 榆树市| 且末县| 海原县| 临武县| 肥东县| 澄城县| 营口市| 伊宁市| 余江县| 江油市| 万年县| 建宁县| 苍梧县| 花莲市| 阳城县| 建湖县| 姜堰市| 明光市| 皋兰县| 三都| 玉田县| 综艺| 龙口市|