官术网_书友最值得收藏!

  • Mastering Spark for Data Science
  • Andrew Morgan Antoine Amend David George Matthew Hallett
  • 144字
  • 2021-07-09 18:49:31

Summary

In this chapter, we introduced the idea of data architecture and explained how to group responsibilities into capabilities that help manage data throughout its lifecycle. We explained that all data handling requires a level of due diligence, whether this is enforced by corporate rules or otherwise, and without this, analytics and their results can quickly become invalid.

Having scoped our data architecture, we have walked through the inpidual components and their respective advantages/disadvantages, explaining that our choices are based upon collective experience. Indeed, there are always options when it comes to choosing components and their inpidual features should always be carefully considered before any commitment.

In the next chapter, we will pe deeper into how to source and capture data. We will advise on how to bring data onto the platform and discuss aspects related to processing and handling data through a pipeline.

主站蜘蛛池模板: 枝江市| 上蔡县| 双峰县| 沙坪坝区| 冕宁县| 巴彦淖尔市| 天柱县| 沙坪坝区| 乌兰察布市| 蓬莱市| 攀枝花市| 万安县| 察哈| 老河口市| 大余县| 海盐县| 怀远县| 尤溪县| 梁山县| 全南县| 祁门县| 曲靖市| 堆龙德庆县| 上饶市| 巴中市| 通化市| 谢通门县| 广水市| 阳江市| 泗阳县| 冷水江市| 汝州市| 白河县| 阳信县| 保德县| 平陆县| 连州市| 澳门| 深圳市| 丰镇市| 海城市|