官术网_书友最值得收藏!

Data Transforming and Cleaning with T-SQL

`Data comes from a wide range of sources. It can be relational or non-relational, the connectivity can be unstable, and there are also many other issues when data has to be extracted from data sources. This is why developers, statisticians, and data scientists should never entirely believe in the quality of the source data. This chapter explains the techniques for data transformation and cleansing using Transact-SQL (T-SQL) language.

The following topics will be covered in this chapter:

  • The need for data transformation: This section presents the main goal of data transformation for data science purposes and, using examples, also provides several cases of what could happen to incoming data.
  • Database architectures for data transformations: Data transformations can vary from very simple to very complex. That's why it's necessary to find the right architecture to find the most reliable set of transform tasks.
  • Transforming data: This includes accuracy checks, deduplication, high-watermark for incremental loads, and so on. There are also many other actions that could be seen as transformations.
  • Denormalizing data: As a lot of data comes from relational databases, its format is strongly normalized. Denormalization is a part of data transformation, which is useful for fitting data better for analytical purposes.
  • Using views and stored procedures: Views and stored procedures are very common database objects. This is the same when these objects are used for data transformations.
  • Performance considerations: It would not be feasible to transform data longer than the analysis itself is executed. Another aspect of performance is the impact on source systems. That's why it's very important to be aware of data transformation performance.
主站蜘蛛池模板: 闵行区| 黄山市| 济宁市| 白玉县| 昭觉县| 西华县| 大余县| 城固县| 类乌齐县| 宝清县| 东莞市| 澎湖县| 原阳县| 墨脱县| 蚌埠市| 顺平县| 桐乡市| 汽车| 宁津县| 安泽县| 宜宾市| 鲜城| 菏泽市| 吐鲁番市| 江川县| 长春市| 廉江市| 三明市| 股票| 石城县| 游戏| 平山县| 博野县| 安阳市| 沁阳市| 长岛县| 台北市| 七台河市| 常德市| 栖霞市| 萝北县|