官术网_书友最值得收藏!

Data ingestion

Data ingestion refers to the process of procuring data into the system. This can be done via manual, semi-automatic, or automatic methods. 

Data ingestion means the process of getting the data into the data system that we are building or using.

In a smaller system, users prefer to have some kind of web form or visual interface that takes input in order to put the data into the system. However, when it comes to a larger system, such as a hospital management system, an airline management system, a government and public record management system, or a social media site, users often prefer to automate the data ingestion process as much as possible. So, when it comes to data ingestion, we need to explore a bunch of questions, such as the following:

  • How many data sources are there?
  • How many large data items are available?
  • Will the number of data sources grow over time?
  • What is the rate at which data will be consumed?

It is quite important to note that the size of an individual record is small, but the volume of data is quite enormous. When it comes to data ingestion, developers like to create a bunch of policies, called ingestion policies, that guide the handling of errors during the data ingestion, as well as the data incompleteness, and so on. Data ingestion (along with its policies) is an integral part of a big data system.

主站蜘蛛池模板: 青冈县| 清苑县| 长岛县| 平阳县| 张家港市| 铁力市| 北票市| 托克托县| 江川县| 宣城市| 廊坊市| 桐梓县| 柳林县| 丰顺县| 黑河市| 文昌市| 探索| 胶州市| 恩平市| 四子王旗| 乡城县| 灵寿县| 新余市| 岑巩县| 塘沽区| 醴陵市| 施秉县| 白水县| 辽宁省| 宿州市| 吉安县| 庆元县| 焦作市| 贺兰县| 西乌珠穆沁旗| 攀枝花市| 留坝县| 乡城县| 郁南县| 青神县| 枣阳市|