官术网_书友最值得收藏!

What is big data?

Big data is a relatively new term which has been gathering steam over the past few years. Big data is a term used for datasets that are relatively large to be stored in a traditional database system or processed by traditional data-processing pipelines. This data could be structured, semi-structured, or unstructured data. The datasets that belong to this category usually scale to terabytes or petabytes of data. Big data usually involves one or more of the following:

  • Velocity: Data moves at an unprecedented speed and must be dealt with it in a timely manner.

For example, online systems, sensors, social media, web clickstream, and so on.

  • Volume: Organizations collect data from a variety of sources, including business transactions, social media, and information from sensor or machine-to-machine data. This could involve terabytes to petabytes of data. In the past, storing it would've been a problem, but new technologies have eased the burden.
  • Variety: Data comes in all sorts of formats ranging from structured data to be stored in traditional databases to unstructured data (blobs) such as images, audio files, and text files.

These are known as the 3Vs of big data.

In addition to these, we tend to associate another term with big data:

  • Complexity: Today's data comes from multiple sources, which makes it difficult to link, match, cleanse, and transform data across systems. However, it's necessary to connect and correlate relationships, hierarchies, and multiple data linkages, or your data can quickly spiral out of control. It must be able to traverse multiple data centers, cloud, and geographical zones.
主站蜘蛛池模板: 龙泉市| 民和| 府谷县| 诏安县| 江西省| 彭山县| 贺州市| 富裕县| 台东市| 贺兰县| 天柱县| 辰溪县| 尚义县| 民丰县| 汕头市| 正蓝旗| 昌都县| 凤台县| 太白县| 无为县| 嘉黎县| 扎赉特旗| 罗定市| 天津市| 万安县| 盘山县| 怀化市| 荥经县| 山阴县| 漳州市| 青田县| 颍上县| 聊城市| 兖州市| 桂林市| 新蔡县| 饶平县| 西乌珠穆沁旗| 紫金县| 乐安县| 新河县|