官术网_书友最值得收藏!

Sources of big data

Technology today allows us to collect data at an astounding rate--both in terms of volume and variety. There are various sources that generate data, but in the context of big data, the primary sources are as follows:

  • Social networks: Arguably, the primary source of all big data that we know of today is the social networks that have proliferated over the past 5-10 years. This is by and large unstructured data that is represented by millions of social media postings and other data that is generated on a second-by-second basis through user interactions on the web across the world. Increase in access to the internet across the world has been a self-fulfilling act for the growth of data in social networks.
  • Media: Largely a result of the growth of social networks, media represents the millions, if not billions, of audio and visual uploads that take place on a daily basis. Videos uploaded on YouTube, music recordings on SoundCloud, and pictures posted on Instagram are prime examples of media, whose volume continues to grow in an unrestrained manner.
  • Data warehouses: Companies have long invested in specialized data storage facilities commonly known as data warehouses. A DW is essentially collections of historical data that companies wish to maintain and catalog for easy retrieval, whether for internal use or regulatory purposes. As industries gradually shift toward the practice of storing data in platforms such as Hadoop and NoSQL, more and more companies are moving data from their pre-existing data warehouses to some of the newer technologies. Company emails, accounting records, databases, and internal documents are some examples of DW data that is now being offloaded onto Hadoop or Hadoop-like platforms that leverage multiple nodes to provide a highly-available and fault-tolerant platform.
  • Sensors: A more recent phenomenon in the space of big data has been the collection of data from sensor devices. While sensors have always existed and industries such as oil and gas have been using drilling sensors for measurements at oil rigs for many decades, the advent of wearable devices, also known as the Internet Of Things such as Fitbit and Apple Watch, meant that now each inpidual could stream data at the same rate at which a few oil rigs used to do just 10 years back.

Wearable devices can collect hundreds of measurements from an inpidual at any given point in time. While not yet a big data problem as such, as the industry keeps evolving, sensor-related data is likely to become more akin to the kind of spontaneous data that is generated on the web through social network activities.

主站蜘蛛池模板: 汶川县| 城市| 万盛区| 淅川县| 广灵县| 武隆县| 酒泉市| 如东县| 河池市| 秦皇岛市| 县级市| 姚安县| 祁门县| 奉节县| 灵川县| 安平县| 澄江县| 南和县| 资源县| 蕉岭县| 当雄县| 临潭县| 贵港市| 新兴县| 海丰县| 鲁甸县| 信阳市| 小金县| 抚宁县| 邯郸县| 读书| 洪洞县| 丰镇市| 姚安县| 迁西县| 精河县| 鄂托克前旗| 山东| 隆回县| 永济市| 陇川县|