官术网_书友最值得收藏!

Introduction to big data

Twitter, Facebook, Amazon, Verizon, Macy's, and Whole Foods are all companies that run their business using data analytics and base many of the decisions on the analytics. Think about what kind of data they are collecting, how much data they might be collecting, and then how they might be using the data.

Let's look at the grocery store example seen earlier; what if the store starts expanding its business to set up hundreds of stores? Naturally, the sales transactions will have to be collected and stored at a scale hundreds of times more than the single store. But then, no business works independently any more. There is a lot of information out there, starting from local news, tweets, Yelp reviews, customer complaints, survey activities, competition from other stores, the changing demographics or economy of the local area, and so on. All such additional data can help in better understanding the customer behavior and the revenue models.

For example, if we see increasing negative sentiment regarding the store's parking facility, then we could analyze this and take corrective action such as validated parking or negotiating with the city's public transportation department to provide more frequent trains or buses for better reach. Such an increasing quantity and variety of data, while it provides better analytics also poses challenges to the business IT organization trying to store and process and analyze all the data. It is, in fact, not uncommon to see TBs of data.

Every day, we create more than two quintillion bytes of data (2 EB), and it is estimated that more than 90% of the data has been generated in last few years alone:

1 KB = 1024 Bytes
1 MB = 1024 KB
1 GB = 1024 MB
1 TB = 1024 GB ~ 1,000,000 MB
1 PB = 1024 TB ~ 1,000,000 GB ~ 1,000,000,000 MB
1 EB = 1024 PB ~ 1,000,000 TB ~ 1,000,000,000 GB ~ 1,000,000,000,000 MB

Such large amounts of data since the 1990s and the need to understand and make sense of the data, gave rise to the term big data. 

In 2001, Doug Laney, then an analyst at consultancy Meta Group Inc (which got acquired by Gartner), introduced the idea of three Vs (that is, Variety, Velocity, and Volume). Now, we refer to four Vs instead of three Vs with the addition of Veracity of data to the three Vs.

The following are the four Vs of big data, used to describe the properties of big data.

主站蜘蛛池模板: 临颍县| 宜黄县| 宁南县| 南雄市| 盐津县| 温宿县| 陵水| 吉林市| 寻乌县| 寻乌县| 闽侯县| 奉新县| 微山县| 崇义县| 临安市| 交口县| 顺义区| 四川省| 安徽省| 黔东| 泸定县| 广昌县| 微博| 中宁县| 呼玛县| 若羌县| 黄石市| 阿克| 高安市| 阳东县| 黄龙县| 丹东市| 郓城县| 阳西县| 江川县| 太仆寺旗| 同德县| 清丰县| 岐山县| 鹿泉市| 柳林县|