官术网_书友最值得收藏!

Analytics challenges

Analytics often requires deciding on whether to fill in or ignore the missing values. Either choice may lead to a dataset that is not a representative of reality.

As an example of how this can affect results, consider the case of inaccurate political poll results in recent years. Many experts believe it is now in near crisis due to the shift of much of the world to mobile numbers as their only phone number. For pollsters, it is cheaper and easier to reach people on landline numbers. This can lead to the over representation of people with landlines. These people tend to be both older and wealthier than mobile-only respondents.

The response rate has also dropped from near 80% in the 1970s to about 8% (if you are lucky) today. This makes it more difficult (and expensive) to obtain a representative sample leading to many embarrassingly wrong poll predictions.

There can also be outside influences, such as environment conditions, that are not captured in the data. Winter storms can lead to power failures affecting devices that are able to report back data. You may end up drawing conclusions based on a non-representative sample of data without realizing it. This can affect the results of IoT analytics – and it will not be clear why.

Since connectivity is a new thing for many devices, there is also often a lack of historical data to base predictive models on. This can limit the type of analytics that can be done with the data.

It can also lead to a recency bias in datasets, as newer products are over represented in the data simply because a higher percentage are now a part of the IoT.

This leads us to the author's number one rule in IoT analytics:

Never trust data you don't know.

Treat it like a stranger offering you candy.

主站蜘蛛池模板: 通州市| 通渭县| 泰宁县| 井研县| 巴青县| 巴塘县| 滁州市| 台湾省| 马龙县| 开封市| 苗栗县| 太康县| 富蕴县| 郴州市| 合作市| 金湖县| 海门市| 茶陵县| 木里| 天台县| 拉萨市| 闽清县| 屯昌县| 通海县| 叶城县| 昌吉市| 乐都县| 称多县| 扎赉特旗| 松滋市| 闵行区| 宜良县| 英吉沙县| 托克托县| 鸡泽县| 芜湖县| 平顺县| 呼图壁县| 芦山县| 济南市| 静乐县|