官术网_书友最值得收藏!

Developing a churn analytics pipeline

In ML, we observe an algorithm's performance in two stages: learning and inference. The ultimate target of the learning stage is to prepare and describe the available data, also called the feature vector, which is used to train the model.

The learning stage is one of the most important stages, but it is also truly time-consuming. It involves preparing a list of vectors, also called feature vectors (vectors of numbers representing the value of each feature), from the training data after transformation so that we can feed them to the learning algorithms. On the other hand, training data also sometimes contains impure information that needs some pre-processing, such as cleaning.

Once we have the feature vectors, the next step in this stage is preparing (or writing/reusing) the learning algorithm. The next important step is training the algorithm to prepare the predictive model. Typically, (and of course based on data size), running an algorithm may take hours (or even days) so that the features converge into a useful model, as shown in the following figure:

Figure 2: Learning and training a predictive model - it shows how to generate the feature vectors from the training data to train the learning algorithm that produces a predictive model

The second most important stage is the inference that is used for making an intelligent use of the model, such as predicting from the never-before-seen data, making recommendations, deducing future rules, and so on. Typically, it takes less time compared to the learning stage, and is sometimes even in real time. Thus, inferencing is all about testing the model against new (that is, unobserved) data and evaluating the performance of the model itself, as shown in the following figure:

Figure 3: Inferencing from an existing model towards predictive analytics (feature vectors are generated from unknown data for making predictions)

However, during the whole process and for making the predictive model a successful one, data acts as the first-class citizen in all ML tasks. Keeping all this in mind, the following figure shows an analytics pipeline that can be used by telecommunication companies:

Figure 4: Churn analytics pipeline

With this kind of analysis, telecom companies can discern how to predict and enhance the customer experience, which can, in turn, prevent churn and tailor marketing campaigns. In practice, often these business assessments are used in order to retain the customers most likely to leave, as opposed to those who are likely to stay.

Thus, we need to develop a predictive model so that it ensures that our model is sensitive to the Churn = True samples—that is, a binary classification problem. We will see more details in upcoming sections.

主站蜘蛛池模板: 文山县| 龙门县| 宁城县| 顺义区| 开阳县| 安徽省| 湘潭市| 镇宁| 泽普县| 南岸区| 宁南县| 荆州市| 象山县| 锡林浩特市| 株洲市| 东丽区| 永丰县| 浑源县| 利辛县| 荔波县| 阜城县| 酉阳| 海淀区| 荃湾区| 莫力| 子长县| 保定市| 洛扎县| 揭东县| 罗定市| 罗甸县| 浦江县| 苏尼特左旗| 深水埗区| 阳朔县| 阿荣旗| 宁明县| 尚义县| 荆州市| 马关县| 玛纳斯县|