官术网_书友最值得收藏!

Description of the dataset

The Orange Telecom's Churn Dataset, which consists of cleaned customer activity data (features), along with a churn label specifying whether a customer canceled the subscription, will be used to develop our predictive model. The churn-80 and churn-20 datasets can be downloaded from the following links, respectively:

However, as more data is often desirable for developing ML models, let's use the larger set (that is, churn-80) for training and cross-validation purposes, and the smaller set (that is, churn-20) for final testing and model performance evaluation.

Note that the latter set is only used to evaluate the model (that is for demonstration purposes). For a production ready environment, telecommunication companies can use their own dataset with necessary preprocessing and feature engineering. The dataset has the following schema:

  • State: String
  • Account length: Integer
  • Area code: Integer
  • International plan: String
  • Voicemail plan: String
  • Number email messages: Integer
  • Total day minutes: Double
  • Total day calls: Integer
  • Total day charge: Double
  • Total eve minutes: Double
  • Total eve calls: Integer
  • Total eve charge: Double
  • Total night minutes: Double
  • Total night calls: Integer
  • Total night charge: Double
  • Total intl minutes: Double
  • Total intl calls: Integer
  • Total intl charge: Double
  • Customer service calls: Integer
主站蜘蛛池模板: 墨竹工卡县| 介休市| 上犹县| 健康| 罗甸县| 兖州市| 且末县| 比如县| 土默特左旗| 盐津县| 通辽市| 沂南县| 黔江区| 黄陵县| 通化市| 琼结县| 昭苏县| 辛集市| 新乐市| 青河县| 敦煌市| 武清区| 和田市| 尼勒克县| 曲沃县| 茌平县| 汤阴县| 台州市| 宜阳县| 阳曲县| 忻州市| 昌图县| 阿拉善左旗| 大宁县| 青龙| 临桂县| 淮滨县| 辽阳市| 南乐县| 高雄市| 嘉荫县|