官术网_书友最值得收藏!

Typical machine learning workflow

A typical ML application involves several processing steps, from the input to the output, forming a scientific workflow as shown in Figure 1, ML workflow. The following steps are involved in a typical ML application:

  1. Load the data
  2. Parse the data into the input format for the algorithm
  3. Pre-process the data and handle the missing values
  4. Split the data into three sets, for training, testing, and validation (train set and validation set respectively) and one for testing the model (test dataset)
  5. Run the algorithm to build and train your ML model
  6. Make predictions with the training data and observe the results
  7. Test and evaluate the model with the test data or alternatively validate the model using some cross-validator technique using the third dataset called a validation dataset
  8. Tune the model for better performance and accuracy
  9. Scale up the model so that it can handle massive datasets in future
  10. Deploy the ML model in production:
Figure 1: ML workflow

The preceding workflow is represent a few steps to solve ML problems. Where, ML tasks can be broadly categorized into supervised, unsupervised, semi-supervised, reinforcement, and recommendation systems. The following Figure 2, Supervised learning in action, shows the schematic diagram of supervised learning. After the algorithm has found the required patterns, those patterns can be used to make predictions for unlabeled test data:

Figure 2: Supervised learning in action

Examples include classification and regression for solving supervised learning problems so that predictive models can be built for predictive analytics based on them. Throughout the upcoming chapters, we will provide several examples of supervised learning, such as LR, logistic regression, random forest, decision trees, Naive Bayes, multilayer perceptron, and so on.

A regression algorithm is meant to produce continuous output. The input is allowed to be either discrete or continuous:

Figure 3: A regression algorithm is meant to produce continuous output

A classification algorithm, on the other hand, is meant to produce discrete output from an input of a set of discrete or continuous values. This distinction is important to know because discrete-valued outputs are handled better by classification, which will be discussed in upcoming chapters:

Figure 4: A classification algorithm is meant to produce discrete output

In this chapter, we will mainly focus on the supervised regression algorithms. We will start with describing the problem statement and then we move on to the very simple LR algorithm. Often, performance of these ML models is optimized using hyperparameter tuning and cross-validation techniques. So knowing them, in brief, is mandatory so that we can easily use them in future chapters.

主站蜘蛛池模板: 泸溪县| 日照市| 灵武市| 麻栗坡县| 淮阳县| 大荔县| 泰和县| 台中市| 府谷县| 德州市| 鄂尔多斯市| 林周县| 富民县| 军事| 崇信县| 蓝山县| 宝兴县| 临朐县| 沂南县| 泾源县| 衡水市| 满城县| 横峰县| 金乡县| 武山县| 葵青区| 赞皇县| 墨江| 调兵山市| 烟台市| 遂平县| 金乡县| 呈贡县| 河西区| 奈曼旗| 阜城县| 博野县| 吉安市| 精河县| 阳曲县| 怀化市|