官术网_书友最值得收藏!

Predicting prices and evaluating the model

ShortTermPredictionServiceImpl is the class that actually performs the prediction with the given model and data. At first, it transforms PriceData into a Spark DataFrame with the scheme corresponding to the one used for training by calling transformPriceData(priceData: PriceData). Then, the model.transform(dataframe) method is called; we extract the variables we need, write into the debugger log and return to the caller:

override def predictPriceDeltaLabel(priceData: PriceData, mlModel: org.apache.spark.ml.Transformer): (String, Row) = {
val df = transformPriceData(priceData)
val prediction = mlModel.transform(df)
val predictionData = prediction.select("probability", "prediction", "rawPrediction").head()
(predictionData.get(1).asInstanceOf[Double].toInt.toString, predictionData)
}

While running, the application collects data about the prediction output: predicted label and actual price delta. This information is used to build the root web page, displaying statistics such as TPR (true positive rate), FPR (false positive rate), TNR (true negative rate), and FNR (false negative rate), which were described earlier.

These statistics are counted on the fly from the SHORT_TERM_PREDICTION_BINARY table. Basically, by using the CASE-WHEN construction, we add new columns: TPR, FPR, TNR, and FNR. They are defined as follows:

  • TPR with value 1 if the predicted label was 1 and price delta was > 0, and value 0 otherwise
  • FPR with value 1 if the predicted label was 1 and price delta was <= 0, and value 0 otherwise
  • TNR with value 1 if the predicted label was 0 and price delta was <= 0, and value 0 otherwise
  • FNR with value 1 if the predicted label was 0 and price delta was > 0, and value 0 otherwise

Then, all records are grouped by model name, and TPR, FPR, TNR, and FNR are summed up, giving us the total numbers for each model. Here is the SQL code responsible for this:

SELECT MODEL, SUM(TPR) as TPR, SUM(FPR) as FPR, SUM(TNR) as TNR, 
SUM(FNR) as FNR, COUNT(*) as TOTAL FROM (SELECT *,
case when PREDICTED_LABEL='1' and ACTUAL_PRICE_DELTA > 0
then 1 else 0 end as TPR,
case when PREDICTED_LABEL='1' and ACTUAL_PRICE_DELTA <=0
then 1 else 0 end as FPR,
case when PREDICTED_LABEL='0' and ACTUAL_PRICE_DELTA <=0
then 1 else 0 end as TNR,
case when PREDICTED_LABEL='0' and ACTUAL_PRICE_DELTA > 0
then 1 else 0 end as FNR
FROM SHORT_TERM_PREDICTION_BINARY)
GROUP BY MODEL
主站蜘蛛池模板: 龙州县| 措勤县| 清河县| 南汇区| 土默特右旗| 孝昌县| 龙海市| 鞍山市| 武定县| 卢氏县| 东台市| 喀喇| 绩溪县| 聂荣县| 武强县| 渭南市| 江西省| 双牌县| 河北省| 德安县| 太和县| 绥阳县| 浪卡子县| 兴隆县| 绥化市| 柞水县| 灵丘县| 思茅市| 浦北县| 华阴市| 安岳县| 宁夏| 鹤庆县| 大渡口区| 广宁县| 泸水县| 濮阳市| 肥东县| 页游| 西乡县| 中西区|