- Machine Learning with Scala Quick Start Guide
- Md. Rezaul Karim
- 321字
- 2021-06-24 14:32:02
Description of the dataset
We will use a recently added cryotherapy dataset from the UCI machine learning repository. The dataset can be downloaded from http://archive.ics.uci.edu/ml/datasets/Cryotherapy+Dataset+#.
This dataset contains information about wart treatment results of 90 patients using cryotherapy. In case you don't know, a wart is a kind of skin problem caused by infection with a type of human papillomavirus. Warts are typically small, rough, and hard growths that are similar in color to the rest of the skin.
There are two available treatments for this problem:
- Salicylic acid: A type of gel containing salicylic acid used in medicated band-aids.
- Cryotherapy: A freezing liquid (usually nitrogen) is sprayed onto the wart. It will destroy the cells in the affected area. After the cryotherapy, usually, a blister develops, which eventually turns into a scab and falls off after a week or so.
There are 90 samples or instances that were either recommended to go through cryotherapy or be discharged without cryotherapy. There are seven attributes in the dataset:
- sex: Patient gender, characterized by 1 (male) or 0 (female).
- age: Patient age.
- Time: Observation and treatment time in hours.
- Number_of_Warts: Number of warts.
- Type: Types of warts.
- Area: The amount of affected area.
- Result_of_Treatment: The recommended result of the treatment, characterized by either 1 (yes) or 0 (no). It is also the target column.
As you can understand, it is a classification problem because we will have to predict discrete labels. More specifically, it is a binary classification problem. Since this is a small dataset with only six features, we can start with a very basic classification algorithm called logistic regression, where the logistic function is applied to the regression to get the probabilities of it belonging in either class. We will learn more details about logistic regression and other classification algorithms in Chapter 3, Scala for Learning Classification. For this, we use the Spark ML-based implementation of logistic regression in Scala.
- 課課通計(jì)算機(jī)原理
- Introduction to DevOps with Kubernetes
- 基于LPC3250的嵌入式Linux系統(tǒng)開發(fā)
- CorelDRAW X4中文版平面設(shè)計(jì)50例
- Mastering Machine Learning Algorithms
- 數(shù)據(jù)挖掘方法及天體光譜挖掘技術(shù)
- 運(yùn)動(dòng)控制系統(tǒng)應(yīng)用與實(shí)踐
- OpenStack Cloud Computing Cookbook
- 多媒體制作與應(yīng)用
- 所羅門的密碼
- 重估:人工智能與賦能社會(huì)
- 機(jī)床電氣控制與PLC
- 數(shù)字多媒體技術(shù)基礎(chǔ)
- 納米集成電路制造工藝(第2版)
- 新世紀(jì)Photoshop CS6中文版應(yīng)用教程