官术网_书友最值得收藏!

Dropping features that are redundant

From the dataset seen previously, there are a few columns that are redundant to the machine learning process:

  • nameOrig: This column is a unique identifier that belongs to each customer. Since each identifier is unique with every row of the dataset, the machine learning algorithm will not be able to discern any patterns from this feature. 
  • nameDest: This column is also a unique identifier that belongs to each customer and as such provides no value to the machine learning algorithm. 
  • isFlaggedFraud: This column flags a transaction as fraudulent if a person tries to transfer more than 200,000 in a single transaction. Since we already have a feature called isFraud that flags a transaction as fraud, this feature becomes redundant. 

We can drop these features from the dataset by using the following code: 

#Dropping the redundant features

df = df.drop(['nameOrig', 'nameDest', 'isFlaggedFraud'], axis = 1)
主站蜘蛛池模板: 锦屏县| 东安县| 中超| 延长县| 库尔勒市| 山东| 收藏| 镇宁| 玉溪市| 雅江县| 连州市| 河南省| 遂川县| 静安区| 河曲县| 大理市| 菏泽市| 察雅县| 柳江县| 吉水县| 三穗县| 寻甸| 布拖县| 肥乡县| 余庆县| 开远市| 喀喇沁旗| 水城县| 黄陵县| 辽阳市| 临桂县| 蒙山县| 麻栗坡县| 牙克石市| 三门县| 马山县| 通江县| 申扎县| 平乐县| 大庆市| 时尚|