官术网_书友最值得收藏!

How to do it...

In the following steps, we will see a demonstration of how scikit-learn's K-means clustering algorithm performs on a toy PE malware classification:

  1. Start by importing and plotting the dataset:
import pandas as pd
import plotly.express as px

df = pd.read_csv("file_pe_headers.csv", sep=",")
fig = px.scatter_3d(
df,
x="SuspiciousImportFunctions",
y="SectionsLength",
z="SuspiciousNameSection",
color="Malware",
)
fig.show()

The following screenshot shows the output:

  1. Extract the features and target labels:
y = df["Malware"]
X = df.drop(["Name", "Malware"], axis=1).to_numpy()
  1. Next, import scikit-learn's clustering module and fit a K-means model with two clusters to the data:
from sklearn.cluster import KMeans

estimator = KMeans(n_clusters=len(set(y)))
estimator.fit(X)
  1. Predict the cluster using our trained algorithm:
y_pred = estimator.predict(X)
df["pred"] = y_pred
df["pred"] = df["pred"].astype("category")
  1. To see how the algorithm did, plot the algorithm's clusters:
fig = px.scatter_3d(
df,
x="SuspiciousImportFunctions",
y="SectionsLength",
z="SuspiciousNameSection",
color="pred",
)
fig.show()

The following screenshot shows the output:

The results are not perfect, but we can see that the clustering algorithm captured much of the structure in the dataset.

主站蜘蛛池模板: 万盛区| 新巴尔虎左旗| 宜丰县| 桓仁| 华坪县| 宣化县| 湘潭县| 郎溪县| 清镇市| 通榆县| 惠来县| 伊吾县| 高州市| 彭阳县| 南涧| 海南省| 佛学| 福鼎市| 湛江市| 湄潭县| 天柱县| 石棉县| 阿拉善右旗| 正安县| 迁西县| 高邮市| 大田县| 梓潼县| 会昌县| 梓潼县| 景东| 定结县| 应用必备| 安西县| 绥江县| 西丰县| 海盐县| 土默特右旗| 柘城县| 青田县| 岳池县|