- Statistics for Machine Learning
- Pratap Dangeti
- 473字
- 2021-07-02 19:05:57
Grid search
Grid search in machine learning is a popular way to tune the hyperparameters of the model in order to find the best combination for determining the best fit:

In the following code, implementation has been performed to determine whether a particular user will click an ad or not. Grid search has been implemented using a decision tree classifier for classification purposes. Tuning parameters are the depth of the tree, the minimum number of observations in terminal node, and the minimum number of observations required to perform the node split:
# Grid search >>> import pandas as pd >>> from sklearn.tree import DecisionTreeClassifier >>> from sklearn.model_selection import train_test_split >>> from sklearn.metrics import classification_report,confusion_matrix,accuracy_score >>> from sklearn.pipeline import Pipeline >>> from sklearn.grid_search import GridSearchCV >>> input_data = pd.read_csv("ad.csv",header=None) >>> X_columns = set(input_data.columns.values) >>> y = input_data[len(input_data.columns.values)-1] >>> X_columns.remove(len(input_data.columns.values)-1) >>> X = input_data[list(X_columns)]
Split the data into train and testing:
>>> X_train, X_test,y_train,y_test = train_test_split(X,y,train_size = 0.7,random_state=33)
Create a pipeline to create combinations of variables for the grid search:
>>> pipeline = Pipeline([ ... ('clf', DecisionTreeClassifier(criterion='entropy')) ])
Combinations to explore are given as parameters in Python dictionary format:
>>> parameters = { ... 'clf__max_depth': (50,100,150), ... 'clf__min_samples_split': (2, 3), ... 'clf__min_samples_leaf': (1, 2, 3)}
The n_jobs field is for selecting the number of cores in a computer; -1 means it uses all the cores in the computer. The scoring methodology is accuracy, in which many other options can be chosen, such as precision, recall, and f1:
>>> grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=1, scoring='accuracy') >>> grid_search.fit(X_train, y_train)
Predict using the best parameters of grid search:
>>> y_pred = grid_search.predict(X_test)
The output is as follows:
>>> print ('\n Best score: \n', grid_search.best_score_) >>> print ('\n Best parameters set: \n') >>> best_parameters = grid_search.best_estimator_.get_params() >>> for param_name in sorted(parameters.keys()): >>> print ('\t%s: %r' % (param_name, best_parameters[param_name])) >>> print ("\n Confusion Matrix on Test data \n",confusion_matrix(y_test,y_pred)) >>> print ("\n Test Accuracy \n",accuracy_score(y_test,y_pred)) >>> print ("\nPrecision Recall f1 table \n",classification_report(y_test, y_pred))

The R code for grid searches on decision trees is as follows:
# Grid Search on Decision Trees library(rpart) input_data = read.csv("ad.csv",header=FALSE) input_data$V1559 = as.factor(input_data$V1559) set.seed(123) numrow = nrow(input_data) trnind = sample(1:numrow,size = as.integer(0.7*numrow)) train_data = input_data[trnind,];test_data = input_data[-trnind,] minspset = c(2,3);minobset = c(1,2,3) initacc = 0 for (minsp in minspset){ for (minob in minobset){ tr_fit = rpart(V1559 ~.,data = train_data,method = "class",minsplit = minsp, minbucket = minob) tr_predt = predict(tr_fit,newdata = train_data,type = "class") tble = table(tr_predt,train_data$V1559) acc = (tble[1,1]+tble[2,2])/sum(tble) acc if (acc > initacc){ tr_predtst = predict(tr_fit,newdata = test_data,type = "class") tblet = table(test_data$V1559,tr_predtst) acct = (tblet[1,1]+tblet[2,2])/sum(tblet) acct print(paste("Best Score")) print( paste("Train Accuracy ",round(acc,3),"Test Accuracy",round(acct,3))) print( paste(" Min split ",minsp," Min obs per node ",minob)) print(paste("Confusion matrix on test data")) print(tblet) precsn_0 = (tblet[1,1])/(tblet[1,1]+tblet[2,1]) precsn_1 = (tblet[2,2])/(tblet[1,2]+tblet[2,2]) print(paste("Precision_0: ",round(precsn_0,3),"Precision_1: ",round(precsn_1,3))) rcall_0 = (tblet[1,1])/(tblet[1,1]+tblet[1,2]) rcall_1 = (tblet[2,2])/(tblet[2,1]+tblet[2,2]) print(paste("Recall_0: ",round(rcall_0,3),"Recall_1: ",round(rcall_1,3))) initacc = acc } } }
- 深入核心的敏捷開發:ThoughtWorks五大關鍵實踐
- Learning Spring 5.0
- Java Web基礎與實例教程(第2版·微課版)
- 程序員面試算法寶典
- Web Development with Django Cookbook
- 教孩子學編程:C++入門圖解
- PhpStorm Cookbook
- 零基礎入門學習Python
- ANSYS Fluent 二次開發指南
- Learning Unity 2D Game Development by Example
- RSpec Essentials
- Practical Game Design with Unity and Playmaker
- UI設計全書(全彩)
- Delphi開發典型模塊大全(修訂版)
- C語言從入門到精通