基于sk_learn的k近邻算法实现-mnist手写数字识别且要求97%以上精确率

软件发布|下载排行|最新软件

当前位置:首页IT学院IT技术

基于sk_learn的k近邻算法实现-mnist手写数字识别且要求97%以上精确率

虎头虎脑小黑狗   2021-03-26 我要评论

1. 导入需要的库

from sklearn.datasets import fetch_openml
import numpy as np
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

 

2. 设置随机种子,以获得可复现的结果。

np.random.seed(42)

 

3. 获取mnist数据集,并将数据集标签 由字符型转换为整数型

1 np.random.seed(42)
2 mnist = fetch_openml("mnist_784", version = 1, as_frame=False)
3 X, y = mnist['data'], mnist['target']
4 y = y.astype(np.uint8)

 

4. 划分训练集和测试集

X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

 

5. 训练模型并测试

knn_clf = KNeighborsClassifier()
knn_clf.fit(X_train, y_train)

y_test_pred = knn_clf.predict(X_test)
print(accuracy_score(y_test, y_test_pred))

如图我们得到了模型的准确率 0.9688

 

 

 

6. 训练模型中的超参数weights(默认值为'uniform')和n_neighbors(默认值为5)。由于超参数的连续性,所以n_neighbors的备选值可以为 3, 4,  6

from sklearn.datasets import fetch_openml
import numpy as np
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score

np.random.seed(42)
mnist = fetch_openml("mnist_784", version = 1, as_frame=False)
X, y = mnist['data'], mnist['target']
y = y.astype(np.uint8)

X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]
param_grid = [{'weights': ["uniform", "distance"], 'n_neighbors': [3, 4, 6]}]

knn_clf = KNeighborsClassifier()
grid_search = GridSearchCV(knn_clf, param_grid, cv=5, verbose=3)
grid_search.fit(X_train, y_train)
y_pred = grid_search.predict(X_test)
print(accuracy_score(y_test, y_pred))

如图所示,在测试集上得到的准确率达到97.14%

 

 

 

通过如下命令可以获得选取的最合适的超参数以及在验证集上达到的最好结果

 

 

 

 

Copyright 2022 版权所有 软件发布 访问手机版

声明:所有软件和文章来自软件开发商或者作者 如有异议 请与本站联系 联系我们