sklearn K近邻KNeighborsClassifier参数详解

2023-11-18

【原文网址】https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html

class sklearn.neighbors.KNeighborsClassifier(n_neighbors=5, weights=’uniform’, algorithm=’auto’, leaf_size=30, p=2, metric=’minkowski’, metric_params=None, n_jobs=None, **kwargs)[source]

Classifier implementing the k-nearest neighbors vote.

Parameters:	n_neighbors : int, optional (default = 5) Number of neighbors to use by default for `kneighbors` queries. weights : str or callable, optional (default = ‘uniform’) weight function used in prediction. Possible values: ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally. ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away. [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights. algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional Algorithm used to compute the nearest neighbors: ‘ball_tree’ will use `BallTree` ‘kd_tree’ will use `KDTree` ‘brute’ will use a brute-force search. ‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to `fit` method. Note: fitting on sparse input will override the setting of this parameter, using brute force. leaf_size : int, optional (default = 30) Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem. p : integer, optional (default = 2) Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used. metric : string or callable, default ‘minkowski’ the distance metric to use for the tree. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric. See the documentation of the DistanceMetric class for a list of available metrics. metric_params : dict, optional (default = None) Additional keyword arguments for the metric function. n_jobs : int or None, optional (default=None) The number of parallel jobs to run for neighbors search. `None` means 1 unless in a `joblib.parallel_backend` context. `-1` means using all processors. See Glossary for more details. Doesn’t affect `fit` method.

Notes

See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.

Warning

Regarding the Nearest Neighbors algorithms, if it is found that two neighbors, neighbor k+1 and k, have identical distances but different labels, the results will depend on the ordering of the training data.

https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

Examples

>>>

>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import KNeighborsClassifier
>>> neigh = KNeighborsClassifier(n_neighbors=3)
>>> neigh.fit(X, y) 
KNeighborsClassifier(...)
>>> print(neigh.predict([[1.1]]))
[0]
>>> print(neigh.predict_proba([[0.9]]))
[[0.66666667 0.33333333]]

Methods

`fit`(X, y)	Fit the model using X as training data and y as target values
`get_params`([deep])	Get parameters for this estimator.
`kneighbors`([X, n_neighbors, return_distance])	Finds the K-neighbors of a point.
`kneighbors_graph`([X, n_neighbors, mode])	Computes the (weighted) graph of k-Neighbors for points in X
`predict`(X)	Predict the class labels for the provided data
`predict_proba`(X)	Return probability estimates for the test data X.
`score`(X, y[, sample_weight])	Returns the mean accuracy on the given test data and labels.
`set_params`(**params)	Set the parameters of this estimator.

__init__(n_neighbors=5, weights=’uniform’, algorithm=’auto’, leaf_size=30, p=2, metric=’minkowski’, metric_params=None, n_jobs=None, **kwargs)[source]

fit(X, y)[source]

Fit the model using X as training data and y as target values

Parameters:	X : {array-like, sparse matrix, BallTree, KDTree} Training data. If array or matrix, shape [n_samples, n_features], or [n_samples, n_samples] if metric=’precomputed’. y : {array-like, sparse matrix} Target values of shape = [n_samples] or [n_samples, n_outputs]

Parameters:

X : {array-like, sparse matrix, BallTree, KDTree}

Training data. If array or matrix, shape [n_samples, n_features], or [n_samples, n_samples] if metric=’precomputed’.

y : {array-like, sparse matrix}

Target values of shape = [n_samples] or [n_samples, n_outputs]

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:	deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:	params : mapping of string to any Parameter names mapped to their values.

Parameters:

deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params : mapping of string to any

Parameter names mapped to their values.

kneighbors(X=None, n_neighbors=None, return_distance=True)[source]

Finds the K-neighbors of a point. Returns indices of and distances to the neighbors of each point.

Parameters:	X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’ The query point or points. If not provided, neighbors of each indexed point are returned. In this case, the query point is not considered its own neighbor. n_neighbors : int Number of neighbors to get (default is the value passed to the constructor). return_distance : boolean, optional. Defaults to True. If False, distances will not be returned
Returns:	dist : array Array representing the lengths to points, only present if return_distance=True ind : array Indices of the nearest points in the population matrix.

Parameters:

X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’

The query point or points. If not provided, neighbors of each indexed point are returned. In this case, the query point is not considered its own neighbor.

n_neighbors : int

Number of neighbors to get (default is the value passed to the constructor).

return_distance : boolean, optional. Defaults to True.

If False, distances will not be returned

Returns:

dist : array

Array representing the lengths to points, only present if return_distance=True

ind : array

Indices of the nearest points in the population matrix.

Examples

In the following example, we construct a NeighborsClassifier class from an array representing our data set and ask who’s the closest point to [1,1,1]

>>>

>>> samples = [[0., 0., 0.], [0., .5, 0.], [1., 1., .5]]
>>> from sklearn.neighbors import NearestNeighbors
>>> neigh = NearestNeighbors(n_neighbors=1)
>>> neigh.fit(samples) 
NearestNeighbors(algorithm='auto', leaf_size=30, ...)
>>> print(neigh.kneighbors([[1., 1., 1.]])) 
(array([[0.5]]), array([[2]]))

As you can see, it returns [[0.5]], and [[2]], which means that the element is at distance 0.5 and is the third element of samples (indexes start at 0). You can also query for multiple points:

>>>

>>> X = [[0., 1., 0.], [1., 0., 1.]]
>>> neigh.kneighbors(X, return_distance=False) 
array([[1],
       [2]]...)

kneighbors_graph(X=None, n_neighbors=None, mode=’connectivity’)[source]

Computes the (weighted) graph of k-Neighbors for points in X

Parameters:	X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’ The query point or points. If not provided, neighbors of each indexed point are returned. In this case, the query point is not considered its own neighbor. n_neighbors : int Number of neighbors for each sample. (default is value passed to the constructor). mode : {‘connectivity’, ‘distance’}, optional Type of returned matrix: ‘connectivity’ will return the connectivity matrix with ones and zeros, in ‘distance’ the edges are Euclidean distance between points.
Returns:	A : sparse matrix in CSR format, shape = [n_samples, n_samples_fit] n_samples_fit is the number of samples in the fitted data A[i, j] is assigned the weight of edge that connects i to j.

Parameters:

X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’

The query point or points. If not provided, neighbors of each indexed point are returned. In this case, the query point is not considered its own neighbor.

n_neighbors : int

Number of neighbors for each sample. (default is value passed to the constructor).

mode : {‘connectivity’, ‘distance’}, optional

Type of returned matrix: ‘connectivity’ will return the connectivity matrix with ones and zeros, in ‘distance’ the edges are Euclidean distance between points.

Returns:

A : sparse matrix in CSR format, shape = [n_samples, n_samples_fit]

n_samples_fit is the number of samples in the fitted data A[i, j] is assigned the weight of edge that connects i to j.

Parameters:	X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’ Test samples.
Returns:	y : array of shape [n_samples] or [n_samples, n_outputs] Class labels for each data sample.

Parameters:	X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’ Test samples.
Returns:	p : array of shape = [n_samples, n_classes], or a list of n_outputs of such arrays if n_outputs > 1. The class probabilities of the input samples. Classes are ordered by lexicographic order.

Parameters:	X : array-like, shape = (n_samples, n_features) Test samples. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True labels for X. sample_weight : array-like, shape = [n_samples], optional Sample weights.
Returns:	score : float Mean accuracy of self.predict(X) wrt. y.

Returns:	self

sklearn K近邻KNeighborsClassifier参数详解的相关文章

使用 scikit 包在 Python 中绘制集群区域的边界

这是我处理 3 个属性 x y 值中的数据聚类的简单示例每个样本代表其位置 x y 及其所属变量我的代码发布在这里 x np arange 100 200 1 y np arange 100 200 1 value np random
在监督分类中，使用partial_fit() 的MLP 比使用fit() 的表现更差

我正在使用的学习数据集是灰度图像flatten让每个像素代表一个单独的样本第二张图像在训练后将被逐像素分类Multilayer perceptron MLP 前一个分类器我遇到的问题是MLP当它一次接收到所有训练数据集时表现更好 fit
使用 imblearn 管道进行交叉验证之前或之后是否发生过采样？

在对训练数据进行交叉验证以验证我的超参数之前我已将数据分为训练测试我有一个不平衡的数据集并且想要在每次迭代中执行 SMOTE 过采样因此我使用以下方法建立了一个管道imblearn 我的理解是将数据分成k折后应该进行过采样以防
为什么我的混淆矩阵只返回一个数字？

我正在做二元分类每当我的预测等于事实时我发现sklearn metrics confusion matrix返回单个值难道没有问题吗 from sklearn metrics import confusion matrix print
ValueError：不支持连续[重复]

这个问题在这里已经有答案了我正在使用 GridSearchCV 进行线性回归的交叉验证不是分类器也不是逻辑回归我还使用 StandardScaler 对 X 进行标准化我的数据框有 17 个特征 X 和 5 个目标 y 观察约11
在sklearn中将文本列转换为数字

我是数据分析新手我正在尝试 python Sklearn 中的一些模型我有一个数据集其中某些列具有文本列就像下面这样 Dataset 有没有办法将这些列值转换为 pandas 或 Sklearn 中的数字为这些值分配数字是对的吗
sklearn LogisticRegressionCV 是否使用最终模型的所有数据

我想知道sklearn中LogisticRegressionCV的最终模型即决策边界是如何计算的假设我有一些 Xdata 和 ylabels Xdata shape of this is n samples n features yl
sklearn 中的 pca.inverse_transform

将我的数据拟合后 X 我的数据 pca PCA n components 1 pca fit X X pca pca fit transform X 现在 X pca 具有一维当我根据定义执行逆变换时它不是应该返回原始数据即 X 二维
将 Keras 集成到 SKLearn 管道？

我有一个 sklearn 管道对异构数据类型布尔分类数字文本执行特征工程并想尝试使用神经网络作为我的学习算法来拟合模型我遇到了输入数据形状的一些问题我想知道我想做的事情是否可能或者我是否应该尝试不同的方法我尝试了几种不
Learning_rate 不是合法参数

我正在尝试通过实现 GridSearchCV 来测试我的模型但我似乎无法在 GridSearch 中添加学习率和动量作为参数每当我尝试通过添加这些代码来执行代码时我都会收到错误这是我创建的模型 def define model op
如何使用FeatureUnion转换PipeLine中的多个特征？

我有一个 pandas 数据框其中包含有关用户发送的消息的信息对于我的模型我感兴趣的是预测消息的缺失收件人即给定消息的收件人 A B C 我想预测还有谁应该成为收件人的一部分我正在使用 OneVsRestClassifier 和
Scikit-learn：如何获得 True Positive、True Negative、False Positive 和 False Negative

我的问题我有一个数据集它是一个很大的 JSON 文件我读取它并将其存储在trainList多变的接下来我对其进行预处理以便能够使用它完成后我开始分类我用kfold交叉验证方法以获得平均值准确性并训练分类器我做出预测并获
分类报告 - 精度和 F 分数定义不明确

我从 sklearn metrics 导入了classification report 当我输入我的np arrays作为参数我收到以下错误 usr local lib python3 6 dist packages sklearn met
float() 参数必须是字符串或数字，而不是“时间戳”

我无法使 scilearn 与日期时间系列一起工作找到了这篇文章但对我没有帮助 Pandas 类型错误 float 参数必须是字符串或数字 https stackoverflow com questions 41256626 panda
sklearn 中带有词袋和附加情感特征的文本分类器

我正在尝试构建一个分类器除了词袋之外还使用情绪或主题 LDA 结果等特征我有一个包含文本和标签的 pandas DataFrame 并且想添加情感值 5 到 5 之间的数字和 LDA 分析结果带有句子主题的字符串我有一个工作词
是否可以使用具有余弦相似度的 KDTree？

看来我不能使用这个相似度度量sklearn例如 KDTree 但我需要因为我正在使用测量单词向量相似度对于这种情况快速鲁棒定制算法是什么我知道关于Local Sensitivity Hashing 但它应该经过大量调整和测试才能找到
scikit-learn LinearRegression 的意外交叉验证分数

我正在尝试学习使用 scikit learn 来完成一些基本的统计学习任务我认为我已经成功创建了适合我的数据的线性回归模型 X train X test y train y test cross validation train test
问：R 中的 KNN——奇怪的行为

有谁知道为什么下面的 KNN R 代码对不同的种子给出不同的预测这很奇怪因为 K library class train lt rbind c 0 0626015 0 0530052 0 0530052 0 0496676 0 0530
如何组合多个朴素贝叶斯分类器的输出？

我是新来的我有一组使用 Sklearn 工具包中的朴素贝叶斯分类器 NBC 构建的弱分类器我的问题是如何结合每个 NBC 的输出来做出最终决定我希望我的决定是基于概率而不是标签我用 python 编写了以下程序我假设 sklean
在Python中表示语料库句子的一种热门编码

我是 Python 和 Scikit learn 库的初学者我目前需要从事一个 NLP 项目该项目首先需要通过 One Hot Encoding 来表示一个大型语料库我已经阅读了 Scikit learn 关于 preprocessi

随机推荐

Java IO流缓冲流-BufferedInputStream、BufferedOutputStream

首先抛出一个问题有了InputStream为什么还要有BufferedInputStream BufferedInputStream和BufferedOutputStream这两个类分别是FilterInputStream和FilterO
将Python脚本编译为so文件的方法，并实现调用

本文以Linux系统 Ubuntu 为例讲解如何将自己的Python程序 py文件加密为 so文件 1 安装必要的工具首先我们需要在Ubuntu系统中安装一些准备工具包括python3 dev gcc Cython 其中Cytho
lua环境搭建数据类型

lua作为一门计算机语言从语法角度个人感觉还是挺简洁的接下来我们从0开始学习lua语言 1 首先我们需要下载lua开发工具包在这里我们使用的工具是luadist 下载链接为 https luadist org repository 下载
2023年每天都投递很多份简历，但都石沉大海，我还投吗？测试人该何去何从？

各大互联网公司的接连裁员政策限制的行业接连消失让今年的求职雪上加霜想躺平却没有资本还有人说软件测试岗位饱和了对此很多求职者深信不疑因为投出去的简历回复的越来越少了另一面企业招人真的变得容易了吗有企业HR吐槽简历确实比以前多
销售、售前、项目实施不同的培训要求

产品部门对于不同的岗位培训要有不同的针对性不能搞一刀切针对销售部门培训的要求和考核的要求知其然即知道产品的功能性能优势针对售前部门培训的要求和考核的要求知其然知起所以然即要知道产品的然更要知道然从何来优势
Linux操作系统的题目联系及解析

一创建文件命令练习 1 在目录下创建一个临时目录test 这个比较基础就是考创建利用mkdir就能完成如 2 在临时目录test下创建五个文件文件名分别为passwd group bashrc profile sshd conf
如何判断网页是否使用了Ajax

方法一一次AJAX请求头如下一次普通get请求如下方法2 使用JS插件查看是不是异步加载方法3
操作系统中的作业、程序、进程

作业作业是用户向计算机提交任务的任务实体是要求计算机系统所做工作的集合在用户向计算机提交作业后系统将它放入外存中的作业等待队列中等待执行它包括程序数据及其作业说明书程序程序是为解决一个信息处理任务而预先编制的工作执行方案是
最热门的大数据技术

大数据已经融入到各行各业哪些大数据技术是最受欢迎哪些大数据技术潜力巨大对10个最热门的大数据技术的介绍一预测分析预测分析是一种统计或数据挖掘解决方案包含可在结构化和非结构化数据中使用以确定未来结果的算法和技术可为预测优化
LeetCode 2391. 收集垃圾的最少总时间

给你一个下标从 0 开始的字符串数组 garbage 其中 garbage i 表示第 i 个房子的垃圾集合 garbage i 只包含字符 M P 和 G 但可能包含多个相同字符每个字符分别表示一单位的金属纸和玻璃垃圾车收拾一单
Qt离线安装MSVC方法

安装好Qt后有时候需要用到MSVC编译环境如果电脑连接了互联网直接下载安装器在线安装即可那么需要为没有联网的电脑安装MSVC时就需要采用下载离线安装包离线安装的方法 MSVC安装器下载地址 MSVC2019 https visu
MTCNN代码解读

首先了解MTCNN算法理论基础正如上图所示该MTCNN由3个网络结构组成 P Net R Net O Net Proposal Network P Net 该网络结构主要获得了人脸区域的候选窗口和边界框的回归向量并用该边界框做回归
Apache和Nginx虚拟机的配置方法+跨域知识点整理

Apache的配置 ip 创建虚拟主机目录新建测试页面修改主配置文件 root hya vim etc httpd conf httpd conf 在主配置文件的最下面添加
Vue3优雅地监听localStorage变化

目录前言为什么要这样做思路实现实现中介者模式重写localStorage 实现useStorage hook 测试使用localStorage 监听localStorage变化结果前言最近在研究框架也仔细用了Vue3一
搜索引擎使用技巧详解

说到搜索这可能是我们每个网民每天都要用到的操作这个操作看起来很简单一般用户都是想搜什么就输入什么然后一按搜索就直接开始这是最简单最快速的方法但可能并不是最有效的方法要想搜索结果最合乎你的意愿 IT 之家建议你掌握如下 8 个技
第十三课，深度测试

开启深度测试 glEnable GL DEPTH TEST 清除深度缓存 glClear GL COLOR BUFFER BIT GL DEPTH BUFFER BIT 深度测试函数 OpenGL允许我们禁用深度缓冲的写入只需要设置它的深
xshell无法连接vmware虚拟机

一问题描述本机使用Xshell无法连接VMware中的虚拟机并且从本机也无法ping通虚拟机虚拟机也无法ping通本机物理机二环境场景物理机 windows10系统 Xshell 6 VMware Workstation 1
linux 下的 iptables/ netfilter 防火墙深度理解前篇

一概述 iptables 其实不是真正的防火墙我们可以把它理解为一个客户端代理用户通过iptables 这个代理将用户的安全设置执行到对应的安全框架中这个安全框架才是真正的防火墙这个框架的名称叫做netfilter 二五链
服务器虚拟化导出快照,ESXi5 PACS服务器虚拟化系统快照数据恢复

杭州某国有企业一台ESXi5 1 虚拟化系统中运行一重要的PACS服务的虚拟机因为之前做了快照管理员在误还原快照后数据回到3个月前数据很重要管理员在尝试多种方式后也无法补救数据后通过集成商介绍联系到了北京安数云和科技北京
sklearn K近邻KNeighborsClassifier参数详解

原文网址 https scikit learn org stable modules generated sklearn neighbors KNeighborsClassifier html class sklearn neighbors

sklearn K近邻KNeighborsClassifier参数详解

sklearn K近邻KNeighborsClassifier参数详解 的相关文章

随机推荐

热门标签

sklearn K近邻KNeighborsClassifier参数详解的相关文章