如何在 python 中使用 kNN 动态时间扭曲

2024-01-04

我有一个带有两个标签的时间序列数据集（0 and 1）。我在用动态时间扭曲 (DTW)作为使用 k 最近邻 (kNN) 进行分类的相似性度量，如这两篇精彩的博客文章中所述：

https://nbviewer.jupyter.org/github/markdregan/K-Nearest-Neighbors-with-Dynamic-Time-Warping/blob/master/K_Nearest_Neighbor_Dynamic_Time_Warping.ipynb https://nbviewer.jupyter.org/github/markdregan/K-Nearest-Neighbors-with-Dynamic-Time-Warping/blob/master/K_Nearest_Neighbor_Dynamic_Time_Warping.ipynb

http://alexminnaar.com/2014/04/16/Time-Series-Classification-and-Clustering-with-Python.html http://alexminnaar.com/2014/04/16/Time-Series-Classification-and-Clustering-with-Python.html

Arguments
---------
n_neighbors : int, optional (default = 5)
    Number of neighbors to use by default for KNN

max_warping_window : int, optional (default = infinity)
    Maximum warping window allowed by the DTW dynamic
    programming function

subsample_step : int, optional (default = 1)
    Step size for the timeseries array. By setting subsample_step = 2,
    the timeseries length will be reduced by 50% because every second
    item is skipped. Implemented by x[:, ::subsample_step]
"""

def __init__(self, n_neighbors=5, max_warping_window=10000, subsample_step=1):
    self.n_neighbors = n_neighbors
    self.max_warping_window = max_warping_window
    self.subsample_step = subsample_step

def fit(self, x, l):
    """Fit the model using x as training data and l as class labels

    Arguments
    ---------
    x : array of shape [n_samples, n_timepoints]
        Training data set for input into KNN classifer

    l : array of shape [n_samples]
        Training labels for input into KNN classifier
    """

    self.x = x
    self.l = l

def _dtw_distance(self, ts_a, ts_b, d = lambda x,y: abs(x-y)):
    """Returns the DTW similarity distance between two 2-D
    timeseries numpy arrays.

    Arguments
    ---------
    ts_a, ts_b : array of shape [n_samples, n_timepoints]
        Two arrays containing n_samples of timeseries data
        whose DTW distance between each sample of A and B
        will be compared

    d : DistanceMetric object (default = abs(x-y))
        the distance measure used for A_i - B_j in the
        DTW dynamic programming function

    Returns
    -------
    DTW distance between A and B
    """

    # Create cost matrix via broadcasting with large int
    ts_a, ts_b = np.array(ts_a), np.array(ts_b)
    M, N = len(ts_a), len(ts_b)
    cost = sys.maxint * np.ones((M, N))

    # Initialize the first row and column
    cost[0, 0] = d(ts_a[0], ts_b[0])
    for i in xrange(1, M):
        cost[i, 0] = cost[i-1, 0] + d(ts_a[i], ts_b[0])

    for j in xrange(1, N):
        cost[0, j] = cost[0, j-1] + d(ts_a[0], ts_b[j])

    # Populate rest of cost matrix within window
    for i in xrange(1, M):
        for j in xrange(max(1, i - self.max_warping_window),
                        min(N, i + self.max_warping_window)):
            choices = cost[i - 1, j - 1], cost[i, j-1], cost[i-1, j]
            cost[i, j] = min(choices) + d(ts_a[i], ts_b[j])

    # Return DTW distance given window 
    return cost[-1, -1]

def _dist_matrix(self, x, y):
    """Computes the M x N distance matrix between the training
    dataset and testing dataset (y) using the DTW distance measure

    Arguments
    ---------
    x : array of shape [n_samples, n_timepoints]

    y : array of shape [n_samples, n_timepoints]

    Returns
    -------
    Distance matrix between each item of x and y with
        shape [training_n_samples, testing_n_samples]
    """

    # Compute the distance matrix        
    dm_count = 0

    # Compute condensed distance matrix (upper triangle) of pairwise dtw distances
    # when x and y are the same array
    if(np.array_equal(x, y)):
        x_s = np.shape(x)
        dm = np.zeros((x_s[0] * (x_s[0] - 1)) // 2, dtype=np.double)

        p = ProgressBar(shape(dm)[0])

        for i in xrange(0, x_s[0] - 1):
            for j in xrange(i + 1, x_s[0]):
                dm[dm_count] = self._dtw_distance(x[i, ::self.subsample_step],
                                                  y[j, ::self.subsample_step])

                dm_count += 1
                p.animate(dm_count)

        # Convert to squareform
        dm = squareform(dm)
        return dm

    # Compute full distance matrix of dtw distnces between x and y
    else:
        x_s = np.shape(x)
        y_s = np.shape(y)
        dm = np.zeros((x_s[0], y_s[0])) 
        dm_size = x_s[0]*y_s[0]

        p = ProgressBar(dm_size)

        for i in xrange(0, x_s[0]):
            for j in xrange(0, y_s[0]):
                dm[i, j] = self._dtw_distance(x[i, ::self.subsample_step],
                                              y[j, ::self.subsample_step])
                # Update progress bar
                dm_count += 1
                p.animate(dm_count)

        return dm

def predict(self, x):
    """Predict the class labels or probability estimates for 
    the provided data

    Arguments
    ---------
      x : array of shape [n_samples, n_timepoints]
          Array containing the testing data set to be classified

    Returns
    -------
      2 arrays representing:
          (1) the predicted class labels 
          (2) the knn label count probability
    """

    dm = self._dist_matrix(x, self.x)

    # Identify the k nearest neighbors
    knn_idx = dm.argsort()[:, :self.n_neighbors]

    # Identify k nearest labels
    knn_labels = self.l[knn_idx]

    # Model Label
    mode_data = mode(knn_labels, axis=1)
    mode_label = mode_data[0]
    mode_proba = mode_data[1]/self.n_neighbors

    return mode_label.ravel(), mode_proba.ravel()

然而，对于 kNN 分类，这两篇文章使用他们自己的 kNN 算法。

我想使用 sklearn 的选项，例如gridsearchcv在我的分类中。因此，我想知道如何将动态时间规整 (DTW) 与 sklearn kNN 结合使用。

注：我不限于sklearn也很高兴在其他图书馆收到答案

如果需要，我很乐意提供更多详细信息。

您可以对 KNN 使用自定义指标。因此，您只需要自己实现 DTW（或使用/改编 python 中任何现有的 DTW 实现）[此代码的要点] https://gist.github.com/nikolasrieble/8bd3a83e14c0b2fa66bfa2ddd8828717.

import numpy as np
from scipy.spatial import distance
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report

#toy dataset 
X = np.random.random((100,10))
y = np.random.randint(0,2, (100))
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

#custom metric
def DTW(a, b):   
    an = a.size
    bn = b.size
    pointwise_distance = distance.cdist(a.reshape(-1,1),b.reshape(-1,1))
    cumdist = np.matrix(np.ones((an+1,bn+1)) * np.inf)
    cumdist[0,0] = 0

    for ai in range(an):
        for bi in range(bn):
            minimum_cost = np.min([cumdist[ai, bi+1],
                                   cumdist[ai+1, bi],
                                   cumdist[ai, bi]])
            cumdist[ai+1, bi+1] = pointwise_distance[ai,bi] + minimum_cost

    return cumdist[an, bn]

#train
parameters = {'n_neighbors':[2, 4, 8]}
clf = GridSearchCV(KNeighborsClassifier(metric=DTW), parameters, cv=3, verbose=1)
clf.fit(X_train, y_train)



#evaluate
y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))

哪个产量

Fitting 3 folds for each of 3 candidates, totalling 9 fits        

[Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed:   29.0s finished

                         precision    recall  f1-score   support

                      0       0.57      0.89      0.70        18
                      1       0.60      0.20      0.30        15

            avg / total       0.58      0.58      0.52        33

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

python

scikitlearn

TimeSeries

Classification

KNN

如何在 python 中使用 kNN 动态时间扭曲的相关文章

PyList_SetItem 与 PyList_SETITEM

据我所知 PyList SetItem 和 PyList SETITEM 之间的区别在于 PyList SetItem 会降低它覆盖的列表项的引用计数而 PyList SETITEM 不会我有什么理由不应该一直使用 PyList Set
具有多处理功能的 Python 代码无法在 Windows 上运行

以下简单的绝对初学者代码在 Ubuntu 14 04 Python 2 7 6 和 Cygwin Python 2 7 8 上运行 100 但在 Windows 64 位 Python 2 7 8 上挂起我使用另一个片段观察到了同样的情况
Python Nose 导入错误

我似乎无法理解鼻子测试框架 https nose readthedocs org en latest 识别文件结构中测试脚本下方的模块我已经设置了演示该问题的最简单的示例下面我会解释一下这是包文件结构 init py foo py t
Python 遍历目录树的方法是什么？

我觉得分配文件和文件夹并执行 item 部分有点黑客有什么建议么我正在使用Python 3 2 from os import from os path import def dir contents path contents list
将 API 数据存储到 DataFrame 中

我正在运行 Python 脚本来从 Interactive Brokers API 收集金融市场数据连接到API后终端打印出请求的历史数据如何将数据保存到数据帧中而不是在终端中流式传输 from ibapi wrapper impor
为什么导入 pdb 时出现此错误？ “模块”对象没有属性“ascii_letters”

尝试调试我的代码我正在导入库pdb import sys from subprocess import check call import pdb functions if name main Code 我收到此错误 File reg p
conda 无法从 yml 创建环境

我尝试运行下面的代码来从 YAML 文件创建虚拟 Python 环境我在 Ubuntu 服务器上的命令行中运行代码虚拟环境名为 py36 当我运行下面的代码时我收到下面的消息环境也没有被创建这个问题是因为我有几个必须使用 pip
如何在 ReportLab 段落中插入回车符？

有没有办法在 ReportLab 的段落中插入回车符我试图将 n 连接到我的段落字符串但这不起作用 Title Paragraph Title n Page myStyle 我想要这样做因为我将名称放入单元格中并且想要控制单元格中的
如何找到列表S的所有分区为k个子集（可以为空）？

我有一个唯一元素列表比方说 1 2 我想将其拆分为 k 2 个子列表现在我想要所有可能的子列表 1 2 1 2 2 1 1 2 我想分成 1 1 2 我怎样才能用 Python 3 做到这一点更新我的目标是获取 N 个唯一数字列表的
Python 正则表达式部分匹配或“hitEnd”

我正在编写一个扫描器因此我将任意字符串与正则表达式规则列表进行匹配如果我可以模拟 Java hitEnd 功能不仅知道正则表达式何时不匹配还知道何时匹配这将非常有用 can t匹配当正则表达式匹配器在决定拒绝输入之前到达输入末尾
在Python中创建一个新表

我正在尝试从数控机床中提取数据事件每毫秒发生一次我需要过滤掉一些用管道分隔的变量分隔符 PuTTy exe 程序生成的日志文件我尝试阅读熊猫但列不在同一位置 df pd read table data log sep 日志文件的一
如何对这个 Flask 应用程序进行单元测试？

我有一个 Flask 应用程序它使用 Flask Restless 来提供 API 我刚刚写了一些身份验证来检查如果消费者主机被识别该请求包含一个哈希值通过加密 POST 的请求内容和 GET 的 URL 以及秘密 API 密钥来计
在 matplotlib 中使用 yscale('log') 时缺少误差线

在某些情况下当使用对数刻度时 matplotlib 会错误地显示带有误差条的图假设这些数据例如在 pylab 内 s 19 0 20 0 21 0 22 0 24 0 v 36 5 66 814250000000001 130 177
Python正则表达式从字符串中获取浮点数

我正在使用正则表达式来解析字符串中的浮点数 re findall a zA Z d d t 是我使用的代码这段代码有问题如果数字和任何字符之间没有空格则不会解析该数字例如 0 1 2 3 4 5 6 7 8 9 的预期输出为 0 1
一起使用 Flask 和 Tornado？

我是以下的忠实粉丝Flask 部分是因为它很简单部分是因为它有很多扩展 http flask pocoo org extensions 然而 Flask 是为了在 WSGI 环境中使用而设计的而 WSGI 不是非阻塞的所以我相信它
如何检查列表是否为空？

这个问题的答案是社区努力 help privileges edit community wiki 编辑现有答案以改进这篇文章目前不接受新的答案或互动例如如果通过以下内容 a 我如何检查是否a是空的 if not a print Lis
如何强制 Y 轴仅使用整数

我正在使用 matplotlib pyplot 模块绘制直方图我想知道如何强制 y 轴标签仅显示整数例如 0 1 2 3 等而不显示小数例如 0 0 5 1 1 5 2 等我正在查看指导说明并怀疑答案就在附近matplotlib
Jupyter Notebook：没有名为 pandas 的模块

我搜索了其他问题但没有找到任何有帮助的内容大多数只是建议您使用 conda 或 pip 安装 pandas 在我的 jupyter 笔记本中我试图导入 pandas import pandas as pd 但我收到以下错误 Modul
为什么 bot.get_channel() 会产生 NoneType？

我正在制作一个 Discord 机器人来处理公告命令当使用该命令时我希望机器人在特定通道中发送一条消息并向用户发送一条消息以表明该命令已发送但是我无法将消息发送到频道我尝试了这段代码 import discord import
在Python中从日期时间中减去秒

我有一个 int 变量它实际上是秒让我们调用这个秒数X 我需要得到当前日期和时间以日期时间格式减去的结果X秒 Example If X是 65 当前日期是2014 06 03 15 45 00 那么我需要得到结果2014 06 03

随机推荐

LinkedList.iterator() 返回什么对象？

将它们视为一个对象 Iterator
如何使用 std::copy 将 constexpr 数组复制到另一个 constexpr 数组？

在下面的代码中我创建了一个长度为 6 的数组并在前 3 个元素中使用 1 2 和 3 对其进行初始化然后我将前 3 个元素复制到后 3 个元素然后我按顺序打印所有元素 std array
Firefox 错误：无法检查输入，因为该模式不是有效的正则表达式：正则表达式中的身份转义无效

我正在使用正则表达式模式匹配进行 HTML5 表单验证最新版本的 Firefox 给我一个错误我只是在 Firefox 46 中才开始看到这一点我认为这在早期的 Firefox 版本中不是问题无法检查
Android - Spinner，onItemSelected(...) 未被调用

这是代码 import java io BufferedReader import java io IOException import java io InputStream import java io InputStreamReade
asp.net 中的会话、缓存和配置文件有什么区别

我们在asp net webform项目中经常使用session cache和profile 我们经常在 ASP NET WebForm 项目中将数据存储在会话缓存和配置文件中但我想知道什么时候应该将数据存储在会话中或者什么时候应该存
如何将代码库从 Python2 迁移到 Python3？ [关闭]

Closed 这个问题需要多问focused help closed questions 目前不接受答案我正在考虑将我的 Python2 x 代码库迁移到 python3 x 我的期望是做以下事情将Python2语法迁移到Python3
图形编程的 GLUT 已经死了吗？ [关闭]

Closed 这个问题是基于意见的 help closed questions 目前不接受答案阅读 Ubuntu 论坛上有关以下内容的讨论后GLUT 与 FreeGLUT http ubuntuforums org showthread
仅针对某些文件/提交的拉取请求

我有一个来自的存储库GitHub http www github com 对其进行了一些修改然而在某个提交中一些文件被更改我想提交拉取请求而将其他修改的文件排除在请求之外拉取请求是否合并all提交还是我需要做一些特殊的事情来隔
我如何在 php 数组中传递一个空键

如果没有这样的键我如何获取数组内容 products 0 这会让我部分到达那里但我如何过去 0 gt Array gt Array 0 gt Array product name gt stuff i need to get to 这很
Compact Framework 最佳实践：构建 GUI

我正在维护一个使用 NET Framework 构建的 Windows CE 应用程序该应用程序大约有 45 个表单有 5 个部分可实现您想要的功能该应用程序是 100 全屏的重要的是它不能最小化由于表单如此之多因此很难跟踪
GO 在 SQL Server Management Studio 和 Transact SQL 中有何用途？

当我使用右键单击脚本为菜单创建查询时 SQL Server Management Studio 总是插入 GO 命令为什么 GO实际上是做什么的 It is a batch terminator you can however cha
Winforms ListView MouseUp 事件多次触发

在我的 NET 4 5 Winforms 应用程序中当我从该事件打开文件时 ListView 控件的 MouseUp 事件会触发多次如下所示 private void ListView1 MouseUp object sender Mo
隐藏的可见性会删除表格中的背景颜色[重复]

这个问题在这里已经有答案了我正在构建一个应用程序并且我有一个动态创建的表表行创建如下 tr style background color 71aa9a td td td td tr tr 还有一些td s在创建过程中被隐藏出现的问题
如何让机器人在discord.py 中编辑自己的消息

有没有办法让机器人编辑自己的消息我试图寻找答案但找不到答案这将通过代码完成您需要以某种方式在您的机器人程序中执行它例如为其创建一个执行它的命令然后您可以删除它获取消息对象这可以通过首先获取通道对象然后从中获取消息来完成基
ComboBox 未更新所选项目上的数据绑定已更改（WinForms）

我有一个绑定到数据源的组合框但在控件失去焦点之前它不会更新绑定当所选项目发生更改时如何更新绑定在下面的屏幕截图中我希望标签立即更新以反映新的选择一些代码 public enum MyEnum First Second publi
字符串“sizeof”的意外结果

为什么 sizeof 在以下情况下会打印不同的值 printf d sizeof ab print 3 char t ab printf d sizeof t print 4 在第一种情况下我有 2 个角色不应该sizeof打印2 因为
Yii2 GridView 有条件隐藏列

我在 Yii2 GridView 小部件中显示一些列执行人员名称是其中之一但它应该仅在主管登录时显示而不是在执行人员登录时显示当我将可见值硬编码为零时它不会显示如下 label gt Executive Name attribu
加载React组件时未定义gapi

我正在尝试集成 Google 登录 link https developers google com identity sign in web sign in 使用反应我发现一个问题过去已经解决了这个问题使用带有 React 2 的 go
如何在提取元素时跳过“#”号而不循环？

我想从这个数据集中获取一个新的data frame 但是有些行之间有一些带有的描述有些行包含符号我可以在条件 substr x 1 下使用 for 循环 1 和 gsub 使用正则表达式来获得我需要的结果我的问题是我是否可以在没有
如何在 python 中使用 kNN 动态时间扭曲

我有一个带有两个标签的时间序列数据集 0 and 1 我在用动态时间扭曲 DTW 作为使用 k 最近邻 kNN 进行分类的相似性度量如这两篇精彩的博客文章中所述 https nbviewer jupyter org github mark

如何在 python 中使用 kNN 动态时间扭曲

如何在 python 中使用 kNN 动态时间扭曲 的相关文章

随机推荐

热门标签

如何在 python 中使用 kNN 动态时间扭曲的相关文章