sklearn 管道拟合：AttributeError：未找到较低值

2024-01-07

我想在 sklearn 中使用管道，如下所示：

corpus = load_files('corpus/train')

stop_words = [x for x in open('stopwords.txt', 'r').read().split('\n')]  # Uppercase!

countvec = CountVectorizer(stop_words=stop_words, ngram_range=(1, 2))

X_train, X_test, y_train, y_test = train_test_split(corpus.data, corpus.target, test_size=0.9,
                                                    random_state=0)
x_train_counts = countvec.fit_transform(X_train)
x_test_counts = countvec.transform(X_test)

k_fold = KFold(n=len(corpus.data), n_folds=6)
confusion = np.array([[0, 0], [0, 0]])

pipeline = Pipeline([
    ('vectorizer',  CountVectorizer(stop_words=stop_words, ngram_range=(1, 2))),
    ('classifier',  MultinomialNB()) ])

for train_indices, test_indices in k_fold:

    pipeline.fit(x_train_counts, y_train)
    predictions = pipeline.predict(x_test_counts)

但是，我收到此错误：

AttributeError: lower not found

我看过这个帖子：

属性错误：未找到下层；在 scikit-learn 中使用带有 CountVectorizer 的 Pipeline https://stackoverflow.com/questions/33605946/attributeerror-lower-not-found-using-a-pipeline-with-a-countvectorizer-in-scik

但我将字节列表传递给矢量化器，所以这不应该是问题。

EDIT

corpus = load_files('corpus')

stop_words = [x for x in open('stopwords.txt', 'r').read().split('\n')]

X_train, X_test, y_train, y_test = train_test_split(corpus.data, corpus.target, test_size=0.5,
                                                    random_state=0)

k_fold = KFold(n=len(corpus.data), n_folds=6)
confusion = np.array([[0, 0], [0, 0]])

pipeline = Pipeline([
    ('vectorizer', CountVectorizer(stop_words=stop_words, ngram_range=(1, 2))),
    ('classifier', MultinomialNB())])

for train_indices, test_indices in k_fold:
    pipeline.fit(X_train[train_indices], y_train[train_indices])
    predictions = pipeline.predict(X_test[test_indices])

现在我收到错误：

TypeError: only integer arrays with one element can be converted to an index

2ND EDIT

corpus = load_files('corpus')

stop_words = [y for x in open('stopwords.txt', 'r').read().split('\n') for y in (x, x.title())]

k_fold = KFold(n=len(corpus.data), n_folds=6)
confusion = np.array([[0, 0], [0, 0]])

pipeline = Pipeline([
    ('vectorizer', CountVectorizer(stop_words=stop_words, ngram_range=(1, 2))),
    ('classifier', MultinomialNB())])

for train_indices, test_indices in k_fold:
    pipeline.fit(corpus.data, corpus.target)

您没有正确使用管道。您不需要传递矢量化的数据，其想法是管道对数据进行矢量化。

# This is done by the pipeline
# x_train_counts = countvec.fit_transform(X_train)
# x_test_counts = countvec.transform(X_test)

k_fold = KFold(n=len(corpus.data), n_folds=6)
confusion = np.array([[0, 0], [0, 0]])

pipeline = Pipeline([
    ('vectorizer',  CountVectorizer(stop_words=stop_words, ngram_range=(1, 2))),
    ('classifier',  MultinomialNB()) ])

# also you are not using the indices...
for train_indices, test_indices in k_fold:

    pipeline.fit(corpus.data[train_indices], corpus.target[train_indices])
    predictions = pipeline.predict(corpus.data[test_indices])

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

python

scikitlearn

pipeline

sklearn 管道拟合：AttributeError：未找到较低值的相关文章

如何替换 Pandas Dataframe 中不在列表中的所有值？ [复制]

这个问题在这里已经有答案了我有一个值列表如何替换 Dataframe 列中不在给定值列表中的所有值例如 gt gt gt df pd DataFrame D ND D garbage columns S gt gt gt df S 0
Pandas set_levels，如何避免标签排序？

我使用时遇到问题set levels多索引 from io import StringIO txt Name Height Age Metres A 1 25 B 95 1 df pd read csv StringIO txt heade
为什么 dataclasses.astuple 返回类属性的深层副本？

在下面的代码中astuple函数正在执行数据类的类属性的深层复制为什么它不能产生与函数相同的结果my tuple import copy import dataclasses dataclasses dataclass class Dem
在 Celery 任务中调用 Google Cloud API 永远不会返回

我正在尝试拨打外部电话Google Cloud Natural Language API从一个内Celery任务使用google cloud python包裹问题是对 API 的调用永远不会返回挂起 celery task def g
VSCode Settings.json 丢失

我正在遵循教程并尝试将 vscode 指向我为 Scrapy 设置的虚拟工作区但是当我在 VSCode 中打开设置时工作区设置选项卡不在用户设置选项卡旁边我还尝试通过以下方式手动转到文件 APPDATA Code User s
从Django中具有外键关系的两个表中检索数据？ [复制]

这个问题在这里已经有答案了 This is my models py file from django db import models class Author models Model first name models CharFie
python multiprocessing 设置生成进程等待

是否可以生成一些进程并将生成进程设置为等待生成的进程完成下面是我用过的一个例子 import multiprocessing import time import sys def daemon p multiprocessing curr
Pycharm 在 os.path 连接上出现“未解析的引用”

将pycharm升级到2018 1 并将python升级到3 6 5后 pycharm报告未解析的引用 join 最新版本的 pycharm 不会显示以下行的任何警告 from os path import join expanduser
打印包含字符串和其他 2 个变量的变量

var a 8 var b 3 var c hello my name is var a and var b bye print var c 当我运行程序时 var c 会像这样打印出来 hello my name is 8 and 3 b
为什么一旦我离开内置的运行服务器，Django 就无法找到我的管理媒体文件？

当我使用内置的简单服务器时一切正常管理界面很漂亮 python manage py runserver 但是当我尝试使用 wsgi 服务器为我的应用程序提供服务时django core handlers wsgi WSGIHandle
Java 和 Python 可以在同一个应用程序中共存吗？

我需要一个 Java 实例直接从 Python 实例数据存储中获取数据我不知道这是否可能数据存储是否透明唯一或者每个实例如果它们确实可以共存都有其单独的数据存储总结一下 Java 应用程序如何从 Python 应用程序的数据存
使用 Python Oauthlib 通过服务帐户验证 Google API

我不想使用适用于 Python 的 Google API 客户端库但仍想使用 Python 访问 Google APIOauthlib https github com idan oauthlib 创建服务帐户后谷歌开发者控制台 http
导入错误：没有名为flask.ext.login的模块

我的flask login 模块有问题我已经成功安装了flask login模块另外从命令提示符我可以轻松运行此脚本不会出现错误 Python 2 7 r27 82525 Jul 4 2010 07 43 08 MSC v 1500
如何使用 Python 3 检查目录是否包含文件

我到处寻找这个答案但找不到我正在尝试编写一个脚本来搜索特定的子文件夹然后检查它是否包含任何文件如果包含则写出该文件夹的路径我已经弄清楚了子文件夹搜索部分但检查文件却难倒了我我发现了有关如何检查文件夹是否为空的多个建议并且我尝
Spider 必须返回 Request、BaseItem、dict 或 None，已“设置”

我正在尝试从以下位置下载所有产品的图像我的蜘蛛看起来像 from shopclues items import ImgData import scrapy class multipleImages scrapy Spider name m
如何以正确的方式为独立的Python应用程序制作setup.py？

我读过几个类似的主题但还没有成功我觉得我错过或误解了一些基本的事情这就是我失败的原因我有一个用 python 编写的应用程序我想在标准 setup py 的帮助下进行部署由于功能复杂它由不同的 python 模块组成但单独
重新分配唯一值 - pandas DataFrame

我在尝试着assign unique值在pandas df给特定的个人 For the df below Area and Place 会一起弥补unique不同的价值观jobs 这些值将分配给个人总体目标是使用尽可能少的个人诀窍在于这
Firebase Firestore：获取文档的生成 ID (Python)

我可以创建一个新文档带有自动生成的 ID 并存储对其的引用如下所示 my data key value doc ref db collection u campaigns add my data 我可以像这样访问数据本身 print d
如何将 Django 中的权限添加到模型并使用 shell 进行测试

我在模型中添加了 Meta 类并同步了数据库然后在 shell 中创建了一个对象它返回 false 所以我真的无法理解错误在哪里或者缺少什么是否在其他文件中可能存在某种配置 class Employer User Employer in
如何在Python脚本中从youtube-dl中提取文件大小？

我是 python 编程新手我想在下载之前提取视频音频大小任何 YouTube 视频 gt gt gt from youtube dl import YoutubeDL gt gt gt url https www youtube c

随机推荐

使用 ASP MVC 下载并显示私有 Azure Blob

我将 ASP MVC 5 Razor 与 Microsoft Azure Blob 存储结合使用我可以使用 MVC 成功地将文档和图像上传到 Blob 存储但我很难找到一些如何下载和显示文件的 MVC 示例如果 blob 存储为公共文
ConstraintLayout 不会省略 TextView 中的长文本

我有一个TextView在图像的右侧我试图在图像旁边放置一些长文本但该文本应通过在末尾添加自动结束然而这是行不通的我使用这个布局
Google 地图 API - geocode() 不返回纬度和经度

我试图使用以下代码通过地址获取纬度和经度 function initialize directionsDisplay new google maps DirectionsRenderer geocoder new google maps G
如何避免在factory_girl中循环创建关联模型？

我有一个应用程序用户可以使用多种服务登录例如谷歌脸书推特等为了促进这一点我有一个基础User模型哪个has many Identity记录 Each Identity记录有一个provider字段例如 Google Faceb
实现幻灯片的下一个和后退按钮

我正在尝试制作一个 php 幻灯片我几乎完成了我只需要实现下一个和后退按钮我认为这很容易但显然你不能在 php 中增加索引 sql SELECT pic url FROM pic info result conn gt query
无法在 Angular 2 应用程序内提交 HTML 表单

我试图在我的 Angular 2 beta2 应用程序中包含静态 HTML 表单但当我点击提交按钮时它不会执行任何操作这是我使用的 HTML
Jenkins代理407错误

我在跑詹金斯 CI在使用代理访问互联网的公司网络内我尝试在中配置代理详细信息插件 gt 高级但即使凭据是正确的是的我检查了很多次它也无法验证测试 URL 即使在http google com http google com并返回
是否可以按 3 个月的时间段对结果进行分组？

我尝试从当月开始按 3 个月的时间段对结果进行分组如下所示 row1 15 This should contain November September and October row2 25 This should contain Au
使用 Enterprise Library 5 进行数据库日志记录

有人知道如何使用 EL 5 0 实现数据库异常日志记录吗 Thanks 您将需要针对数据库运行脚本以便它为您创建特定的表结构此外还将创建一个存储过程您必须在跟踪侦听器的配置部分中引用该存储过程您应该能够在此处找到此脚本文件 C E
ExpressJS 不会在 app.use 中触发下一个路由

我在使用 app use 时遇到 ExpressJS 触发错误路由的问题这是index js 文件中的代码我在其中组合了所有路由 const app express Router express Router app use api v
Java Streams 是迭代器设计模式的实现吗？ [关闭]

Closed 这个问题是基于意见的 help closed questions 目前不接受答案那么正如标题所问的那样 Java Streams 可以被视为迭代器模式的实现吗我们是否可以认为 stream 调用 Collection 会
没有得到应用内结算的响应

我正在我的应用程序中实现 Android 应用内计费但我没有从 Google play 获得 Json 响应 Override public void onClick DialogInterface dialog int which d
在 Google Apps 脚本中调用 Google 表格插件？

是否可以像调用 Apps 脚本中的函数一样调用 Google Sheets 插件我正在尝试自动化这样的几个步骤将数据从工作表 1 导入附加到主工作表 1 将数据从工作表 2 导入附加到主工作表 2 将主表 1 和 2 中的值合并附
如何在 SDL Tridion Anguilla 框架中从用户 ID 获取用户名和描述

我为 SDL Tridion 2011 SP1 编写了 GUI 扩展 GUI 由一个额外的功能区按钮和保存组件时触发的事件处理程序组成我的事件处理程序注册如下 PowerTools Commands ItemCommenting prot
否认和注销后是否可以看到输出到标准输出？

我已经对该程序执行了此操作 ctrl z 否认 h 1 背景1 然后注销我现在还能看到该程序输出到标准输出的内容吗 ctrl z the program bg so it wont die when you logoff screen r
R 中有符号 ODE 求解器吗？（ODE = 常微分方程）

问题 R 中有符号 ODE 求解器吗常微分方程常微分方程 https en wikipedia org wiki Ordinary differential equation 恐怕没有但让我向专家确认例如求解 gt 5x 6 2
模拟 DirectoryEntry 的“Properties”属性

我正在尝试对一些 Active Directory 代码进行单元测试与此问题中概述的几乎相同创建 DirectoryEntry 实例以供测试使用 https stackoverflow com questions 5966161 cre
我可以使用 java 开发 iPhone 应用程序吗？

是否可以使用 Java 为 iPhone 开发应用程序如果是这样它是否允许使用自定义 jar 文件 Thanks 最初的答复是否定的苹果协议曾经规定不允许使用解释语言其他语言时期现在这种情况已经改变有几种这样的解决方案 Cod
RuntimeWarning：pandasalign.py 中的 log10 中遇到除以零，来自“查询”的问题 - 原因/解决方案？

收到以下错误并且不知道这意味着什么如何避免我是否应该担心 C Users Nick Anaconda3 lib site packages pandas computation align py 98 ordm np log10 ab
sklearn 管道拟合：AttributeError：未找到较低值

我想在 sklearn 中使用管道如下所示 corpus load files corpus train stop words x for x in open stopwords txt r read split n Uppercase

sklearn 管道拟合：AttributeError：未找到较低值

sklearn 管道拟合：AttributeError：未找到较低值 的相关文章

随机推荐

热门标签

sklearn 管道拟合：AttributeError：未找到较低值的相关文章