PyTorch中FLOPs计算问题

2023-05-16

最近看了很多关于FLOPs计算的实现方法，也自己尝试了一些方法，发现最好用的还是PyTorch中的thop库（代码如下）：

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = 模型的名字().to(device)
inputs = torch.randn(1,3,512,1024)   ####(360,640)
inputs=inputs.to(device)
macs, params = profile(model,inputs=(inputs,))   ##verbose=False
print('The number of MACs is %s'%(macs/1e9))   ##### MB
print('The number of params is %s'%(params/1e6))   ##### MB

实现起来确实很简单，那么问题来了，这里面算出来的macs到底是MACs还是FLOPs呢？先说我自己探索得到的结论，这里计算出的macs其实就是FLOPs（每秒钟浮点运算次数），前提是：不计算卷积层的bias，原因如下：

自己手动计算ResNet18的FLOPs，对于512*1024*3的输入尺寸。

（1）ResNet18的代码：

class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, inplanes, planes, r=1, stride=1, downsample=None, norm_layer=nn.BatchNorm2d):
        super(BasicBlock, self).__init__()
        # Both self.conv1 and self.downsample layers downsample the input when stride != 1
        self.conv1 = conv3x3(inplanes, planes, stride)
        self.bn1 = norm_layer(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes, dilation=r)
        self.bn2 = norm_layer(planes)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)
        return out


class ResNet18(nn.Module):

    def __init__(self, block=BasicBlock, layers=[2,2,2,2], zero_init_residual=False, norm_layer=nn.BatchNorm2d):
        super(ResNet18, self).__init__()
        self.inplanes = 64
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = norm_layer(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0], r=2, norm_layer=norm_layer)
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2, r=2, norm_layer=norm_layer)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2, r=2, norm_layer=norm_layer)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2, r=2, norm_layer=norm_layer)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

        # Zero-initialize the last BN in each residual branch,
        # so that the residual branch starts with zeros, and each residual block behaves like an identity.
        # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
        if zero_init_residual:
            for m in self.modules():
                if isinstance(m, Bottleneck):
                    nn.init.constant_(m.bn3.weight, 0)
                elif isinstance(m, BasicBlock):
                    nn.init.constant_(m.bn2.weight, 0)

    def _make_layer(self, block, planes, blocks, stride=1, r=1, norm_layer=nn.BatchNorm2d):
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                conv1x1(self.inplanes, planes * block.expansion, stride),
                norm_layer(planes * block.expansion),
            )

        layers = []
        layers.append(block(self.inplanes, planes, r, stride, downsample))
        self.inplanes = planes * block.expansion
        for _ in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x_downsampling_8 = x       ###(h/8,128)
        x = self.layer3(x)
        x_downsampling_16 = x      ###(h/16,256)
        x = self.layer4(x)         ###(h/32,512)

        return x, x_downsampling_8, x_downsampling_16

（2）用profile函数计算得到的macs值为 19.0GB

（3）自己手动计算FLOPs ≈ 20GB

因此，在不统计卷积层bias计算次数的前提下，profile函数计算得到的macs值其实就是FLOPs。

（PS：个人理解，欢迎批评纠正）

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

Pytorch

FLOPs

计算问题

PyTorch中FLOPs计算问题的相关文章

尝试理解 Pytorch 的 LSTM 实现

我有一个包含 1000 个示例的数据集其中每个示例都有5特征 a b c d e 我想喂7LSTM 的示例以便它预测第 8 天的特征 a 阅读 nn LSTM 的 Pytorchs 文档我得出以下结论 input size 5 hid
在非单一维度 1 处，张量 a (2) 的大小必须与张量 b (39) 的大小匹配

这是我第一次从事文本分类工作我正在使用 CamemBert 进行二进制文本分类使用 fast bert 库该库主要受到 fastai 的启发当我运行下面的代码时 from fast bert data cls import Bert
从打包序列中获取每个序列的最后一项

我试图通过 GRU 放置打包和填充的序列并检索每个序列最后一项的输出当然我的意思不是 1项目但实际上是最后一个未填充的项目我们预先知道序列的长度因此应该很容易为每个序列提取length 1 item 我尝试了以下方法 impor
使 CUDA 内存不足

我正在尝试训练网络但我明白了我将批量大小设置为 300 并收到此错误但即使我将其减少到 100 我仍然收到此错误更令人沮丧的是在 1200 个图像上运行 10 epoch 大约需要 40 分钟有什么建议吗错了我怎样才能加快这
PyTorch LSTM：运行时错误：无效参数 0：张量的大小必须匹配，维度 0 除外。维度 1 为 1219 和 440

我有一个基本的 PyTorch LSTM import torch nn as nn import torch nn functional as F class BaselineLSTM nn Module def init self su
将 Keras (Tensorflow) 卷积神经网络转换为 PyTorch 卷积网络？

Keras 和 PyTorch 使用不同的参数进行填充 Keras 需要输入字符串而 PyTorch 使用数字有什么区别如何将一个转换为另一个哪些代码在任一框架中获得相同的结果 PyTorch 还采用参数 in channels o
Pytorch CUDA 错误：没有内核映像可用于在带有 cuda 11.1 的 RTX 3090 设备上执行

如果我运行以下命令 import torch import sys print A sys version print B torch version print C torch cuda is available print D torc
Pytorch GPU 使用率低

我正在尝试 pytorch 的例子https pytorch org tutorials beginner blitz cifar10 tutorial html https pytorch org tutorials beginner b
如何计算cifar10数据的平均值和标准差

Pytorch 使用以下值作为 cifar10 数据的平均值和标准差变换 Normalize 0 5 0 5 0 5 0 5 0 5 0 5 我需要理解计算背后的概念因为这些数据是 3 通道图像我不明白什么是相加的什么是除什么的等等
PyTorch 给出 cuda 运行时错误

我对我的代码做了一些小小的修改以便它不使用 DataParallel and DistributedDataParallel 代码如下 import argparse import os import shutil import time
Pytorch RuntimeError：“host_softmax”未针对“torch.cuda.LongTensor”实现

我正在使用 pytorch 来训练模型但是在计算交叉熵损失时我遇到了运行时错误 Traceback most recent call last File deparser py line 402 in
BertForSequenceClassification 是否在 CLS 向量上进行分类？

我正在使用抱脸变压器 https huggingface co transformers index html使用 PyTorch 打包和 BERT 我正在尝试进行 4 向情感分类并正在使用BertFor序列分类 https hugging
Google Colab 使用 Transformers 和 PyTorch 微调 BERT Base Case 时出现间歇性“RuntimeError: CUDA out of memory”错误

我正在运行以下代码来微调 Google Colab 中的 BERT Base Cased 模型有时代码第一次运行良好没有错误其他时候相同的代码使用相同的数据会导致 CUDA 内存不足错误以前重新启动运行时或退出笔记本返回笔
PyTorch DataLoader 对并行运行的批次使用相同的随机种子

有一个bug https tanelp github io posts a bug that plagues thousands of open source ml projects 在 PyTorch Numpy 中当并行加载批次时Da
Pytorch 中是否有一种方法可以以可反向传播的方式计算唯一值的数量？

给定以下张量这是网络的结果注意 grad fn tensor 121 241 125 1 108 238 125 121 13 117 121 229 161 13 0 202 161 121 121 0 121 121 242 125
如何屏蔽 PyTorch 权重参数中的权重？

我正在尝试在 PyTorch 中屏蔽强制为零特定权重值我试图掩盖的权重是这样定义的def init class LSTM MASK nn Module def init self options inp dim super LSTM
PyTorch 中的标签平滑

我正在建造一个ResNet 18分类模型为斯坦福汽车使用迁移学习的数据集我想实施标签平滑 https arxiv org pdf 1701 06548 pdf惩罚过度自信的预测并提高泛化能力 TensorFlow有一个简单的关键字参数Cr
将 Pytorch 模型 .pth 转换为 onnx 模型

我有一个预训练的模型其格式为 pth 扩展名我想将其转换为 Tensorflow protobuf 但我没有找到任何方法来做到这一点我见过 onnx 可以将模型从 pytorch 转换为 onnx 然后从 onnx 转换为 Tenso
从 torch.autograd.gradcheck 导入 zero_gradients

我想复制代码here https github com LTS4 DeepFool blob master Python deepfool py 并且我在 Google Colab 中运行时收到以下错误 ImportError 无法导入名称
RuntimeError: 预期所有张量都在同一设备上，但发现至少有两个设备，cpu 和 cuda:0！使用我的模型进行预测时

我使用变压器训练了一个序列分类模型 BertForSequenceClassification 我收到错误预计所有张量都在同一设备上但发现至少有两个设备 cpu 和 cuda 0 在方法wrapper index select中检查参

随机推荐

【论文写作】毕业论文该如何选择自己的导师？

01 不要只是看他的来头有多大 xff01 来头大的导师虽然招研究生 xff0c 直接带的可能性却很小 xff0c 通常是二级导师或博士生来带博士生肯定也有带的 xff0c 所以要好好向别的师兄师姐打听一下 xff01 比如 xff0c
C++学习笔记之数组使用注意事项(持续补充ing)

我是目录在不知道数组长度时 xff0c 想要声明数组数组名相当于指针常量 xff08 即 type const xff09 数据名在作为函数的参数时将失去其数据结构内涵在不知道数组长度时 xff0c 想要声明数组 xff08 例如想要
C++学习笔记之常量指针与指针常量

常量指针形式 xff1a const int pt 特点 xff1a 防止通过该指针修改指向的值指针的指向可以改 Int age 61 23 Const int pt 61 amp age 表明 pt 的值不能被修改 xff0c 或者说
Pytorch学习笔记之主体训练流程

目录数据读取部分DatasetDataLoader 模型训练优化器设置损失函数设置设备设置模型训练保存加载模型数据读取部分 pytorch官方文档链接 xff1a 这里 Dataset 数据类 xff0c 需要自己实现 xff0c 后
C++学习笔记之 lower_bound & upper_bound

用于查找有序序列中目标值的上下界使用时需要包含头文件 include lt algorithm gt 内部实现是二分查找 xff0c 时间复杂度为 O l o g n
Git 学习笔记

目录教程链接常用命令教程链接廖雪峰Git教程 Git官方教程常用命令工程准备 git init git clone 新增删除移动文件到暂存区 git add git rm git mv 查看工作区 git diff git s
对vector元素取址时出现的问题

一般不建议对vector里的元素进行取址 xff0c 除非vector已经填充完毕 xff0c 即size不再变化时才可取址因为如果vector未填充完毕时 xff0c 此时把某一元素地址赋给一个指针p xff0c 后续如果对vector
maskrcnn_benchmark使用过程错误记录

在使用maskrcnn的benchmark框架的时候训练突然报错 xff1a Non existent config key MODEL ROI BOX HEAD NUM CLASSES 39 之前还能正常运行的 xff0c 突然就报错 x
CMakeLists.txt书写规则记录

编写自己的CMakeLists txt 1 一个CMakeLists txt的基本内容2 项目包含多个文件或文件夹时添加方式3 添加链接库 1 一个CMakeLists txt的基本内容 span class token comment 编
1-FreeRTOS入门指南

本专栏是根据官方提供的文档进行FreeRTOS的各个功能函数的说明 xff0c 以及函数的使用本专栏不涉及动手操作 xff0c 只是对原理进行说明 xff0c FreeRTOS基础知识篇更新完成会对如何在开发板上进行上手实战操作这里不会
2-FreeRTOS编码标准、风格指南

1 编码标准 FreeRTOS源文件对所有端口通用 xff0c 但对端口层不通用符合MISRA编码标准指南使用pc lint和链接lint配置文件检查遵从性由于标准有很多页长 xff0c 并且可以从MISRA处以非常低的费用购买 x
4-FreeRTOS队列、互斥、信号量

1 队列队列 xff08 我对队列的理解就是上体育课 xff0c 排队这种 xff09 是任务之间通信的一种方式队列可以用于任务和任务之间或者中断和任务之间消息的接收与发送在多数情况下 xff0c 他们消息缓冲是按照FIFO xff0
QT教程demo之串口助手代码设计实现

关注WeChat Official Account 南山府嵌入式获取更多精彩我创建了一个群关注V号后加入因为这里不允许添加二维码代码 xff1a QT Pr 1 QT开发串口助手需要的基本文件在QT6开发串口助手时 xff0c 通常
自己动手写全套无人驾驶算法系列（四）机器人2D SLAM

自己动手写全套无人驾驶算法系列 xff08 四 xff09 机器人2D SLAM 目录一概述 1 1 系列整体概述二传感器层 2 1 轮式里程计 2 2 IMU 2 3 激光雷达 2 4 视觉VO 三建图层 3 1 静态二值贝叶斯
深入理解如何不费吹灰之力搭建一个无人驾驶车（一）2D-小车底盘部分

一搭建综述无人驾驶最新很流行 xff0c 但是很多人都觉得这东西蛮高大上的 xff0c 因为CSDN还没有一个完整的介绍无人驾驶车如何做的博文 xff0c 都很零散或者简略其实有了ROS xff0c 这东西一个小学生都可以搭 xff0
深入理解如何不费吹灰之力搭建一个无人驾驶车（五）2.5D-汽车自主部分（从无到有自己写一个主流无人驾驶框架如apollo）（CSDN独创）

五从无到有自己动手写个主流无人驾驶汽车框架 xff08 如apollo xff09 xff08 CSDN独创 xff09 注1 xff1a 必须先看完前四章再看这一章 xff0c 如果想看得轻松请看概率机器人与机器人学状态估计完再
ORB_SLAM系列总结

1 最早的特征点法 xff0c 并把定位与跟踪分为两个线程是PTAM Parallel Tracking and Mapping for Small AR Workspaces 可以说是特征点法SLAM的起源之一论文 xff1a http
vmware下ubuntu18.04 安装ROS Melodic+gazebo9+PX4并roslaunch运行+QGroundConrtol控制

官方推荐ubuntu18 04的对应ROS和gazebo版本为ROS Melodic 和 gazebo9 官方指导 xff1a ROS安装 xff1a http wiki ros org Installation Ubuntu gazebo
PX4自定义pkg包roslaunch

创建pkg 首先在Ros工程目录src下 xff0c 创建新的功能包ref catkin create pkg span class token operator lt span pkg name span class token oper
PyTorch中FLOPs计算问题

最近看了很多关于FLOPs计算的实现方法 xff0c 也自己尝试了一些方法 xff0c 发现最好用的还是PyTorch中的thop库 xff08 代码如下 xff09 xff1a device 61 torch device 34 cuda

PyTorch中FLOPs计算问题

PyTorch中FLOPs计算问题 的相关文章

随机推荐

热门标签

PyTorch中FLOPs计算问题的相关文章