VQ-VAE

2023-10-31

前言

之前总结了一篇VAE的，这次来个它的离散版本。
VAE(Variational Autoencoder)简单记录

论文: Neural Discrete Representation Learning

代码: https://gitee.com/mirrors_ritheshkumar95/pytorch-vqvae

v2: Generating Diverse High-Fidelity Images with VQ-VAE-2

原理

代码

这里选取生成模型之VQ-VAE 的代码。之所以粘贴过来是因为我想写一些笔记啥的，只用于学习用途…

class VectorQuantizer(nn.Module):
    """
    VQ-VAE layer: Input any tensor to be quantized. 
    Args:
        embedding_dim (int): the dimensionality of the tensors in the
          quantized space. Inputs to the modules must be in this format as well.
        num_embeddings (int): the number of vectors in the quantized space.
        commitment_cost (float): scalar which controls the weighting of the loss terms (see
          equation 4 in the paper - this variable is Beta).
    """
    def __init__(self, embedding_dim, num_embeddings, commitment_cost):
        super().__init__()
        self.embedding_dim = embedding_dim
        self.num_embeddings = num_embeddings
        self.commitment_cost = commitment_cost
        
        # initialize embeddings
        self.embeddings = nn.Embedding(self.num_embeddings, self.embedding_dim)
        
    def forward(self, x):
        # [B, C, H, W] -> [B, H, W, C]
        x = x.permute(0, 2, 3, 1).contiguous()
        # [B, H, W, C] -> [BHW, C]
        flat_x = x.reshape(-1, self.embedding_dim)
        
        encoding_indices = self.get_code_indices(flat_x)
        quantized = self.quantize(encoding_indices)
        quantized = quantized.view_as(x) # [B, H, W, C]
        
        if not self.training:
            quantized = quantized.permute(0, 3, 1, 2).contiguous()
            return quantized
        
        # embedding loss: move the embeddings towards the encoder's output
        q_latent_loss = F.mse_loss(quantized, x.detach())
        # commitment loss
        e_latent_loss = F.mse_loss(x, quantized.detach())
        loss = q_latent_loss + self.commitment_cost * e_latent_loss

        # Straight Through Estimator
        quantized = x + (quantized - x).detach()
        
        quantized = quantized.permute(0, 3, 1, 2).contiguous()
        return quantized, loss
    
    def get_code_indices(self, flat_x):
        # compute L2 distance
        distances = (
            torch.sum(flat_x ** 2, dim=1, keepdim=True) +
            torch.sum(self.embeddings.weight ** 2, dim=1) -
            2. * torch.matmul(flat_x, self.embeddings.weight.t())
        ) # [N, M]
        encoding_indices = torch.argmin(distances, dim=1) # [N,]
        return encoding_indices
    
    def quantize(self, encoding_indices):
        """Returns embedding tensor for a batch of indices."""
        return self.embeddings(encoding_indices)

class Encoder(nn.Module):
    """Encoder of VQ-VAE"""
    
    def __init__(self, in_dim=3, latent_dim=16):
        super().__init__()
        self.in_dim = in_dim
        self.latent_dim = latent_dim
        
        self.convs = nn.Sequential(
            nn.Conv2d(in_dim, 32, 3, stride=2, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(32, 64, 3, stride=2, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, latent_dim, 1),
        )
        
    def forward(self, x):
        return self.convs(x)

class Decoder(nn.Module):
    """Decoder of VQ-VAE"""
    
    def __init__(self, out_dim=1, latent_dim=16):
        super().__init__()
        self.out_dim = out_dim
        self.latent_dim = latent_dim
        
        self.convs = nn.Sequential(
            nn.ConvTranspose2d(latent_dim, 64, 3, stride=2, padding=1, output_padding=1),
            nn.ReLU(inplace=True),
            nn.ConvTranspose2d(64, 32, 3, stride=2, padding=1, output_padding=1),
            nn.ReLU(inplace=True),
            nn.ConvTranspose2d(32, out_dim, 3, padding=1),
        )
        
    def forward(self, x):
        return self.convs(x)

class VQVAE(nn.Module):
    """VQ-VAE"""
    
    def __init__(self, in_dim, embedding_dim, num_embeddings, data_variance, 
                 commitment_cost=0.25):
        super().__init__()
        self.in_dim = in_dim
        self.embedding_dim = embedding_dim
        self.num_embeddings = num_embeddings
        self.data_variance = data_variance
        
        self.encoder = Encoder(in_dim, embedding_dim)
        self.vq_layer = VectorQuantizer(embedding_dim, num_embeddings, commitment_cost)
        self.decoder = Decoder(in_dim, embedding_dim)
        
    def forward(self, x):
        z = self.encoder(x)
        if not self.training:
            e = self.vq_layer(z)
            x_recon = self.decoder(e)
            return e, x_recon
        
        e, e_q_loss = self.vq_layer(z)
        x_recon = self.decoder(e)
        
        recon_loss = F.mse_loss(x_recon, x) / self.data_variance
        
        return e_q_loss + recon_loss

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

图像生成模型

深度学习

Pytorch

VAE

VQVAE

VQ-VAE 的相关文章

在pytorch张量中过滤数据

我有一个张量X like 0 1 0 5 1 0 0 1 2 0 我想实现一个名为的函数filter positive 它可以将正数据过滤成新的张量并返回原始张量的索引例如 new tensor index filter positive
LSTM 错误：AttributeError：“tuple”对象没有属性“dim”

我有以下代码 import torch import torch nn as nn model nn Sequential nn LSTM 300 300 nn Linear 300 100 nn ReLU nn Linear 300 7
下载变压器模型以供离线使用

我有一个训练有素的 Transformer NER 模型我想在未连接到互联网的机器上使用它加载此类模型时当前会将缓存文件下载到 cache 文件夹要离线加载并运行模型需要将 cache 文件夹中的文件复制到离线机器上然而这些文
pytorch 中的 autograd 可以处理同一模块中层的重复使用吗？

我有一层layer in an nn Module并在一次中使用两次或多次forward步这个的输出layer稍后输入到相同的layer pytorch可以吗autograd正确计算该层权重的梯度 def forward x x self
PyTorch LSTM：运行时错误：无效参数 0：张量的大小必须匹配，维度 0 除外。维度 1 为 1219 和 440

我有一个基本的 PyTorch LSTM import torch nn as nn import torch nn functional as F class BaselineLSTM nn Module def init self su
BatchNorm 动量约定 PyTorch

Is the 批归一化动量约定 http pytorch org docs master modules torch nn modules batchnorm html 默认 0 1 与其他库一样正确例如Tensorflow默认情况下似乎
将 Keras (Tensorflow) 卷积神经网络转换为 PyTorch 卷积网络？

Keras 和 PyTorch 使用不同的参数进行填充 Keras 需要输入字符串而 PyTorch 使用数字有什么区别如何将一个转换为另一个哪些代码在任一框架中获得相同的结果 PyTorch 还采用参数 in channels o
如何使用pytorch构建多任务DNN，例如超过100个任务？

下面是使用 pytorch 为两个回归任务构建 DNN 的示例代码这forward函数返回两个输出 x1 x2 用于大量回归分类任务的网络怎么样例如 100 或 1000 个输出对所有输出例如 x1 x2 x100 进行硬编码绝对
PyTorch 中的交叉熵

交叉熵公式但为什么下面给出loss 0 7437代替loss 0 since 1 log 1 0 import torch import torch nn as nn from torch autograd import Variable
如何在 PyTorch 中对子集使用不同的数据增强

如何针对不同的情况使用不同的数据增强转换 Subset在 PyTorch 中吗例如 train test torch utils data random split dataset 80000 2000 train and test将具
Pytorch 与 joblib 的 autograd 问题

将 pytorch 的 autograd 与 joblib 混合似乎存在问题我需要并行获取大量样本的梯度 Joblib 与 pytorch 的其他方面配合良好但是与 autograd 混合时会出现错误我做了一个非常小的例子显示串行
ValueError：使用火炬张量时需要解压的值太多

对于神经网络项目我使用 Pytorch 并使用 EMNIST 数据集已经给出的代码加载到数据集中 train dataset dsets MNIST root data train True transform transforms T
PyTorch：如何批量进行推理（并行推理）

如何在PyTorch中批量进行推理如何并行进行推理以加快这部分代码的速度我从进行推理的标准方法开始 with torch no grad for inputs labels in dataloader predict inputs in
Pytorch RuntimeError：“host_softmax”未针对“torch.cuda.LongTensor”实现

我正在使用 pytorch 来训练模型但是在计算交叉熵损失时我遇到了运行时错误 Traceback most recent call last File deparser py line 402 in
在requirements.txt中包含.whl安装

如何将其包含在requirements txt 文件中对于Linux pip install http download pytorch org whl cu75 torch 0 1 12 post2 cp27 none linux x8
Pytorch - 推断线性层 in_features

我正在构建一个玩具模型来获取一些图像并进行分类我的模型看起来像 conv2d gt pool gt conv2d gt linear gt linear 我的问题是当我们创建模型时我们必须计算第一个线性层的大小in features基
Pytorch 中是否有一种方法可以以可反向传播的方式计算唯一值的数量？

给定以下张量这是网络的结果注意 grad fn tensor 121 241 125 1 108 238 125 121 13 117 121 229 161 13 0 202 161 121 121 0 121 121 242 125
Pytorch RuntimeError：张量 a (4) 的大小必须与非单维 0 处张量 b (3) 的大小匹配

我使用的代码来自here https www learnopencv com image classification using transfer learning in pytorch 训练模型来预测印刷样式编号0 to 9 idx t
PyTorch 中的后向函数

我对 pytorch 的后向功能有一些疑问我认为我没有得到正确的输出 import numpy as np import torch from torch autograd import Variable a Variable torch
Pytorch LSTM：计算交叉熵损失的目标维度

我一直在尝试在 Pytorch 中使用 LSTM LSTM 后跟自定义模型中的线性层但在计算损失时出现以下错误 Assertion cur target gt 0 cur target lt n classes failed 我用以下函数

随机推荐

The type org.springframework.dao.DataAccessException cannot be resolved. It is indirectly referenced

今天使用Spring Cloud Mybatis Plus3 x 搭建微服务项目时提示如下错误信息 The type org springframework dao DataAccessException cannot be resolv
vue-cli3搭建多入口应用项目搭建以及webpack配置

我们平时开发 vue项目的时候就有一种感觉就是 vue就像是专门为单页应用而诞生的因为人家的官方文档中也说了其实不是的因为vue在工程化开发的时候依赖于 webpack 而webpack是将所有的资源整合到一块后形成一个html文件
Python基础（三）_函数和代码复用

三函数和代码复用一函数的基本使用 1 函数的定义函数是一段具有特定功能的可重用的语句组用函数名来表示并通过函数名进行功能调用函数也可以看作是一段具有名字的子程序可以在需要的地方调用执行不需要在每个执行的地方重复编写这些语句
HJ32 密码截取(java详解)(动态规划)

hello world 你好世界想要了解这题的动态规划提议先了解这题的中心扩散法解题思路最长回文子串的中心扩散法遍历每个字符作为中间位进行左右比较算法流程从右到左对每个字符进行遍历处理并且每个字符要处理两次因为回文
二手打印机如何挑选？

打印机作为生产力工具最重要的是稳定性可靠性以及使用成本常用的打印机分为三种分别是激光打印机喷墨打印机针式打印机不管你是去网店还是实体店铺购买打印机首先你要了解自己的需求打印机作为商品没有好与不好只有适不适合你一
Python编程基础题（20-宇宙无敌加法器）

Description Input 输入首先在第一行给出一个 N 位的进制表 0 lt N 20 以回车结束随后两行每行给出一个不超过 N 位的非负的 PAT 数 Output 在一行中输出两个 PAT 数之和 Sample Input
点位运动

梯形速度规划是最简单的速度规划方法加速度是常数规划过程中只需要控制速度和位移与时间的关系如图所示整个过程分为加速段匀速段减速段每一个轴在规划静止时都可以设置为点位运动在点位运动模式下各轴可以独立设置目标位置目标速度加
linux/windows下查看目标文件.a/.lib的函数符号名称

1 linux下 objdump t 查看对象文件所有的符号列表例如 objdump t libtest o 2 nm列出目标文件 o 的符号清单例如 nm s filename a filename o a out 3 列出所有定义的
jq中快速返回祖先元素

div class one div class two div class three div class focus 我是这个div div div div div
解决页面favicon.ico文件不存在提示404问题

所谓favicon 即Favorites Icon的缩写顾名思义便是其可以让浏览器的收藏夹中除显示相应的标题外还以图标的方式区别不同的网站当然这不是Favicon的全部根据浏览器的不同 Favicon显示也有所区别在大多数主流
逗号和分号

上面的程序
将python代码打包成可执行文件

文章目录打包工具使用 pyinstaller 安装pyinstaller库打包 Python是一种高级编程语言它具有易学易用跨平台等优点因此在开发中得到了广泛的应用然而 Python代码需要在Python解释器中运行这对于一
UML类图几种关系的总结

UML类图几种关系的总结转载链接 http blog csdn net sunboy 2050 article details 9211457 UML类图描述对象和类之间相互关系的方式包括依赖 Dependency 关联 Associ
mac生成树形结构

第一步安装tree brew install tree 第二步在要展示树结构的文件里面打开终端运行命令 tree d 只显示文件夹 tree L n 显示项目的层级 n表示层级数比如想要显示项目三层结构可以用tree l 3 tr
firefox安装selenium插件

1 目前新版类似Firefox58不兼容打开 https addons mozilla org en US firefox addon selenium ide 网址显示add to firefox为灰色下载Firefox48即可 2
R：RStudio和RStudio Server

RStudio是R语言开发中的利器是最好用的R语言IDE集成环境 RStudio Server更是利器中的神器不仅提供了Web的功能可以安装到远程服务器上通过Web进行访问还支持多用户的协作开发 RStudio 是一个强大的免费
IDEA——手把手教你mybatis的使用(新手教程)

说到Mybatis 很多人不知道这是用来干什么的简单来说就是用来优化JDBC的使用我们可以理解为一个这样的流程数据库 gt JDBC gt MyBatis gt Java 今天来教一下简单的mybatis使用方法对于初学者很友好目
C++基础（11）类模板

1 类模板类模板和函数模板的定义和使用类似我们已经进行了介绍有时有两个或多个类其功能是相同的仅仅是数据类型不同类模板用于实现类所需数据的类型参数化 include
Java并发工具之CyclicBarrier

一简介摘自 Java并发编程的艺术一书中 CyclicBarrier的字面意思是可循环使用 Cyclic 的屏障 Barrier 它要做的事情是让一组线程到达一个屏障也可以叫同步点时被阻塞直到最后一个线程到达屏障时屏障才会开
VQ-VAE

前言之前总结了一篇VAE的这次来个它的离散版本 VAE Variational Autoencoder 简单记录论文 Neural Discrete Representation Learning 代码 https gitee com

VQ-VAE

前言

原理

代码

VQ-VAE 的相关文章

随机推荐

热门标签