AlexNet模型

2023-11-08

目录

1、摘要:介绍背景及提出AlexNet模型,获得ILSVRC-2012冠军

2、Introduction:介绍了本文的主要贡献;研究的成果主要得益于大量的数据以及高性能的GPU

3、The DataSet:ILSVRC-2012数据集简介;图片预处理细节

4、The Architecture:AlexNet网络结构及其内部细节,ReLU、GPU、LRN、Overlapping Pooling

5、Reducing Overfitting:防止过拟合,数据增强和dropout

6、Details of learning:超参数调整、权重初始化

7、实验结果和分析

8、论文总结

(1)、关键点

(2)、创新点

(3)、启发点

9、代码复现


1、摘要:介绍背景及提出AlexNet模型,获得ILSVRC-2012冠军


ILSVRC:大规模图像识别挑战赛从包含21841个类别、14197122张图片的ImageNet数据集中挑选了1000类的1200000张作为训练集,获得了最优的结果,“top-1 and top-5 error rates of 37.5% and 17.0%” (Krizhevsky 等, 2017, p. 84)


“The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully connected layers with a final 1000-way softmax.” (Krizhevsky 等, 2017, p. 84)


“To make training faster, we used nonsaturating neurons and a very efficient GPU implementation of the convolution operation.” (Krizhevsky 等, 2017, p. 84)


“To reduce overfitting in the fully connected layers we employed a recently developed regularization method called “dropout” that proved to be very effective.” (Krizhevsky 等, 2017, p. 84)

2、Introduction:介绍了本文的主要贡献;研究的成果主要得益于大量的数据以及高性能的GPU


“But objects in realistic settings exhibit considerable variability, so to learn to recognize them it is necessary to use much larger training sets. And indeed, the shortcomings of small image datasets have been widely recognized (e.g., Ref.25), but it has only recently become possible to collect labeled datasets with millions of images. The new larger datasets include LabelMe,28 which consists of hundreds of thousands of fully segmented images, and ImageNet,7 which consists of over 15 million labeled high-resolution images in over 22,000 categories.” (Krizhevsky 等, 2017, p. 85)


“To learn about thousands of objects from millions of images, we need a model with a large learning capacity. However, the immense complexity of the object recognition task means that this problem cannot be specified even by a dataset as large as ImageNet, so our model should also have lots of prior knowledge to compensate for all the data we do not have.” (Krizhevsky 等, 2017, p. 85)


“In the end, the network’s size is limited mainly by the amount of memory available on current GPUs and by the amount of training time that we are willing to tolerate.” (Krizhevsky 等, 2017, p. 85)

3、The DataSet:ILSVRC-2012数据集简介;图片预处理细节


“mageNet is a dataset of over 15 million labeled high-resolution images belonging to roughly 22,000 categories. The images were collected from the web and labeled by human labelers using Amazon’s Mechanical Turk crowd-sourcing tool. Starting in 2010, as part of the Pascal Visual Object Challenge, an annual competition called the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) has been held. ILSVRC uses a subset of ImageNet with roughly 1000 images in each of 1000 categories. In all, there are roughly 1.2 million training images, 50,000 validation images, and 150,000 testing images.” (Krizhevsky 等, 2017, p. 85)


“ImageNet consists of variable-resolution images, while our system requires a constant input dimensionality. Therefore, we down-sampled the images to a fixed resolution of 256 × 256. Given a rectangular image, we first rescaled the image such that the shorter side was of length 256, and then cropped out the central 256 × 256 patch from the resulting image. We did not pre process the images in any other way, except for subtracting the mean activity over the training set from each pixel. So we trained our network on the (centered) raw RGB values of the pixels.” (Krizhevsky 等, 2017, p. 85)

4、The Architecture:AlexNet网络结构及其内部细节,ReLU、GPU、LRN、Overlapping Pooling

(1)、tanh 和 sigmod函数是饱和的激活函数;ReLU以及其变种为非饱和激活函数。非饱和激活函数主要有如下优势:

非饱和激活函数可以解决梯度消失问题。

非饱和激活函数可以加速收敛。


“This is demonstrated in Figure 1, which shows the number of iterations required to reach 25% training error on the CIFAR-10 dataset for a particular four-layer convolutional network.” (Krizhevsky 等, 2017, p. 85)


“A four-layer convolutional neural network with ReLUs (solid line) reaches a 25% training error rate on CIFAR-10 six times faster than an equivalent network with tanh neurons (dashed line).” (Krizhevsky 等, 2017, p. 86)


(2)指出原因使用双GPU的原因

“A single GTX 580 GPU has only 3GB of memory, which limits the maximum size of the networks that can be trained on it. It turns out that 1.2 million training examples are enough to train networks which are too big to fit on one GPU.” (Krizhevsky 等, 2017, p. 86)

“Current GPUs are particularly well-suited to cross-GPU parallelization, as they are able to read from and write to one another’s memory directly, without going through host machine memory.” (Krizhevsky 等, 2017, p. 86)


双GPU的具体使用方法:“The parallelization scheme that we employ essentially puts half of the kernels (or neurons) on each GPU, with one additional trick: the GPUs communicate only in certain layers. This means that, for example, the kernels of layer 3 take input from all kernel maps in layer 2. However, kernels in layer 4 take input only from those kernel maps in layer 3 which reside on the same GPU.” (Krizhevsky 等, 2017, p. 86)


(3)、局部归一化:

“local normalization scheme aids generalization” (Krizhevsky 等, 2017, p. 86)

模型泛化能力是指模型在未曾见过的数据上的表现能力,也就是模型对于新的数据的适应能力,是机器学习算法的一个评价指标


“We also verified the effectiveness of this scheme on the CIFAR-10 dataset: a four-layer CNN achieved a 13% test error rate without normalization and 11% with normalization.c” (Krizhevsky 等, 2017, p. 86)

(4)、Overlapping Pooling:使用池化层减少过拟合


“To be more precise, a pooling layer can be thought of as consisting of a grid of pooling units spaced s pixels apart, each summarizing a neighborhood of size z × z centered at the location of the pooling unit. If we set s = z, we obtain traditional local pooling as commonly employed in CNNs. If we set s < z, we obtain overlapping pooling. This is what we use throughout our network, with s = 2 and z = 3. This scheme reduces the top-1 and top-5 error rates by 0.4% and 0.3%, respectively, as compared with the non overlapping scheme s = 2, z = 2, which produces output of equivalent dimensions. We generally observe during training that models with overlapping pooling find it slightly more difficult to overfit.” (Krizhevsky 等, 2017, p. 87)

(5)AlexNet网络结构:


“the net contains eight layers with weights; the first five are convolutional and the remaining three are fully connected. The output of the last fully connected layer is fed to a 1000-way softmax which produces a distribution over the 1000 class labels.” (Krizhevsky 等, 2017, p. 87)

5、Reducing Overfitting:防止过拟合,数据增强和dropout

(1)、两种方式进行数据增强:

“We employ two distinct forms of data augmentation, both of which allow transformed images to be produced from the original images with very little computation, so the transformed images do not need to be stored on disk” (Krizhevsky 等, 2017, p. 87)


“The first form of data augmentation consists of generating image translations and horizontal reflections. We do this by extracting random 224 × 224 patches (and their horizontal reflections) from the 256 × 256 images and training our network on these extracted patches.d This increases the size of our training set by a factor of 2048, though the resulting training examples are, of course, highly inter dependent. Without this scheme, our network suffers from substantial overfitting, which would have forced us to use much smaller networks. At test time, the network makes a prediction by extracting five 224 × 224 patches (the four corner patches and the center patch) as well as their horizontal reflections (hence 10 patches in all), and averaging the predictions made by the network’s softmax layer on the ten patches.” (Krizhevsky 等, 2017, p. 87)

训练阶段:

。图片统一缩放到256*256

。随机位置裁剪出224*224

。随机进行水平翻转

测试阶段:

。图片统一缩放至256*256

。裁剪出5个224*224区域

。均进行水平翻转,共得到10张224*224照片


“The second form of data augmentation consists of altering the intensities of the RGB channels in training images.” (Krizhevsky 等, 2017, p. 88)

对 RGB这些通道上的数据进行一个主成分分析,然后对主成分分析上的参数进行一个扰动,经过这些扰动,图像的色彩就会发生一个微小的变化来实现对图像数据增强,增加图像的一个多样性和丰富度。但是效果了有限

(2)、Dropout(随机失活)

“The recently introduced technique, called “dropout”,12 consists of setting to zero the output of each hidden neuron with probability 0.5. The neurons which are “dropped out” in this way do not contribute to the forward pass and do not participate in back propagation. So every time an input is presented, the neural network samples a different architecture, but all these architectures share weights.” (Krizhevsky 等, 2017, p. 88)


“This technique reduces complex co adaptations of neurons, since a neuron cannot rely on the presence of particular other neurons. It is, therefore, forced to learn more robust features that are useful in conjunction with many different random subsets of the other neurons. At test time, we use all the neurons but multiply their outputs by 0.5, which is a reasonable approximation to taking the geometric mean of the predictive distributions produced by the exponentially-many dropout networks.” (Krizhevsky 等, 2017, p. 88)


“We use dropout in the first two fully connected layers of Figure 2. Without dropout, our network exhibits substantial overfitting. Dropout roughly doubles the number of iterations required to converge.” (Krizhevsky 等, 2017, p. 88)

6、Details of learning:超参数调整、权重初始化

(1)、超参数调整

“We trained our models using stochastic gradient descent with a batch size of 128 examples, momentum of 0.9, and weight decay of 0.0005. We found that this small amount of weight decay was important for the model to learn. In other words, weight decay here is not merely a regularizer: it reduces the model’s training error. The update rule for weight w was where i is the iteration index, u is the momentum variable, ε is the learning rate, and 〈 wi〉Di is the average over the ith batch Di of the derivative of the objective with respect to w, evaluated at wi.” (Krizhevsky 等, 2017, p. 88)


(2)、权重初始化

“We initialized the weights in each layer from a zero-mean Gaussian distribution with standard deviation 0.01. We initialized the neuron biases in the second, fourth, and fifth convolutional layers, as well as in the fully connected hidden layers, with the constant 1. This initialization accelerates the early stages of learning by providing the ReLUs with positive inputs. We initialized the neuron biases in the remaining layers with the constant 0.” (Krizhevsky 等, 2017, p. 88)

7、实验结果和分析

(1)、相似图片的第二个全连接层输出特征向量的欧式距离相近

(2)、可用AlexNet提取高级特征进行图像检索、图像聚类、图像编码


“Computing similarity by using Euclidean distance between two 4096-dimensional, real-valued vectors is inefficient, but it could be made efficient by training an auto encoder to compress these vectors to short binary codes. This should produce a much better image retrieval method than applying auto encoders to the raw pixels,16 which does not make use of image labels and hence has a tendency to retrieve images with similar patterns of edges, whether or not they are semantically similar.” (Krizhevsky 等, 2017, p. 90)

8、论文总结

(1)、关键点

。大量带标签的数据——ImageNet,算料

。高性能计算资源——GPU,算力

。合理算法模型——深度卷积神经网络,算法

(2)、创新点

。采用ReLu加快大型神经网络训练

。采用LRN提升大型网络泛化能力

。采用随机裁剪翻转以及色彩扰动增加数据多样性

。采用Drpour减轻过拟合

(3)、启发点

。深度与广度可以决定网络能力


“Their capacity can be controlled by varying their depth and breadth” (Krizhevsky 等, 2017, p. 85)

。更大的数据集以及更快的GPU可以进一步提高模型的性能


“All of our experiments suggest that our results can be improved simply by waiting for faster GPUs and bigger datasets to become available.” (Krizhevsky 等, 2017, p. 85)

。图片缩放,先对短边缩放


“Given a rectangular image, we first rescaled the image such that the shorter side was of length 256, and then cropped out the central 256 × 256 patch from the resulting image” (Krizhevsky 等, 2017, p. 85)

。ReLu函数不需要对输入进行标准化来防止饱和,也即说明sigmod和tanh函数需要进行标准化来防止饱和


“ReLUs have the desirable property that they do not require input normalization to prevent them from saturating. I” (Krizhevsky 等, 2017, p. 86)

。卷积核学习到频率、方向和颜色特征


“The network has learned a variety of frequency- and orientation-selective kernels, as well as various colored blobs.” (Krizhevsky 等, 2017, p. 89)

。相似图片具有”相似”的高级特征


“If two images produce feature activation vectors with a small Euclidean separation, we can say that the higher levels of the neural network consider them to be similar.” (Krizhevsky 等, 2017, p. 89)

。图片检索可以基于高级特征,效果应该由于基于原始图像


“This should produce a much better image retrieval method than applying auto encoders to the raw pixels,16 which does not make use of image labels and hence has a tendency to retrieve images with similar patterns of edges, whether or not they are semantically similar.” (Krizhevsky 等, 2017, p. 90)

。网络结果具有相关性,不可轻易移除某一层


“It is notable that our network’s performance degrades if a single convolutional layer is removed.” (Krizhevsky 等, 2017, p. 90)

。采用视频数据,可能有新突破


“Ultimately we would like to use very large and deep convolutional nets on video sequences where the temporal structure provides very helpful information, that is, missing or far less obvious in static images.” (Krizhevsky 等, 2017, p. 90)

9、代码复现

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

AlexNet模型 的相关文章

  • 欧盟反垄断主管即将会见库克,iPhone NFC功能要开放了?

    1月5日路透社报道 欧盟反垄断主管玛格丽特 维斯塔格 Margrethe Vestager 即将在下周举办会议 会见苹果 博通 英伟达等多个科技公司CEO 苹果首席执行官蒂姆 库克 Tim Cook 就在其中 欧盟反垄断想来大家应该已经不陌
  • 【固定翼飞机】基于最优控制的固定翼飞机着陆控制器设计研究(Matlab代码实现)

    欢迎来到本博客 博主优势 博客内容尽量做到思维缜密 逻辑清晰 为了方便读者 座右铭 行百里者 半于九十 本文目录如下 目录 1 概述 2 运行结果 3 参考文献 4 Matlab代码及文章
  • 蒙特卡洛在发电系统中的应用(Matlab代码实现)

    欢迎来到本博客 博主优势 博客内容尽量做到思维缜密 逻辑清晰 为了方便读者 座右铭 行百里者 半于九十 本文目录如下 目录 1 概述 2 运行结果 3 参考文献 4 Matlab代码实现
  • 利用CHAT写实验结论

    问CHAT 通过观察放置在玻璃表面上的单个水滴 人们可以观察到水滴充当成像系统 探究这样一个透镜的放大倍数和分辨率 CHAT回复 实验报告标题 利用玻璃表面的单一水滴观察成像系统的放大倍数和分辨率 一 实验目的 通过对比和测量 研究和探索玻
  • 基于java的ssh医院在线挂号系统设计与实现

    基于java的ssh医院在线挂号系统设计与实现 I 引言 A 研究背景和动机 基于Java的SSH医院在线挂号系统设计与实现的研究背景和动机 随着信息技术的迅速发展和应用 医院在线挂号系统已成为医院管理的重要组成部分 传统的挂号方式存在许多
  • 面对AI革新时,Soul App等社交应用的“出圈”解法是什么?

    2023年初 ChatGPT掀开海内外互联网 AI革新 的序幕 公众在惊讶于ChatGPT对于海量信息富有逻辑的整合归纳 帮助大家提升工作及学习效率之余 更为期待的莫过于有一天人工智能的 意识觉醒 十余年前由斯派克 琼斯 Spike Jon
  • 链上繁荣,合作不断,外媒热议波场 TRON 2023 年度成绩

    近日 权威外媒Theblock 美联社和Decrypt等就波场 TRON 2023大事件进行了年度盘点 报道指出 波场TRON网络在2023年取得了一系列的发展和合作 提升了其在Web3领域的地位 其中 波场TRON网络账户数量增加了 54
  • 利用CHAT上传文件的操作

    问CHAT autox js ui 上传框 CHAT回复 上传文件的操作如果是在应用界面中的话 由于Android对于文件权限的限制 你可能不能直接模拟点击选择文件 一般来说有两种常见的解决方案 一种是使用intent来模拟发送一个文件路径
  • 用CHAT分析高校体育智慧教学体系构建与探索研究现状

    CHAT回复 现阶段 高校体育智慧教学体系的构建与探索研究还处于初级阶段 但全球数字化转型大潮的推动下 一些较为前沿的研究和实践已经开始出现 1 教学平台的建设 很多高校已经开始尝试使用在线教育平台进行体育教学 把传统的面对面授课模式转变为
  • 基于opencv的大米计数统计(详细处理流程+代码)

    在我每周的标准作业清单中 有一项是编写计算机视觉算法来计算该图像中米粒的数量 因此 当我的一个好朋友M给我发了一张纸上的扁豆照片 显然是受到上述转发的启发 请我帮他数一下谷物的数量时 它勾起了我怀旧的回忆 因此 我在我的旧硬盘上寻找很久以前
  • 毕业设计:基于深度学习的微博谣言检测系统 人工智能

    目录 前言 设计思路 一 课题背景与意义 二 算法理论原理 三 检测的实现 最后 前言 大四是整个大学期间最忙碌的时光 一边要忙着备考或实习为毕业后面临的就业升学做准备 一边要为毕业设计耗费大量精力 近几年各个学校要求的毕设项目越来越难 有
  • 作物叶片病害识别系统

    介绍 由于植物疾病的检测在农业领域中起着重要作用 因为植物疾病是相当自然的现象 如果在这个领域不采取适当的护理措施 就会对植物产生严重影响 进而影响相关产品的质量 数量或产量 植物疾病会引起疾病的周期性爆发 导致大规模死亡 这些问题需要在初
  • 手把手教你用 Stable Diffusion 写好提示词

    Stable Diffusion 技术把 AI 图像生成提高到了一个全新高度 文生图 Text to image 生成质量很大程度上取决于你的提示词 Prompt 好不好 前面文章写了一篇文章 一份保姆级的 Stable Diffusion
  • AI在保护环境、应对气候变化中的作用

    对于AI生命周期数据领域的全球领导者而言 暂时搁置我们惯常的AI见解和AI生命周期数据内容产出 来认识诸如世界地球日这样的自然环境类活动日 似乎是个奇怪的事情 我们想要知道 数据是否真的会影响我们的地球环境 简而言之 是 确实如此 但作为一
  • 主流进销存系统有哪些?企业该如何选择进销存系统?

    主流进销存系统有哪些 企业该如何选择进销存系统 永久免费 的软件 这个可能还真不太可能有 而且就算有 也只能说是相对免费 因为要么就是数据存量有限 要么就是功能有限 数据 信息都不保障 并且功能不完全 免费 免费软件 免费进销存 诸如此类
  • 【EI复现】基于深度强化学习的微能源网能量管理与优化策略研究(Python代码实现)

    欢迎来到本博客 博主优势 博客内容尽量做到思维缜密 逻辑清晰 为了方便读者 座右铭 行百里者 半于九十 本文目录如下 目录 1 概述 2 运行结果 2 1 有 无策略奖励 2 2 训练结果1
  • 蒙特卡洛在发电系统中的应用(Matlab代码实现)

    欢迎来到本博客 博主优势 博客内容尽量做到思维缜密 逻辑清晰 为了方便读者 座右铭 行百里者 半于九十 本文目录如下 目录 1 概述 2 运行结果 3 参考文献 4 Matlab代码实现
  • 3D点云检测神技 | UFO来了!让PointPillars、PV-RCNN统统涨点!

    作者 AI驾驶员 编辑 智驾实验室 点击下方 卡片 关注 自动驾驶之心 公众号 ADAS巨卷干货 即可获取 点击进入 自动驾驶之心 3D目标检测 技术交流群 本文只做学术分享 如有侵权 联系删文 在这篇论文中提出了一个关于在3D点云中检测未
  • 基于节点电价的电网对电动汽车接纳能力评估模型研究(Matlab代码实现)

    欢迎来到本博客 博主优势 博客内容尽量做到思维缜密 逻辑清晰 为了方便读者 座右铭 行百里者 半于九十 本文目录如下 目录 1 概述 2 运行结果 3 参考文献 4 Matlab代码 数据
  • 对中国手机作恶的谷歌,印度CEO先后向三星和苹果低头求饶

    日前苹果与谷歌宣布合作 发布了 Find My Device Network 的草案 旨在规范蓝牙追踪器的使用 在以往苹果和谷歌的生态形成鲜明的壁垒 各走各路 如今双方竟然达成合作 发生了什么事 首先是谷歌安卓系统的市场份额显著下滑 数年来

随机推荐