从LeNet到MobileNet:手把手教你用PyTorch复现这6个改变CV历史的经典CNN模型

张开发
2026/4/19 14:30:25 15 分钟阅读

分享文章

从LeNet到MobileNet:手把手教你用PyTorch复现这6个改变CV历史的经典CNN模型
从LeNet到MobileNet用PyTorch复现6个里程碑式CNN模型的完整指南在计算机视觉的发展历程中卷积神经网络CNN的演进如同一部浓缩的技术进化史。从1998年Yann LeCun提出的LeNet到2017年谷歌推出的MobileNet每个突破性模型都代表着设计理念的革新。本文将带你用PyTorch亲手实现这些改变历史的经典架构通过代码透视CNN设计的智慧演变。1. 环境准备与基础工具链搭建在开始复现这些经典模型前我们需要配置合适的开发环境。推荐使用Python 3.8和PyTorch 1.10版本这些组合在稳定性和功能支持上达到了最佳平衡。以下是基础环境配置步骤conda create -n torch-cv python3.8 conda activate torch-cv pip install torch torchvision torchaudio pip install matplotlib tqdm numpy pandas对于GPU加速确保安装对应CUDA版本的PyTorch。验证GPU是否可用import torch print(torch.cuda.is_available()) # 应输出True print(torch.__version__) # 确认版本号数据集方面我们将使用CIFAR-10作为统一的测试基准。虽然这些模型原论文大多使用ImageNet但CIFAR-10的较小尺寸更适合快速验证from torchvision import datasets, transforms # 标准化参数来自CIFAR-10数据统计 transform transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.4914, 0.4822, 0.4465), (0.247, 0.243, 0.261)) ]) train_set datasets.CIFAR10(root./data, trainTrue, downloadTrue, transformtransform) test_set datasets.CIFAR10(root./data, trainFalse, downloadTrue, transformtransform)提示所有模型将使用相同的训练配置以保证公平比较批量大小128初始学习率0.1使用Cosine退火交叉熵损失函数训练50个epoch。2. LeNet-5卷积神经网络的黎明1998年诞生的LeNet-5是首个成功应用于数字识别的CNN架构其设计理念至今仍在影响现代网络。让我们用PyTorch实现这个开创性模型import torch.nn as nn class LeNet5(nn.Module): def __init__(self, num_classes10): super().__init__() self.features nn.Sequential( nn.Conv2d(3, 6, kernel_size5), # C1层 nn.AvgPool2d(kernel_size2, stride2), # S2层 nn.Conv2d(6, 16, kernel_size5), # C3层 nn.AvgPool2d(kernel_size2, stride2), # S4层 ) self.classifier nn.Sequential( nn.Linear(16*5*5, 120), # C5层原论文中为卷积层 nn.Linear(120, 84), # F6层 nn.Linear(84, num_classes) # 输出层 ) def forward(self, x): x self.features(x) x torch.flatten(x, 1) x self.classifier(x) return x关键实现细节局部感受野5×5卷积核模拟生物视觉的局部连接特性权值共享通过卷积核复用大幅减少参数约6万参数仅为全连接的1/400空间降采样平均池化保留特征位置信息同时降低维度训练曲线显示即使在CIFAR-10上LeNet-5也能达到约65%的准确率。虽然性能不及现代模型但其设计思想极具启发性Epoch 50/50 | Train Acc: 64.72% | Test Acc: 63.85%3. AlexNet深度学习的复兴之作2012年AlexNet以压倒性优势赢得ImageNet竞赛开启了深度学习新时代。其创新点包括ReLU激活、Dropout和数据增强class AlexNet(nn.Module): def __init__(self, num_classes10): super().__init__() self.features nn.Sequential( nn.Conv2d(3, 64, kernel_size11, stride4, padding2), nn.MaxPool2d(kernel_size3, stride2), nn.Conv2d(64, 192, kernel_size5, padding2), nn.MaxPool2d(kernel_size3, stride2), nn.Conv2d(192, 384, kernel_size3, padding1), nn.Conv2d(384, 256, kernel_size3, padding1), nn.Conv2d(256, 256, kernel_size3, padding1), nn.MaxPool2d(kernel_size3, stride2), ) self.classifier nn.Sequential( nn.Dropout(p0.5), nn.Linear(256*2*2, 4096), nn.Dropout(p0.5), nn.Linear(4096, 4096), nn.Linear(4096, num_classes), ) def forward(self, x): x self.features(x) x torch.flatten(x, 1) x self.classifier(x) return x实现中的关键技术ReLU非线性相比Sigmoid缓解梯度消失问题重叠池化3×3池化窗口使用步长2提升特征丰富性局部响应归一化模拟生物神经元的侧抑制机制后被BN层取代注意原始AlexNet采用双GPU并行设计现代实现通常简化为单GPU版本。调整输入尺寸为64×64以适应CIFAR-10。4. VGGNet深度与规整之美牛津大学提出的VGGNet证明了网络深度的重要性。其统一的3×3卷积堆叠成为后续设计的标准范式def make_layers(cfg, batch_normFalse): layers [] in_channels 3 for v in cfg: if v M: layers [nn.MaxPool2d(kernel_size2, stride2)] else: conv2d nn.Conv2d(in_channels, v, kernel_size3, padding1) layers [conv2d, nn.ReLU(inplaceTrue)] if batch_norm: layers [nn.BatchNorm2d(v)] in_channels v return nn.Sequential(*layers) class VGG16(nn.Module): def __init__(self, num_classes10): super().__init__() self.features make_layers([64, 64, M, 128, 128, M, 256, 256, 256, M, 512, 512, 512, M, 512, 512, 512, M]) self.classifier nn.Sequential( nn.Linear(512, 4096), nn.ReLU(True), nn.Dropout(), nn.Linear(4096, 4096), nn.ReLU(True), nn.Dropout(), nn.Linear(4096, num_classes), ) def forward(self, x): x self.features(x) x torch.flatten(x, 1) x self.classifier(x) return xVGG的核心优势小卷积核堆叠多个3×3卷积等效于更大感受野但参数更少深度递增每阶段通道数翻倍空间尺寸减半结构对称美观的模块化设计便于扩展尽管参数量较大约1.38亿但VGG的特征提取能力使其至今仍被用作基础特征提取器。5. ResNet深度网络的突破微软研究院提出的ResNet通过残差连接解决了深度网络退化问题使训练数百层的网络成为可能class BasicBlock(nn.Module): expansion 1 def __init__(self, in_planes, planes, stride1): super().__init__() self.conv1 nn.Conv2d(in_planes, planes, kernel_size3, stridestride, padding1, biasFalse) self.bn1 nn.BatchNorm2d(planes) self.conv2 nn.Conv2d(planes, planes, kernel_size3, stride1, padding1, biasFalse) self.bn2 nn.BatchNorm2d(planes) self.shortcut nn.Sequential() if stride ! 1 or in_planes ! self.expansion*planes: self.shortcut nn.Sequential( nn.Conv2d(in_planes, self.expansion*planes, kernel_size1, stridestride, biasFalse), nn.BatchNorm2d(self.expansion*planes) ) def forward(self, x): out F.relu(self.bn1(self.conv1(x))) out self.bn2(self.conv2(out)) out self.shortcut(x) out F.relu(out) return out class ResNet(nn.Module): def __init__(self, block, num_blocks, num_classes10): super().__init__() self.in_planes 64 self.conv1 nn.Conv2d(3, 64, kernel_size3, stride1, padding1, biasFalse) self.bn1 nn.BatchNorm2d(64) self.layer1 self._make_layer(block, 64, num_blocks[0], stride1) self.layer2 self._make_layer(block, 128, num_blocks[1], stride2) self.layer3 self._make_layer(block, 256, num_blocks[2], stride2) self.layer4 self._make_layer(block, 512, num_blocks[3], stride2) self.linear nn.Linear(512*block.expansion, num_classes) def _make_layer(self, block, planes, num_blocks, stride): strides [stride] [1]*(num_blocks-1) layers [] for stride in strides: layers.append(block(self.in_planes, planes, stride)) self.in_planes planes * block.expansion return nn.Sequential(*layers) def forward(self, x): out F.relu(self.bn1(self.conv1(x))) out self.layer1(out) out self.layer2(out) out self.layer3(out) out self.layer4(out) out F.avg_pool2d(out, 4) out torch.flatten(out, 1) out self.linear(out) return out残差学习的核心创新恒等快捷连接解决梯度消失问题使信号可直接传播瓶颈设计1×1卷积先降维再升维减少计算量预激活结构BN和ReLU放在卷积前优化训练动态ResNet-18在CIFAR-10上轻松达到90%准确率证明了残差学习的强大能力。6. MobileNet移动端优化设计谷歌的MobileNet系列专注于移动设备的高效推理其核心是深度可分离卷积class DepthwiseSeparableConv(nn.Module): def __init__(self, in_channels, out_channels, stride1): super().__init__() self.depthwise nn.Sequential( nn.Conv2d(in_channels, in_channels, kernel_size3, stridestride, padding1, groupsin_channels, biasFalse), nn.BatchNorm2d(in_channels), nn.ReLU6(inplaceTrue) ) self.pointwise nn.Sequential( nn.Conv2d(in_channels, out_channels, kernel_size1, biasFalse), nn.BatchNorm2d(out_channels), nn.ReLU6(inplaceTrue) ) def forward(self, x): x self.depthwise(x) x self.pointwise(x) return x class MobileNetV1(nn.Module): def __init__(self, num_classes10): super().__init__() self.model nn.Sequential( nn.Conv2d(3, 32, kernel_size3, stride2, padding1, biasFalse), nn.BatchNorm2d(32), nn.ReLU6(inplaceTrue), DepthwiseSeparableConv(32, 64, stride1), DepthwiseSeparableConv(64, 128, stride2), DepthwiseSeparableConv(128, 128, stride1), DepthwiseSeparableConv(128, 256, stride2), DepthwiseSeparableConv(256, 256, stride1), DepthwiseSeparableConv(256, 512, stride2), DepthwiseSeparableConv(512, 512, stride1), DepthwiseSeparableConv(512, 512, stride1), DepthwiseSeparableConv(512, 512, stride1), DepthwiseSeparableConv(512, 512, stride1), DepthwiseSeparableConv(512, 512, stride1), DepthwiseSeparableConv(512, 1024, stride2), DepthwiseSeparableConv(1024, 1024, stride1), nn.AdaptiveAvgPool2d(1) ) self.fc nn.Linear(1024, num_classes) def forward(self, x): x self.model(x) x torch.flatten(x, 1) x self.fc(x) return xMobileNet的创新设计深度分离卷积将标准卷积分解为深度卷积和点卷积两步宽度乘子通过α系数控制模型大小与计算量的平衡线性瓶颈最后一层使用线性激活避免信息损失相比VGG16MobileNet仅用1/30的计算量就能达到相近的准确率非常适合移动端部署。7. 模型对比与演进趋势分析通过实际复现这些模型我们可以清晰看到CNN架构的演进轨迹模型参数量计算量(FLOPs)Top-1准确率关键创新LeNet-560K0.3M63.8%卷积池化结构AlexNet60M720M75.3%ReLU/DropoutVGG16138M15.5G85.2%小卷积堆叠ResNet-1811M1.8G91.5%残差连接MobileNetV14.2M0.6G83.7%深度分离卷积从技术演进角度看CNN设计呈现出以下趋势从人工设计到结构搜索早期网络依赖人工设计后期如MobileNetV3引入NAS技术计算效率优先参数量和计算量呈下降趋势同时保持精度模块化程度提高从AlexNet的连续层到ResNet的块结构非线性简化ReLU替代Sigmoid后期甚至移除部分激活函数这些经典模型不仅是技术里程碑更为我们提供了丰富的设计模式库。在实际项目中可以根据需求组合这些模式——例如在ResNet中使用深度分离卷积或在MobileNet中添加残差连接。

更多文章