VJEPA2预训练完全指南:利用百万小时视频数据构建世界模型

张开发
2026/4/16 4:31:26 15 分钟阅读

分享文章

VJEPA2预训练完全指南:利用百万小时视频数据构建世界模型
VJEPA2预训练完全指南利用百万小时视频数据构建世界模型【免费下载链接】vjepa2PyTorch code and models for VJEPA2 self-supervised learning from video.项目地址: https://gitcode.com/gh_mirrors/vj/vjepa2VJEPA2是基于PyTorch的自监督视频学习框架通过互联网规模的视频数据训练视频编码器在动作理解和人类动作预测任务上达到了最先进的性能。本文将全面介绍如何使用VJEPA2进行预训练帮助你利用百万小时视频数据构建强大的世界模型。VJEPA2视频自监督学习的终极解决方案 VJEPA2Video Joint-Embedding Predictive Architecture是一种创新的自监督学习方法它通过掩码潜在特征预测目标从海量自然视频中引导物理世界的理解和预测能力。与传统监督学习相比VJEPA2不需要人工标注能够从原始视频数据中自动学习有用的特征表示。VJEPA2的核心优势无监督学习无需人工标注直接从原始视频数据中学习时空理解能够捕捉视频中的时间动态和空间关系迁移能力预训练模型可轻松迁移到各种下游任务高效训练优化的架构设计支持大规模视频数据训练VJEPA2工作流程图从互联网视频和图像到各种下游任务的完整流程VJEPA2.1更强大的视频特征学习2026年3月发布的VJEPA2.1带来了全新的模型系列通过新颖的训练方法学习高质量且时间一致的密集特征。VJEPA2.1主要改进包括密集预测损失一种基于掩码的自监督目标所有标记可见/上下文和掩码标记都有助于自监督训练损失深度自监督在编码器模型的多个中间表示上应用自监督损失多模态分词器针对图像和视频的专用分词器模型和数据扩展通过扩大模型规模和训练数据提升性能VJEPA2.1架构图展示了编码器和预测器的工作原理视觉化特征对比VJEPA2.1在特征学习方面的提升可以通过PCA可视化清晰地看到。以下对比展示了原始图像、VJEPA2特征和VJEPA2.1特征的差异VJEPA2与VJEPA2.1特征对比上排为原始图像中排为VJEPA2特征可视化下排为VJEPA2.1特征可视化快速开始环境搭建要开始使用VJEPA2进行预训练首先需要搭建环境。推荐使用conda创建虚拟环境conda create -n vjepa2-312 python3.12 conda activate vjepa2-312 git clone https://gitcode.com/gh_mirrors/vj/vjepa2 cd vjepa2 pip install . # 或使用 pip install -e . 进行开发模式安装macOS用户注意VJEPA2依赖decord库而该库不支持macOS。可以尝试使用社区维护的替代方案如eva-decord或decord2。预训练模型选择VJEPA2提供了多种预训练模型适用于不同的应用场景和计算资源VJEPA2预训练模型模型参数数量分辨率配置文件路径ViT-L/16300M256configs/train/vitl16ViT-H/16600M256configs/train/vith16ViT-g/161B256configs/train/vitg16ViT-g/16_3841B384configs/train/vitg16VJEPA2.1预训练模型模型参数数量分辨率配置文件路径ViT-B/1680M384configs/train_2_1/vitb16ViT-L/16300M384configs/train_2_1/vitl16ViT-g/161B384configs/train_2_1/vitg16ViT-G/162B384configs/train_2_1/vitG16预训练步骤VJEPA2的预训练可以在本地或分布式环境中运行。预训练和冷却训练阶段使用不同的配置文件但命令格式相同。本地预训练以下命令启动ViT-L模型的初始训练python -m app.main --fname configs/train/vitl16/pretrain-256px-16f.yaml \ --devices cuda:0对于VJEPA2.1使用相应的配置文件python -m app.main --fname configs/train_2_1/vitl16/pretrain-256px-16f.yaml \ --devices cuda:0分布式预训练在SLURM集群上进行分布式训练python -m app.main_distributed \ --fname configs/train/vitl16/pretrain-256px-16f.yaml \ --time 6000 \ --account my_account --qosmy_qos使用预训练模型VJEPA2提供了多种方式加载预训练模型包括PyTorch Hub和HuggingFace。通过PyTorch Hub加载import torch # 加载预处理器 processor torch.hub.load(facebookresearch/vjepa2, vjepa2_preprocessor) # 加载VJEPA2模型 vjepa2_vit_large torch.hub.load(facebookresearch/vjepa2, vjepa2_vit_large) vjepa2_vit_huge torch.hub.load(facebookresearch/vjepa2, vjepa2_vit_huge) vjepa2_vit_giant torch.hub.load(facebookresearch/vjepa2, vjepa2_vit_giant) # 加载VJEPA2.1模型 vjepa2_1_vit_base_384 torch.hub.load(facebookresearch/vjepa2, vjepa2_1_vit_base_384) vjepa2_1_vit_large_384 torch.hub.load(facebookresearch/vjepa2, vjepa2_1_vit_large_384)通过HuggingFace加载from transformers import AutoVideoProcessor, AutoModel hf_repo facebook/vjepa2-vitg-fpc64-256 model AutoModel.from_pretrained(hf_repo) processor AutoVideoProcessor.from_pretrained(hf_repo)评估与微调VJEPA2提供了完整的评估和微调工具链方便用户在自己的数据集上进行模型评估和微调。探针评估探针评估包括在冻结的VJEPA2特征之上训练一个注意力探针。可以使用提供的训练脚本训练自己的探针或直接使用预训练的探针进行推理。# 本地训练探针 python -m evals.main --fname configs/eval/vitl/ssv2.yaml \ --devices cuda:0 cuda:1 # 分布式训练探针 python -m evals.main_distributed \ --fname configs/eval/vitl/ssv2.yaml \ --time 8600 \ --account my_account --qosmy_qos推理示例notebooks/vjepa2_demo.ipynb提供了加载模型并对示例视频运行推理的完整示例。使用前需要下载模型权重并更新脚本中的相应路径wget https://dl.fbaipublicfiles.com/vjepa2/vitg-384.pt -P YOUR_DIR wget https://dl.fbaipublicfiles.com/vjepa2/evals/ssv2-vitg-384-64x2x3.pt -P YOUR_DIR python -m notebooks.vjepa2_demo代码结构解析VJEPA2项目结构清晰便于理解和扩展. ├── app # 训练循环 │ ├── vjepa # V-JEPA 2预训练 │ ├── vjepa_2_1 # V-JEPA 2.1预训练 │ ├── vjepa_droid # 动作条件模型训练 │ ├── main_distributed.py # 分布式训练入口 │ └── main.py # 本地训练入口 ├── configs # 训练和评估的配置文件 ├── evals # 评估循环 ├── src # 核心代码包 │ ├── datasets # 数据集和数据加载器 │ ├── models # 模型定义 │ ├── masks # 掩码工具 │ └── utils # 通用工具函数 └── tests # 单元测试结语VJEPA2为视频自监督学习提供了强大的工具和模型通过利用百万小时的视频数据可以构建出能够理解、预测和规划的世界模型。无论是学术研究还是工业应用VJEPA2都为视频理解任务提供了新的可能性。希望本指南能帮助你快速上手VJEPA2的预训练流程。如有任何问题欢迎查阅项目文档或提交issue。祝你的VJEPA2预训练之旅顺利 【免费下载链接】vjepa2PyTorch code and models for VJEPA2 self-supervised learning from video.项目地址: https://gitcode.com/gh_mirrors/vj/vjepa2创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

更多文章