| Meta | 172 comments

Meta reposted this

1,003,382 followers

Introducing DINOv3: a state-of-the-art computer vision model trained with self-supervised learning (SSL) that produces powerful, high-resolution image features. For the first time, a single frozen vision backbone outperforms specialized solutions on multiple long-standing dense prediction tasks. Learn more here: https://lnkd.in/giU_-6_M A few highlights of DINOv3: 1️⃣SSL enables 1.7B-image, 7B-param training without labels, supporting annotation-scarce scenarios including satellite imagery 2️⃣Produces excellent high-resolution features and state-of-the art performance on dense prediction tasks 3️⃣Versatile application across vision tasks and domains, all with a frozen backbone (no fine-tuning required) 4️⃣ Includes distilled smaller models (ViT-B, ViT-L) and ConvNeXt variants for deployment flexibility To help foster innovation and collaboration in the computer vision community, we’re releasing DINOv3 under a commercial license with a full suite of pre-trained models, adapters, training and evaluation code, and (much!) more. Find them here: https://lnkd.in/gEptEtVR

172 Comments

Transcript

Meta Fair presents Dino V3 self supervised learning for vision at unprecedented scale. Dino V3 scales self supervised learning for images to produce our strongest universal vision backbones. Rich, dense image features show impressive self similarities with exceptional consistency and time. Dense features are aligned across objects. And even across dramatic style changes. Building on these features takes just a few manual annotations, enabling applications like zero shot segmentation tracking. Or building a few shot fine grained segmentation model. These features power exceptional performance across a broad range of vision tasks. Researchers and developers use Dyno in breakthrough applications, and with Dino V3 we release a specialized backbone for satellite imagery tasks. This open source release includes our training code and model weights, efficient models, and alternative architectures tutorials to get you started. Learn more about Dino V3.

Praveen Kamsetti

It's more than a technical upgrade. With 7B parameters and 1.7B training images, DINOv3's pivot to visual in-context prompting marks a conceptual shift combining scale and simplicity to advance intuitive, open-set segmentation and multimodal alignment. Excited to see real-world business cases emerge, especially in healthcare and medical imaging, where semantic segmentation and resource-constrained diagnostics could benefit immensely.