How to use Meta’s DINOv2 to perform image object detection

Read Time 3 mins | Written by: Cole

Training AI models to perform complex tasks is extremely expensive and only the largest companies have the millions or billions of dollars to do it. But, many of the open source models (vision models, large language models, or multimodal models) can do things they weren’t trained to do.

One of our senior software engineers who researches AI (Zachary) wanted to see if he could get a vision model (Meta’s open source DINOv2) to perform a task it wasn’t trained to do – potentially saving on training investments and opening up new product possibilities.

Zachary extended and improved on some ideas proposed in Meta's research. From there, he got DINOv2 to perform foreground image object detection with PCA (a foundational function of Adobe’s generative AI features). A task the model wasn’t trained to do by Meta.

What does foreground image object detection look like?

You can read Zachary’s article to get into the technical details here. But to put it simply, he got DINOv2 to recognize a dog playing with a soccer ball in a park (from an AI-generated image) and replaced the dog with an AI-generated tiger.

How does DINOv2 object detection apply to enterprise products?

Companies are scrambling to add AI technology to their products and Adobe is a great example of success. They’ve added impressive AI innovations for photo, video, audio, and 3D to their suite of products. One of their biggest features is an object-aware editing engine called Project Stardust.

You can use the Adobe object-aware editing engine in their Creative Cloud to switch the clothing people are wearing in a photo, remove people from the foreground or background, and edit practically any identifiable objects on the fly.

Zachary’s research led to DINOv2 performing a vital function of Adobe’s advanced object editing engine. This shows that companies can use open source vision models to perform major features in customer-facing products that the models aren’t previously trained to do.

With the right expertise, that could lead to radically lower compute costs for training custom models to perform tasks that can power world-class customer experiences.

How do I hire AI experts to build enterprise products?

Instead of waiting 6-18 months to recruit for expensive open source AI teams, you could engage Codingscape and start on your AI roadmap next quarter. We’re already building AI applications for our partners using open source technology and helping them plan their investments for 2024.

Zappos, Twilio, and Veho are just a few companies that trust us to build their technology ecosystems with a remote-first approach.

You can schedule a time to talk with us here. No hassle, no expectations, just answers.

Don't Miss
Another Update

Subscribe to be notified when
new content is published

Cole

Cole is Codingscape's Content Marketing Strategist & Copywriter.

How to use Meta’s DINOv2 to perform image object detection

What does foreground image object detection look like?

How does DINOv2 object detection apply to enterprise products?

How do I hire AI experts to build enterprise products?

Don't Miss
Another Update

Cole

Services

SOLUTIONS

Resources

COMPANY

How to use Meta’s DINOv2 to perform image object detection

What does foreground image object detection look like?

How does DINOv2 object detection apply to enterprise products?

How do I hire AI experts to build enterprise products?

Don't MissAnother Update

Cole

Services

SOLUTIONS

Resources

COMPANY

Don't Miss
Another Update