arxiv:2407.04538

PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers

Published on Jul 5, 2024

Upvote

Authors:

Ananthu Aniraj ,

Cassio F. Dantas ,

Abstract

Using self-supervised transformer-based vision models with a total variation prior improves unsupervised part discovery and classification performance compared to previous methods.

AI-generated summary

Computer vision methods that explicitly detect object parts and reason on them are a step towards inherently interpretable models. Existing approaches that perform part discovery driven by a fine-grained classification task make very restrictive assumptions on the geometric properties of the discovered parts; they should be small and compact. Although this prior is useful in some cases, in this paper we show that pre-trained transformer-based vision models, such as self-supervised DINOv2 ViT, enable the relaxation of these constraints. In particular, we find that a total variation (TV) prior, which allows for multiple connected components of any size, substantially outperforms previous work. We test our approach on three fine-grained classification benchmarks: CUB, PartImageNet and Oxford Flowers, and compare our results to previously published methods as well as a re-implementation of the state-of-the-art method PDiscoNet with a transformer-based backbone. We consistently obtain substantial improvements across the board, both on part discovery metrics and the downstream classification task, showing that the strong inductive biases in self-supervised ViT models require to rethink the geometric priors that can be used for unsupervised part discovery.

View arXiv page View PDF GitHub 18 Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2407.04538

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 17

Browse 17 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2407.04538 in a dataset README.md to link it from this page.

PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers

Abstract

Community

Models citing this paper 17

Datasets citing this paper 0

Spaces citing this paper 1

Collections including this paper 1