HaochenWang/Grasp-Any-Region-Dataset
Viewer • Updated • 1.04M • 2.4k • 2
How to use HaochenWang/GAR-8B with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("feature-extraction", model="HaochenWang/GAR-8B", trust_remote_code=True) # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("HaochenWang/GAR-8B", trust_remote_code=True, dtype="auto")This repository contains the GAR-8B model, as presented in the paper Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs.
TL; DR: Our Grasp Any Region (GAR) supports both (1) describing a single region of an image or a video in the form of points/boxes/scribbles/masks in detail and (2) understanding multiple regions such as modeling interactions and performing complex reasoning. We also release a new benchmark, GARBench, to evaluate models on advanced region-level understanding tasks.
For detailed usage of this model, please refer to our GitHub repo.
Base model
facebook/Perception-LM-8B