Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up
markredito 's Collections
Image Generation
LLMs
Audio
Interpretability
Multimodal
Music Generation
experiments
robotics
3D

Multimodal

updated Sep 7, 2024
Upvote
-

  • Compositional Foundation Models for Hierarchical Planning

    Paper • 2309.08587 • Published Sep 15, 2023 • 11

  • DreamLLM: Synergistic Multimodal Comprehension and Creation

    Paper • 2309.11499 • Published Sep 20, 2023 • 60

  • VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning

    Paper • 2309.15091 • Published Sep 26, 2023 • 35

  • Context-Aware Meta-Learning

    Paper • 2310.10971 • Published Oct 17, 2023 • 17

  • Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

    Paper • 2310.11441 • Published Oct 17, 2023 • 29

  • MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning

    Paper • 2310.09478 • Published Oct 14, 2023 • 21

  • VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

    Paper • 2403.00522 • Published Mar 1, 2024 • 46

  • Building and better understanding vision-language models: insights and future directions

    Paper • 2408.12637 • Published Aug 22, 2024 • 134
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs