Multimodal - a markredito Collection

markredito 's Collections

Image Generation

Interpretability

Music Generation

Multimodal

updated Sep 7, 2024

Compositional Foundation Models for Hierarchical Planning

Paper • 2309.08587 • Published Sep 15, 2023 • 11
DreamLLM: Synergistic Multimodal Comprehension and Creation

Paper • 2309.11499 • Published Sep 20, 2023 • 60
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning

Paper • 2309.15091 • Published Sep 26, 2023 • 35
Context-Aware Meta-Learning

Paper • 2310.10971 • Published Oct 17, 2023 • 17
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

Paper • 2310.11441 • Published Oct 17, 2023 • 29
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning

Paper • 2310.09478 • Published Oct 14, 2023 • 21
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46
Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 134