slprl
/

SIMS-7B

@@ -12,7 +12,7 @@ pipeline_tag: audio-to-audio
 # Scaling Analysis of Interleaved Speech-Text Language Models
-The model was presented in the paper [Scaling Analysis of Interleaved Speech-Text Language Models](https://arxiv.org/abs/).
 # Paper abstract
 Existing Speech Language Model (SLM) scaling analysis paints a bleak picture. They predict that SLMs require much more compute and data
@@ -32,7 +32,7 @@ This is a Speech Language Model (SLM) trained for generating speech or text cont
 ## Model Details
 ### Model Description
-This Speech Language Model, introduced in ["Scaling Analysis of Interleaved Speech-Text Language Models"](https://arxiv.org/abs/), focuses on scaling analysis of interleaved speech-text SLMs.
 It was fine-tuned from [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) by extending its vocabulary with 500 speech tokens extracted from
 the 11-th layer of [mhubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz).
@@ -44,7 +44,7 @@ the 11-th layer of [mhubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz
 ### Model Sources
 - **Repository:** [https://github.com/slp-rl/slamkit](https://github.com/slp-rl/slamkit)
-- **Paper:** [https://arxiv.org/abs/](https://arxiv.org/abs/)
 - **Demo:** [https://pages.cs.huji.ac.il/adiyoss-lab/sims/](https://pages.cs.huji.ac.il/adiyoss-lab/sims/)
 ## Uses
@@ -60,7 +60,7 @@ We refer users to the official repository for full usage explanations - [github]
 ## Training Details
-We highly encourage users to read the full [paper](https://arxiv.org/abs/2502.15814), for full training details.
 ### Compute Infrastructure
@@ -76,6 +76,12 @@ easy and efficient training of Speech Language Models.
 **BibTeX:**
 ```
 @misc{maimon2025scaling,
-      soon
 }
 ```

 # Scaling Analysis of Interleaved Speech-Text Language Models
+The model was presented in the paper [Scaling Analysis of Interleaved Speech-Text Language Models](https://arxiv.org/abs/2504.02398).
 # Paper abstract
 Existing Speech Language Model (SLM) scaling analysis paints a bleak picture. They predict that SLMs require much more compute and data
 ## Model Details
 ### Model Description
+This Speech Language Model, introduced in ["Scaling Analysis of Interleaved Speech-Text Language Models"](https://arxiv.org/abs/2504.02398), focuses on scaling analysis of interleaved speech-text SLMs.
 It was fine-tuned from [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) by extending its vocabulary with 500 speech tokens extracted from
 the 11-th layer of [mhubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz).
 ### Model Sources
 - **Repository:** [https://github.com/slp-rl/slamkit](https://github.com/slp-rl/slamkit)
+- **Paper:** [https://arxiv.org/abs/](https://arxiv.org/abs/2504.02398)
 - **Demo:** [https://pages.cs.huji.ac.il/adiyoss-lab/sims/](https://pages.cs.huji.ac.il/adiyoss-lab/sims/)
 ## Uses
 ## Training Details
+We highly encourage users to read the full [paper](https://arxiv.org/abs/2504.02398), for full training details.
 ### Compute Infrastructure
 **BibTeX:**
 ```
 @misc{maimon2025scaling,
+      title={Scaling Analysis of Interleaved Speech-Text Language Models},
+      author={Gallil Maimon and Michael Hassid and Amit Roth and Yossi Adi},
+      year={2025},
+      eprint={2504.02398},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2504.02398},
 }
 ```