Instructions to use slprl/SIMS-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use slprl/SIMS-7B with Transformers:
# Load model directly from transformers import UnitLM model = UnitLM.from_pretrained("slprl/SIMS-7B", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -12,7 +12,7 @@ pipeline_tag: audio-to-audio
|
|
| 12 |
|
| 13 |
# Scaling Analysis of Interleaved Speech-Text Language Models
|
| 14 |
|
| 15 |
-
The model was presented in the paper [Scaling Analysis of Interleaved Speech-Text Language Models](https://arxiv.org/abs/).
|
| 16 |
|
| 17 |
# Paper abstract
|
| 18 |
Existing Speech Language Model (SLM) scaling analysis paints a bleak picture. They predict that SLMs require much more compute and data
|
|
@@ -32,7 +32,7 @@ This is a Speech Language Model (SLM) trained for generating speech or text cont
|
|
| 32 |
## Model Details
|
| 33 |
|
| 34 |
### Model Description
|
| 35 |
-
This Speech Language Model, introduced in ["Scaling Analysis of Interleaved Speech-Text Language Models"](https://arxiv.org/abs/), focuses on scaling analysis of interleaved speech-text SLMs.
|
| 36 |
It was fine-tuned from [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) by extending its vocabulary with 500 speech tokens extracted from
|
| 37 |
the 11-th layer of [mhubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz).
|
| 38 |
|
|
@@ -44,7 +44,7 @@ the 11-th layer of [mhubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz
|
|
| 44 |
### Model Sources
|
| 45 |
|
| 46 |
- **Repository:** [https://github.com/slp-rl/slamkit](https://github.com/slp-rl/slamkit)
|
| 47 |
-
- **Paper:** [https://arxiv.org/abs/](https://arxiv.org/abs/)
|
| 48 |
- **Demo:** [https://pages.cs.huji.ac.il/adiyoss-lab/sims/](https://pages.cs.huji.ac.il/adiyoss-lab/sims/)
|
| 49 |
|
| 50 |
## Uses
|
|
@@ -60,7 +60,7 @@ We refer users to the official repository for full usage explanations - [github]
|
|
| 60 |
|
| 61 |
|
| 62 |
## Training Details
|
| 63 |
-
We highly encourage users to read the full [paper](https://arxiv.org/abs/
|
| 64 |
|
| 65 |
|
| 66 |
### Compute Infrastructure
|
|
@@ -76,6 +76,12 @@ easy and efficient training of Speech Language Models.
|
|
| 76 |
**BibTeX:**
|
| 77 |
```
|
| 78 |
@misc{maimon2025scaling,
|
| 79 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
}
|
| 81 |
```
|
|
|
|
| 12 |
|
| 13 |
# Scaling Analysis of Interleaved Speech-Text Language Models
|
| 14 |
|
| 15 |
+
The model was presented in the paper [Scaling Analysis of Interleaved Speech-Text Language Models](https://arxiv.org/abs/2504.02398).
|
| 16 |
|
| 17 |
# Paper abstract
|
| 18 |
Existing Speech Language Model (SLM) scaling analysis paints a bleak picture. They predict that SLMs require much more compute and data
|
|
|
|
| 32 |
## Model Details
|
| 33 |
|
| 34 |
### Model Description
|
| 35 |
+
This Speech Language Model, introduced in ["Scaling Analysis of Interleaved Speech-Text Language Models"](https://arxiv.org/abs/2504.02398), focuses on scaling analysis of interleaved speech-text SLMs.
|
| 36 |
It was fine-tuned from [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) by extending its vocabulary with 500 speech tokens extracted from
|
| 37 |
the 11-th layer of [mhubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz).
|
| 38 |
|
|
|
|
| 44 |
### Model Sources
|
| 45 |
|
| 46 |
- **Repository:** [https://github.com/slp-rl/slamkit](https://github.com/slp-rl/slamkit)
|
| 47 |
+
- **Paper:** [https://arxiv.org/abs/](https://arxiv.org/abs/2504.02398)
|
| 48 |
- **Demo:** [https://pages.cs.huji.ac.il/adiyoss-lab/sims/](https://pages.cs.huji.ac.il/adiyoss-lab/sims/)
|
| 49 |
|
| 50 |
## Uses
|
|
|
|
| 60 |
|
| 61 |
|
| 62 |
## Training Details
|
| 63 |
+
We highly encourage users to read the full [paper](https://arxiv.org/abs/2504.02398), for full training details.
|
| 64 |
|
| 65 |
|
| 66 |
### Compute Infrastructure
|
|
|
|
| 76 |
**BibTeX:**
|
| 77 |
```
|
| 78 |
@misc{maimon2025scaling,
|
| 79 |
+
title={Scaling Analysis of Interleaved Speech-Text Language Models},
|
| 80 |
+
author={Gallil Maimon and Michael Hassid and Amit Roth and Yossi Adi},
|
| 81 |
+
year={2025},
|
| 82 |
+
eprint={2504.02398},
|
| 83 |
+
archivePrefix={arXiv},
|
| 84 |
+
primaryClass={cs.CL},
|
| 85 |
+
url={https://arxiv.org/abs/2504.02398},
|
| 86 |
}
|
| 87 |
```
|