LLäMmlein 7B · GermEval 2014 NER
This model is a LoRA fine-tune of LSX-UniWue/LLaMmlein_7B for Named Entity Recognition (NER) on the original German GermEval 2014 dataset. It was trained using a bracketed inline output format and an Alpaca-style instruction-tuning setup, reproducing the experimental configuration from Zhan et al. (2026).
Training code, evaluation scripts, and full configuration are available at: https://github.com/stefan-it/llms-meet-ner
Bracketed Inline Format
Instead of producing a label sequence, the model rewrites the input sentence by wrapping each named entity in [Entity Text | LABEL] brackets. Plain (non-entity) tokens are left unchanged.
Input:
Schartau sagte dem " Tagesspiegel " vom Freitag , Fischer sei " in einer Weise aufgetreten , die alles andere als überzeugend war " .
Output:
[Schartau | PER] sagte dem " [Tagesspiegel | ORG] " vom Freitag , [Fischer | PER] sei " in einer Weise aufgetreten , die alles andere als überzeugend war " .
Multi-token entities are supported naturally — the entire span appears inside a single bracket pair. Everything after the first newline in the model output is discarded to handle hallucinated continuations.
Instruction Format
The model was fine-tuned in Alpaca-style instruction format. Each training example consists of a system instruction defining the task and label set, followed by the input sentence as the user turn and the bracketed inline annotation as the expected response.
The instruction used for GermEval 2014 dataset:
Your task is to identify all named entities in the input sentence and rewrite the sentence by enclosing each entity using the format [Entity Text | LABEL]. Use only the label tags defined in the Label Set below.
Label Set:
PER(person): A named individual, including real people, fictional characters, animal names, and aliases or nicknames.
PERderiv(person derivative): An adjective or other derivative form derived from a person name (e.g. "Kafkaesque" from "Kafka").
PERpart(person part): A token that contains a person name as a component part, joined with non-name material into a compound.
ORG(organization): A collective entity acting as a unit, including companies, institutions, governmental and political bodies, courts, military units, clubs, newspapers, museums, universities, hospitals, bands, and other organized groups.
ORGderiv(organization derivative): An adjective or other derivative form derived from an organization name (e.g. "googeln" from "Google").
ORGpart(organization part): A token that contains an organization name as a component part, joined with non-name material into a compound.
LOC(location): A geographical or spatial entity, including countries, cities, regions, districts, streets, squares, landmarks, natural features such as mountains, rivers, and lakes, as well as planets and other celestial bodies.
LOCderiv(location derivative): An adjective or other derivative form derived from a location name (e.g. "Berliner" from "Berlin", "europäisch" from "Europa").
LOCpart(location part): A token that contains a location name as a component part, joined with non-name material into a compound (e.g. "deutschlandweit" containing "Deutschland").
OTH(other): Named entities not covered by PER, ORG, or LOC, including languages, nationalities, religions, ideologies, artworks, books, films, events, projects, programmes, currencies, market indices, operating systems, websites, and other named concepts.
OTHderiv(other derivative): An adjective or other derivative form derived from an OTH entity (e.g. "englisch" from "Englisch").
OTHpart(other part): A token that contains an OTH entity as a component part, joined with non-name material into a compound.
Now process the input sentence:
Training
| Hyperparameter | Value |
|---|---|
| Base model | LSX-UniWue/LLaMmlein_7B |
| Fine-tuning method | LoRA (via LLaMA-Factory) |
| LoRA rank | 256 |
| LoRA alpha | 512 |
| LoRA target | all |
| Training dataset | GermEval 2014 train split |
| Epochs | 2 |
| Learning rate | 2.0e-5 |
| LR scheduler | cosine |
| Warmup ratio | 0.1 |
| Per-device batch size | 1 |
| Gradient accumulation steps | 8 |
| Effective batch size | 8 |
| Max sequence length | 2048 |
| Precision | bfloat16 |
Evaluation Setup
Evaluation is performed in two complementary ways, both working from the raw model output (bracketed inline predictions) aligned against gold labels taken directly from the original GermEval 2014 dataset — never from the converted training format, to avoid any annotation artefacts.
- seqeval — token-level strict span matching: a span is correct only if both its boundaries and entity type match exactly.
- nervaluate — span-level evaluation reporting four scenarios (strict, exact, partial, ent_type) following the SemEval 2013 Task 9.1 metrics.
Results
Development Set — 2,200 sentences, 41,653 tokens
seqeval
F1: 0.8874
Precision: 0.8787
Recall: 0.8964
Test Set — 5,100 sentences, 96,499 tokens
seqeval
F1: 0.8866
Precision: 0.8918
Recall: 0.8815
It shows a very strong performance of 88.66 on the test dataset.
Model tree for stefan-it/LLaMmlein_7B-GermEval-2014
Base model
LSX-UniWue/LLaMmlein_7B