LLäMmlein 7B · GermEval 2014 NER

This model is a LoRA fine-tune of LSX-UniWue/LLaMmlein_7B for Named Entity Recognition (NER) on the original German GermEval 2014 dataset. It was trained using a bracketed inline output format and an Alpaca-style instruction-tuning setup, reproducing the experimental configuration from Zhan et al. (2026).

Training code, evaluation scripts, and full configuration are available at: https://github.com/stefan-it/llms-meet-ner


Bracketed Inline Format

Instead of producing a label sequence, the model rewrites the input sentence by wrapping each named entity in [Entity Text | LABEL] brackets. Plain (non-entity) tokens are left unchanged.

Input:

Schartau sagte dem " Tagesspiegel " vom Freitag , Fischer sei " in einer Weise aufgetreten , die alles andere als überzeugend war " .

Output:

[Schartau | PER] sagte dem " [Tagesspiegel | ORG] " vom Freitag , [Fischer | PER] sei " in einer Weise aufgetreten , die alles andere als überzeugend war " .

Multi-token entities are supported naturally — the entire span appears inside a single bracket pair. Everything after the first newline in the model output is discarded to handle hallucinated continuations.


Instruction Format

The model was fine-tuned in Alpaca-style instruction format. Each training example consists of a system instruction defining the task and label set, followed by the input sentence as the user turn and the bracketed inline annotation as the expected response.

The instruction used for GermEval 2014 dataset:

Your task is to identify all named entities in the input sentence and rewrite the sentence by enclosing each entity using the format [Entity Text | LABEL]. Use only the label tags defined in the Label Set below.
Label Set:
PER(person): A named individual, including real people, fictional characters, animal names, and aliases or nicknames.
PERderiv(person derivative): An adjective or other derivative form derived from a person name (e.g. "Kafkaesque" from "Kafka").
PERpart(person part): A token that contains a person name as a component part, joined with non-name material into a compound.
ORG(organization): A collective entity acting as a unit, including companies, institutions, governmental and political bodies, courts, military units, clubs, newspapers, museums, universities, hospitals, bands, and other organized groups.
ORGderiv(organization derivative): An adjective or other derivative form derived from an organization name (e.g. "googeln" from "Google").
ORGpart(organization part): A token that contains an organization name as a component part, joined with non-name material into a compound.
LOC(location): A geographical or spatial entity, including countries, cities, regions, districts, streets, squares, landmarks, natural features such as mountains, rivers, and lakes, as well as planets and other celestial bodies.
LOCderiv(location derivative): An adjective or other derivative form derived from a location name (e.g. "Berliner" from "Berlin", "europäisch" from "Europa").
LOCpart(location part): A token that contains a location name as a component part, joined with non-name material into a compound (e.g. "deutschlandweit" containing "Deutschland").
OTH(other): Named entities not covered by PER, ORG, or LOC, including languages, nationalities, religions, ideologies, artworks, books, films, events, projects, programmes, currencies, market indices, operating systems, websites, and other named concepts.
OTHderiv(other derivative): An adjective or other derivative form derived from an OTH entity (e.g. "englisch" from "Englisch").
OTHpart(other part): A token that contains an OTH entity as a component part, joined with non-name material into a compound.
Now process the input sentence:

Training

Hyperparameter Value
Base model LSX-UniWue/LLaMmlein_7B
Fine-tuning method LoRA (via LLaMA-Factory)
LoRA rank 256
LoRA alpha 512
LoRA target all
Training dataset GermEval 2014 train split
Epochs 2
Learning rate 2.0e-5
LR scheduler cosine
Warmup ratio 0.1
Per-device batch size 1
Gradient accumulation steps 8
Effective batch size 8
Max sequence length 2048
Precision bfloat16

Evaluation Setup

Evaluation is performed in two complementary ways, both working from the raw model output (bracketed inline predictions) aligned against gold labels taken directly from the original GermEval 2014 dataset — never from the converted training format, to avoid any annotation artefacts.

  • seqeval — token-level strict span matching: a span is correct only if both its boundaries and entity type match exactly.
  • nervaluate — span-level evaluation reporting four scenarios (strict, exact, partial, ent_type) following the SemEval 2013 Task 9.1 metrics.

Results

Development Set — 2,200 sentences, 41,653 tokens

seqeval

  F1:        0.8874
  Precision: 0.8787
  Recall:    0.8964

Test Set — 5,100 sentences, 96,499 tokens

seqeval

  F1:        0.8866
  Precision: 0.8918
  Recall:    0.8815

It shows a very strong performance of 88.66 on the test dataset.


Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for stefan-it/LLaMmlein_7B-GermEval-2014

Adapter
(5)
this model

Dataset used to train stefan-it/LLaMmlein_7B-GermEval-2014

Paper for stefan-it/LLaMmlein_7B-GermEval-2014