YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

TREA 2.0 Pipeline

Audio question-answering dataset generator supporting diverse datasets like ESC-50, UrbanSound8K, and GISE. It dynamically concatenates variable-length audio clips to reach exact target durations. Creates 15 task types across three reasoning families: simple singular, multihop singular, and multihop inter-task temporal reasoning.

Quick Start

# 1. Install dependencies
pip install -r requirements.txt

# 2. Preprocess datasets (e.g., ESC-50, UrbanSound8K) (required for duration-based tasks)
python preprocess_esc50.py --config config.yaml
# Or for UrbanSound8K: python preprocess_urbansound8k.py --config config_urbansound8k.yaml

# 3. Generate datasets
python main.py --config config.yaml
# Or use the helper scripts for specific datasets:
# ./run_pipeline_urbansound8k.sh
# ./run_pipeline_gise.sh

Configuration

Edit config.yaml (or config_urbansound8k.yaml / config_gise.yaml) to set:

  • Task duration: task_duration_size (hours) per task
  • Clip duration range: min_clip_duration to max_clip_duration (seconds)
  • Dataset paths: Point to your source dataset location (e.g., ESC-50, UrbanSound8K, GISE)
  • Variable Length Handling: Audio clips with native durations are automatically concatenated and trimmed to reach specific target durations while ensuring metadata correctness.
  • Enable/disable tasks: Set enabled: true/false for each task

Key Files

  • config.yaml, config_urbansound8k.yaml, config_gise.yaml - Configuration parameters for different datasets
  • main.py - Pipeline entry point (runs all tasks)
  • preprocess_esc50.py, preprocess_urbansound8k.py - Preprocess datasets for duration tasks
  • tasks/task_*.py - Individual task generators
  • tasks/multihop_base.py - Shared base class for multihop tasks

Tasks

Simple Singular Temporal Reasoning

Direct one-step questions over one temporal/acoustic property.

Task Question Example
COUNT "How many unique sounds?" / "How many times does X occur?" Audio with distinct sound types or repetitions
DURATION "Which sound is longest/shortest?" / "Which is longer?" Compare sound durations and pairwise comparisons
ORDER "Which sound is first/last/after X?" Temporal sequence questions
VOLUME "Which sound is loudest/softest?" Loudness comparison
SILENCE GAP "Which sounds have the longest silence?" Compare silence durations between sequential sounds
OVERLAP "Which sound overlaps with X?" / "Do X and Y overlap?" Identify partially overlapping sound events
DURING/CONTAINS "Which sound occurs during X?" One sound temporally contained entirely within another

Multihop Temporal Reasoning β€” Singular Task

Multi-hop questions within one task family. The model first applies a temporal condition (before, after, between), then answers a question of the same type.

Task Question Reasoning Hops
CONDITIONAL COUNT "How many X sounds occur after Y?" temporal filter β†’ count
CONDITIONAL DURATION "Which sound after Y lasts the longest?" temporal filter β†’ duration comparison
BETWEEN EVENTS "Which sound occurs between X and Y?" temporal window β†’ identification
EVENT DENSITY "Which half of the audio has more events?" region segmentation β†’ count comparison

Multihop Temporal Reasoning β€” Inter-Task

Multi-hop questions combining two or more temporal/acoustic properties.

Task Question Reasoning Hops
DURATION GAP "Which is longer: X or the silence after X?" duration ↔ silence comparison
TEMPORAL ARITHMETIC "Which sound lasts longer in total: X or Y?" aggregated duration across repetitions
TEMPORAL LOUDNESS "Which sound after Y is the loudest?" temporal filter β†’ loudness comparison
MULTI HOP "What sound occurs after the longest sound?" property identification β†’ order lookup

Output Structure

output/{task}/
β”œβ”€β”€ audio/*.wav           # Generated audio files
β”œβ”€β”€ {task}_mcq.csv        # Multiple choice questions
β”œβ”€β”€ {task}_open_text.csv  # Open-ended questions
└── {task}_metadata.csv   # Detailed metadata

Shell scripts (quick)

Use the provided shell helpers for simple runs.

Run full pipeline (uses python main.py under the hood):

# Make executable and run (from pipeline/)
./run_pipeline.sh

# With custom config, tasks, and output
./run_pipeline.sh --config my_config.yaml --tasks count,order --output ./my_dataset

# Run only multihop tasks
./run_pipeline.sh --tasks conditional_count,conditional_duration,between_events,event_density,duration_gap,temporal_arithmetic,temporal_loudness,multi_hop

Run the LLM answer generation across splits (uses llm_answer_generator.py):

# Processes open_text CSVs across splits/tasks defined in the script
./run_llm_answers_all.sh

# Or run per-file with the helper script directly
python llm_answer_generator.py --input /path/to/count_open_text.csv --mode open_text --task count

Advanced Usage

# Run specific tasks only
python main.py --tasks count order conditional_count multi_hop

# Use custom config (e.g., for UrbanSound8K)
python main.py --config config_urbansound8k.yaml

# Custom output directory
python main.py --output /path/to/output

# Preprocess with custom parameters
python preprocess_esc50.py --config config.yaml \
    --threshold-strategy noise_floor \
    --noise-floor-percentile 2.0 \
    --noise-floor-delta-db 5.0

Available Task Names

All task names for --tasks argument:

Simple: count, duration, order, volume, silence_gap, overlap, during_contains

Multihop Singular: conditional_count, conditional_duration, between_events, event_density

Multihop Inter-Task: duration_gap, temporal_arithmetic, temporal_loudness, multi_hop

Documentation

See DOCS.md for complete technical documentation including:

  • Mathematical formulations
  • Detailed algorithm explanations
  • Configuration parameter reference
  • Preprocessing pipeline details
  • Balancing mechanisms

Requirements

  • Python 3.8+
  • pydub
  • numpy
  • pandas
  • tqdm
  • pyyaml
  • pyloudnorm (for LUFS-based loudness measurement)
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support