YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
TREA 2.0 Pipeline
Audio question-answering dataset generator supporting diverse datasets like ESC-50, UrbanSound8K, and GISE. It dynamically concatenates variable-length audio clips to reach exact target durations. Creates 15 task types across three reasoning families: simple singular, multihop singular, and multihop inter-task temporal reasoning.
Quick Start
# 1. Install dependencies
pip install -r requirements.txt
# 2. Preprocess datasets (e.g., ESC-50, UrbanSound8K) (required for duration-based tasks)
python preprocess_esc50.py --config config.yaml
# Or for UrbanSound8K: python preprocess_urbansound8k.py --config config_urbansound8k.yaml
# 3. Generate datasets
python main.py --config config.yaml
# Or use the helper scripts for specific datasets:
# ./run_pipeline_urbansound8k.sh
# ./run_pipeline_gise.sh
Configuration
Edit config.yaml (or config_urbansound8k.yaml / config_gise.yaml) to set:
- Task duration:
task_duration_size(hours) per task - Clip duration range:
min_clip_durationtomax_clip_duration(seconds) - Dataset paths: Point to your source dataset location (e.g., ESC-50, UrbanSound8K, GISE)
- Variable Length Handling: Audio clips with native durations are automatically concatenated and trimmed to reach specific target durations while ensuring metadata correctness.
- Enable/disable tasks: Set
enabled: true/falsefor each task
Key Files
config.yaml,config_urbansound8k.yaml,config_gise.yaml- Configuration parameters for different datasetsmain.py- Pipeline entry point (runs all tasks)preprocess_esc50.py,preprocess_urbansound8k.py- Preprocess datasets for duration taskstasks/task_*.py- Individual task generatorstasks/multihop_base.py- Shared base class for multihop tasks
Tasks
Simple Singular Temporal Reasoning
Direct one-step questions over one temporal/acoustic property.
| Task | Question | Example |
|---|---|---|
| COUNT | "How many unique sounds?" / "How many times does X occur?" | Audio with distinct sound types or repetitions |
| DURATION | "Which sound is longest/shortest?" / "Which is longer?" | Compare sound durations and pairwise comparisons |
| ORDER | "Which sound is first/last/after X?" | Temporal sequence questions |
| VOLUME | "Which sound is loudest/softest?" | Loudness comparison |
| SILENCE GAP | "Which sounds have the longest silence?" | Compare silence durations between sequential sounds |
| OVERLAP | "Which sound overlaps with X?" / "Do X and Y overlap?" | Identify partially overlapping sound events |
| DURING/CONTAINS | "Which sound occurs during X?" | One sound temporally contained entirely within another |
Multihop Temporal Reasoning β Singular Task
Multi-hop questions within one task family. The model first applies a temporal condition (before, after, between), then answers a question of the same type.
| Task | Question | Reasoning Hops |
|---|---|---|
| CONDITIONAL COUNT | "How many X sounds occur after Y?" | temporal filter β count |
| CONDITIONAL DURATION | "Which sound after Y lasts the longest?" | temporal filter β duration comparison |
| BETWEEN EVENTS | "Which sound occurs between X and Y?" | temporal window β identification |
| EVENT DENSITY | "Which half of the audio has more events?" | region segmentation β count comparison |
Multihop Temporal Reasoning β Inter-Task
Multi-hop questions combining two or more temporal/acoustic properties.
| Task | Question | Reasoning Hops |
|---|---|---|
| DURATION GAP | "Which is longer: X or the silence after X?" | duration β silence comparison |
| TEMPORAL ARITHMETIC | "Which sound lasts longer in total: X or Y?" | aggregated duration across repetitions |
| TEMPORAL LOUDNESS | "Which sound after Y is the loudest?" | temporal filter β loudness comparison |
| MULTI HOP | "What sound occurs after the longest sound?" | property identification β order lookup |
Output Structure
output/{task}/
βββ audio/*.wav # Generated audio files
βββ {task}_mcq.csv # Multiple choice questions
βββ {task}_open_text.csv # Open-ended questions
βββ {task}_metadata.csv # Detailed metadata
Shell scripts (quick)
Use the provided shell helpers for simple runs.
Run full pipeline (uses python main.py under the hood):
# Make executable and run (from pipeline/)
./run_pipeline.sh
# With custom config, tasks, and output
./run_pipeline.sh --config my_config.yaml --tasks count,order --output ./my_dataset
# Run only multihop tasks
./run_pipeline.sh --tasks conditional_count,conditional_duration,between_events,event_density,duration_gap,temporal_arithmetic,temporal_loudness,multi_hop
Run the LLM answer generation across splits (uses llm_answer_generator.py):
# Processes open_text CSVs across splits/tasks defined in the script
./run_llm_answers_all.sh
# Or run per-file with the helper script directly
python llm_answer_generator.py --input /path/to/count_open_text.csv --mode open_text --task count
Advanced Usage
# Run specific tasks only
python main.py --tasks count order conditional_count multi_hop
# Use custom config (e.g., for UrbanSound8K)
python main.py --config config_urbansound8k.yaml
# Custom output directory
python main.py --output /path/to/output
# Preprocess with custom parameters
python preprocess_esc50.py --config config.yaml \
--threshold-strategy noise_floor \
--noise-floor-percentile 2.0 \
--noise-floor-delta-db 5.0
Available Task Names
All task names for --tasks argument:
Simple: count, duration, order, volume, silence_gap, overlap, during_contains
Multihop Singular: conditional_count, conditional_duration, between_events, event_density
Multihop Inter-Task: duration_gap, temporal_arithmetic, temporal_loudness, multi_hop
Documentation
See DOCS.md for complete technical documentation including:
- Mathematical formulations
- Detailed algorithm explanations
- Configuration parameter reference
- Preprocessing pipeline details
- Balancing mechanisms
Requirements
- Python 3.8+
- pydub
- numpy
- pandas
- tqdm
- pyyaml
- pyloudnorm (for LUFS-based loudness measurement)
- Downloads last month
- 2