Looking for cs.LG endorser — replication study of DARE-BENCH

shyamsivakumar · March 19, 2026, 3:36pm

Hi everyone,

I’m looking for an arXiv endorser for cs.LG. I have a replication study of DARE-BENCH (Shu et al., ICLR 2026) ready to submit.

I ran the benchmark’s reference solution generator on the publicly released 162 datasets and compared the deterministic sklearn baseline against the 8 LLM agents evaluated in the paper. The main finding: the deterministic baseline ranks 3rd of 9 systems on Classification-MM (58.78%), outperforming 6 of 8 LLM agents including GPT-5. This suggests the ML modeling evaluation primarily measures pipeline execution reliability rather than genuine modeling judgment.

Happy to share the draft as well

If you’re endorsed for cs.LG and willing to help, please let me know. Thank you!

Thanks,

Shyam Sivakumar

Topic		Replies	Views
Looking for endorsor for arXiv Submission (cs.LG) Research	10	163	February 28, 2026
Seeking arXiv endorsement (cs.LG) Research	0	22	February 8, 2026
arXiv Endorsement for Enterprise Multi-Agent Reliability Paper Research	1	24	February 26, 2026
Why can't I reproduce benchmark scores from papers like Phi, Llama, or Qwen? Am I doing something wrong or is this normal? Models	2	230	June 10, 2025
Reasoning LLM Benchmarking 🤗Transformers	2	3730	March 24, 2025

Looking for cs.LG endorser — replication study of DARE-BENCH

Related topics