Looking for cs.LG endorser — replication study of DARE-BENCH

Hi everyone,

I’m looking for an arXiv endorser for cs.LG. I have a replication study of DARE-BENCH (Shu et al., ICLR 2026) ready to submit.

I ran the benchmark’s reference solution generator on the publicly released 162 datasets and compared the deterministic sklearn baseline against the 8 LLM agents evaluated in the paper. The main finding: the deterministic baseline ranks 3rd of 9 systems on Classification-MM (58.78%), outperforming 6 of 8 LLM agents including GPT-5. This suggests the ML modeling evaluation primarily measures pipeline execution reliability rather than genuine modeling judgment.

Happy to share the draft as well

If you’re endorsed for cs.LG and willing to help, please let me know. Thank you!

Thanks,

Shyam Sivakumar

1 Like