SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks Paper • 2603.24755 • Published 10 days ago • 27
nick11roberts/co-emerge-overtrained-rw-params37M_maxstep219586-flop_2_56e19_step_219586 56.5M • Updated 15 days ago • 121
nick11roberts/co-emerge-overtrained-rw-params37M_maxstep219586-flop_2_56e19_step_219586 56.5M • Updated 15 days ago • 121
nick11roberts/co-emerge-overtrained-rw-params84M_maxstep95981-flop_2_56e19_step_95981 0.1B • Updated 15 days ago • 121
nick11roberts/co-emerge-overtrained-rw-params84M_maxstep95981-flop_2_56e19_step_95981 0.1B • Updated 15 days ago • 121
nick11roberts/co-emerge-overtrained-rw-params149M_maxstep58415-flop_2_56e19_step_58415 0.2B • Updated 15 days ago • 125
nick11roberts/co-emerge-overtrained-rw-params149M_maxstep58415-flop_2_56e19_step_58415 0.2B • Updated 15 days ago • 125
nick11roberts/co-emerge-overtrained-rw-params9M_maxstep14128-flop_4_00e17_step_14128 17.9M • Updated 18 days ago • 136
nick11roberts/co-emerge-overtrained-rw-params9M_maxstep14128-flop_4_00e17_step_14128 17.9M • Updated 18 days ago • 136
nick11roberts/co-emerge-overtrained-rw-params7M_maxstep18165-flop_4_00e17_step_18165 14M • Updated 18 days ago • 136
nick11roberts/co-emerge-overtrained-rw-params7M_maxstep18165-flop_4_00e17_step_18165 14M • Updated 18 days ago • 136
nick11roberts/co-emerge-overtrained-rw-params22M_maxstep5779-flop_4_00e17_step_5779 37M • Updated 18 days ago • 135
nick11roberts/co-emerge-overtrained-rw-params22M_maxstep5779-flop_4_00e17_step_5779 37M • Updated 18 days ago • 135