How to reproduce the benchmark of GLM-5 on different datasets? which tool do you use? Are OpenCompass、lm-evaluation-harness、EvalScope ok ?
· Sign up or log in to comment