Projects with this topic
Sort by:
-
Most LLM reasoning benchmarks come from Western math, English logic, or code. Bazi-Bench tests multi-step rule-following inference in a different formal system: traditional Chinese Ba Zi. Frozen tables, Python reference impl, gold CoT cases, all mechanically verifiable.
Updated