L
llm-evaluation

Projects with this topic

View Bazi Bech project

Jiang Gu / Bazi Bech

Most LLM reasoning benchmarks come from Western math, English logic, or code. Bazi-Bench tests multi-step rule-following inference in a different formal system: traditional Chinese Ba Zi. Frozen tables, Python reference impl, gold CoT cases, all mechanically verifiable.

llm AI evaluation ai-reasoning reasoning-be... llm-evaluation Chain-of-Tho... multiple-ste... ba-zi Python symbolic-rea... rule-based-s...

1

Updated May 13, 2026

1 0 0 0

Updated May 13, 2026