q1-3B-PRIME, a tiny reasoning model trained with RL on top of SmallThinker-3B