The FinMMEval Hugging Face collection is the official public data release for Task 1. Participants may use those released datasets for training and may reorganize or re-split them as needed.
The language-specific dev sets on this page are separate organizer-held evaluation data. They are intended for validation on the live leaderboard and should not be added back into training. The remaining hidden test sets are reserved for final evaluation.
English, Chinese, and Arabic dev leaderboards are temporarily hidden while we complete final data-pool review and split validation.
At present, we do not enforce a hard submission cap for Task 1 dev leaderboards. Participants may submit multiple times as needed, but should avoid unnecessary rapid resubmission.
Rows marked as Baseline are organizer sanity checks: Random, Always A, Round Robin, and Qwen2.5-0.5B-Instruct zero-shot.