CLEF 2026 - Call for Participation
Task 3 leaderboard is ready · Updated submission deadline: 05 May 2026
Task 1 submission hub is live · Hindi is currently available

FinMMEval Lab 2026

Multilingual and Multimodal Evaluation of Financial AI Systems

Latest News

Task 1 Ready Updated: 27 April 2026

Task 1 dev leaderboard ready

The Task 1 submission hub is live for Hindi. English, Chinese, and Arabic are temporarily hidden while we complete final data-pool review and split validation.

1
One Submission Hub
Hindi is currently available from the Task 1 submission hub.
2
Live Leaderboard
Leaderboard data is rendered directly in the hub after each successful submission.
3
Public Dev Access
Participants can download the released Hindi dev set and submit predictions from the main Task 1 interface.
Open Submission
Use the Task 1 submission hub to validate on the released dev leaderboards and keep training separate from those evaluation sets.
Open Task 1 Submission

Current language: Hindi. English, Chinese, and Arabic are temporarily hidden.

Leaderboard Ready Updated: 4 February 2026 Updated deadline: 05 May 2026

Endpoint Registration Open for Task 3 - Financial Decision Making

Submit or update your endpoint via the Google Form. The organizers will verify submitted endpoints before the official leaderboard is computed over a common evaluation window.

1
Daily Requests
One call per day at 00:00 UTC.
2
Actions
BUY / HOLD / SELL mapped to long / flat / short.
3
Common Evaluation Window
Official performance is compared over the same market period for all accepted endpoints.
Submit Now
Submit your endpoint via the Agent Market Arena Google Form.
Submit Endpoint

Dataset collection: MBZUAI/finmmeval-lab-clef2026

Awards Announced Updated: 4 February 2026

New awards for top submissions

We are recognizing the best systems with Best Paper, Outstanding Paper, and Merit / Encouragement awards.

🏆
Best Paper
USD 500
🥈
Outstanding Paper
USD 300 each (x3)
🌱
Merit / Encouragement
USD 200 each (x2)
Awards
Top submissions will be recognized with monetary awards.
View Awards

See details in the Awards section.

Training Data Released Released: 15 December 2025

Official Public Data For All Tasks

Use the released Hugging Face collection as the official public data source for model development. Participants may reorganize or re-split the released datasets as needed. Each dataset card includes format, licensing, and citation guidance.

1
Task 1
Multilingual exam-style multiple choice.
2
Task 2
PolyFiQA (Easy + Expert) filings with multilingual news Q&A.
3
Task 3
BTC and TSLA daily contexts for Buy/Hold/Sell reasoning.
New
Browse the Hugging Face collection to download the official public data release and review licensing.
Open Collection

Participants may use these released datasets for training even if original dataset cards use train, validation, or test split names.

About the Lab

FinMMEval Lab integrates financial reasoning, multilingual understanding, and decision-making into a unified evaluation suite designed to promote robust, transparent, and globally competent financial AI. The 2026 edition introduces three interconnected tasks spanning five languages.

Multi-modal inputs: news, filings, macro indicators, tests.

Multiple languages with low-resource representations: English, Chinese, Arabic, Hindi, Greek, Japanese, Spanish.

Tasks spanning Q&A, and decision making.

Metrics centered on Accuracy, ROUGE-1, BLEURT and performance quantitative metrics (e.g. CR, SR, MD).

Financial AI Framework

"How can I tailor my setup to make an LLM exceptionally good at finance?"

Tasks

Choose one or more tasks. Follow the task-specific submission format; Task 1 and Task 2 do not require confidence scores. For Task 1 and Task 2, dataset expansion and/or real-time retrieval during inference are allowed, but these must be clearly disclosed in the submitted system paper.

Task 1 - Financial Exam Q&A

Given a stand-alone multiple-choice question Q with four candidate options { A1, A2, A3, A4 }, the system must select the correct answer A. Questions cover valuation, accounting, ethics, corporate finance, and regulatory knowledge.

Motivation

Professional financial qualification exams (e.g., CFA, EFPA) require the integration of theoretical and regulatory knowledge with applied reasoning. Existing LLMs often rely on factual recall without demonstrating the analytical rigor expected from human candidates.

Data

  • EFPA (Spanish): 50 exam-style financial questions on investment and regulation.
  • GRFinQA (Greek): 225 multiple-choice finance questions from university-level exams.
  • CFA (English): 600 exam-style multiple-choice questions covering nine core domains.
  • CPA (Chinese): 300 exam-style financial questions focusing on major modules.
  • BBF (Hindi): 500-1000 exam-style financial multiple-choice questions covering over 30 domains.

Official Data Usage

The FinMMEval Hugging Face collection is the official public data release for Task 1. Participants may use the released datasets for training and may reorganize or re-split them as needed. The original split names on dataset cards do not restrict participant usage.

Evaluation

Models are required to output the correct answer label. Performance is measured by accuracy, defined as the proportion of correctly identified options in the test set.

Submission Policy

For Task 1, dataset expansion and/or real-time retrieval during inference are allowed. Please clearly disclose any such components, together with the overall inference setup, in your submitted system paper.

The released Task 1 dev leaderboards use separate organizer-held evaluation data. Those public dev sets are for validation only and should not be used for training. The remaining hidden test sets are reserved for final evaluation.

At present, we do not enforce a hard submission cap for Task 1. Participants may submit multiple times as needed during development, but should avoid unnecessary rapid resubmission.

Task 1 Submission

A unified Task 1 submission hub is available for released language-specific dev benchmarks. These public dev leaderboards are validation tracks only and are separate from the official public training collection.

Important Dates

Specific dates will be announced once they are fixed.

  1. Lab registration opens
    17 November 2025
  2. Training data released
    15 December 2025 • Available now via the Hugging Face collection
  3. Lab registration closes
    23 April 2026
  4. Task 3 endpoint submission deadline
    05 May 2026
  5. Task 1/2 dev leaderboards and test questions release
    06 May 2026
  6. Task 1/2 final run submission deadline and leaderboard release
    15 May 2026, after submissions close
  7. Deadline for the submission of working notes [CEUR-WS]
    28 May 2026
  8. Review process of participant papers
    28 May – 30 June 2026
  9. Submission of Condensed Lab Overviews [LNCS]
    08 June 2026
  10. Notification of Acceptance for Condensed Lab Overviews [LNCS]
    15 June 2026
  11. Camera Ready Copy of Condensed Lab Overviews [LNCS] due
    22 June 2026
  12. Notification of Acceptance for Participant Papers [CEUR-WS]
    30 June 2026
  13. Camera Ready Copy of Participant Papers and Extended Lab Overviews [CEUR-WS] due
    06 July 2026
  14. CLEF 2026 Conference
    21–24 September 2026 • Jena, Germany

Awards

Awards will consider both leaderboard performance and the quality of the submitted system paper.

🏆
Best Paper Award
USD 500

Single award based on overall excellence, considering both leaderboard performance and paper quality.

🥈
Outstanding Paper Award
USD 300 each (x3)

Three awards recognizing strong systems, clear methodology, and high-quality papers.

🌱
Merit / Encouragement Award
USD 200 each (x2)

Two awards highlighting promising approaches and well-documented submissions.

How to Participate

Engage with the challenges in a way that suits you - from a quick, one-time experiment to a detailed research project. While we invite you to share your findings in our workshop notes, you are also free to develop promising results into a full paper for an archival journal.

The workshop itself is a perfect opportunity to refine your ideas through discussion with peers.

Ready to join?

Sign Up

Sign up via the CLEF registration form (FinMMEval section)

Task 1 Submission

The Task 1 submission hub consolidates released language-specific dev portals in one place. These public dev leaderboards are separate from the official public training collection. Hindi is currently available; English, Chinese, and Arabic are temporarily hidden while we complete final data-pool review and split validation.

Task 2 Submission

Task 2 uses an organizer-held additional test set. We will release the evaluation questions, participants will submit textual answers, and the organizer side will score the runs. The submission format will stay aligned with the already released Task 2 data structure, and the final instructions will be shared together with the test-set release.

Packaging Checklist


  • Results JSONL (per task)
  • System Card (architecture, data usage, risks)
  • For Task 1/2, clearly disclose any dataset expansion or real-time retrieval used during inference
  • Reproducibility (seed, versions, hardware)
  • License compliance acknowledgements (if applicable)

Organizers

Organizing committee and partner institutions.

Zhuohan Xie
Zhuohan Xie
MBZUAI (UAE)
Rania Elbadry
Rania Elbadry
MBZUAI (UAE)
Fan Zhang
Fan Zhang
The University of Tokyo (Japan)
Georgi Georgiev
Georgi Georgiev
Sofia University "St. Kliment Ohridski" (Bulgaria)
Xueqing Peng
Xueqing Peng
The Fin AI (USA)
Lingfei Qian
Lingfei Qian
The Fin AI (USA)
Jimin Huang
Jimin Huang
The Fin AI (USA)
Dimitar Dimitrov
Dimitar Dimitrov
Sofia University "St. Kliment Ohridski" (Bulgaria)
Vanshikaa Jani
Vanshikaa Jani
University of Arizona (USA)
Yuyang Dai
Yuyang Dai
INSAIT (Bulgaria)
Jiahui Geng
Jiahui Geng
Linköping University (Sweden)
Yankai Chen
Yankai Chen
McGill University (Canada) & MBZUAI (UAE)
Yuan Ye
Yuan Ye
McGill University (Canada) & Mila - Quebec AI Institute (Canada)
Haolun Wu
Haolun Wu
McGill University (Canada) & Mila - Quebec AI Institute (Canada)
Yuxia Wang
Yuxia Wang
INSAIT (Bulgaria)
Ivan Koychev
Ivan Koychev
Sofia University "St. Kliment Ohridski" (Bulgaria)
Veselin Stoyanov
Veselin Stoyanov
MBZUAI (UAE)
Mingzi Song
Mingzi Song
Nikkei Financial Technology Research Institute, Inc. (Japan)
Yu Chen
Yu Chen
The University of Tokyo (Japan)
Steve Liu
Steve Liu
McGill University (Canada) & MBZUAI (UAE)
Preslav Nakov
Preslav Nakov
MBZUAI (UAE)

Frequently Asked Questions

Who can participate?

Researchers and practitioners from academia and industry. Student teams are particularly welcome.

How is data licensed?

Research-only license; redistribution of raw sources may be restricted.

Can we submit to multiple tasks?

Yes. Submit independent result bundles per task.

Is there a hard limit on the number of submissions per task?

At present, we do not enforce a hard submission cap per task. Participants may submit multiple times as needed, especially for Task 1 dev validation and Task 3 endpoint iteration, but should avoid unnecessary rapid resubmission.

Are ensembles allowed?

Yes, but disclose all components in the system card.

Can we use dataset expansion or real-time retrieval for Task 1 or Task 2?

Yes. For Task 1 and Task 2, dataset expansion and/or real-time retrieval during inference are allowed, but they must be clearly and fully disclosed in the submitted system paper.

How will Task 2 submissions work?

The organizers will release an additional held-out Task 2 test set. We will provide the questions, participants will submit their generated answers, and the organizer side will run the evaluation. The submission format will follow the same general structure as the already released Task 2 data, with final instructions shared alongside the test-set release.

Can we train on the official public Hugging Face collection for Task 1?

Yes. The FinMMEval Hugging Face collection is the official public data release. Participants may use the released Task 1 datasets for training and may reorganize or re-split them as needed. The original split names on dataset cards do not restrict participant usage. However, the separate Task 1 dev leaderboard sets are organizer-held evaluation data and should not be used for training.

How does the Task 3 timeline work?

The updated Task 3 endpoint submission deadline is 05 May 2026. This is the deadline for submitting or updating an endpoint, not the end of evaluation. Submitting the Google Form registers or updates an endpoint, but it may not appear on the leaderboard immediately.

Because Task 3 evaluates market decisions across multiple trading days, organizers first verify submitted endpoints and then compute official performance over a common evaluation window for all accepted endpoints, rather than starting separately from each team’s individual submission date. We expect this window to run through late June or early July, aligned with the final lab reporting schedule.

The daily runner starts at 00:00 UTC. Teams do not need to keep endpoints online for the full day, but should start them shortly before 00:00 UTC and keep them available for several hours after the runner starts.

Do we need final Task 3 results before writing the paper?

No. Participants are encouraged to prepare working notes early. The paper should focus on the system architecture, methodology, and experimental setup, and results may be updated later as long as the evaluation status is stated clearly. Awards are decided primarily based on paper quality, with leaderboard performance used as supporting evidence.

Contact

Email

zhuohan.xie@mbzuai.ac.ae

Discord

Join the FinMMEval server

Ask questions, share progress, and get updates.

Discord QR code for the FinMMEval server