FinMMEval Lab 2026

Latest News

Results Released Updated: 27 May 2026 Task 1 and Task 2

Final-test leaderboards are available

Task 1 and Task 2 final-test results are now released after organizer-side validation. Teams can use the official rankings and scores in their CLEF Working Notes papers.

Task 1

English, Chinese, Arabic, and Hindi final-test rankings are available.

Task 2

The official ROUGE-1 F1 leaderboard is available for completed submissions.

Working Notes

Participants may add the final result table/rank and a short discussion to their papers.

Final Test

View Task 1 and Task 2 official final-test rankings and download public ranking CSVs.

Open Final Results

Ranking-level metrics only are published. Hidden labels and participant answers remain private.

Task 1 Results Updated: 27 May 2026 Final Leaderboard

Task 1 final results are released

Official final-test leaderboards are available for English, Chinese, Arabic, and Hindi after organizer-side validation.

Dev Leaderboards

Live development leaderboards are available through the unified Task 1 hub.

Final-Test Results

Official accuracy scores and ranks are available on the Results page.

Four Languages

English, Chinese, Arabic, and Hindi are all linked from the same interface.

Final Results

Use the released Task 1 results in CLEF Working Notes papers.

Open Final Results

Submission status remains available from the Task 1 hub.

Task 3 Update Updated: 10 May 2026 Deadline: 10 May 2026 AoE

Task 3 endpoint checks are underway

Submit or update your Task 3 endpoint by the extended deadline. Organizers verify submitted endpoints before the common live evaluation window.

Organizer-Side Checks

We check endpoint reachability and valid BUY / HOLD / SELL responses from the organizer server.

Several Checks Per Day

Endpoints that do not pass are contacted by email for fixes and re-testing.

Common Evaluation Window

Official trading performance is compared over the same market period for all accepted endpoints.

Submit Endpoint

Use the Agent Market Arena form for new submissions or endpoint updates.

Open Submission Form

Already-passing teams do not need to resubmit unless endpoint details changed.

Awards Announced Updated: 4 February 2026

New awards for top submissions

We are recognizing the best systems with Best Paper, Outstanding Paper, and Merit / Encouragement awards.

🏆

Best Paper

USD 500

🥈

Outstanding Paper

USD 300 each (x3)

🌱

Merit / Encouragement

USD 200 each (x2)

Awards

Awards are for FinMMEval participant submissions documented in CLEF Working Notes papers.

View Awards

The separate CLEF main conference paper track is not used for FinMMEval awards.

About the Lab

FinMMEval Lab integrates financial reasoning, multilingual understanding, and decision-making into a unified evaluation suite designed to promote robust, transparent, and globally competent financial AI. The 2026 edition introduces three interconnected tasks spanning five languages.

✓

Multi-modal inputs: news, filings, macro indicators, tests.

✓

Multiple languages with low-resource representations: English, Chinese, Arabic, Hindi, Greek, Japanese, Spanish.

✓

Tasks spanning Q&A, and decision making.

✓

Metrics centered on Accuracy, ROUGE-1, BLEURT and performance quantitative metrics (e.g. CR, SR, MD).

"How can I tailor my setup to make an LLM exceptionally good at finance?"

Tasks

Choose one or more tasks. Follow the task-specific submission format; Task 1 and Task 2 do not require confidence scores. For Task 1 and Task 2, dataset expansion and/or real-time retrieval during inference are allowed, but these must be clearly disclosed in the submitted system paper.

Task 1 - Financial Exam Q&A

Given a stand-alone multiple-choice question Q with four candidate options { A₁, A₂, A₃, A₄ }, the system must select the correct answer A^∗. Questions cover valuation, accounting, ethics, corporate finance, and regulatory knowledge.

Motivation

Professional financial qualification exams (e.g., CFA, EFPA) require the integration of theoretical and regulatory knowledge with applied reasoning. Existing LLMs often rely on factual recall without demonstrating the analytical rigor expected from human candidates.

Data

EFPA (Spanish): 50 exam-style financial questions on investment and regulation.
GRFinQA (Greek): 225 multiple-choice finance questions from university-level exams.
CFA (English): 600 exam-style multiple-choice questions covering nine core domains.
CPA (Chinese): 300 exam-style financial questions focusing on major modules.
BBF (Hindi): 500-1000 exam-style financial multiple-choice questions covering over 30 domains.

Official Data Usage

The FinMMEval Hugging Face collection is the official public data release for Task 1. Participants may use the released datasets for training and may reorganize or re-split them as needed. The original split names on dataset cards do not restrict participant usage.

Evaluation

Models are required to output the correct answer label. Performance is measured by accuracy, defined as the proportion of correctly identified options in the test set.

Submission Policy

For Task 1, dataset expansion and/or real-time retrieval during inference are allowed. Please clearly disclose any such components, together with the overall inference setup, in your submitted system paper.

The released Task 1 dev leaderboards use separate organizer-held evaluation data. Those public dev sets are for validation only and should not be used for training. The remaining hidden test sets are reserved for final evaluation.

At present, we do not enforce a hard submission cap for Task 1. Participants may submit multiple times as needed during development, but should avoid unnecessary rapid resubmission.

Task 1 Submission

A unified Task 1 submission hub is available for released language-specific dev benchmarks. These public dev leaderboards are validation tracks only and are separate from the official public training collection.

Open Task 1 Submission Browse Languages

Task 3 - Financial Decision Making

Task 3 is a daily, news-driven trading workflow. We collect market news each day, call each submitted endpoint once, and execute positions from the returned action: BUY, HOLD, or SELL.

Data & Submission

Historical data for backtesting, validation, and model training: MBZUAI/finmmeval-lab-clef2026.

To participate, submit your endpoint via the Agent Market Arena page (Google Form): Agent Market Arena.

At present, we do not enforce a hard submission cap for Task 3. Teams may update their submitted endpoint as needed before the endpoint submission deadline, but should avoid unnecessary rapid resubmission.

The endpoint submission deadline is the deadline for submitting or updating your endpoint. It is not the end of the Task 3 evaluation period.

Submitting the Google Form registers or updates your endpoint, but it may not appear on the leaderboard immediately. Organizers first verify submitted endpoints and confirm the final endpoint list.

Daily Workflow

Each endpoint receives one request per day and must return one action: BUY, HOLD, or SELL.
Position mapping: BUY -> long, HOLD -> flat, SELL -> short.
Execution rule: each new action fully replaces the previous day’s position.
Price convention: daily close price.

Scheduling & Request Policy

Daily process starts at 00:00 UTC.
Requests are sent progressively after start time.
Official Task 3 performance is computed over a common evaluation window for all accepted endpoints, rather than starting separately from each team’s individual form submission date.
Submitted endpoints will continue to be called daily after the endpoint submission deadline for the official Task 3 evaluation window. We expect this window to run through late June or early July, aligned with the final lab reporting schedule.
Teams do not need to keep endpoints online for the full day, but should start them shortly before 00:00 UTC and keep them available for several hours to allow for queued requests, retries, and temporary network delays.
Per-request timeout is 3 minutes.
If a request fails (timeout, server error, or invalid response), the action defaults to HOLD.

Request / Response Format

Input is a JSON object containing date, price, news, symbol, momentum, history_price, and optional 10k/10q (object or null). The symbol varies by asset (e.g., TSLA, BTC).

{
  "recommended_action": "BUY"
}

Valid actions: BUY, HOLD, SELL.

Input Samples

Sample A (TSLA):

Note: 10k and 10q can each be an object or null.

{
  "date": "2025-01-15",
  "price": {"TSLA": 250.50},
  "news": {"TSLA": ["Tesla announces new production milestone"]},
  "symbol": ["TSLA"],
  "momentum": {"TSLA": "bullish"},
  "10k": {"TSLA": ["[SEC 10-K Filing - 2025-01-15]\nSummary..."]},
  "10q": {"TSLA": ["[SEC 10-Q Filing - 2025-01-15]\nSummary..."]},
  "history_price": {"TSLA": [
    {"date": "2025-01-12", "price": 249.80},
    ...,
    {"date": "2025-01-13", "price": 250.50},
    {"date": "2025-01-14", "price": 250.30}
  ]}
}

Sample B (BTC):

{
  "date": "2025-01-15",
  "price": {"BTC": 67890.50},
  "news": {"BTC": ["Bitcoin ETF inflows remain strong"]},
  "symbol": ["BTC"],
  "momentum": {"BTC": "neutral"},
  "10k": null,
  "10q": null,
  "history_price": {"BTC": [
    {"date": "2025-01-12", "price": 67580.00},
    ...,
    {"date": "2025-01-13", "price": 67720.00},
    {"date": "2025-01-14", "price": 67810.00}
  ]}
}

cURL Sample

curl -X POST "<your_endpoint>" \
  -H "Content-Type: application/json" \
  -d '{
    "date": "2025-01-15",
    "price": {"TSLA": 250.50},
    "news": {"TSLA": ["Tesla announces new production milestone"]},
    "symbol": ["TSLA"],
    "momentum": {"TSLA": "bullish"},
    "10k": {"TSLA": ["[SEC 10-K Filing - 2025-01-15]\nSummary..."]},
    "10q": {"TSLA": ["[SEC 10-Q Filing - 2025-01-15]\nSummary..."]},
    "history_price": {"TSLA": [
      {"date": "2025-01-12", "price": 249.80},
      ...,
      {"date": "2025-01-13", "price": 250.50},
      {"date": "2025-01-14", "price": 250.30}
    ]}
  }'

Data & Submission

Historical data for backtesting, validation, and model training: MBZUAI/finmmeval-lab-clef2026.

To participate, submit your endpoint via the Agent Market Arena page (Google Form): https://huggingface.co/spaces/TheFinAI/Agent-Market-Arena.

Reference endpoint example (FastAPI): examples/simple_trading_api.py.

Evaluation

Primary: Cumulative Return (CR)

Secondary: Sharpe Ratio (SR), Maximum Drawdown (MD), Daily Volatility (DV), and Annualized Volatility (AV)

Important Dates

Current lab milestones and submission deadlines.

Lab registration opens

17 November 2025
Training data released

15 December 2025 • Available now via the Hugging Face collection
Lab registration closes

23 April 2026
Task 3 endpoint submission deadline

10 May 2026 AoE
Task 1/2 dev leaderboards and test questions release

06 May 2026
Task 1/2 final run submission deadline and leaderboard release

25 May 2026 AoE submission deadline; final leaderboards released on 27 May 2026
Deadline for the submission of working notes [CEUR-WS]

28 May 2026
Review process of participant papers

28 May – 30 June 2026
Submission of Condensed Lab Overviews [LNCS]

08 June 2026
Notification of Acceptance for Condensed Lab Overviews [LNCS]

15 June 2026
Camera Ready Copy of Condensed Lab Overviews [LNCS] due

22 June 2026
Notification of Acceptance for Participant Papers [CEUR-WS]

30 June 2026
Camera Ready Copy of Participant Papers and Extended Lab Overviews [CEUR-WS] due

06 July 2026
CLEF 2026 Conference

21–24 September 2026 • Jena, Germany

Awards

Awards will consider both leaderboard performance and the quality of the submitted CLEF Working Notes participant paper.

🏆

Best Paper Award

USD 500

Single award based on overall excellence, considering both leaderboard performance and paper quality.

🥈

Outstanding Paper Award

USD 300 each (x3)

Three awards recognizing strong systems, clear methodology, and high-quality papers.

🌱

Merit / Encouragement Award

USD 200 each (x2)

Two awards highlighting promising approaches and well-documented submissions.

Award eligibility

FinMMEval awards are evaluated over FinMMEval participant submissions and their CLEF Working Notes papers. They are not evaluated through the separate CLEF main conference paper track. Participants may later develop a substantially extended version for another conference or journal, subject to that venue's originality, prior-publication, and dual-submission policies.

How to Participate

Engage with the challenges in a way that suits you - from a quick, one-time experiment to a detailed research project. To document an official FinMMEval participant system and be considered for FinMMEval awards, submit a CLEF Working Notes participant paper for the lab.

The CLEF main conference paper track is separate from the FinMMEval lab working notes. Promising systems may later be developed into a substantially extended conference or journal paper, subject to the target venue's policies.

Ready to join?

Task 1 Submission

Open Final Results View Submission Hub

The Task 1 final-test leaderboards for English, Chinese, Arabic, and Hindi are now available. The submission hub remains linked for reference.

Task 2 Final Results

The Task 2 final-test leaderboard is now available. Completed submissions are ranked by ROUGE-1 F1, the primary metric for the task.

Open Final Results

Working Notes Template

FinMMEval participant working notes should follow the CLEF 2026 Labs Working Notes format. Papers are expected to be written in English, use the 1-column CEURART template, and be at least 5 pages long. There is no maximum page limit.

Download LaTeX Template Download ODT Template Instructions PDF

See the CLEF 2026 submission instructions for the official working-notes requirements. A generic CEURART Overleaf template is also available as a reference.

Packaging Checklist

✓
Results JSONL (per task)
✓
System Card (architecture, data usage, risks)
✓
For Task 1/2, clearly disclose any dataset expansion or real-time retrieval used during inference
✓
Reproducibility (seed, versions, hardware)
✓
License compliance acknowledgements (if applicable)

Recommended Citations

Please cite the FinMMEval Lab overview paper when referring to the full lab. If your work is associated with a specific task, please also cite the corresponding task overview paper.

FinMMEval Lab Overview

LNCS

@inproceedings{FinMMEval2026,
  title = {Overview of {FinMMEval} 2026: Multilingual and Multimodal Financial Evaluation},
  author = {Zhuohan Xie and Yuyang Dai and Rania Elbadry and Vanshikaa Jani and Xueqing Peng and Lingfei Qian and Georgi Georgiev and Dimitar Dimitrov and Fan Zhang and Jimin Huang and Jiahui Geng and Yankai Chen and Ye Yuan and Haolun Wu and Yuxia Wang and Ivan Koychev and Veselin Stoyanov and Mingzi Song and Yu Chen and Steve Liu and Preslav Nakov},
  booktitle = {Experimental IR Meets Multilinguality, Multimodality, and Interaction},
  series = {Proceedings of the Seventeenth International Conference of the CLEF Association (CLEF 2026)},
  year = {2026},
  month = {September 21--24},
  address = {Jena, Germany},
  publisher = {Springer Lecture Notes in Computer Science LNCS},
}

Task 1 Overview

CEUR-WS

@inproceedings{FinMMEvalTask1Overview2026,
  title = {Overview of the {FinMMEval} 2026 Task 1: Multilingual Financial Multiple-Choice Question Answering},
  author = {Zhuohan Xie and Yuyang Dai and Rania Elbadry and Vanshikaa Jani and Georgi Georgiev and Dimitar Dimitrov and Fan Zhang and Xueqing Peng and Lingfei Qian and Jimin Huang and Jiahui Geng and Yankai Chen and Ye Yuan and Haolun Wu and Yuxia Wang and Ivan Koychev and Veselin Stoyanov and Mingzi Song and Yu Chen and Steve Liu and Preslav Nakov},
  booktitle = {CLEF 2026 Working Notes},
  series = {CEUR Workshop Proceedings},
  year = {2026},
  month = {September 21--24},
  address = {Jena, Germany},
  publisher = {CEUR-WS.org},
}

Task 2 Overview

CEUR-WS

@inproceedings{FinMMEvalTask2Overview2026,
  title = {Overview of the {FinMMEval} 2026 Task 2: Financial Question Answering and Summarization},
  author = {Zhuohan Xie and Xueqing Peng and Georgi Georgiev and Dimitar Dimitrov and Rania Elbadry and Fan Zhang and Lingfei Qian and Jimin Huang and Vanshikaa Jani and Yuyang Dai and Jiahui Geng and Yankai Chen and Ye Yuan and Haolun Wu and Yuxia Wang and Ivan Koychev and Veselin Stoyanov and Mingzi Song and Yu Chen and Steve Liu and Preslav Nakov},
  booktitle = {CLEF 2026 Working Notes},
  series = {CEUR Workshop Proceedings},
  year = {2026},
  month = {September 21--24},
  address = {Jena, Germany},
  publisher = {CEUR-WS.org},
}

Task 3 Overview

CEUR-WS

@inproceedings{FinMMEvalTask3Overview2026,
  title = {Overview of the {FinMMEval} 2026 Task 3: Financial Decision Making},
  author = {Zhuohan Xie and Lingfei Qian and Georgi Georgiev and Dimitar Dimitrov and Rania Elbadry and Fan Zhang and Xueqing Peng and Jimin Huang and Vanshikaa Jani and Yuyang Dai and Jiahui Geng and Yankai Chen and Ye Yuan and Haolun Wu and Yuxia Wang and Ivan Koychev and Veselin Stoyanov and Mingzi Song and Yu Chen and Steve Liu and Preslav Nakov},
  booktitle = {CLEF 2026 Working Notes},
  series = {CEUR Workshop Proceedings},
  year = {2026},
  month = {September 21--24},
  address = {Jena, Germany},
  publisher = {CEUR-WS.org},
}

Organizers

Organizing committee and partner institutions.

Zhuohan Xie

MBZUAI (UAE)

Rania Elbadry

MBZUAI (UAE)

Fan Zhang

The University of Tokyo (Japan)

Georgi Georgiev

Sofia University "St. Kliment Ohridski" (Bulgaria)

Xueqing Peng

The Fin AI (USA)

Lingfei Qian

The Fin AI (USA)

Jimin Huang

The Fin AI (USA)

Dimitar Dimitrov

Sofia University "St. Kliment Ohridski" (Bulgaria)

Vanshikaa Jani

University of Arizona (USA)

Yuyang Dai

INSAIT (Bulgaria)

Jiahui Geng

Linköping University (Sweden)

Yankai Chen

McGill University (Canada) & MBZUAI (UAE)

Ye Yuan

McGill University (Canada) & Mila - Quebec AI Institute (Canada)

Haolun Wu

McGill University (Canada) & Mila - Quebec AI Institute (Canada)

Yuxia Wang

INSAIT (Bulgaria)

Ivan Koychev

Sofia University "St. Kliment Ohridski" (Bulgaria)

Veselin Stoyanov

MBZUAI (UAE)

Mingzi Song

Nikkei Financial Technology Research Institute, Inc. (Japan)

Yu Chen

The University of Tokyo (Japan)

Steve Liu

McGill University (Canada) & MBZUAI (UAE)

Preslav Nakov

MBZUAI (UAE)

Frequently Asked Questions

Who can participate?

Researchers and practitioners from academia and industry. Student teams are particularly welcome.

How is data licensed?

Research-only license; redistribution of raw sources may be restricted.

Can we submit to multiple tasks?

Yes. Submit independent result bundles per task.

Is there a hard limit on the number of submissions per task?

At present, we do not enforce a hard submission cap per task. Participants may submit multiple times as needed, especially for Task 1 dev validation and Task 3 endpoint iteration, but should avoid unnecessary rapid resubmission.

Are ensembles allowed?

Yes, but disclose all components in the system card.

Can we use dataset expansion or real-time retrieval for Task 1 or Task 2?

Yes. For Task 1 and Task 2, dataset expansion and/or real-time retrieval during inference are allowed, but they must be clearly and fully disclosed in the submitted system paper.

How will Task 2 submissions work?

The Task 2 final-test submission window has closed. The final leaderboard is available on the Results page, while the portal status table remains available for submission-history checks.

Can we train on the official public Hugging Face collection for Task 1?

Yes. The FinMMEval Hugging Face collection is the official public data release. Participants may use the released Task 1 datasets for training and may reorganize or re-split them as needed. The original split names on dataset cards do not restrict participant usage. However, the separate Task 1 dev leaderboard sets are organizer-held evaluation data and should not be used for training.

How does the Task 3 timeline work?

The updated Task 3 endpoint submission deadline is 10 May 2026 AoE. This is the deadline for submitting or updating an endpoint, not the end of evaluation. Submitting the Google Form registers or updates an endpoint, but it may not appear on the leaderboard immediately.

Because Task 3 evaluates market decisions across multiple trading days, organizers first verify submitted endpoints and then compute official performance over a common evaluation window for all accepted endpoints, rather than starting separately from each team’s individual submission date. We expect this window to run through late June or early July, aligned with the final lab reporting schedule.

The daily runner starts at 00:00 UTC. Teams do not need to keep endpoints online for the full day, but should start them shortly before 00:00 UTC and keep them available for several hours after the runner starts.

Do we need final Task 3 results before writing the paper?

No. Participants are encouraged to prepare working notes early. The paper should focus on the system architecture, methodology, and experimental setup, and results may be updated later as long as the evaluation status is stated clearly. Awards are decided primarily based on paper quality, with leaderboard performance used as supporting evidence.

Which paper should FinMMEval participants submit?

Participants should submit a CLEF Working Notes paper for their FinMMEval system. This is the participant paper used for lab reporting and FinMMEval award consideration. The CLEF main conference paper track is separate and should not receive the same or near-identical manuscript. A later conference or journal version should be substantially extended and follow that venue's policies.

Which template should we use for the working notes?

Use the CLEF 2026 CEUR-WS working notes template. Working notes should be written in English, formatted in 1-column CEURART style, with a minimum length of 5 pages and no maximum page limit. The official CLEF 2026 files are available as a LaTeX template, ODT template, and instructions PDF.

Contact

zhuohan.xie@mbzuai.ac.ae

Discord

Join the FinMMEval server

Ask questions, share progress, and get updates.

Discord QR code for the FinMMEval server

FinMMEval Lab 2026

Latest News

Final-test leaderboards are available

Task 1 final results are released

Task 3 endpoint checks are underway

New awards for top submissions

About the Lab

Tasks

Task 1 - Financial Exam Q&A

Motivation

Data

Official Data Usage

Evaluation

Submission Policy

Task 1 Submission

Task 2 - Multilingual Financial Q&A

Motivation

Difficulty Tiers

PolyFiQA-Easy

PolyFiQA-Expert

Data

Evaluation

Submission Policy

Submission Workflow

Task 3 - Financial Decision Making

Data & Submission

Daily Workflow

Scheduling & Request Policy

Request / Response Format

Input Samples

cURL Sample

Data & Submission

Evaluation

Important Dates

Awards

How to Participate

Packaging Checklist

Recommended Citations

FinMMEval Lab Overview

Task 1 Overview

Task 2 Overview

Task 3 Overview

Organizers

Frequently Asked Questions

Who can participate?

How is data licensed?

Can we submit to multiple tasks?

Is there a hard limit on the number of submissions per task?

Are ensembles allowed?

Can we use dataset expansion or real-time retrieval for Task 1 or Task 2?

How will Task 2 submissions work?

Can we train on the official public Hugging Face collection for Task 1?

How does the Task 3 timeline work?

Do we need final Task 3 results before writing the paper?

Which paper should FinMMEval participants submit?

Which template should we use for the working notes?

Contact