FinMMEval Lab 2026

Latest News

Leaderboard Ready Updated: 4 February 2026 Deadline: 28 April 2026

Leaderboard Ready for Task 3 - Financial Decision Making

The Task 3 leaderboard is live. Deploy your endpoint and receive daily evaluations on BTC and TSLA decision-making.

Daily Requests

One call per day at 00:00 UTC.

Actions

BUY / HOLD / SELL mapped to long / flat / short.

Live Metrics

Track performance updates on the leaderboard as evaluations run.

Submit Now

Submit your endpoint via the Agent Market Arena Google Form.

Submit Endpoint

Dataset: TheFinAI/CLEF_Task3_Trading

Awards Announced Updated: 4 February 2026

New awards for top submissions

We are recognizing the best systems with Best Paper, Outstanding Paper, and Merit / Encouragement awards.

🏆

Best Paper

USD 500

🥈

Outstanding Paper

USD 300 each (x3)

🌱

Merit / Encouragement

USD 200 each (x2)

Awards

Top submissions will be recognized with monetary awards.

View Awards

See details in the Awards section.

Training Data Released Released: 15 December 2025

Ready-to-use splits for all tasks

Get calibrated training splits for exam-style Q&A, multilingual financial reasoning, and trading decision-making. Each dataset card includes format, licensing, and citation guidance.

Task 1

Multilingual exam-style multiple choice.

Task 2

PolyFiQA (Easy + Expert) filings with multilingual news Q&A.

Task 3

BTC and TSLA daily contexts for Buy/Hold/Sell reasoning.

New

Browse the Hugging Face collection to download splits and review licensing.

Open Collection

Check each dataset card for citation, licensing, and format details.

About the Lab

FinMMEval Lab integrates financial reasoning, multilingual understanding, and decision-making into a unified evaluation suite designed to promote robust, transparent, and globally competent financial AI. The 2026 edition introduces three interconnected tasks spanning five languages.

✓

Multi-modal inputs: news, filings, macro indicators, tests.

✓

Multiple languages with low-resource representations: English, Chinese, Arabic, Hindi, Greek, Japanese, Spanish.

✓

Tasks spanning Q&A, and decision making.

✓

Metrics centered on Accuracy, ROUGE-1, BLEURT and performance quantitative metrics (e.g. CR, SR, MD).

"How can I tailor my setup to make an LLM exceptionally good at finance?"

Tasks

Choose one or more tasks. Each submission must provide calibrated confidence scores and an evidence trace.

Task 1 - Financial Exam Q&A

Given a stand-alone multiple-choice question Q with four candidate options { A₁, A₂, A₃, A₄ }, the system must select the correct answer A^∗. Questions cover valuation, accounting, ethics, corporate finance, and regulatory knowledge.

Motivation

Professional financial qualification exams (e.g., CFA, EFPA) require the integration of theoretical and regulatory knowledge with applied reasoning. Existing LLMs often rely on factual recall without demonstrating the analytical rigor expected from human candidates.

Data

EFPA (Spanish): 50 exam-style financial questions on investment and regulation.
GRFinQA (Greek): 225 multiple-choice finance questions from university-level exams.
CFA (English): 600 exam-style multiple-choice questions covering nine core domains.
CPA (Chinese): 300 exam-style financial questions focusing on major modules.
BBF (Hindi): 500-1000 exam-style financial multiple-choice questions covering over 30 domains.

Evaluation

Models are required to output the correct answer label. Performance is measured by accuracy, defined as the proportion of correctly identified options in the test set.

Task 3 - Financial Decision Making

Task 3 is a daily, news-driven trading workflow. We collect market news each day, call each submitted endpoint once, and execute positions from the returned action: BUY, HOLD, or SELL.

Data & Submission

Historical data for backtesting, validation, and model training: TheFinAI/CLEF_Task3_Trading.

To participate, submit your endpoint via the Agent Market Arena page (Google Form): Agent Market Arena.

Daily Workflow

Each endpoint receives one request per day and must return one action: BUY, HOLD, or SELL.
Position mapping: BUY -> long, HOLD -> flat, SELL -> short.
Execution rule: each new action fully replaces the previous day’s position.
Price convention: daily close price.

Scheduling & Request Policy

Daily process starts at 00:00 UTC.
Requests are sent progressively after start time.
Per-request timeout is 3 minutes.
If a request fails (timeout, server error, or invalid response), the action defaults to HOLD.

Request / Response Format

Input is a JSON object containing date, price, news, symbol, momentum, history_price, and optional 10k/10q (object or null). The symbol varies by asset (e.g., TSLA, BTC).

{
  "recommended_action": "BUY"
}

Valid actions: BUY, HOLD, SELL.

Input Samples

Sample A (TSLA):

Note: 10k and 10q can each be an object or null.

{
  "date": "2025-01-15",
  "price": {"TSLA": 250.50},
  "news": {"TSLA": ["Tesla announces new production milestone"]},
  "symbol": ["TSLA"],
  "momentum": {"TSLA": "bullish"},
  "10k": {"TSLA": ["[SEC 10-K Filing - 2025-01-15]\nSummary..."]},
  "10q": {"TSLA": ["[SEC 10-Q Filing - 2025-01-15]\nSummary..."]},
  "history_price": {"TSLA": [
    {"date": "2025-01-12", "price": 249.80},
    ...,
    {"date": "2025-01-13", "price": 250.50},
    {"date": "2025-01-14", "price": 250.30}
  ]}
}

Sample B (BTC):

{
  "date": "2025-01-15",
  "price": {"BTC": 67890.50},
  "news": {"BTC": ["Bitcoin ETF inflows remain strong"]},
  "symbol": ["BTC"],
  "momentum": {"BTC": "neutral"},
  "10k": null,
  "10q": null,
  "history_price": {"BTC": [
    {"date": "2025-01-12", "price": 67580.00},
    ...,
    {"date": "2025-01-13", "price": 67720.00},
    {"date": "2025-01-14", "price": 67810.00}
  ]}
}

cURL Sample

curl -X POST "<your_endpoint>" \
  -H "Content-Type: application/json" \
  -d '{
    "date": "2025-01-15",
    "price": {"TSLA": 250.50},
    "news": {"TSLA": ["Tesla announces new production milestone"]},
    "symbol": ["TSLA"],
    "momentum": {"TSLA": "bullish"},
    "10k": {"TSLA": ["[SEC 10-K Filing - 2025-01-15]\nSummary..."]},
    "10q": {"TSLA": ["[SEC 10-Q Filing - 2025-01-15]\nSummary..."]},
    "history_price": {"TSLA": [
      {"date": "2025-01-12", "price": 249.80},
      ...,
      {"date": "2025-01-13", "price": 250.50},
      {"date": "2025-01-14", "price": 250.30}
    ]}
  }'

Data & Submission

Historical data for backtesting, validation, and model training: TheFinAI/CLEF_Task3_Trading.

To participate, submit your endpoint via the Agent Market Arena page (Google Form): https://huggingface.co/spaces/TheFinAI/Agent-Market-Arena.

Reference endpoint example (FastAPI): examples/simple_trading_api.py.

Evaluation

Primary: Cumulative Return (CR)

Secondary: Sharpe Ratio (SR), Maximum Drawdown (MD), Daily Volatility (DV), and Annualized Volatility (AV)

Important Dates

Specific dates will be announced once they are fixed.

Lab registration opens

17 November 2025
Training data released

15 December 2025 • Available now via the Hugging Face collection
Lab registration closes

23 April 2026
Task 3 submission deadline

28 April 2026
Beginning of the evaluation cycle (test sets release)

May 2026
End of the evaluation cycle (run submission)

07 May 2026
Deadline for the submission of working notes [CEUR-WS]

28 May 2026
Review process of participant papers

28 May – 30 June 2026
Submission of Condensed Lab Overviews [LNCS]

08 June 2026
Notification of Acceptance for Condensed Lab Overviews [LNCS]

15 June 2026
Camera Ready Copy of Condensed Lab Overviews [LNCS] due

22 June 2026
Notification of Acceptance for Participant Papers [CEUR-WS]

30 June 2026
Camera Ready Copy of Participant Papers and Extended Lab Overviews [CEUR-WS] due

06 July 2026
CLEF 2026 Conference

21–24 September 2026 • Jena, Germany

Awards

Top submissions will be recognized with monetary awards.

🏆

Best Paper Award

USD 500

Single award for the top-ranked submission and paper.

🥈

Outstanding Paper Award

USD 300 each (x3)

Three awards recognizing strong submissions and writing.

🌱

Merit / Encouragement Award

USD 200 each (x2)

Two awards to celebrate promising approaches.

How to Participate

Engage with the challenges in a way that suits you - from a quick, one-time experiment to a detailed research project. While we invite you to share your findings in our workshop notes, you are also free to develop promising results into a full paper for an archival journal.

The workshop itself is a perfect opportunity to refine your ideas through discussion with peers.

Ready to join?