Zero-shot Model Performance Scores
| Rank | Model | Size | ChainEval ↑ | ROUGE R₂ ↑ | ROUGE Rₗ ↑ | BERTScore ↑ |
|---|
Performance on Random Samples
| Rank | Model | ChainEval ↑ | ROUGE R₂ ↑ |
|---|
An interactive dashboard for evaluating financial reasoning in LLMs.
| Rank | Model | Size | ChainEval ↑ | ROUGE R₂ ↑ | ROUGE Rₗ ↑ | BERTScore ↑ |
|---|
| Rank | Model | ChainEval ↑ | ROUGE R₂ ↑ |
|---|