Zero-shot Model Performance Scores
Rank | Model | Size | ChainEval ↑ | ROUGE R₂ ↑ | ROUGE Rₗ ↑ | BERTScore ↑ |
---|
Performance on Random Samples
Rank | Model | ChainEval ↑ | ROUGE R₂ ↑ |
---|
An interactive dashboard for evaluating financial reasoning in LLMs.
Rank | Model | Size | ChainEval ↑ | ROUGE R₂ ↑ | ROUGE Rₗ ↑ | BERTScore ↑ |
---|
Rank | Model | ChainEval ↑ | ROUGE R₂ ↑ |
---|