Benchmark
### Benchmark
Legend
Text tasks – IE (Information Extraction), TA (Text Analysis), QA (Question Answering), RM (Risk Management), FO (Forecasting), DM (Decision-Making), CQA (Complex Question Answering).
Multimodal tasks – VQA (Visual Question Answering), CU (Chart Understanding), NU (Numeral Understanding), IU (Image Understanding), I2T (Image-to-Text).
Features – RAG (Retrieval-Augmented Generation), Agent (in-benchmark agent evaluation).
“✓” = capability present; “-” = not covered.