Skip to content

Korean financial text-table benchmark for evaluating tool-augmented agents on QA tasks [JKSCI 2025]

Notifications You must be signed in to change notification settings

Astro36/kraft3qa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KRAFT3-QA

Korean financial text-table benchmark for evaluating tool-augmented agents on QA tasks

Python PyTorch CUDA

Get Started

1. Create the SGLang Environment

conda create -n sglang python=3.10
conda activate sglang

Install PyTorch

pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124

Install FlashInfer

pip install flashinfer-python -i https://flashinfer.ai/whl/cu124/torch2.6/

Install SGLang

pip install sglang[srt]

2. Create the vLLM Environment

Recommended: Keep vLLM in a separate Conda environment to avoid conflicts.

conda create -n vllm python=3.10
conda activate vllm

Install PyTorch

pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124

Install vLLM

pip install vllm==0.8.5

3. Create the Benchmark Environment

conda create -n kraft3qa python=3.10
conda activate kraft3qa
pip install -e .

Build the Dataset

0. Fetch KRX-Listed Companies

python scripts/data/0_fetch_listed_companies.py

1. Retrieve Coperate Filings and Financial Statements from Open DART

Set your Open DART API key:

export OPENDART_API_KEY='YOUR_API_KEY'

Then run:

python scripts/data/1_fetch_dart_filings.py
python scripts/data/2_fetch_dart_finstates.py

2. Extract Coperate Filing Sections

python scripts/data/3_extract_business_from_dart_filings.py --outdir data/dart_filings/business/1_overview
python scripts/data/3_extract_business_from_dart_filings.py --outdir data/dart_filings/business/2_products+services
python scripts/data/3_extract_business_from_dart_filings.py --outdir data/dart_filings/business/3_materials+facilities
python scripts/data/3_extract_business_from_dart_filings.py --outdir data/dart_filings/business/4_sales+orders
python scripts/data/3_extract_business_from_dart_filings.py --outdir data/dart_filings/business/5_risk_management
python scripts/data/3_extract_business_from_dart_filings.py --outdir data/dart_filings/business/6_contracts+rnd
python scripts/data/3_extract_business_from_dart_filings.py --outdir data/dart_filings/business/7_others

3. Convert Financial Statements

Transform CSV-format financial statements into structured JSON+HTML tables:

python scripts/data/4_generate_finstate_tables.py

4. Extract Management's Discussion & Analysis (MD&A) Sections

python scripts/data/5_extract_mda_from_dart_filings.py

5. Generate QA Dataset

Run SGLang server with Qwen3 32B:

python -m sglang.launch_server --model-path Qwen/Qwen3-32B --port 8000 --reasoning-parser qwen3

Then generate the QA dataset:

python scripts/data/6_make_qa_dataset.py

6. Filter Low-Quality Questions

6.1. First filtering with Qwen3 32B

python -m sglang.launch_server --model-path Qwen/Qwen3-32B --port 8000 --reasoning-parser qwen3
python scripts/data/7_filter_qa_dataset.py --indir data/qa_raw --outdir data/qa_qwen3-filter

6.2. Second filtering with EXAONE 3.5 32B

python -m sglang.launch_server --model-path LGAI-EXAONE/EXAONE-3.5-32B-Instruct --port 8000 --attention-backend triton
python scripts/data/7_filter_qa_dataset.py --indir data/qa_qwen3-filter --outdir data/qa_qwen3-filter+exaone3.5-filter

7. Shuffle Answer Choices Randomly

python scripts/data/8_shuffle_dataset_answers.py --indir data/qa_qwen3-filter+exaone3.5-filter --outdir dataset

Main Experimental Results

Model Params Accuracy (%) Valid Response Rate (%)
Qwen3 (w/ Thinking) 32B 71.2 98.8
Gemma 3 27B 66.3 99.6
EXAONE 3.5 32B 62.2 95.1
Kanana 1.5 8B 54.2 92.9
Llama 3.2 3B 23.2 61.5
HyperCLOVA X SEED 1.5B 14.1 41.8

Citation

If you use KRAFT3-QA, please cite:

@article{park2025kraft3qa,
    title = {{KRAFT}³-{QA}: {Korean} financial text-table benchmark for evaluating tool-augmented agents on {QA} tasks},
    author = {Park, Seungjae and Cho, Sung-Bae and Kim, Ha-Young},
    journal = {Journal of the Korea Society of Computer and Information},
    volume = {30},
    number = {8},
    pages = {29--39},
    year = {2025},
    doi = {10.9708/jksci.2025.30.08.029},
    language = {ko},
}

Acknowledgement

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No.2023R1A2C200337911 and No. RS-2023-00220762).

About

Korean financial text-table benchmark for evaluating tool-augmented agents on QA tasks [JKSCI 2025]

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Languages