KRAFT3-QA

Korean financial text-table benchmark for evaluating tool-augmented agents on QA tasks

Get Started

1. Create the SGLang Environment

conda create -n sglang python=3.10
conda activate sglang

Install PyTorch

pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124

Install FlashInfer

pip install flashinfer-python -i https://flashinfer.ai/whl/cu124/torch2.6/

Install SGLang

pip install sglang[srt]

2. Create the vLLM Environment

Recommended: Keep vLLM in a separate Conda environment to avoid conflicts.

conda create -n vllm python=3.10
conda activate vllm

Install PyTorch

pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124

Install vLLM

pip install vllm==0.8.5

3. Create the Benchmark Environment

conda create -n kraft3qa python=3.10
conda activate kraft3qa
pip install -e .

Build the Dataset

0. Fetch KRX-Listed Companies

python scripts/data/0_fetch_listed_companies.py

1. Retrieve Coperate Filings and Financial Statements from Open DART

Set your Open DART API key:

export OPENDART_API_KEY='YOUR_API_KEY'

Then run:

python scripts/data/1_fetch_dart_filings.py
python scripts/data/2_fetch_dart_finstates.py

2. Extract Coperate Filing Sections

python scripts/data/3_extract_business_from_dart_filings.py --outdir data/dart_filings/business/1_overview
python scripts/data/3_extract_business_from_dart_filings.py --outdir data/dart_filings/business/2_products+services
python scripts/data/3_extract_business_from_dart_filings.py --outdir data/dart_filings/business/3_materials+facilities
python scripts/data/3_extract_business_from_dart_filings.py --outdir data/dart_filings/business/4_sales+orders
python scripts/data/3_extract_business_from_dart_filings.py --outdir data/dart_filings/business/5_risk_management
python scripts/data/3_extract_business_from_dart_filings.py --outdir data/dart_filings/business/6_contracts+rnd
python scripts/data/3_extract_business_from_dart_filings.py --outdir data/dart_filings/business/7_others

3. Convert Financial Statements

Transform CSV-format financial statements into structured JSON+HTML tables:

python scripts/data/4_generate_finstate_tables.py

4. Extract Management's Discussion & Analysis (MD&A) Sections

python scripts/data/5_extract_mda_from_dart_filings.py

5. Generate QA Dataset

Run SGLang server with Qwen3 32B:

python -m sglang.launch_server --model-path Qwen/Qwen3-32B --port 8000 --reasoning-parser qwen3

Then generate the QA dataset:

python scripts/data/6_make_qa_dataset.py

6. Filter Low-Quality Questions

6.1. First filtering with Qwen3 32B

python -m sglang.launch_server --model-path Qwen/Qwen3-32B --port 8000 --reasoning-parser qwen3
python scripts/data/7_filter_qa_dataset.py --indir data/qa_raw --outdir data/qa_qwen3-filter

6.2. Second filtering with EXAONE 3.5 32B

python -m sglang.launch_server --model-path LGAI-EXAONE/EXAONE-3.5-32B-Instruct --port 8000 --attention-backend triton
python scripts/data/7_filter_qa_dataset.py --indir data/qa_qwen3-filter --outdir data/qa_qwen3-filter+exaone3.5-filter

7. Shuffle Answer Choices Randomly

python scripts/data/8_shuffle_dataset_answers.py --indir data/qa_qwen3-filter+exaone3.5-filter --outdir dataset

Main Experimental Results

Model	Params	Accuracy (%)	Valid Response Rate (%)
Qwen3 (w/ Thinking)	32B	71.2	98.8
Gemma 3	27B	66.3	99.6
EXAONE 3.5	32B	62.2	95.1
Kanana 1.5	8B	54.2	92.9
Llama 3.2	3B	23.2	61.5
HyperCLOVA X SEED	1.5B	14.1	41.8

Citation

If you use KRAFT3-QA, please cite:

@article{park2025kraft3qa,
    title = {{KRAFT}³-{QA}: {Korean} financial text-table benchmark for evaluating tool-augmented agents on {QA} tasks},
    author = {Park, Seungjae and Cho, Sung-Bae and Kim, Ha-Young},
    journal = {Journal of the Korea Society of Computer and Information},
    volume = {30},
    number = {8},
    pages = {29--39},
    year = {2025},
    doi = {10.9708/jksci.2025.30.08.029},
    language = {ko},
}

Acknowledgement

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No.2023R1A2C200337911 and No. RS-2023-00220762).

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
results		results
scripts		scripts
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

KRAFT3-QA

Get Started

1. Create the SGLang Environment

Install PyTorch

Install FlashInfer

Install SGLang

2. Create the vLLM Environment

Install PyTorch

Install vLLM

3. Create the Benchmark Environment

Build the Dataset

0. Fetch KRX-Listed Companies

1. Retrieve Coperate Filings and Financial Statements from Open DART

2. Extract Coperate Filing Sections

3. Convert Financial Statements

4. Extract Management's Discussion & Analysis (MD&A) Sections

5. Generate QA Dataset

6. Filter Low-Quality Questions

6.1. First filtering with Qwen3 32B

6.2. Second filtering with EXAONE 3.5 32B

7. Shuffle Answer Choices Randomly

Main Experimental Results

Citation

Acknowledgement

About

Uh oh!

Releases

Languages

Astro36/kraft3qa

Folders and files

Latest commit

History

Repository files navigation

KRAFT3-QA

Get Started

1. Create the SGLang Environment

Install PyTorch

Install FlashInfer

Install SGLang

2. Create the vLLM Environment

Install PyTorch

Install vLLM

3. Create the Benchmark Environment

Build the Dataset

0. Fetch KRX-Listed Companies

1. Retrieve Coperate Filings and Financial Statements from Open DART

2. Extract Coperate Filing Sections

3. Convert Financial Statements

4. Extract Management's Discussion & Analysis (MD&A) Sections

5. Generate QA Dataset

6. Filter Low-Quality Questions

6.1. First filtering with Qwen3 32B

6.2. Second filtering with EXAONE 3.5 32B

7. Shuffle Answer Choices Randomly

Main Experimental Results

Citation

Acknowledgement

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Languages