IVC-Prune: Revealing the Implicit Visual Coordinates in LVLMs for Vision Token Pruning

Zhichao Sun, Yidong Ma, Gang Liu, Yibo Chen, Xu Tang, Yao Hu, Yongchao Xu

Xiaohongshu Inc. Wuhan university

Paper

ICLR 2026

Overview

💡 We reveal a fundamental mechanism of how LVLMs process spatial information：

LVLMs implicitly establish visual coordinate systems through Rotary Position Embeddings (RoPE).

Through theoretical analysis, we discover that specific token positions serve as Implicit Visual Coordinates (IVC tokens)—spatial reference points essential for absolute object localization. These positions occur where RoPE's rotation matrices approximate:

Identity matrix (real-axis references)
90° rotation matrix (imaginary-axis references)

This provides the first theoretical characterization of spatial reasoning mechanisms in LVLMs.

🚀 Method: IVC-Prune

A training-free, prompt-aware pruning strategy that preserves two crucial token types:

IVC Tokens: Identified by analyzing RoPE's mathematical properties (cosine/sine components across dimensions)
Foreground Tokens: Selected via a robust two-stage process:
- Stage 1: Semantic seed identification using value-vector similarity (avoiding positional bias)
- Stage 2: Contextual refinement to capture complete objects

Key Innovation: Single-selection pruning strategy—tokens are selected once at an intermediate layer and applied across all layers, maximizing KV-cache reduction while preserving original position IDs.

📝 TODO List:

Open Source Plan for Qwen, LLaVA, InternVL, DeepSeek Support

Supported LVLMs

✅ Qwen-VL Support (transformers code)
✅ LLaVA-v1.5 Support (transformers code)
✅ InternVL2.5 Support (transformers code)
✅ DeepSeek-VL2 Support (transformers code)

Installation

Based on VLMEvalKit and transformers. We have supported grounding data testing code in VLMEvalKit.

# Step 1: Create and activate conda environment
conda create --name IVCP python=3.10.6 -y
conda activate IVCP

# Step 2: Install PyTorch (cu118)
pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu118

# Step 3: Install local transformers
pip install -e /path/to/IVCP/transformers

# Step 4: Install flash-attn
pip install flash-attn==2.5.8 --no-build-isolation -v

# Step 5: Install DeepSeek-VL2
cd /path/to/IVCP/DeepSeek-VL2
pip install -e .

# Step 6: Install VLMEvalKit
cd /path/to/IVCP/VLMEvalKit
pip install -e .

pip uninstall numba -y
pip install numba
pip install qwen_vl_utils

Dataset

For RefCOCO grounding dataset, We provide MDETR format annotations on Hugging Face.

Setup Instructions

Step 1: Download Required Files

MDETR JSON annotations from Hugging Face
COCO train2014 images from COCO official site

Step 2: Generate TSV Files

Standard models:

cd IVCP/VLMEvalKit
python convert_to_tsv.py \\
    --images_folder /path/to/train2014/ \\
    --annotations_files /path/to/finetune_refcoco_val.json \\
    --output_dir /path/to/output/

Qwen2.5-VL (resized coordinates to multiples of 28):

cd IVCP/VLMEvalKit
python convert_to_tsv.py \\
    --images_folder /path/to/train2014/ \\
    --annotations_files /path/to/finetune_refcoco_val.json \\
    --output_dir /path/to/output/ \\    
    --qwen25

Step 3: Configure Paths

Modify DATASET_URL in vlmeval/dataset/image_grounding.py:

DATASET_URL = {
    'RefCOCO_testA': '/PATH/TO/refcoco_testA.tsv',
    'RefCOCO_testB': '/PATH/TO/refcoco_testB.tsv',
    'RefCOCO_val': '/PATH/TO/refcoco_val.tsv',   
    # ... other splits
    }

Note: Always use --qwen25 flag when evaluating Qwen2.5-VL to ensure GT boxes match the model's patch size (28)."

Usage

To evaluate the model on grounding tasks, run the following script:

cd IVCP/VLMEvalKit
bash test_ivcp_qwen_grounding.sh

bash test_ivcp_internvl_grounding.sh

bash test_ivcp_deepseekvl_grounding.sh

To evaluate general VQA tasks, run:

bash test_ivcp_qwen_generalvqa.sh

bash test_ivcp_internvl_generalvqa.sh

bash test_ivcp_deepseekvl_generalvqa.sh

bash test_ivcp_llava_generalvqa.sh

Note:
Before running the scripts, please make sure to modify the dataset paths in test_ivcp_qwen_grounding.sh (specifically lines 2-3) to point to your actual dataset location, e.g.: export LMUData="/your/dataset/path"

Some VQA Benchmark tests require the use of an API. Please configure your API according to the VLMEvalKit instructions. By default, we use GPT-4o.

Citation

@misc{sun2026ivcprunerevealingimplicitvisual,
      title={IVC-Prune: Revealing the Implicit Visual Coordinates in LVLMs for Vision Token Pruning}, 
      author={Zhichao Sun and Yidong Ma and Gang Liu and Yibo Chen and Xu Tang and Yao Hu and Yongchao Xu},
      year={2026},
      eprint={2602.03060},
      archivePrefix={arXiv},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IVC-Prune: Revealing the Implicit Visual Coordinates in LVLMs for Vision Token Pruning

Xiaohongshu Inc. Wuhan university

Paper

ICLR 2026

Overview

🚀 Method: IVC-Prune

📝 TODO List:

Table of Contents

Installation

Dataset

Setup Instructions

Usage

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
DeepSeek-VL2		DeepSeek-VL2
VLMEvalKit		VLMEvalKit
fig		fig
transformers		transformers
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

IVC-Prune: Revealing the Implicit Visual Coordinates in LVLMs for Vision Token Pruning

Xiaohongshu Inc. Wuhan university

Paper

ICLR 2026

Overview

🚀 Method: IVC-Prune

📝 TODO List:

Table of Contents

Installation

Dataset

Setup Instructions

Usage

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages