Dynamic Scoring with Enhanced Semantics for Training-Free Human-Object Interaction Detection

Installation

# Create environment
uv sync

# Or with pip
pip install -e .

Data

Follow the process of ADA-CM. Then, link the downloaded dataset as follows:

ln -s /path/to/hico_20160224_det data/hicodet

Download the following to the weights/ folder:

File	Source
`longclip-B.pt`	LongCLIP-B
`longclip-L.pt`	LongCLIP-L
`openai/clip-vit-base-patch16`	downloaded automatically from HuggingFace
`openai/clip-vit-large-patch14`	downloaded automatically from HuggingFace

Download the pre-computed signatures and the DETR detections from this link and unzip it into weights/. It contains:

weights/<model>/prompts_X.pth: pre-computed text features.
weights/detr/hicodet_test_bbox_R50.pt: pre-computed DETR detections.

Generate vision features

python scripts/precompute_vision_features.py \
    --data_root data/hicodet \
    --output_dir weights \
    --models clip-vit_b_16 clip-vit_l_14 longclip_b longclip_l \
    --batch_size 256

After all the steps above, the weights/ folder should look like:

weights/
├── longclip-B.pt
├── longclip-L.pt
├── detr/
│   └── hicodet_test_bbox_R50.pt
├── clip-vit_b_16/
│   ├── prompts_0.pth
│   ├── prompts_1.pth
│   └── 0_hicodet_train_openai-clip-vit-base-patch16_vision_features.pth
├── clip-vit_l_14/
│   ├── prompts_0.pth
│   ├── prompts_1.pth
│   └── 0_hicodet_train_openai-clip-vit-large-patch14_vision_features.pth
├── longclip_b/
│   ├── prompts_0.pth
│   ├── prompts_1.pth
│   └── 0_hicodet_train_longclip-B_vision_features.pth
└── longclip_l/
    ├── prompts_0.pth
    ├── prompts_1.pth
    └── 0_hicodet_train_longclip-L_vision_features.pth

Experiments

Run any experiment with:

python src/main.py experiment=<config>

Available configurations:

Config	VLM
`clip-vit_b_16`	CLIP ViT-B/16
`clip-vit_l_14`	CLIP ViT-L/14
`longclip_b`	LongCLIP-B
`longclip_l`	LongCLIP-L

For example:

python src/main.py experiment=clip-vit_b_16

Cite us!

If you find our paper and/or code helpful, please consider citing:

@inproceedings{tonini2025dynamic,
    title={Dynamic Scoring with Enhanced Semantics for Training-Free Human-Object Interaction Detection},
    author={Tonini, Francesco and Vaquero, Lorenzo and Conti, Alessandro and Beyan, Cigdem and Ricci, Elisa},
    booktitle={Proceedings of the 33rd ACM International Conference on Multimedia},
    pages={2801--2810},
    year={2025}
}

Acknowledgement

We gratefully thank the authors from ADA-CM and Lightning Hydra for open-sourcing their code.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
configs		configs
libs		libs
scripts		scripts
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
.project-root		.project-root
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dynamic Scoring with Enhanced Semantics for Training-Free Human-Object Interaction Detection

Installation

Data

Experiments

Cite us!

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Dynamic Scoring with Enhanced Semantics for Training-Free Human-Object Interaction Detection

Installation

Data

Experiments

Cite us!

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages