Rethinking the Functionality of Latent Representation: A Logarithmic Rate-Distortion Model for Learned Image Compression (TCSVT2025)
Please feel free to contact Ziqing Ge (ziqing.ge@vipl.ict.ac.cn) if you have any questions.
End-to-end optimized Learned Image Compression (LIC) has demonstrated remarkable performance in terms of Rate-Distortion (R-D) efficiency. However, the R-D characteristics of LIC codecs remain underexplored. Previous research has attempted to investigate the R-D behavior through numerical and statistical approaches, but these methods often provide only empirical results, lacking theoretical insights. In this work, we introduce a novel methodology for studying the R-D characteristics of LIC. By rethinking the LIC paradigm from a fresh perspective, we propose a plug-and-play module, the Latent-domain Auto-Encoder (LAE). This innovative approach not only naturally leads to Variable Bit-Rate (VBR) compression, but also allows for a theoretical modeling of the R-D behavior of LIC codecs. Our findings reveal that the bit-rate is the logarithmic sum of the neurons
models/tcm.py includes the model, where TCM_VBR is our proposed VBR method integrated on Liu2023 [46]. In this file, the module fcn represents the
train_latent_e2e.py is a toy experiment to explore the relationship between latents of different
train_vbr.py is the main entry of training, where you may modify some configurations, e.g. dataset path.
eval_vbr.py is the test script to evaluate the R-D performance of VBR model, and online_training_vbr.py realizes the proposed DLM approach.
To get the final BD-rate performance, run
python online_training_vbr.py -l [0~5] -d [Your dataset directory].
This implementation is not original codes of our TCSVT2025 paper, because are rearranged by us. This repo is a re-implementation, but the core codes are almost the same and results are also consistent with original results.