Any input image is resized to [448,448] by CLIPImageProcessor. So the model input size is actually fixed, right?