imageqa

This work leverages pre-trained CLIP and DistilBERT models to train a network that performs image-based question answering on the Toronto COCOQA dataset. Image embeddings are derived from a pre-trained CLIP vision encoder and concatenated to the word embedding sequence of a question. A DistilBERT language model uses the resulting representation as input to generate an answer to the image-based question. Our results show significant improvements in test accuracy when the CLIP embeddings are first passed through a small MLP network to obtain the final image embeddings. We achieve further improvements in the test accuracy after fine-tuning the DistilBERT model for the task.

Data

Toronto COCO-QA datset can be found here.

Images are sourced from Microsoft's 2014 COCO dataset, which can be found here.

Data Processing & Training

clipqa_bert.py takes care of preprocessing the raw data (expected in a /data folder), training the architecture based on provided parameters (shown below) and evaluating performance on the test set.

--model_name: model naem used to save model checkpoint
--checkpoint: model checkpoint to initalize model with
--batch_size: training and eval batch size
--learning_rate: training learning rate
--epochs: number of training epochs
--eval_steps: number of training iterations until evaluated on validation set
--num_layers: number of linear layers for intermediate MLP mapping network
--hidden_dim: number of hidden dimensions in MLP network
--freeze_txt: whether or not language model weights should be frozen during training

An example command to train a model with 2 linear layers in the mapping network is shown below:

python clipqa_bert.py --model clipqa_mlp2_e1 --batch_size 8 --learning_rate 1e-4 --epochs 1 --eval_steps 400 --num_layers 2 --hidden_dim 1024 --freeze_txt True

Results

A paper discussing our implentation and results is found in clipqa.pdf. The \results directory displays training and validtion loss plots for various architectures we experimented with.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
results		results
README.md		README.md
clipqa.pdf		clipqa.pdf
clipqa_bert.py		clipqa_bert.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

imageqa

Data

Data Processing & Training

Results

About

Uh oh!

Releases

Packages

Languages

pmabbo13/imageqa

Folders and files

Latest commit

History

Repository files navigation

imageqa

Data

Data Processing & Training

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages