Lipreading using Temporal Convolutional Networks

Content

Introduction

Preprocessing

How to install the environment

How to prepare the dataset

IDL FINAL PROJECT TEAM4

Contributors : @ib33x99, @choyoungjung, @KangHyunPark

Introduction

$LIPS directory structure

ㄴconfigs
	follows the structure of lrw_resnet18_mstcn.yaml

ㄴdata
	this is where we keep all our datasets including…
	lipread_mp4 (LRW dataset)
	visual_data (preprocessed LRW landmarks)

ㄴdatasets
	this is where we keep our dataloaders that inherit base_dataset.py
	lrw_dataset is the dataloader for LRW dataset

ㄴlabels
	labels for dataloader

ㄴlandmarks
	original <RW landmarks

ㄴmodels
	this is where we keep our models that inherit base_model.py
	Each model is indexed by its name ‘lipreading’ and 
	is a class with the following functions:
	__init__, forward, learn, evaluate 
	lipreading.py is the default model that we use
	
ㄴpreprocessing
	data augmentation

ㄴutils
	dataloading and training utils

ㄴdata.py
	datascheduler used in train/test

ㄴmain
	includes train_model, test_model in one main file
ㄴtrain.py
ㄴtest.py
ㄴrequirements.txt

Preprocessing

As described in the paper, each video sequence from the LRW dataset is processed by 1) doing face detection and face alignment, 2) aligning each frame to a reference mean face shape 3) cropping a fixed 96 × 96 pixels wide ROI from the aligned face image so that the mouth region is always roughly centered on the image crop 4) transform the cropped image to gray level.

You can run the pre-processing script provided in the preprocessing folder to extract the mouth ROIs.

How to install environment

Clone the repository into a directory. We refer to that directory as $LIPS.

git clone https://github.com/ib33x99/lips.git

Install all required packages.

pip install -r requirements.txt

How to prepare dataset

Download the pre-computed landmarks from GoogleDrive or BaiduDrive (key: kumy) and unzip them to $LIPS/landmarks/ folder.
Pre-process mouth ROIs using the script crop_mouth_from_video.py in the preprocessing folder and save them to $LIPS/data/visual_data/.
Locate the original LRW dataset that includes timestamps (.txt) to $LIPS/data/lipread_mp4.
Before running the code, make $LIPS/logs directory and link subdirectory of logs with $LIPS/log. See the following shell for an example code.

mkdir logs
mkdir logs/1225
ln -s logs/1225 log

(rm log erases the symlink without deleting the original logs directory)

How to train

Train a visual-only model. This saves the config file you used inside the specified log directory.

python main --config configs/lrw_resnet18_mstcn.yaml -l log/train_visual

To exclude the need of editing the config file every time, you can use --override.

python main --config configs/lrw_resnet18_mstcn.yaml --override "num_workers=0" -l debug/1

python main --config configs/lrw_resnet18_mstcn.yaml --override "eval_step=1|summary_step=100|ckpt_step=1000" -l debug/2

Resume from checkpoint.

You can pass the checkpoint path (.ckpt) <CHECKPOINT-PATH> to the variable argument --resume-ckpt or --resume-latest-ckpt, and specify the ckpt path or log directory to resume training.

How to test

Evaluate the visual-only performance (lipreading).

python main --config configs/lrw_resnet18_mstcn.yaml --resume-latest-ckpt log/train_visual -l log/test_visual --test

Citation

Most of the code is based on the following papers:

@INPROCEEDINGS{ma2020towards,
  author={Ma, Pingchuan and Martinez, Brais and Petridis, Stavros and Pantic, Maja},
  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  title={Towards Practical Lipreading with Distilled and Efficient Models},
  year={2021},
  pages={7608-7612},
  doi={10.1109/ICASSP39728.2021.9415063}
}

@INPROCEEDINGS{martinez2020lipreading,
  author={Martinez, Brais and Ma, Pingchuan and Petridis, Stavros and Pantic, Maja},
  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  title={Lipreading Using Temporal Convolutional Networks},
  year={2020},
  pages={6319-6323},
  doi={10.1109/ICASSP40776.2020.9053841}
}

License

It is noted that the code can only be used for comparative or benchmarking purposes. Users can only use code supplied under a License for non-commercial purposes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lipreading using Temporal Convolutional Networks

Content

IDL FINAL PROJECT TEAM4

Contributors : @ib33x99, @choyoungjung, @KangHyunPark

Introduction

Preprocessing

How to install environment

How to prepare dataset

How to train

How to test

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
configs		configs
data		data
datasets		datasets
labels		labels
landmarks		landmarks
models		models
preprocessing		preprocessing
utils		utils
LICENSE		LICENSE
README.md		README.md
data.py		data.py
main		main
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

Lipreading using Temporal Convolutional Networks

Content

IDL FINAL PROJECT TEAM4

Contributors : @ib33x99, @choyoungjung, @KangHyunPark

Introduction

Preprocessing

How to install environment

How to prepare dataset

How to train

How to test

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages