Welcome to the ClipABit team!
As part of your onboarding, we want to make sure you are familiar with the software development lifecycle and caught up on the basics of building software in a team. This includes working with Git, writing Python code, testing, and submitting a PR. By the end of this task, you should be refreshed/taught all foundational software skills you will need to be a contributing member to the team.
The first step in our search engine's pipeline is processing videos. To do that we need to extract indivdual frames and audio transcriptions from our clips. You will be creating a slightly modified version of that. The goal is to open a video file, extract frames at a fixed interval, create an audio transcription of the entire clip and then save it to an output directory.
The first thing you need to do is clone the repository. Ensure that you have Python and Git installed on your computer. On your computer, please run the following command:
git clone https://github.com/ClipABit/Onboarding-Project.gitNext, open the project, and before doing anything else, please make a new branch from main for you to work on.
- Open a new terminal and create a new branch using
git checkout -b "FirstName_Onboarding"- Replace
FirstNameappropriately
- Replace
- As you work through this exercise, please make sure you push and commit your changes to save your work. Everything moving forward should be done in your branch.
Before you can do any video processing, you need to set up your environment and download the video to your project.
-
Check your Python installation and version:
python3 --version
python --version
Make sure Python 3.7 or higher is installed.
-
(macOS only) If you encounter SSL certificate errors when running Python scripts, run:
/Applications/Python\ 3.x/Install\ Certificates.command
Replace
3.xwith your installed Python version (e.g.,3.12). -
Open a new terminal and make sure you are in the root directory of the project.
~/Onboarding-Project -
Create and start a new virtual environment with the following commands.
python3 -m venv .venv source .venv/bin/activatepython -m venv .venv .venv\Scripts\activate
To deactivate the virtual environment when you are done, run:
deactivate
-
After activation, install dependencies with:
pip install -r requirements.txt
-
Download the sample footage by running
python download_video.py
You should see a new directory
/videoappear with a filePeterGriffin.mp4 -
You are now ready to begin! Open
main.pyand proceed
You will notice there are two functions left blank. Those are what you will be working on. Please refer to config.py to see all the available libraries to use.
- The function
extract_frames()should read a video from the directory/videoand extract frames everyintervalseconds. These frames should then be saved as a.jpgin the output directory. Hint: you will need to use thecv2library. - The function
extract_audio()should transcribe the audio from the video and save it to the output directory. Hint: you will need to use theffmpeglibrary.
You are provided the skeleton code containing the necessary imports and the function signature, you only need to implement the body of the function. You may author the function as you like and should be able to do so used the provided outline. You do not need to modify any code outside the function body.
This excersise is to get everyone on the same page high quality software. Your code should be robust, including error handling and comments where appropriate. You may use AI, but make sure you understand the code deeply and ensure it is of high quality. (Think: Would this code be suitable for a production environment?)
Unit tests are provided in test_main.py using pytest.
To run the tests go the root directory of the project and run
pytest test_main.pyIf you see that all tests pass then you have successfully completed the project.
Note: if you get any incompatible architecture errors, run pip cache purge and try again.
Once you have finished, you will need to submit a Pull Request for us to review. On Github, open an new PR from your branch to main. Send us a message on discord attaching the link to the PR