Decode the Drawings - Fun with Radu

A very cool puzzle created by Radu Mariescu-Istodor (Lecturer in Computer Science at Karelia University of Applied Sciences) to decode drawings by analyzing video of a triangle of balls, captured from the top of a pencil.

https://radufromfinland.com/decodeTheDrawings/

Excluded directories (data)

drawings/ - txt files with drawing data
drawings_ai/ - txt files with drawing data generated by neural network
frames/ - screenshots for visual comparison
models/ - neural network model files
output/ - generated plots
output_ai/ - plots generated by neural network
video_locations/ - metadata for simulated videos used to train neural network
videos/ - videos from Radu and simulation data

Installation

Python Setup

Install Python 3.12+ (if not already installed, lower might also work)

Install required Python packages:

pip install opencv-python numpy scipy matplotlib moviepy torch tqdm scikit-learn

JavaScript Setup (for simulation)

Install Node.js (if not already installed)
Navigate to the JavaScript directory and install dependencies:
```
cd javascript
npm install
```

Data Setup

Place Radu's video files in the videos/ directory
Videos should be named as 1.mp4, 2.mp4, etc., or 1.webm, 2.webm, etc.

Usage

Process video data: Run the main processing script. This extracts ball positions and audio data, and creates a plot + point data of the the result:
```
python process_data.py
```
Train neural network (optional): If you want to use the AI approach:
```
python train_nn.py
```
Make predictions: Use the trained model to predict drawings. This creates a plot + point data of the result:
```
python predict_nn.py
```
Run JavaScript simulation (optional): To validate approaches or generate training data:
```
cd javascript
npm run dev
```
This will start a Vite development server. Open the provided local URL (usually http://localhost:5173) in your browser to run the 3D simulation.

Modify VIDEO to select the video to process, modify MODEL (predictions only) to select the video the model was trained on:

  VIDEO = "1"  # Change to process different videos

Output

Decoded drawings are saved as text files in drawings/ (traditional approach) or drawings_ai/ (neural network approach)
Visualization plots are generated in output/ or output_ai/
Each line in the drawing files contains x,z coordinates of the pen position

Configuration

Adjust parameters in global_settings.py for different camera setups or ball configurations
Modify filtering and processing parameters in the respective processing scripts

Languages

Python (for analysis)
JavaScript (for simulation)

Libraries

Three.js (3D simulation)
SciPy (optimization and filtering)
OpenCV (image analysis)
MoviePy (audio analysis)
Matplotlib (for visualization and result analysis)
PyTorch (for the Neural Network)

Simulation

A JavaScript simulation of the first drawing (a circle) was used to validate the solution on a known problem with no disturbances: the camera is always horizontal and pointing at the centroid.

The simulation was also used to validate the assumption that roll and pitch are less important than yaw. Update: roll cannot be ignored when combined with yaw.

The horizontal offset estimation was also validated using the simulation.

Camera Intrinsics

Both OpenCV and mathematical approaches failed at estimating the camera intrinsics - shame on me! :-(

I therefore used a ruler and visually estimated the horizontal field of view at 60 degrees, assuming square pixels and a principal point exactly at the center of the screen.

Image Processing

For image processing, OpenCV was used. Due to the camera projection of 3D balls onto a 2D surface, they appear as ellipses. OpenCV can find contours, fit ellipses, and extract the center and axes of these ellipses.

Reconstructing Camera Position

Under the assumption that the camera starts 18cm in front of the centroid of the equilateral triangle, the initial ball sizes in pixels of the minor axes are used to calculate the distances from the balls to the camera, using the fact that size is inversely proportional to distance. These 3 distances, combined with the known viewing angle and ball positions, are used to reconstruct the apex of the viewing 'pyramid'.

Correcting Errors

Rotation (Roll)

To correct camera rotation, the angle between the blue and green balls is used. This angle changes slowly due to perspective while moving. Sudden changes are therefore likely errors. These errors are detected by subtracting a low-pass filter from the measured data. The error angle is then reversed by rotating the coordinates of the red, green, and blue balls around the center of the screen before estimating the horizontal offset.

Horizontal Offset (Yaw)

The assumption that the camera always points at the centroid is not true due to small aiming errors. Simulation experiments show that rotation (roll), which is very small, and the y-axis (pitch) are not as problematic. The x-axis (yaw) is the most important error to correct. We take advantage of the fact that vertical lines stay vertical under perspective projection (assuming roll is small). We can therefore use the x-positions of the blue, red, and green balls along with the x-position of the screen center, combined with the known triangle size, to calculate the camera's true center and find the horizontal offset. This is done using the cross-ratio of the 4 points, which is projectively invariant. This offset is then used to shift the pencil perpendicular to the viewing direction (this is not fully accurate, but helps reduce the error).

Vertical Offset (Pitch)

As mentioned earlier, the camera doesn't always point at the triangle's center. Using the camera intrinsics and estimated distance, the angle toward (or away from) the triangle can be calculated. Simple trigonometry then determines the offset toward or away from the triangle.

Pen Lift Detection

To detect when the pen is lifted, the audio track is analyzed. By calculating the dB level of the video and applying a low-pass filter to avoid reacting to brief sounds, it becomes clear when the pen is touching the paper versus when it's in the air.

Neural Network Approach

Simulation data was used to train a neural network to predict positions in real videos. The results were not spectacular, but still much better than expected. With more training data, results improved, but the biggest improvement came from normalizing the data to the range [-1, 1] around the screen center. Reducing the network from 4 to 3 hidden layers also improved results, indicating there was (and might still be) overfitting to the simulated training data.

Fun fact: I also tried using real video 1 (under the assumption that Radu has drawn a perfect circle with constant speed) as training data for the network, and used it to predict the star. The result - a morphed star - was far from perfect, but at the same time far better than expected! Perhaps using the (perfect) five-pointed star as training data (tune the corners!) could be an affective approach.

Note: The AI has not learned to detect pen lifting; instead, I used the audio analysis approach as in the non-AI decodings.

Future Ideas / TODO

Color models: Compare different color models. RGB works well in simulation, but HSV might be better for real videos
Filtering: Filtering makes images smoother but not necessarily more accurate. More experiments needed. Kalman filtering/sensor fusion might perform better than simple filtering
Position estimation: Current decoding assumes the camera is held correctly, with corrections applied afterward. Estimating position without assumptions might be more robust
Distance estimation: Currently based on the minor axis size of circles. More sophisticated methods exist in these papers (which I unfortunately couldn't get working):
- Single View 3D Reconstruction under an Uncalibrated Camera and an Unknown Mirror Sphere
- A Minimal Solution for Image-Based Sphere Estimation

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
bak		bak
javascript		javascript
.gitignore		.gitignore
README.md		README.md
global_settings.py		global_settings.py
helper_functions.py		helper_functions.py
neural_network.py		neural_network.py
plot_data.py		plot_data.py
predict_nn.py		predict_nn.py
process_data.py		process_data.py
process_video.py		process_video.py
train_nn.py		train_nn.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Decode the Drawings - Fun with Radu

Installation

Python Setup

JavaScript Setup (for simulation)

Data Setup

Usage

Output

Configuration

Languages

Libraries

Simulation

Camera Intrinsics

Image Processing

Reconstructing Camera Position

Correcting Errors

Rotation (Roll)

Horizontal Offset (Yaw)

Vertical Offset (Pitch)

Pen Lift Detection

Neural Network Approach

Future Ideas / TODO

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Decode the Drawings - Fun with Radu

Installation

Python Setup

JavaScript Setup (for simulation)

Data Setup

Usage

Output

Configuration

Languages

Libraries

Simulation

Camera Intrinsics

Image Processing

Reconstructing Camera Position

Correcting Errors

Rotation (Roll)

Horizontal Offset (Yaw)

Vertical Offset (Pitch)

Pen Lift Detection

Neural Network Approach

Future Ideas / TODO

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages