Skip to content

basboot/decode_drawings

Repository files navigation

Decode the Drawings - Fun with Radu

A very cool puzzle created by Radu Mariescu-Istodor (Lecturer in Computer Science at Karelia University of Applied Sciences) to decode drawings by analyzing video of a triangle of balls, captured from the top of a pencil.

https://radufromfinland.com/decodeTheDrawings/

Excluded directories (data)

  • drawings/ - txt files with drawing data
  • drawings_ai/ - txt files with drawing data generated by neural network
  • frames/ - screenshots for visual comparison
  • models/ - neural network model files
  • output/ - generated plots
  • output_ai/ - plots generated by neural network
  • video_locations/ - metadata for simulated videos used to train neural network
  • videos/ - videos from Radu and simulation data

Installation

Python Setup

  1. Install Python 3.12+ (if not already installed, lower might also work)
  2. Install required Python packages:
    pip install opencv-python numpy scipy matplotlib moviepy torch tqdm scikit-learn

JavaScript Setup (for simulation)

  1. Install Node.js (if not already installed)
  2. Navigate to the JavaScript directory and install dependencies:
    cd javascript
    npm install

Data Setup

  • Place Radu's video files in the videos/ directory
  • Videos should be named as 1.mp4, 2.mp4, etc., or 1.webm, 2.webm, etc.

Usage

  1. Process video data: Run the main processing script. This extracts ball positions and audio data, and creates a plot + point data of the the result:

    python process_data.py
  2. Train neural network (optional): If you want to use the AI approach:

    python train_nn.py
  3. Make predictions: Use the trained model to predict drawings. This creates a plot + point data of the result:

    python predict_nn.py
  4. Run JavaScript simulation (optional): To validate approaches or generate training data:

    cd javascript
    npm run dev

    This will start a Vite development server. Open the provided local URL (usually http://localhost:5173) in your browser to run the 3D simulation.

Modify VIDEO to select the video to process, modify MODEL (predictions only) to select the video the model was trained on:

  VIDEO = "1"  # Change to process different videos

Output

  • Decoded drawings are saved as text files in drawings/ (traditional approach) or drawings_ai/ (neural network approach)
  • Visualization plots are generated in output/ or output_ai/
  • Each line in the drawing files contains x,z coordinates of the pen position

Configuration

  • Adjust parameters in global_settings.py for different camera setups or ball configurations
  • Modify filtering and processing parameters in the respective processing scripts

Languages

  • Python (for analysis)
  • JavaScript (for simulation)

Libraries

  • Three.js (3D simulation)
  • SciPy (optimization and filtering)
  • OpenCV (image analysis)
  • MoviePy (audio analysis)
  • Matplotlib (for visualization and result analysis)
  • PyTorch (for the Neural Network)

Simulation

A JavaScript simulation of the first drawing (a circle) was used to validate the solution on a known problem with no disturbances: the camera is always horizontal and pointing at the centroid.

The simulation was also used to validate the assumption that roll and pitch are less important than yaw. Update: roll cannot be ignored when combined with yaw.

The horizontal offset estimation was also validated using the simulation.

Camera Intrinsics

Both OpenCV and mathematical approaches failed at estimating the camera intrinsics - shame on me! :-(

I therefore used a ruler and visually estimated the horizontal field of view at 60 degrees, assuming square pixels and a principal point exactly at the center of the screen.

Image Processing

For image processing, OpenCV was used. Due to the camera projection of 3D balls onto a 2D surface, they appear as ellipses. OpenCV can find contours, fit ellipses, and extract the center and axes of these ellipses.

Reconstructing Camera Position

Under the assumption that the camera starts 18cm in front of the centroid of the equilateral triangle, the initial ball sizes in pixels of the minor axes are used to calculate the distances from the balls to the camera, using the fact that size is inversely proportional to distance. These 3 distances, combined with the known viewing angle and ball positions, are used to reconstruct the apex of the viewing 'pyramid'.

Correcting Errors

Rotation (Roll)

To correct camera rotation, the angle between the blue and green balls is used. This angle changes slowly due to perspective while moving. Sudden changes are therefore likely errors. These errors are detected by subtracting a low-pass filter from the measured data. The error angle is then reversed by rotating the coordinates of the red, green, and blue balls around the center of the screen before estimating the horizontal offset.

Horizontal Offset (Yaw)

The assumption that the camera always points at the centroid is not true due to small aiming errors. Simulation experiments show that rotation (roll), which is very small, and the y-axis (pitch) are not as problematic. The x-axis (yaw) is the most important error to correct. We take advantage of the fact that vertical lines stay vertical under perspective projection (assuming roll is small). We can therefore use the x-positions of the blue, red, and green balls along with the x-position of the screen center, combined with the known triangle size, to calculate the camera's true center and find the horizontal offset. This is done using the cross-ratio of the 4 points, which is projectively invariant. This offset is then used to shift the pencil perpendicular to the viewing direction (this is not fully accurate, but helps reduce the error).

Vertical Offset (Pitch)

As mentioned earlier, the camera doesn't always point at the triangle's center. Using the camera intrinsics and estimated distance, the angle toward (or away from) the triangle can be calculated. Simple trigonometry then determines the offset toward or away from the triangle.

Pen Lift Detection

To detect when the pen is lifted, the audio track is analyzed. By calculating the dB level of the video and applying a low-pass filter to avoid reacting to brief sounds, it becomes clear when the pen is touching the paper versus when it's in the air.

Neural Network Approach

Simulation data was used to train a neural network to predict positions in real videos. The results were not spectacular, but still much better than expected. With more training data, results improved, but the biggest improvement came from normalizing the data to the range [-1, 1] around the screen center. Reducing the network from 4 to 3 hidden layers also improved results, indicating there was (and might still be) overfitting to the simulated training data.

Scherm­afbeelding 2025-06-18 om 21 14 19

Fun fact: I also tried using real video 1 (under the assumption that Radu has drawn a perfect circle with constant speed) as training data for the network, and used it to predict the star. The result - a morphed star - was far from perfect, but at the same time far better than expected! Perhaps using the (perfect) five-pointed star as training data (tune the corners!) could be an affective approach.

Note: The AI has not learned to detect pen lifting; instead, I used the audio analysis approach as in the non-AI decodings.

Future Ideas / TODO

  • Color models: Compare different color models. RGB works well in simulation, but HSV might be better for real videos
  • Filtering: Filtering makes images smoother but not necessarily more accurate. More experiments needed. Kalman filtering/sensor fusion might perform better than simple filtering
  • Position estimation: Current decoding assumes the camera is held correctly, with corrections applied afterward. Estimating position without assumptions might be more robust
  • Distance estimation: Currently based on the minor axis size of circles. More sophisticated methods exist in these papers (which I unfortunately couldn't get working):

About

A very cool puzzle created by Radu Mariescu-Istodor

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors