🍉
Fruit Ninja

computer vision edition



What

A webcam-controlled version of Fruit Ninja where your hand is the blade. Move your index finger to slice fruits, pinch your thumb and index to pause. No mouse, no keyboard, just computer vision.

Built to see if hand tracking could replace traditional input methods for simple games. I concluded that if MediaPipe can detect 21 hand landmarks in real-time, it can handle gesture-based game controls.

Source code

You may clone the repo to try it out for yourself. And please, if you have any additional features or spot a bug, open up a pull request.


Me playing the game.


Case Study

This was the project that finally made all the “probability & statistics + linear algebra + stochastic calculus" (hit me up if you want the whole Obsidian 🔗 note i used) stuff click together into something cool. After doing this, I am definitely going to be dabbling in some more in computer vision / machine learning.

It was also my first project where it wasn't "tight". No immediate feedback, no build step, no deployment. It was just change code, run, see it work (or not), repeat.


Context


This project started as a basic hand-tracking tool: MediaPipe + OpenCV + PyAutoGUI to move the cursor using my index finger and use a pinch as a left-click. That alone could have sufficed, but being the person that I am, I really wanted to gamify it.

So I booted up Perplexity 🔗 and prompted it, "simple game ideas using hand tracking." It suggested this Fruit Ninja clone, which sounded good (I chose it because the rest of the ideas were rubbish). So I set out to build it.


Approach


Webcam capture using cv2.VideoCapture, MediaPipe’s HandLandmarker for 3D hand landmarks, and OpenCV for all rendering, text, and collision visualization. I chose to encapsulate all of this within a single loop.

The “blade” is just a trail of recent index finger positions stored in a list and rendered as a line across frames. The sole reason I included it was because I needed the reassurance that it was actually detecting my finger movement, since there’s no physical controller.

Slicing is a distance check between those trail points and each fruit’s center; when within radius, the fruit marks itself sliced and switches to a split animation.

Game state lives in a few scalars and lists:

  • fruits
  • scores
  • lives
  • paused
  • spawn_timer

Fruits spawn with randomized \( x \), initial upward velocity, and gravity. Then the computer updates each frame until they’re sliced or fall off-screen. And lastly, a thumb–index pinch, detected via landmark distance threshold, toggles pause / play.


The Math


Under the hood this project is basically: take 3D hand landmarks, convert to 2D pixel coordinates, then use Euclidean geometry for the rest.

MediaPipe gives each hand as 21 landmarks with \( (x, y, z) \) where \(x, y\) are normalized to \([0, 1]\) in image coordinates and \(z\) is the relative depth value.

To actually get a fingertip in pixels, you do:

  • \( x_{px}= landmark.x × frame\)_\(width \)
  • \( y_{px}= landmark.y × frame\)_\(height \)

Once you have that, you need to detect pinching. The way I did it was simple distance thresholding between the thumb tip (landmark 4) and index finger tip (landmark 8) in normalized landmark space:

In essence, you're computing the 3D Euclidean distance:

\( d = \sqrt{(x_4 - x_8)^2 + (y_4 - y_8)^2 + (z_4 - z_8)^2} \) and treating pinch as when \( d < 0.05 \).


Conceptually, landmarks closer together in camera space \( = \) a smaller \( d \).

Now for the fruit itself. Each fruit is a point mass with basic kinematics in 2D. And it has, really 2 components:

  1. Position: \( (x, y) \)
  2. Velocity: \( (v_x, v_y) \)

And then you just update each frame:

\( v_y \) ← \( v_y + g \) (gravity term, here \( g = 0.5 \) pixels/frame²)

\( x \) ← \( x + v_x \)

\( y \) ← \( y + v_y \)


So they follow discrete parabolic arcs until sliced or off-screen (bottom). And the off-screen check is just if
\( y > frame\)_\(height + margin\).


Lastly, slicing detection. Basically, each fruit is treated as a filled circle with a center \( (f_x, f_y) \) and radius \( r \). The slicing test is a point-circle distance between each trail point \( (x_i, y_i) \) and fruit center:

\( d = \sqrt{(x_i - x_f)^2 + (y_i - y_f)^2} \)


And if at any point in the recent trail has \( d < r \), the fruit is counted as sliced.

And that's basically it!

The whole game is essentially a loop over time where you repeatedly:

(1) Map normalized landmark coordinates to pixels
(2) Update fruit state with simple kinematics
(3) Apply distance-based tests for gestures (pinch) and slicing (trail vs. circle) in a consistent 2D coordinate frame


Outcome


This project was like scratching an itch I didn't know was there. Web development is engaging enough, but actually seeing code translate to real-time physical interaction?

That's something else entirely for someone who's only ever worked with MERN stacks. There's a certain immediacy to it that frontend work never quite captures.

Anyways, now I have a basic hand-tracking framework set up in Python if I ever decide to do anything more with CV.

Next up: maybe some actual gesture recognition, or even a full-on sign language interpreter. Who knows. Reach out, maybe a collab could happen.


Tech Stack


Frontend

  • Python 3.8+
  • MediaPipe (Google's ML framework for hand tracking)
  • OpenCV (video processing)
  • PyAutoGUI (mouse control)

ML Model

  • MediaPipe Hand Landmarker (pre-trained model)

Platform

  • Cross-platform (macOS, Windows, Linux)
  • Webcam-based input