GenAI - more than just cute cats on the moon

Tommaso Pardi
Feb 22
2 min read

I spend my days at the intersection of code, hardware, and cutting-edge AI, and in the last period, I’ve been diving deep into diffusion models—a fascinating class of generative AI that’s transforming how we approach data synthesis and perception in robotics. From generating realistic sensor data to enhancing image-based navigation, diffusion models are proving to be a game-changer. Here’s what I’ve been exploring and why it matters for robotics.

What Are Diffusion Models?

Diffusion models, like the Denoising Diffusion Probabilistic Models (DDPMs) I’ve been experimenting with, work by learning to reverse a gradual noising process. Imagine taking a clear image, adding noise until it’s unrecognizable, and then training a neural network to peel back that noise layer by layer. The result? A model that can generate high-quality samples—think crisp images or coherent sensor readings—from random noise. In my latest project, I implemented a DDPM to generate MNIST digits, but the real potential lies in robotics applications.

Why Robotics?

Robots rely heavily on perception—interpreting camera feeds, LiDAR point clouds, or tactile sensor data to navigate and interact with the world. Traditional methods often struggle with noisy or incomplete data, especially in unstructured environments like warehouses or outdoor terrains. Diffusion models offer a robust solution:

Data Augmentation: Generate synthetic training data to simulate rare edge cases (e.g., occluded objects or low-light conditions).
Denoising: Clean up sensor noise in real-time, improving object detection or SLAM (Simultaneous Localization and Mapping).
Simulation-to-Reality: Bridge the gap between simulated and real-world data by generating realistic variations.

Starting from zero (almost)

Reading papers is a fantastic method to be updated on technology, but it is the implementation where the rubber meets the road. Therefore, I decided to implement the first version of DDPM from scratch on a toy scenario.

I built a DDPM for MNIST using PyTorch and a lightweight U-Net. Starting with random noise, the model learned to reconstruct handwritten digits over 1000 timesteps. While MNIST is a toy dataset, the principles scale to robotic vision tasks. Imagine applying this to a robot’s camera feed: reconstructing a partially obscured object or enhancing low-resolution depth maps. The training took surprisingly only a few minutes (up to 10) on my laptop GPU, but the results were very interesting! Because I was interested in testing the functionalities more than the performance, some of the digits are still smudged, but I will do more training to improve sharpness and performance.

Challenges and Next Steps

Diffusion models aren’t perfect. They’re computationally expensive, requiring hundreds of steps to sample new data, which isn’t ideal for real-time robotics. My next goal is to integrate Denoising Diffusion Implicit Models (DDIM), which promise faster sampling without sacrificing quality. It'd be very interesting to see how to adapt this to 3D point clouds for a robotic arm project - think precise grasping in cluttered environments.

Takeaway

Diffusion models are more than just a generative AI buzzword—they’re a tool with a tangible impact on robotics. By mastering noise, we can teach robots to see and act with greater clarity and confidence. Stay tuned as I refine this approach and share more updates from my home-made-lab!

#robotics #ai #ddpm #diffusion #genai

GenAI - more than just cute cats on the moon

What Are Diffusion Models?

Why Robotics?

Starting from zero (almost)

Challenges and Next Steps

Takeaway

Recent Posts

Comments

CONTACT