Ideeza, Israel

3D Mesh Reconstruction

Lead ML Engineer
Dec 2020 – Dec 2021

Developed a single-image to 3D mesh reconstruction system using CNN + Graph Neural Network architecture, achieving top-3 performance on the ShapeNet benchmark.

Top-3

ShapeNet Benchmark

5-10x

Design Speedup

0.42

Chamfer Distance

13

Categories

The Challenge

The client, a hardware design company, needed to accelerate their prototyping process. Designers would sketch concepts or take photos of existing products, but converting these to 3D models for manufacturing required skilled CAD engineers and days of work. They needed AI that could generate initial 3D meshes from single images.

Key Challenges

  • Inferring 3D structure from a single 2D image (inherently ambiguous)
  • Generating meshes with proper topology for manufacturing
  • Handling diverse object categories with varying complexity
  • Producing watertight meshes suitable for 3D printing
  • Achieving sufficient detail for practical use

The Solution

I developed a two-stage architecture: a CNN encoder extracts image features and predicts coarse 3D structure, then a Graph Convolutional Network refines the mesh topology and vertex positions. This approach leverages the CNN's strength in image understanding while using GCNs' ability to reason about mesh connectivity.

1

Designed CNN backbone using ResNet-50 with feature pyramid for multi-scale feature extraction

2

Implemented differentiable mesh representation enabling end-to-end training

3

Built Graph Convolutional Network for iterative mesh refinement with edge-aware convolutions

4

Developed custom loss functions combining Chamfer distance, edge length, and Laplacian smoothing

5

Created data augmentation pipeline including synthetic rendering and photometric augmentation

System Architecture

The architecture follows an encode-decode pattern with iterative refinement. Each refinement step progressively improves mesh quality while maintaining topological validity.

Image Encoder

ResNet-50 backbone with Feature Pyramid Network. Extracts hierarchical features capturing both global shape and local details. Outputs multi-scale feature maps.

Coarse Shape Predictor

Fully connected layers predicting initial mesh as deformed sphere or category-specific template. Establishes basic 3D structure from 2D observation.

Graph Neural Network Refiner

3-stage GCN that iteratively refines vertex positions. Each stage uses image features via graph-image attention mechanism. Respects mesh topology during refinement.

Mesh Post-Processor

Handles mesh cleanup: removes self-intersections, ensures watertightness, and applies Laplacian smoothing for manufacturing-ready output.

Training Pipeline

Multi-GPU training with mixed precision. Custom dataloader for ShapeNet with on-the-fly rendering from random viewpoints.

Key Implementation Details

Differentiable Mesh Representation

To enable end-to-end training, I implemented a differentiable mesh renderer and mesh-to-point-cloud sampler. This allowed computing losses in both 2D (rendered image comparison) and 3D (point cloud metrics) spaces, providing rich training signal.

Graph-Image Attention

The key innovation was allowing each mesh vertex to attend to relevant image features. I implemented a cross-attention mechanism where vertex features query the image feature map, enabling the GCN to access detailed texture and edge information when refining vertex positions.

Topology-Aware Refinement

Naive vertex movement can create self-intersections. I designed the GCN to predict vertex displacements constrained by neighboring vertex positions, with additional regularization losses penalizing invalid topologies. This produced clean, manufacturable meshes.

Multi-Category Training Strategy

Training a single model across diverse ShapeNet categories (chairs, cars, planes) required careful handling. I implemented category-conditional normalization and category-weighted sampling to prevent dominant categories from overwhelming training.

Tech Stack

Deep Learning

PyTorchPyTorch3DPyTorch Geometric

3D Processing

Open3DTrimeshBlender (scripting)OpenGL

Training

Mixed Precision (AMP)Distributed Data ParallelWeights & Biases

Data

ShapeNetPix3DCustom Synthetic Rendering

Key Learnings

Graph neural networks are powerful for geometric reasoning but require careful architecture design

Differentiable rendering opens up powerful training signals—invest in good implementations

3D reconstruction inherently involves ambiguity—embracing uncertainty (multiple plausible outputs) often works better than forcing single predictions

Manufacturing constraints should be built into the model, not applied as post-processing

Interested in Similar Solutions?

Let's discuss how I can help bring your AI ideas to production.

Get in Touch