3D Mesh Reconstruction
Developed a single-image to 3D mesh reconstruction system using CNN + Graph Neural Network architecture, achieving top-3 performance on the ShapeNet benchmark.
Top-3
ShapeNet Benchmark
5-10x
Design Speedup
0.42
Chamfer Distance
13
Categories
The Challenge
The client, a hardware design company, needed to accelerate their prototyping process. Designers would sketch concepts or take photos of existing products, but converting these to 3D models for manufacturing required skilled CAD engineers and days of work. They needed AI that could generate initial 3D meshes from single images.
Key Challenges
- •Inferring 3D structure from a single 2D image (inherently ambiguous)
- •Generating meshes with proper topology for manufacturing
- •Handling diverse object categories with varying complexity
- •Producing watertight meshes suitable for 3D printing
- •Achieving sufficient detail for practical use
The Solution
I developed a two-stage architecture: a CNN encoder extracts image features and predicts coarse 3D structure, then a Graph Convolutional Network refines the mesh topology and vertex positions. This approach leverages the CNN's strength in image understanding while using GCNs' ability to reason about mesh connectivity.
Designed CNN backbone using ResNet-50 with feature pyramid for multi-scale feature extraction
Implemented differentiable mesh representation enabling end-to-end training
Built Graph Convolutional Network for iterative mesh refinement with edge-aware convolutions
Developed custom loss functions combining Chamfer distance, edge length, and Laplacian smoothing
Created data augmentation pipeline including synthetic rendering and photometric augmentation
System Architecture
The architecture follows an encode-decode pattern with iterative refinement. Each refinement step progressively improves mesh quality while maintaining topological validity.
Image Encoder
ResNet-50 backbone with Feature Pyramid Network. Extracts hierarchical features capturing both global shape and local details. Outputs multi-scale feature maps.
Coarse Shape Predictor
Fully connected layers predicting initial mesh as deformed sphere or category-specific template. Establishes basic 3D structure from 2D observation.
Graph Neural Network Refiner
3-stage GCN that iteratively refines vertex positions. Each stage uses image features via graph-image attention mechanism. Respects mesh topology during refinement.
Mesh Post-Processor
Handles mesh cleanup: removes self-intersections, ensures watertightness, and applies Laplacian smoothing for manufacturing-ready output.
Training Pipeline
Multi-GPU training with mixed precision. Custom dataloader for ShapeNet with on-the-fly rendering from random viewpoints.
Key Implementation Details
Differentiable Mesh Representation
To enable end-to-end training, I implemented a differentiable mesh renderer and mesh-to-point-cloud sampler. This allowed computing losses in both 2D (rendered image comparison) and 3D (point cloud metrics) spaces, providing rich training signal.
Graph-Image Attention
The key innovation was allowing each mesh vertex to attend to relevant image features. I implemented a cross-attention mechanism where vertex features query the image feature map, enabling the GCN to access detailed texture and edge information when refining vertex positions.
Topology-Aware Refinement
Naive vertex movement can create self-intersections. I designed the GCN to predict vertex displacements constrained by neighboring vertex positions, with additional regularization losses penalizing invalid topologies. This produced clean, manufacturable meshes.
Multi-Category Training Strategy
Training a single model across diverse ShapeNet categories (chairs, cars, planes) required careful handling. I implemented category-conditional normalization and category-weighted sampling to prevent dominant categories from overwhelming training.
Tech Stack
Deep Learning
3D Processing
Training
Data
Key Learnings
Graph neural networks are powerful for geometric reasoning but require careful architecture design
Differentiable rendering opens up powerful training signals—invest in good implementations
3D reconstruction inherently involves ambiguity—embracing uncertainty (multiple plausible outputs) often works better than forcing single predictions
Manufacturing constraints should be built into the model, not applied as post-processing
Interested in Similar Solutions?
Let's discuss how I can help bring your AI ideas to production.
Get in Touch