Img2CADSeq: Image-to-CAD Generation via Sequence-Based Diffusion

Shiyu Tan, Zixuan Zhao, Hao Gao, Zhiheng Chen, Xiaolong Yin, Enya Shen*
Tsinghua University

Teaser

Teaser Image

Img2CADSeq is a sota BRep pipeline that outputs standardized STEP files. Shown are generations via: unconditional (Cols 1-3), single-view image (Cols 4-5), and point cloud (Col 6) conditioning.

Abstract

Boundary Representation (BRep) is the standard format for Computer-Aided Design (CAD), yet reconstructing high-quality BReps from single-view images remains challenging due to the complexity of topological constraints and operation sequences.

We present Img2CADSeq, a multi-stage pipeline that overcomes these limitations by encoding CAD sequences into a three-level hierarchical codebook. Guided by an importance prioritization, this strategy values profiles over details, compressing long sequences into a stable discrete latent space. To bridge the modality gap, we leverage a coarse-to-fine point cloud intermediate, aligning 2D visual features with 3D CAD sequences via contrastive learning to condition a VQ-Diffusion model.

Supported by newly introduced CAD-220K and PrintCAD datasets, our approach ensures robust industrial domain adaptation. Extensive experiments demonstrate that Img2CADSeq significantly outperforms state-of-the-art methods, producing standard STEP files that can be directly used in commercial CAD software.

CAD generation

Image-conditional CAD generation

Teaser Image
Teaser Image
Teaser Image
Teaser Image
Teaser Image
Teaser Image
Teaser Image
Teaser Image
Teaser Image
Teaser Image
Teaser Image
Teaser Image
Teaser Image
Teaser Image
Teaser Image

Pipeline

Teaser Image

Overview of the Img2CADSeq Framework. In the first stage, hierarchical sequence encoding represents CAD operations via a three-level codebook into a discrete space. Then we lift the input image into a 3D point cloud using a tailored network trained jointly on both synthetic and real-world data types, which is then refined by UA-DGCNN to sharpen edges and smooth surfaces. Finally, we employ contrastive learning to align the geometric embeddings with the CAD latent space, guiding a VQ-Diffusion model to predict a valid CAD operation sequence to be compiled into a watertight BRep

Point-cloud conditional CAD generation

Workflow

Teaser Image

Workflow of Hierarchical Entity Construction. At the base level, the Curve-Cluster parameterizes geometric primitives, which form closed loops in the Sketch-Patch. These loops are then lifted into 3D space via a normal vector and origin to perform extrusion and Boolean operations, resulting in an Extrude-Block. Multiple blocks are finally assembled to yield the target solid. This process mirrors the construction history of standard CAD workflows, preserving human design intent.

Dataset

Teaser Image

To address the scarcity of industrial data, a key contribution of our work lies in the combination of two distinct data types: curated synthetic models (CAD-220K) and real-world captured objects (PrintCAD). By leveraging these two joint data types, we train a network to generate intermediate point clouds. This combination explicitly bridges the sim-to-real gap, enhancing the model’s generalization on mechanical parts.

BibTeX

@article{tan2026img2cadseq,
  title={Img2CADSeq: Image-to-CAD Generation via Sequence-Based Diffusion},
  author={Tan, Shiyu and Zhao, Zixuan and Gao, Hao and Chen, Zhiheng and Yin, Xiaolong and Shen, Enya},
  journal={arXiv preprint arXiv:2605.13293},
  year={2026}
}