Abstract

We present SeeingThroughClutter, a method for reconstructing structured 3D representations from single images by segmenting and modeling objects individually. Prior approaches rely on intermediate tasks such as semantic segmentation and depth estimation, which often underperform in complex scenes, particularly in the presence of occlusion and clutter. We address this by introducing an iterative object removal and reconstruction pipeline that decomposes complex scenes into a sequence of simpler subtasks. Using VLMs as orchestrators, foreground objects are removed one at a time via detection, segmentation, object removal, and 3D fitting. We show that removing objects allows for cleaner segmentations of subsequent objects, even in highly occluded scenes. Our method requires no task-specific training and benefits directly from ongoing advances in foundation models. We demonstrate stateof-the-art robustness on 3D-Front and ADE20K datasets

More coming soon!

BibTeX

@inproceedings{Aguinakang2026SeeingThroughClutter,
        title={Seeing Through Clutter: Structured 3D Scene Reconstruction via Iterative Object Remova},
        author={Aguina-Kang, Rio and Blackburn-Matzen, Kevin James and Groueix, Thibault and Kim, Vladimir and Gadelha, Matheus},
        booktitle = {International Conference on 3D Vision (3DV)},
        year={2026}
    }

Seeing Through Clutter:
Structured 3D Scene Reconstruction via Iterative Object Removal

Given an input image, a VLM repeatedly identifies the most prominent foreground object, reconstructs and poses it in 3D, removes it from the image, and then iterates this process until no objects remain.

Abstract

More coming soon!

BibTeX

Seeing Through Clutter:Structured 3D Scene Reconstruction via Iterative Object Removal

Given an input image, a VLM repeatedly identifies the most prominent foreground object, reconstructs and poses it in 3D, removes it from the image, and then iterates this process until no objects remain.

Abstract

More coming soon!

BibTeX

Seeing Through Clutter:
Structured 3D Scene Reconstruction via Iterative Object Removal