
Advancing VLN: O3D-SIM's Role in Human-Robot Collaboration and Simulation
16 Dec 2025
O3D-SIM introduces an open-set 3D representation that significantly improves the success rate of natural language queries.

O3D-SIM Visualization: Accurately Identifying Unseen Objects and Instances
16 Dec 2025
Qualitative analysis showcases O3D-SIM's open-set capability, identifying objects like wheelchairs and mannequins unseen by closed-set methods.

Quantitative Evaluation of O3D-SIM: Success Rate on Matterport3D VLN Tasks
16 Dec 2025
Quantitatively evaluates O3D-SIM using the Matterport3D dataset and Success Rate metric in the Habitat simulator

Evaluating Novel 3D Semantic Instance Map for Vision-Language Navigation
16 Dec 2025
The experimental section details the evaluation of the O3D-SIM representation and its integration with ChatGPT for Vision-Language Navigation (VLN).

VLN: LLM and CLIP for Instance-Specific Navigation on 3D Maps
16 Dec 2025
The Language-Guided Navigation module leverages an LLM (like ChatGPT) and the open-set O3D-SIM.

3D Mapping Initialization: Using RGB-D Images and Camera Parameters
14 Dec 2025
O3D-SIM creation starts with capturing posed RGB-D images and camera parameters.

Building Open-Set 3D Representation: Feature Fusion and Geometric-Semantic Merging
14 Dec 2025
O3D-SIM is built by projecting 2D masks and embeddings to 3D, using DBSCAN for initial refinement.

Open-Set Semantic Extraction: Grounded-SAM, CLIP, and DINOv2 Pipeline
13 Dec 2025
Describes the process of extracting open-set semantic instance information.

Semantic Instance Extraction: CLIP and DINO Features for 3D Mapping
10 Dec 2025
Details the O3D-SIM pipeline for VLN. It extracts open-set semantic instance information (masks, CLIP/DINO features) from RGB-D images