TL;DR: FlashWorld enables fast (7 seconds on a 1x A100/A800 GPU, 4 seconds on 1x H100/H800 GPU) and high-quality 3D scene generation across diverse scenes, from a single image or text prompt.
LLaVA-3D could perform both 2D and 3D vision-language tasks. The left block (b) shows that compared with previous 3D LMMs, our LLaVA-3D achieves state-of-the-art performance across a wide range of 3D ...
Abstract: Accurate segmentation of 3D point clouds in indoor scenes remains a challenging task, often hindered by the labor-intensive nature of data annotation. While weakly supervised learning ...
Abstract: In remote sensing (RS), Few-Shot Novel View Synthesis (FS-NVS) focuses on creating images of unobserved viewpoints using limited training images. Recently, 3D Gaussian Splatting (3DGS) has ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results