SAT404/.github/knowledge/1.literature_review.knowledge.md
2025-04-18 11:12:57 +08:00

26 KiB

Research Advancements in Efficient 3D Point Cloud Segmentation: A Literature Review with a Focus on RandLA-Net

The three-dimensional point cloud has emerged as a pivotal data structure for representing the physical world, facilitating a more profound understanding and interaction with complex 3D environments. Its significance spans numerous domains, including computer vision, robotics, autonomous vehicles, and augmented reality, where it serves as the foundation for tasks such as object detection, comprehensive scene understanding, and precise navigation.1 In particular, the semantic segmentation of 3D point clouds, which entails assigning a specific label to each individual point within a dataset, offers a richer and more detailed representation of an environment compared to simpler tasks like object detection that only delineate bounding boxes.2 This granular understanding is crucial for applications such as autonomous driving, where detailed environmental perception is necessary for safe and effective navigation.2 The increasing availability of 3D data from various sensing technologies underscores the importance of developing efficient methods to process and analyze these large datasets, extracting meaningful high-level features for practical applications.4 Point cloud semantic segmentation plays a vital role in this context, dividing the original point cloud into semantically distinct subsets, thereby enabling a more nuanced interpretation of the 3D world.4 While traditional approaches to 3D segmentation relied on hand-crafted features, the field has increasingly embraced deep learning techniques due to their superior ability to learn complex and generalizable features directly from the data.5 The direct acquisition of 3D data through technologies like motion capture further highlights the need for efficient processing to enable real-time applications.6 Despite the growing importance of point clouds, their inherent characteristics, such as sparsity, irregularity, and lack of an ordered structure, present significant challenges for processing and analysis, especially when dealing with the vast amounts of data generated by modern sensors.1 Moreover, the density of point clouds can vary considerably depending on the distance from the sensor, posing particular issues in outdoor environments.2 The limited availability of large-scale, accurately labeled datasets for 3D semantic segmentation also complicates the development and evaluation of robust deep learning models.2

The advent of deep learning has revolutionized the field of point cloud processing, with pioneering works like PointNet establishing the feasibility of directly learning from these unstructured datasets.10 PointNet introduced a novel architecture that learns per-point features using shared multilayer perceptrons (MLPs) and aggregates these features using a symmetric function, typically max pooling, to achieve invariance to the order of points in the input.10 This approach marked a significant departure from earlier methods that required converting point clouds into regular formats like voxels or images before processing.13 By directly consuming point clouds, PointNet eliminated the need for manual feature engineering and offered a computationally efficient way to analyze 3D shapes.11 Furthermore, PointNet incorporated input and feature transformation networks (T-Nets) to ensure robustness to rigid transformations such as rotation and translation.11 This foundational work demonstrated the potential of deep learning for various 3D recognition tasks, including object classification, part segmentation, and scene semantic parsing.11 However, a key limitation of PointNet was its independent processing of each point, which prevented it from effectively capturing local spatial relationships and the intricate geometric structures present in point clouds.10 This lack of local context understanding hindered its performance in tasks requiring fine-grained segmentation and generalization to complex scenes, motivating the development of subsequent architectures.

To address PointNet's limitations in capturing local structures, PointNet++ was introduced as a hierarchical neural network that applies PointNet recursively on nested partitions of the input point set.10 This hierarchical approach enables the network to learn features at different scales and capture local details more effectively by exploiting the metric space distances between points.16 The PointNet++ architecture employs set abstraction levels, each consisting of a sampling layer, a grouping layer, and a PointNet layer.17 The sampling layer often utilizes Farthest Point Sampling (FPS) to select a representative subset of points, while the grouping layer identifies neighboring points around these sampled centroids using ball queries or k-nearest neighbors. The PointNet layer then processes these local groups to extract features.17 To handle the variability in point densities, PointNet++ incorporates multi-scale grouping (MSG), which extracts and aggregates features at different scales by varying the neighborhood size.17 For semantic segmentation tasks, a decoder with feature propagation modules is typically used to interpolate the learned features back to the original point cloud resolution for per-point classification.19 Experimental results have demonstrated that PointNet++ achieves significantly better performance than PointNet on challenging 3D point cloud benchmarks.10 However, the use of FPS in PointNet++ can be computationally expensive for very large point clouds, and the network still processes points somewhat independently within local groups, not fully considering the intricate relationships between them.7

RandLA-Net presents an alternative approach focused on achieving high efficiency for the semantic segmentation of large-scale 3D point clouds by primarily utilizing random point sampling.7 This choice of sampling strategy stands in contrast to more complex methods like FPS, which can become a computational bottleneck for massive datasets.7 To address the potential loss of crucial features due to the simplicity of random sampling, RandLA-Net introduces a novel Local Feature Aggregation (LFA) module.7 The LFA module is designed to progressively increase the receptive field for each point, effectively preserving geometric details despite the aggressive downsampling inherent in random selection.7 The LFA module comprises three key units: Local Spatial Encoding (LocSE), Attentive Pooling, and a Dilated Residual Block.10 LocSE explicitly encodes the relative spatial coordinates of neighboring points, enabling the network to learn local geometric patterns.10 Attentive Pooling employs an attention mechanism to weight and aggregate the features of these neighboring points, focusing on the most informative ones.10 Finally, the Dilated Residual Block stacks multiple LocSE and Attentive Pooling units to efficiently enlarge the receptive field of each point.10 The overall RandLA-Net architecture typically follows an encoder-decoder structure with skip connections 26, and it primarily utilizes shared MLPs for computational efficiency.10 Notably, RandLA-Net is end-to-end trainable and does not require computationally intensive pre- or post-processing steps like voxelization or graph construction.10

The efficiency of RandLA-Net is a key aspect of its design, enabling it to process very large point clouds rapidly. It has been shown to process up to 1 million points in a single pass, achieving speeds up to 200 times faster than methods like SPG.10 The random sampling strategy employed by RandLA-Net has a computational complexity of O(1) per sampling operation, making it highly scalable to massive datasets, especially when compared to the O(N²) complexity of FPS.10 Experimental results confirm that random sampling in RandLA-Net is significantly faster than FPS and IDIS for large point clouds.10 Furthermore, RandLA-Net maintains a relatively small memory footprint with around 1.24 million parameters 10 and can process up to 1.03 million points in a single pass due to its memory-efficient design.10 This efficiency stems from the combination of fast random sampling and the lightweight MLP-based local feature aggregation module.10 Even with the emergence of more recent architectures, RandLA-Net continues to demonstrate a strong balance between speed and performance on challenging benchmarks like the S3DIS 6-fold segmentation task.24

RandLA-Net has demonstrated strong performance on several key 3D point cloud semantic segmentation benchmarks. On the Semantic3D dataset, it has outperformed methods like SPG in both mIoU and OA.10 Similarly, on the SemanticKITTI dataset, RandLA-Net has shown superior mIoU compared to other point-based approaches, including PointNet++ and SPG.10 Evaluations on the S3DIS indoor scene segmentation dataset also indicate that RandLA-Net achieves higher OA, mAcc, and mIoU compared to these earlier methods.10 Moreover, studies applying RandLA-Net to urban environments have reported high F1 scores for segmenting point clouds in various cities, often leveraging transfer learning to address data scarcity.31 Subsequent improvements to the architecture, such as RandLA-Net++ and RandLA-Net3+, have further enhanced performance on urban scene datasets.7 RandLA-Net has also been successfully applied to specialized domains like foreign object detection in nuclear reactors with very high accuracy 32 and for semantic segmentation in the creation of high-definition maps for autonomous driving.28 While RandLA-Net generally performs well, it is worth noting that other efficient architectures like KPConv might achieve slightly better accuracy in some cases, although potentially with higher computational costs 18, and attention-based networks like Point Transformer can sometimes outperform RandLA-Net by better capturing global context.27

Kernel Point Convolution (KPConv) represents another efficient architecture for point cloud segmentation, operating directly on point clouds using a set of kernel points to define convolution weights in Euclidean space.10 This approach offers flexibility as the number and locations of these kernel points can be learned, even allowing for deformable convolutions that adapt to local geometry.34 KPConv is well-suited for handling the irregular nature of point clouds and has shown competitive results on various datasets.22 Its architecture allows for building deep networks 34, and in some comparisons, it has achieved top performance in terms of OA and mIoU, although potentially with longer training times than RandLA-Net.18 Recent advancements include lighter versions like KPConvD and attention-enhanced versions like KPConvX, aiming for improved performance and efficiency.35 An extension called IPCONV explores different kernel point generation strategies for enhanced feature learning.39

Point Transformer leverages the power of self-attention networks, inspired by their success in NLP and image analysis, for 3D point cloud processing.10 By applying self-attention layers, Point Transformer can capture long-range dependencies and model relationships between points across the entire point cloud.27 The self-attention mechanism is inherently invariant to the order and number of input points, making it suitable for point cloud data.43 Point Transformer has demonstrated strong performance in tasks like semantic scene segmentation and object part segmentation, achieving state-of-the-art results on datasets like S3DIS.40 Recent versions like Point Transformer V3 (PTv3) prioritize efficiency and simplicity, achieving significant improvements in speed and memory usage while expanding the receptive field.41

When comparing these efficient architectures, RandLA-Net stands out for its speed in processing large-scale point clouds, often significantly faster than SPG and PointNet++.10 While KPConv can achieve competitive accuracy, it might require longer training times compared to RandLA-Net.18 Point Transformer, especially its more recent efficient iterations, offers strong performance by capturing global context but might have varying computational costs depending on the version.27 A common challenge for earlier methods like PointNet and PointNet++ was their difficulty in directly processing massive point clouds due to computational or memory limitations.10 RandLA-Net's reliance on random sampling and its lightweight design address these issues effectively.10 In contrast, SPG involves computationally intensive pre-processing, and voxelization used by some methods can also be demanding.10 RandLA-Net's end-to-end trainability without complex pre- or post-processing is a significant advantage.10 However, in very complex scenarios, RandLA-Net might exhibit some limitations in classification accuracy compared to methods that better capture fine-grained details or global context.27

Recent research in efficient 3D point cloud segmentation is increasingly exploring label-efficient learning to mitigate the high cost of data annotation.9 Graph Neural Networks (GNNs) are also gaining traction due to their ability to handle unstructured data and leverage geometric relationships within point clouds.8 Serialization-based methods, such as certain Point Transformer variants, are emerging as efficient ways to process point clouds by converting them into ordered sequences.41 Techniques like sparse voxelization and superpoint graphs continue to be utilized in some efficient architectures.10 Multi-modal fusion, which combines information from different sensors, is being investigated to enhance segmentation accuracy.2 Adapting efficient architectures like RandLA-Net for specific applications, such as urban scene understanding and industrial inspection, is also an active area of research.28 The development of lightweight architectures with reduced training costs remains a priority 51, as does the modernization of existing methods like KPConv with attention mechanisms.35

In conclusion, the field of efficient 3D point cloud segmentation has made remarkable progress, with various architectures offering different strengths and trade-offs. Foundational works like PointNet and PointNet++ paved the way for direct point cloud processing and hierarchical feature learning. RandLA-Net stands out for its exceptional efficiency, particularly in terms of processing speed and scalability for large-scale datasets, achieved through its strategic use of random sampling and effective local feature aggregation. While RandLA-Net offers a compelling balance of speed and performance, alternative architectures like KPConv and Point Transformer provide different advantages, such as flexibility in convolution operations and the ability to capture global context, respectively. The ongoing research landscape is characterized by a focus on addressing key challenges like data annotation costs and the need for further improvements in accuracy and efficiency through techniques like label-efficient learning, GNNs, serialization, multi-modal fusion, and architectural innovations. The choice of the most suitable technique ultimately depends on the specific requirements and constraints of the target application, highlighting the continued importance of exploring diverse approaches to achieve efficient and accurate 3D point cloud segmentation.

Model Name Total Time (seconds) Network Parameters (millions) Max Inference Points (millions)
PointNet (Vanilla) 192 0.8 0.49
PointNet++ (SSG) 9831 0.97 0.98
PointCNN 8142 11 0.05
SPG 43584 0.25 -
KPConv 717 14.9 0.54
RandLA-Net 185 1.24 1.03

Table 1: Comparison of Processing Time and Memory Consumption on SemanticKITTI (Sequence 08)

Model Name Overall Accuracy (OA) Mean Intersection over Union (mIoU)
SPG 94.0% 73.2%
RandLA-Net 94.8% 77.4%

Table 2: Performance Comparison on Semantic3D

Model Name Mean Intersection over Union (mIoU)
PointNet++ 20.1%
SPG 17.4%
RandLA-Net 53.9%

Table 3: Performance Comparison on SemanticKITTI

Model Name Overall Accuracy (OA) Mean Accuracy (mAcc) Mean Intersection over Union (mIoU)
PointNet++ 81.0% 74.1% 54.5%
SPG 85.5% 77.5% 62.1%
RandLA-Net 88.0% 82.0% 70.0%

Table 4: Performance Comparison on S3DIS 10

Works cited Foundational Models for 3D Point Clouds: A Survey and Outlook - arXiv, accessed on April 13, 2025, https://arxiv.org/html/2501.18594v1 Point Cloud Based Scene Segmentation: A Survey - arXiv, accessed on April 13, 2025, https://arxiv.org/html/2503.12595v1 [2503.12595] Point Cloud Based Scene Segmentation: A Survey - arXiv, accessed on April 13, 2025, https://arxiv.org/abs/2503.12595 Deep-Learning-Based Point Cloud Semantic Segmentation: A Survey, accessed on April 13, 2025, https://www.mdpi.com/2079-9292/12/17/3642 Deep Learning Based 3D Segmentation: A Survey - arXiv, accessed on April 13, 2025, https://arxiv.org/html/2103.05423v5 A Survey on Deep Learning Based Segmentation, Detection and Classification for 3D Point Clouds - PMC, accessed on April 13, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC10137403/ Multi-Feature Aggregation for Semantic Segmentation of an Urban Scene Point Cloud, accessed on April 13, 2025, https://www.mdpi.com/2072-4292/14/20/5134 Graph Neural Networks in Point Clouds: A Survey - MDPI, accessed on April 13, 2025, https://www.mdpi.com/2072-4292/16/14/2518 A Survey of Label-Efficient Deep Learning for 3D Point Clouds - arXiv, accessed on April 13, 2025, https://arxiv.org/html/2305.19812v2 openaccess.thecvf.com, accessed on April 13, 2025, https://openaccess.thecvf.com/content_CVPR_2020/papers/Hu_RandLA-Net_Efficient_Semantic_Segmentation_of_Large-Scale_Point_Clouds_CVPR_2020_paper.pdf openaccess.thecvf.com, accessed on April 13, 2025, https://openaccess.thecvf.com/content_cvpr_2017/papers/Qi_PointNet_Deep_Learning_CVPR_2017_paper.pdf www.cs.ox.ac.uk, accessed on April 13, 2025, https://www.cs.ox.ac.uk/files/11502/RandLA_Net__Efficient_Semantic_Segmentation_of_Large_Scale_Point_Clouds.pdf A quick summary of 3D point cloud segmentation techniques - Mindkosh AI, accessed on April 13, 2025, https://mindkosh.com/blog/a-summary-of-3d-point-cloud-segmentation-techniques/ Deep Learning for Point Cloud Segmentation: Whats going on with PointNet? - Reddit, accessed on April 13, 2025, https://www.reddit.com/r/LiDAR/comments/hxc7y2/deep_learning_for_point_cloud_segmentation_whats/ Point cloud segmentation with PointNet - Keras, accessed on April 13, 2025, https://keras.io/examples/vision/pointnet_segmentation/ PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space - Stanford University, accessed on April 13, 2025, https://stanford.edu/~rqi/pointnet2/ proceedings.neurips.cc, accessed on April 13, 2025, https://proceedings.neurips.cc/paper/7095-pointnet-deep-hierarchical-feature-learning-on-point-sets-in-a-metric-space.pdf Evaluation Point Cloud semantic Segmentation methods - kth .diva, accessed on April 13, 2025, https://kth.diva-portal.org/smash/get/diva2:1942195/FULLTEXT01.pdf Get Started with PointNet++ - MathWorks, accessed on April 13, 2025, https://www.mathworks.com/help/lidar/ug/get-started-pointnetplus.html PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space - arXiv, accessed on April 13, 2025, https://arxiv.org/abs/1706.02413 You Only Group Once: Efficient Point-Cloud Processing with Token Representation and Relation Inference Module - UC Berkeley EECS, accessed on April 13, 2025, https://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-35.pdf ALS Point Cloud Classification using PointNet++ and KPConv with Prior Knowledge, accessed on April 13, 2025, https://isprs-archives.copernicus.org/articles/XLVI-4-W4-2021/91/2021/isprs-archives-XLVI-4-W4-2021-91-2021.pdf Leveraging PointNet and PointNet++ for Lyft Point Cloud Classification Challenge - arXiv, accessed on April 13, 2025, https://arxiv.org/html/2404.18665v1 Ground Awareness in Deep Learning for Large Outdoor Point Cloud Segmentation - arXiv, accessed on April 13, 2025, https://arxiv.org/html/2501.18246v1 randlanet - MathWorks, accessed on April 13, 2025, https://www.mathworks.com/help/lidar/ref/randlanet.html Point cloud classification using RandLA-Net | ArcGIS API for Python, accessed on April 13, 2025, https://developers.arcgis.com/python/latest/guide/point-cloud-classification-using-randlanet/ RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds (CVPR 2020 Oral) - ResearchGate, accessed on April 13, 2025, https://www.researchgate.net/publication/337560140_RandLA-Net_Efficient_Semantic_Segmentation_of_Large-Scale_Point_Clouds_CVPR_2020_Oral An Improved RandLa-Net Algorithm Incorporated with NDT for Automatic Classification and Extraction of Raw Point Cloud Data - MDPI, accessed on April 13, 2025, https://www.mdpi.com/2079-9292/11/17/2795 [1911.11236] RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds, accessed on April 13, 2025, https://arxiv.org/abs/1911.11236 (PDF) RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds, accessed on April 13, 2025, https://www.researchgate.net/publication/343455209_RandLA-Net_Efficient_Semantic_Segmentation_of_Large-Scale_Point_Clouds [2312.11880] Point Cloud Segmentation Using Transfer Learning with RandLA-Net: A Case Study on Urban Areas - arXiv, accessed on April 13, 2025, https://arxiv.org/abs/2312.11880 A point cloud semantic segmentation method for nuclear power reactors based on RandLA-Net Model - SciOpen, accessed on April 13, 2025, https://www.sciopen.com/article/10.51393/j.jamst.2023010 Data Study Group Final Report: SenSat - The Alan Turing Institute, accessed on April 13, 2025, https://www.turing.ac.uk/sites/default/files/2020-06/the_alan_turing_institute_data_study_group_final_report_-_sensat_0.pdf geometry.stanford.edu, accessed on April 13, 2025, https://geometry.stanford.edu/lgl_2024/papers/tqdmgg-KPconv-iccv19/tqdmgg-KPconv-iccv19.pdf KPConvX: Modernizing Kernel Point Convolution with Kernel Attention - CVF Open Access, accessed on April 13, 2025, http://openaccess.thecvf.com/content/CVPR2024/papers/Thomas_KPConvX_Modernizing_Kernel_Point_Convolution_with_Kernel_Attention_CVPR_2024_paper.pdf KPConvX: Modernizing Kernel Point Convolution with Kernel Attention - arXiv, accessed on April 13, 2025, https://arxiv.org/html/2405.13194v1 PointConvFormer: Revenge of the Point-based Convolution - Apple Machine Learning Research, accessed on April 13, 2025, https://machinelearning.apple.com/research/pointconvformer Multi-view KPConv For Enhanced 3D Point Cloud Semantic Segmentation Using Multi-Modal Fusion With 2D Images - mediaTUM, accessed on April 13, 2025, https://mediatum.ub.tum.de/doc/1691326/ktkst0yuqrlvdgdca7izlkqm7.Du_2022_MV-KPConv.pdf IPCONV: Convolution with Multiple Different Kernels for Point Cloud Semantic Segmentation, accessed on April 13, 2025, https://www.mdpi.com/2072-4292/15/21/5136 Point Cloud Segmentation | Papers With Code, accessed on April 13, 2025, https://paperswithcode.com/task/point-cloud-segmentation Point Transformer V3: Simpler, Faster, Stronger - arXiv, accessed on April 13, 2025, https://arxiv.org/html/2312.10035v1 Point Transformer | Papers With Code, accessed on April 13, 2025, https://paperswithcode.com/paper/point-transformer-1 openaccess.thecvf.com, accessed on April 13, 2025, https://openaccess.thecvf.com/content/ICCV2021/papers/Zhao_Point_Transformer_ICCV_2021_paper.pdf PCT: Point Cloud Transformer - Tsinghua Graphics and Geometric Computing Group, accessed on April 13, 2025, https://cg.cs.tsinghua.edu.cn/papers/PCT.pdf Tutorial for 3D Semantic Segmentation with Superpoint Transformer - 3D Geodata Academy, accessed on April 13, 2025, https://learngeodata.eu/tutorial-for-3d-semantic-segmentation-with-superpoint-transformer/ Point Transformer V2: Grouped Vector Attention and Partition-based Pooling, accessed on April 13, 2025, https://proceedings.neurips.cc/paper_files/paper/2022/hash/d78ece6613953f46501b958b7bb4582f-Abstract-Conference.html Semantic Segmentation of Point Cloud Sequences using Point Transformer v3 - Digital Commons@Kennesaw State, accessed on April 13, 2025, https://digitalcommons.kennesaw.edu/cgi/viewcontent.cgi?article=1006&context=masterstheses [2305.19812] A Survey of Label-Efficient Deep Learning for 3D Point Clouds - arXiv, accessed on April 13, 2025, https://arxiv.org/abs/2305.19812 Semantic Point Cloud Segmentation Using Fast Deep Neural Network and DCRF - PMC, accessed on April 13, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC8068939/ Advancements in Point Cloud-Based 3D Defect Detection and Classification for Industrial Systems: A Comprehensive Survey - arXiv, accessed on April 13, 2025, https://arxiv.org/html/2402.12923v1 PointeNet: A Lightweight Framework for Effective and Efficient Point Cloud Analysis - arXiv, accessed on April 13, 2025, https://arxiv.org/html/2312.12743v1