SAT404/assignment_2/slide.typ
2025-04-21 10:44:47 +08:00

284 lines
7.4 KiB
Typst

#import "@preview/touying:0.6.1": *
#import themes.university: *
#show: university-theme.with(
aspect-ratio: "16-9",
config-info(
title: [Detection of Foreign Objects on Railway Tracks],
subtitle: [A Pilot Study with RandLA-Net],
author: [Hanwen Yu],
date: datetime.today(),
institution: [SAT, XJTLU],
logo: emoji.train,
),
)
// Title slide
#title-slide()
// Outline slide
#slide[
= Outline
#set text(size: 1.1em)
- Background & Problem Statement
- Pilot Study Design
- Data & Metrics
- Results & Discussion
]
// Motivation & Goals
= Background and Problem Statement
#slide[
#text(size: 1.2em, weight: "bold")[Why This Matters]
- Undetected objects on railway tracks cause derailments and catastrophic accidents
- Manual inspection is time-consuming and error-prone
- Financial impact of railway accidents is significant
][
#image("../assets/reallife-railway.png", width: 100%)
#text(size: 0.8em)[Fig: Real-life railway track scene]
]
#slide[
#text(size: 1.2em, weight: "bold")[Project Objectives]
- Develop automated detection system using LiDAR and 3D point cloud segmentation
- Accurately identify foreign objects amidst complex railway geometry
- Maintain computational efficiency for practical deployment
][
#image("../assets/whurailway.png", width: 100%)
#text(size: 0.8em)[Fig: Railway track PointCloud Example, from WHURailway3D Dataset]
]
---
#text(weight: "bold")[ Problem Statement]
Given a point cloud *$P = \{p_1, p_2, dots, p_n\}$*
where each point *$p_i in RR^3$* represents a 3D coordinate in the railway environment,
Our task is to assign each point a semantic label *$l_i in \{0, 1, dots, C-1\}$*
where *$C = 13$* represents our predefined classes.
The function $f: P -> L$ maps the input point cloud to a set of labels $L = \{l_1, l_2, dots, l_n\}$.
---
= Pilot Study Design
---
#slide()[
#v(1em)
#text(size: 1.2em, weight: "bold")[Pilot Study Aims]
- Establish *baseline* performance using RandLA-Net
- Evaluate feasibility of detecting extremely rare objects (0.001% of data)
][
#image("../assignment_3/fig/example.jpg", width: 100%)
]
---
#slide[
#text(size: 1.1em, weight: "bold")[Current Approaches]
- PointNet and PointNet++: Improved local feature extraction but computationally expensive for large point clouds
- *RandLA-Net*: Balances efficiency and accuracy through random sampling with local feature aggregation
- Attention-based methods: Focus on global context but may overlook local details, eg PointTransformer
We choose *RandLA-Net* for baseline.
]
= Data and Metrics
#slide[
#text(size: 1.1em, weight: "bold")[Training Setup]
- *858* training files, *172* test files
- Only *18* training files and *1* test file contain foreign objects
- *1/4* downsampling ratio
- NVIDIA RTX 4090 GPU
][
#text(size: 1.1em, weight: "bold")[Data Collection]
- *1,031* PLY files with *248M+* points
- *13* semantic classes including railway infrastructure elements
- "Box" class (label *11*) represents foreign objects
- Extreme class imbalance: boxes only *0.001%* of points
]
#slide(composer: (2fr, 3fr))[
#text(size: 1.2em, weight: "bold")[Inspection on data]
#v(1em)
#text(size: 0.8em)[Table: Distribution of semantic classes in the railway LiDAR dataset]
][
#set text(size: 0.7em)
#table(
columns: (auto, auto, auto, auto),
inset: 8pt,
align: (center, left, right, right),
stroke: 0.4pt,
[*Label*], [*Class Name*], [*Point Count*], [*Percentage*],
[0], [Track], [16,653,029], [6.71%],
[1], [Track Surface], [39,975,480], [16.11%],
[2], [Ditch], [7,937,154], [3.20%],
[3], [Masts], [4,596,199], [1.85%],
[4], [Cable], [2,562,683], [1.03%],
[5], [Tunnel], [31,412,582], [12.66%],
[6], [Ground], [73,861,934], [29.76%],
[7], [Fence], [7,834,499], [3.16%],
[8], [Mountain], [51,685,366], [20.82%],
[9], [Train], [9,047,963], [3.65%],
[10], [Human], [275,077], [0.11%],
[11], [*Box (object)*], [*3,080*], [*0.001%*],
[12], [Others], [2,360,810], [0.95%],
)
]
#slide[
#text(size: 1.2em, weight: "bold")[Evaluation Metrics]
For each class $c$, the IoU is calculated as:
$
text("IoU")_c = frac("TP"_c, "TP"_c + "FP"_c + "FN"_c)
$
where $text("TP")_c$, $ "FP"_c$, and $"FN"_c$ represent true positives, false positives, and false negatives for class $c$, respectively. The mIoU is then calculated by averaging the IoU values across all classes:
$
"mIoU" = 1 / C sum_(c=1)^(C) "IoU"_c
$
]
#slide[
#text(weight: "bold")[Precision]
$
"Precision"_"box" = frac("TP"_"box", "TP"_"box" + "FP"_"box")
$
where $"TP"_"box"$ and $"FP"_"box"$ represent true positives and false positives for the "Box" class, respectively.
]
= Results and Discussion
#slide()[
#text(weight: "bold")[Results]
- The overall mean IoU across all classes was *70.29\%*,
- the IoU for our target class— *"Box"* (foreign object)—was *0.00\%* (Will be discussed later).
- IoU of other classes was relatively high, with *"Train"* achieving *95.22\%* and *"Ground"* achieving *89.68\%*.
][
#set text(size: 0.8em)
#table(
columns: (auto, auto, auto),
inset: 8pt,
align: (center, left, right),
stroke: 0.4pt,
[*Label*], [*Class Name*], [*IoU (\%)*],
[0], [Track], [60.12],
[1], [Track Surface], [74.53],
[2], [Ditch], [74.21],
[3], [Masts], [82.48],
[4], [Cable], [73.62],
[5], [Tunnel], [83.03],
[6], [Ground], [89.68],
[7], [Fence], [79.81],
[8], [Mountain], [91.93],
[9], [Train], [95.22],
[10], [Human], [61.86],
[11], [Box (foreign object)], [0.00],
)
]
#slide[
#text(weight: "bold")[Visualization]
- The model assumes there are always half of points being mountain
- In some cases, where no mountain is present, the model still predicts train or other objects as mountain,
][
#image("../assets/pred.png", width: 90%)
#image("../assets/truth.png", width: 90%)
]
#slide()[
#text(weight: "bold")[Why Model Performs Bad on Boxes]
Let's look at Cross Entropy Loss:
$
ell(x, y) = 1 / N sum_(n=1)^N - w_(y_n) log frac(exp(x_{n,y_n}) , sum_(c=1)^C exp(x_{n,c}))
$
where $w_{y_n}$ is the weight for class $y_n$ and $N$ is the number of points in the batch.
In this case we just *blindly set weight for all class as 1*, which is not suitable. We should add weight on classes like boxes and human.
]
#slide()[
#text(weight: "bold")[Why Model Performs Bad on Boxes]
- In our dataset, *"Ground"* and *"Mountain"* are the majority classes, with *29.76% and 20.82%* of points, respectively, summing up to *50.58%* of the dataset.
- the "Box" class is extremely rare, with only 0.001% of points labeled as "Box".
- If the model blindly predicts the majority class (e.g., "Ground") for all points, it has 50% chance to be right. The loss will be low.
]
#slide(align: auto)[
#text(weight: "bold")[Why Model Performs Bad on Boxes]
#v(5em)
The model is *biased towards the majority classes*, leading to poor performance on the minority class (foreign objects).
]
#slide[
#text(weight: "bold")[Future Work]
- Add weights in loss functions to *address class imbalance*
- Explore *data augmentation* techniques to increase the representation of the "Box" class
- Consider *ensemble methods* or multi-task learning to improve detection performance
]
= The end
#text(size: 0.6em)[
Work by Hanwen Yu,
supervised by Dr. Siyue Yu
]