SAT404/assignment_2/slide.typ

#import "@preview/touying:0.6.1": *
#import themes.university: *


#show: university-theme.with(
  aspect-ratio: "16-9",
  config-info(
    title: [Detection of Foreign Objects on Railway Tracks],
    subtitle: [A Pilot Study with RandLA-Net],
    author: [Hanwen Yu],
    date: datetime.today(),
    institution: [SAT, XJTLU],
    logo: emoji.train,
  ),
)

// Title slide
#title-slide()

// Outline slide
#slide[
  = Outline

  #set text(size: 1.1em)
  - Background & Problem Statement
  - Pilot Study Design
  - Data & Metrics
  - Results & Discussion
]

// Motivation & Goals
= Background and Problem Statement
#slide[
  #text(size: 1.2em, weight: "bold")[Why This Matters]

  - Undetected objects on railway tracks cause derailments and catastrophic accidents
  - Manual inspection is time-consuming and error-prone
  - Financial impact of railway accidents is significant
][
  #image("../assets/reallife-railway.png", width: 100%)
  #text(size: 0.8em)[Fig: Real-life railway track scene]
]

#slide[

  #text(size: 1.2em, weight: "bold")[Project Objectives]

  - Develop automated detection system using LiDAR and 3D point cloud segmentation
  - Accurately identify foreign objects amidst complex railway geometry
  - Maintain computational efficiency for practical deployment
][
  #image("../assets/whurailway.png", width: 100%)
  #text(size: 0.8em)[Fig: Railway track PointCloud Example, from WHURailway3D Dataset]

]
---

#text(weight: "bold")[ Problem Statement]

Given a point cloud *$P = \{p_1, p_2, dots, p_n\}$*

where each point *$p_i in RR^3$* represents a 3D coordinate in the railway environment,

Our task is to assign each point a semantic label *$l_i in \{0, 1, dots, C-1\}$*

where *$C = 13$* represents our predefined classes.

The function $f: P -> L$ maps the input point cloud to a set of labels $L = \{l_1, l_2, dots, l_n\}$.

---

= Pilot Study Design
---

#slide()[

  #v(1em)
  #text(size: 1.2em, weight: "bold")[Pilot Study Aims]

  - Establish *baseline* performance using RandLA-Net
  - Evaluate feasibility of detecting extremely rare objects (0.001% of data)

][
  #image("../assignment_3/fig/example.jpg", width: 100%)
]

---


#slide[

  #text(size: 1.1em, weight: "bold")[Current Approaches]

  - PointNet and PointNet++: Improved local feature extraction but computationally expensive for large point clouds

  - *RandLA-Net*: Balances efficiency and accuracy through random sampling with local feature aggregation

  - Attention-based methods: Focus on global context but may overlook local details, eg PointTransformer

  We choose *RandLA-Net* for baseline.
]

= Data and Metrics

#slide[


  #text(size: 1.1em, weight: "bold")[Training Setup]

  - *858* training files, *172* test files
  - Only *18* training files and *1* test file contain foreign objects
  - *1/4* downsampling ratio
  - NVIDIA RTX 4090 GPU

][
  #text(size: 1.1em, weight: "bold")[Data Collection]

  - *1,031* PLY files with *248M+* points
  - *13* semantic classes including railway infrastructure elements
  - "Box" class (label *11*) represents foreign objects
  - Extreme class imbalance: boxes only *0.001%* of points

]

#slide(composer: (2fr, 3fr))[
  #text(size: 1.2em, weight: "bold")[Inspection on data]
  #v(1em)
  #text(size: 0.8em)[Table: Distribution of semantic classes in the railway LiDAR dataset]
][
  #set text(size: 0.7em)
  #table(
    columns: (auto, auto, auto, auto),
    inset: 8pt,
    align: (center, left, right, right),
    stroke: 0.4pt,
    [*Label*], [*Class Name*], [*Point Count*], [*Percentage*],
    [0], [Track], [16,653,029], [6.71%],
    [1], [Track Surface], [39,975,480], [16.11%],
    [2], [Ditch], [7,937,154], [3.20%],
    [3], [Masts], [4,596,199], [1.85%],
    [4], [Cable], [2,562,683], [1.03%],
    [5], [Tunnel], [31,412,582], [12.66%],
    [6], [Ground], [73,861,934], [29.76%],
    [7], [Fence], [7,834,499], [3.16%],
    [8], [Mountain], [51,685,366], [20.82%],
    [9], [Train], [9,047,963], [3.65%],
    [10], [Human], [275,077], [0.11%],
    [11], [*Box (object)*], [*3,080*], [*0.001%*],
    [12], [Others], [2,360,810], [0.95%],
  )


]


#slide[
  #text(size: 1.2em, weight: "bold")[Evaluation Metrics]


  For each class $c$, the IoU is calculated as:
  $
    text("IoU")_c = frac("TP"_c, "TP"_c + "FP"_c + "FN"_c)
  $

  where $text("TP")_c$, $ "FP"_c$, and $"FN"_c$ represent true positives, false positives, and false negatives for class $c$, respectively. The mIoU is then calculated by averaging the IoU values across all classes:
  $
    "mIoU" = 1 / C sum_(c=1)^(C) "IoU"_c
  $
]

#slide[
  #text(weight: "bold")[Precision]
  $
    "Precision"_"box" = frac("TP"_"box", "TP"_"box" + "FP"_"box")
  $

  where $"TP"_"box"$ and $"FP"_"box"$ represent true positives and false positives for the "Box" class, respectively.
]

= Results and Discussion

#slide()[
  #text(weight: "bold")[Results]

  - The overall mean IoU across all classes was *70.29\%*,
  - the IoU for our target class— *"Box"* (foreign object)—was *0.00\%* (Will be discussed later).
  - IoU of other classes was relatively high, with *"Train"* achieving *95.22\%* and *"Ground"* achieving *89.68\%*.
][

  #set text(size: 0.8em)
  #table(
    columns: (auto, auto, auto),
    inset: 8pt,
    align: (center, left, right),
    stroke: 0.4pt,
    [*Label*], [*Class Name*], [*IoU (\%)*],
    [0], [Track], [60.12],
    [1], [Track Surface], [74.53],
    [2], [Ditch], [74.21],
    [3], [Masts], [82.48],
    [4], [Cable], [73.62],
    [5], [Tunnel], [83.03],
    [6], [Ground], [89.68],
    [7], [Fence], [79.81],
    [8], [Mountain], [91.93],
    [9], [Train], [95.22],
    [10], [Human], [61.86],
    [11], [Box (foreign object)], [0.00],
  )
]

#slide[

  #text(weight: "bold")[Visualization]

  - The model assumes there are always half of points being mountain
  - In some cases, where no mountain is present, the model still predicts train or other objects as mountain,

][

  #image("../assets/pred.png", width: 90%)
  #image("../assets/truth.png", width: 90%)

]

#slide()[

  #text(weight: "bold")[Why Model Performs Bad on Boxes]

  Let's look at Cross Entropy Loss:

  $
    ell(x, y) = 1 / N sum_(n=1)^N - w_(y_n) log frac(exp(x_{n,y_n}) , sum_(c=1)^C exp(x_{n,c}))
  $

  where $w_{y_n}$ is the weight for class $y_n$ and $N$ is the number of points in the batch.

  In this case we just *blindly set weight for all class as 1*, which is not suitable. We should add weight on classes like boxes and human.

]


#slide()[

  #text(weight: "bold")[Why Model Performs Bad on Boxes]

  - In our dataset, *"Ground"* and *"Mountain"* are the majority classes, with *29.76% and 20.82%* of points, respectively, summing up to *50.58%* of the dataset.
  - the "Box" class is extremely rare, with only 0.001% of points labeled as "Box".
  - If the model blindly predicts the majority class (e.g., "Ground") for all points, it has 50% chance to be right. The loss will be low.


]

#slide(align: auto)[

  #text(weight: "bold")[Why Model Performs Bad on Boxes]

  #v(5em)

  The model is *biased towards the majority classes*, leading to poor performance on the minority class (foreign objects).

]

#slide[

  #text(weight: "bold")[Future Work]

  - Add weights in loss functions to *address class imbalance*
  - Explore *data augmentation* techniques to increase the representation of the "Box" class
  - Consider *ensemble methods* or multi-task learning to improve detection performance
]


= The end

#text(size: 0.6em)[

  Work by Hanwen Yu,
  supervised by Dr. Siyue Yu
]