SAT405/poster.typ

#import "postercise.typ": *
#import themes.boxes: *
#import "@preview/fletcher:0.5.8" as fletcher: diagram, edge, node

#import fletcher.shapes: brace, diamond, hexagon, parallelogram, pill
#set page(width: 16in, height: 22in)
#set text(size: 16pt)

#show: theme.with(
  primary-color: rgb(28, 55, 103), // Dark blue
  background-color: white,
  accent-color: rgb(243, 163, 30), // Yellow
  titletext-color: white,
  titletext-size: 2em,
)

#poster-header(
  title: [Exploring SAM2 for Pavement Crack Segmentation #linebreak() ],
  subtitle: [Zero-Shot Performance and Prompt Strategy Analysis],
  authors: [Hanwen Yu, 2467345],
  affiliation: [School of Advanced Technology, Supervisor: Siyue Yu
  ],
  logo-1: image("./img/xjtlu-o.png", width: 22em),
)


#poster-content(col: 3)[


  #normal-box(color: none)[
    == Introduction
    The Segment Anything Model (SAM) @raviSAM2Segment2024
    has zero-shot segmentation capabilities on natural images. However, its zero-shot performance on domain-specific tasks remains underexplored.


    We investigate SAM2's effectiveness for *pavement crack segmentation*, a task characterized by thin, *low-contrast* structures with *complex topologies*.


    *Can SAM2 achieve competitive crack segmentation
    performance without domain-specific training?*


  ]
  #normal-box(color: none)[
    == Methodology

    We use *Crack500 Dataset* @PDFFeaturePyramid , which consists of 500 images with pixel-wise annotations of pavement cracks. The test set is 100 images for evaluation.


    SAM's segmentation workflow is a bit different from traditional segmentation, as shown in. it also has *different prompt strategies*, we evaluate four prompt approaches:


    #show table.cell: set text(size: 14pt)


    #let frame(stroke) = (x, y) => (
      left: if x > 0 { 0.2pt } else { stroke },
      right: stroke,
      top: if y < 2 { stroke } else { 0.2pt },
      bottom: stroke,
    )

    #set table(
      fill: (rgb("EAF2F5"), none),
      stroke: frame(1pt + rgb("21222C")),
    )

    #show figure.where(
      kind: table,
    ): set figure.caption(position: bottom)
    #figure(
      table(
        columns: 2,

        [Prompt Type], [Description],
        [Bounding Box], [Tight box around ground truth mask],
        [1-Point Prompt], [Single point sampled from GT skeleton (morphological center)],
        [3-Point Prompt], [Three uniformly distributed points along GT skeleton],
        [5-Point Prompt], [Five uniformly distributed points along GT skeleton],
      ),
      caption: [Types of Prompts],
    ) <types-of-prompts>

    // #h(0.1pt)

    #set text(size: 16pt)

    #figure(
      diagram(
        node-fill: gradient.radial(white, blue, radius: 200%),
        node-stroke: blue,
        spacing: 25pt,
        (
          node((0, 0), [Crack Image], shape: rect),
          node((0, 1), [SAM Image Encoder], shape: rect),
          node((0, 2), [Prompt Generation #linebreak() BBox, 1/3/5 points], shape: rect),
          node((1, 2), [SAM Mask Decoder], shape: rect),
          node((1, 1), [Predircted Mask], shape: rect),
          node((1, 0), [Metrics (IoU, F1)], shape: rect),
        )
          .intersperse(edge("-|>"))
          .join(),
      ),
      caption: [SAM2 Segmentation workflow],
    )

    Some supervised models are taken into comparison: UNet @ronnebergerUNetConvolutionalNetworks2015 , DeepCrack @liuDeepCrackDeepHierarchical2019
    CT-CrackSeg @liuDeepCrackDeepHierarchical2019 , VM-UNet @ruanVMUNetVisionMamba2024 , CrackSegMamba @qiCrackSegMambaLightweightMamba2024 , TransUNet @chenTransUNetRethinkingUNet2024.

  ]
  #normal-box(color: none)[
    == Experiments and Results


    *Evaluation*
    #show math.equation: set text(size: 14pt)
    #set math.equation(numbering: "(1)")
    $ bold("IoU") = "TP" / ("TP" + "FP" + "FN") $ <iou_e>

    $ bold("F1") = 2 * ("Precision" * "Recall") / ("Precision" + "Recall") $ <f1>

    SAM2 with bbox prompts (39.6% IoU) lags behind supervised models, including UNet 2015 @ronnebergerUNetConvolutionalNetworks2015

    #figure(
      image("img/metrics.png"),
      caption: [Model Metrics Comparison ],
    )


    Bounding box prompts yield the best performance among zero-shot methods. There is a 4.7x performance gap between bbox(39.6% IoU) and 1-point prompts(8.4% IoU).

    #figure(
      image("img/sam_iou.png", width: 14em),

      caption: [IoU of SAM2 with 4 prompt strategies],
    )


    #figure(
      image("img/examples.png"),
      caption: [Examples of SAM2 results],
    )

  ]

  #normal-box(color: none)[

    == Qualitative Analysis
    #figure(
      image("img/fail1.png"),
      caption: [Failure Cases of SAM2 (bbox)],
    )
    #figure(
      image("img/fail2.png"),
      caption: [Failure Cases of SAM2 (5-point) ],
    )

  ]

  #normal-box(color: none)[
    == Key Findings and Discussion

    1-point prompts perform poorly (12.3% IoU), indicating insufficient guidance for complex crack structures.    5-point prompts approach bbox performance for *highly irregular cracks*, suggesting multiple points help capture shape.


    Since SAM was trained on natural images, pavement cracks violate some key assumptions: it *lacks clear object boundaries* , has *low contrast* with background, and exhibits *extreme aspect ratios (length >> width)*.

  ]

  #normal-box(color: none)[

    == Conclusion and Future Work

    SAM2 shows *limited zero-shot capability for crack segmentation*. Bounding box prompts significantly outperform point-based prompts. Performance still lags behind supervised methods, indicating need for domain adaptation.
  ]


  #poster-footer[
    // Content
    #normal-box(color: none)[
      == References


    ]

    #columns()[
      #set text(size: 12pt)
      #bibliography("./crack.bib", title: none, full: false)

    ]
    // #[
    //   // align right
    //   #set align(right)
    //   2467345 |
    //   Hanwen Yu | Email: Hanwen.Yu24\@student.xjtlu.edu.cn
    // ]
  ]
]