#import "postercise.typ": * #import themes.boxes: * #import "@preview/fletcher:0.5.8" as fletcher: diagram, edge, node #import fletcher.shapes: brace, diamond, hexagon, parallelogram, pill #set page(width: 16in, height: 22in) #set text(size: 16pt) #show: theme.with( primary-color: rgb(28, 55, 103), // Dark blue background-color: white, accent-color: rgb(243, 163, 30), // Yellow titletext-color: white, titletext-size: 2em, ) #poster-header( title: [Exploring SAM2 for Pavement Crack Segmentation #linebreak() ], subtitle: [Zero-Shot Performance and Prompt Strategy Analysis], authors: [Hanwen Yu, 2467345], affiliation: [School of Advanced Technology, Supervisor: Siyue Yu ], logo-1: image("./img/xjtlu-o.png", width: 22em), ) #poster-content(col: 3)[ #normal-box(color: none)[ == Introduction The Segment Anything Model (SAM) @raviSAM2Segment2024 has zero-shot segmentation capabilities on natural images. However, its zero-shot performance on domain-specific tasks remains underexplored. We investigate SAM2's effectiveness for *pavement crack segmentation*, a task characterized by thin, *low-contrast* structures with *complex topologies*. *Can SAM2 achieve competitive crack segmentation performance without domain-specific training?* ] #normal-box(color: none)[ == Methodology We use *Crack500 Dataset* @PDFFeaturePyramid , which consists of 500 images with pixel-wise annotations of pavement cracks. The test set is 100 images for evaluation. SAM's segmentation workflow is a bit different from traditional segmentation, as shown in. it also has *different prompt strategies*, we evaluate four prompt approaches: #show table.cell: set text(size: 14pt) #let frame(stroke) = (x, y) => ( left: if x > 0 { 0.2pt } else { stroke }, right: stroke, top: if y < 2 { stroke } else { 0.2pt }, bottom: stroke, ) #set table( fill: (rgb("EAF2F5"), none), stroke: frame(1pt + rgb("21222C")), ) #show figure.where( kind: table, ): set figure.caption(position: bottom) #figure( table( columns: 2, [Prompt Type], [Description], [Bounding Box], [Tight box around ground truth mask], [1-Point Prompt], [Single point sampled from GT skeleton (morphological center)], [3-Point Prompt], [Three uniformly distributed points along GT skeleton], [5-Point Prompt], [Five uniformly distributed points along GT skeleton], ), caption: [Types of Prompts], ) // #h(0.1pt) #set text(size: 16pt) #figure( diagram( node-fill: gradient.radial(white, blue, radius: 200%), node-stroke: blue, spacing: 25pt, ( node((0, 0), [Crack Image], shape: rect), node((0, 1), [SAM Image Encoder], shape: rect), node((0, 2), [Prompt Generation #linebreak() BBox, 1/3/5 points], shape: rect), node((1, 2), [SAM Mask Decoder], shape: rect), node((1, 1), [Predircted Mask], shape: rect), node((1, 0), [Metrics (IoU, F1)], shape: rect), ) .intersperse(edge("-|>")) .join(), ), caption: [SAM2 Segmentation workflow], ) Some supervised models are taken into comparison: UNet @ronnebergerUNetConvolutionalNetworks2015 , DeepCrack @liuDeepCrackDeepHierarchical2019 CT-CrackSeg @liuDeepCrackDeepHierarchical2019 , VM-UNet @ruanVMUNetVisionMamba2024 , CrackSegMamba @qiCrackSegMambaLightweightMamba2024 , TransUNet @chenTransUNetRethinkingUNet2024. ] #normal-box(color: none)[ == Experiments and Results *Evaluation* #show math.equation: set text(size: 14pt) #set math.equation(numbering: "(1)") $ bold("IoU") = "TP" / ("TP" + "FP" + "FN") $ $ bold("F1") = 2 * ("Precision" * "Recall") / ("Precision" + "Recall") $ SAM2 with bbox prompts (39.6% IoU) lags behind supervised models, including UNet 2015 @ronnebergerUNetConvolutionalNetworks2015 #figure( image("img/metrics.png"), caption: [Model Metrics Comparison ], ) Bounding box prompts yield the best performance among zero-shot methods. There is a 4.7x performance gap between bbox(39.6% IoU) and 1-point prompts(8.4% IoU). #figure( image("img/sam_iou.png", width: 14em), caption: [IoU of SAM2 with 4 prompt strategies], ) #figure( image("img/examples.png"), caption: [Examples of SAM2 results], ) ] #normal-box(color: none)[ == Qualitative Analysis #figure( image("img/fail1.png"), caption: [Failure Cases of SAM2 (bbox)], ) #figure( image("img/fail2.png"), caption: [Failure Cases of SAM2 (5-point) ], ) ] #normal-box(color: none)[ == Key Findings and Discussion 1-point prompts perform poorly (12.3% IoU), indicating insufficient guidance for complex crack structures. 5-point prompts approach bbox performance for *highly irregular cracks*, suggesting multiple points help capture shape. Since SAM was trained on natural images, pavement cracks violate some key assumptions: it *lacks clear object boundaries* , has *low contrast* with background, and exhibits *extreme aspect ratios (length >> width)*. ] #normal-box(color: none)[ == Conclusion and Future Work SAM2 shows *limited zero-shot capability for crack segmentation*. Bounding box prompts significantly outperform point-based prompts. Performance still lags behind supervised methods, indicating need for domain adaptation. ] #poster-footer[ // Content #normal-box(color: none)[ == References ] #columns()[ #set text(size: 12pt) #bibliography("./crack.bib", title: none, full: false) ] // #[ // // align right // #set align(right) // 2467345 | // Hanwen Yu | Email: Hanwen.Yu24\@student.xjtlu.edu.cn // ] ] ]