Segmentation
A pixel-level classification task. In this project, each pixel is assigned to background, vertebrae, spinal canal, or discs.
This page summarizes a deep learning approach that classifies each MRI pixel into background, vertebrae, spinal canal, or intervertebral discs. The key idea is the combination of careful preprocessing, a modified U-Net, and a Combined Loss.
This section keeps the main paper, the ScienceDirect page, and the SPIDER dataset paper in one place for presentation preparation.
The paper is important because it turns lumbar MRI segmentation into a cleaner four-class learning problem and reports very high performance on high-resolution T2 SPACE scans.
Number of patients in the SPIDER lumbar MRI dataset.
MRI series including T1, T2, and T2 SPACE sequences.
Background, vertebrae, spinal canal, and intervertebral discs.
Best reported performance on T2 SPACE images.
The original SPIDER data is stored as 3D MHA volumes. The paper converts it into 2D slices and merges detailed labels into four clinically useful classes.
This reduces GPU memory cost and makes the data suitable for a 2D U-Net pipeline.
Individual vertebra and disc IDs are converted into broader anatomical categories.
Slices dominated by background or missing key structures are removed to stabilize training.
T2 SPACE achieves the best result because its higher resolution makes anatomical boundaries clearer.
The key point is that the model predicts clinically useful anatomical categories, not individual anatomical IDs.
The proposed model is based on U-Net. It compresses image features in the encoder and reconstructs a pixel-level segmentation map in the decoder.
Convolution, Batch Normalization, Leaky ReLU, and Max Pooling extract increasingly abstract features.
A 512-channel layer captures complex anatomical patterns and boundary information.
Fine spatial information is passed from encoder to decoder to preserve boundaries.
Transposed convolution restores resolution and reconstructs class-specific masks.
Each pixel is assigned probabilities over the four output classes.
Combined Loss = 0.6 × Focal Loss + 0.4 × Dice Loss
Keeps a small gradient for negative inputs and reduces the risk of inactive neurons.
Stabilizes the starting weight distribution and helps gradients flow through deeper layers.
Balances hard-pixel learning with direct optimization of mask overlap.
Dice is the main metric. A value closer to 1 means stronger overlap between the predicted mask and the ground-truth annotation.
| 構造 | Dice | IoU | 意味 |
|---|---|---|---|
| Intervertebral Discs | 0.9688 | 0.9476 | Thin disc structures are segmented with high overlap. |
| Vertebrae | 0.9712 | 0.9461 | Large bony structures are segmented consistently. |
| Spinal Canal | 0.9671 | 0.9501 | The long canal-like structure remains accurate despite its shape. |
Measures overlap between prediction and ground truth. It is the easiest main metric to explain.
Intersection divided by union. It is usually stricter than Dice.
Boundary-distance metrics used to evaluate how far predicted surfaces deviate from the annotation.
Presentation note: Dice around 0.97 is very high, but the reported result should be discussed carefully because test-set details and preprocessing choices affect generalization.
Use this section to quickly explain the technical terms during an English presentation.
A pixel-level classification task. In this project, each pixel is assigned to background, vertebrae, spinal canal, or discs.
A widely used medical image segmentation architecture with an encoder, decoder, and skip connections.
A high-resolution 3D T2-weighted MRI sequence. It produces clearer anatomical boundaries.
A measure of overlap between prediction and ground truth. Higher is better, with 1 meaning perfect overlap.
Intersection over Union. It divides the overlapping region by the total combined region.
A loss function that gives more weight to hard-to-classify pixels.
A loss function that directly optimizes the overlap between predicted and true masks.
An activation function that keeps a small gradient for negative values.
A weight initialization method designed to keep training stable in deep networks.
A situation where some classes, such as background, dominate the image and can bias learning.
For the project, the first goal is reproduction. The second goal is improvement through model, loss, augmentation, and error-analysis experiments.
Implement the SPIDER preprocessing pipeline, four-class labels, Modified U-Net, and Combined Loss.
Compare Attention U-Net, U-Net++, Boundary Loss, Tversky Loss, and stronger augmentation.
Overlay predictions on MRI images and identify which structures or sequences fail most often.
Explain the clinical motivation, technical method, reproduction result, limitations, and proposed improvements.