Feature Extraction (Local Discriptors)
2025-08-12
Why Local Features?
In computer vision, local features are designed to describe distinctive, repeatable patterns in small regions of an image. These features enable:
- Matching between different images (e.g., for image stitching, object recognition)
- Robustness to geometric and photometric transformations (rotation, scale, lighting)
- Efficiency in handling large scenes with partial views or occlusions
Key Concepts Overview
1. Keypoint Detection
Keypoints are specific, stable locations in the image that are distinctive and repeatable — e.g., corners, blobs, or edges.
- A good keypoint detector should be invariant to image transformations (e.g., scale, rotation).
- Common algorithms: DoG (SIFT), Harris corner, FAST (ORB), etc.
2. Descriptor Generation
A descriptor is a fixed-length vector (e.g., 128-dim for SIFT, 32-bytes for ORB) that encodes the appearance of the image region around a keypoint.
- It must be robust and distinctive to enable reliable matching.
- Descriptors can be floating-point (e.g., SIFT, SURF) or binary (e.g., ORB, BRIEF).
3. Invariance Properties
A good local feature has invariance to:
- Scale (size of object in image)
- Rotation
- Affine transformations
- Lighting changes
Local Feature Methods
SIFT — Scale-Invariant Feature Transform
Developed by David Lowe (2004), SIFT is one of the most influential feature extraction algorithms.
Key Properties:
- Scale and rotation invariant
- Robust to noise and small affine distortions
- Widely used in academic research and industry
Steps:
1. Scale-space extrema detection
- Build a scale-space using Gaussian blur with increasing σ:
L(x,y,σ)=G(x,y,σ)∗I(x,y)
- Compute Difference of Gaussians (DoG) to approximate Laplacian:
D(x,y,σ)=L(x,y,kσ)−L(x,y,σ)
- Detect local extrema in 3D (x, y, scale)
2. Keypoint localization
- Fit a 3D quadratic function to refine keypoint location and reject low-contrast or edge-like points using the Hessian matrix.
3. Orientation assignment
- Compute image gradients ∇I(x,y) in a region around the keypoint.
- Assign one or more dominant orientations using a histogram (36 bins over 360°).
4. Descriptor computation
- Take a 16×16 patch around the keypoint.
- Divide it into 4×4 subregions.
- Compute gradient orientation histograms in each subregion (8 bins).
- Result: 4×4×8=128-dimensional descriptor.
Summary:
- Detector: DoG in scale space
- Descriptor: 128-dim floating point
- Invariant to: Scale, rotation, illumination, partial affine
SURF — Speeded Up Robust Features
Developed by Bay et al. (2006), SURF is a faster approximation of SIFT.
Key Properties:
- Faster than SIFT (uses integral images)
- Based on Haar wavelet responses
- ⚠️ Patented — not free for commercial use
Method Overview:
- Scale-space: Uses box filters with different sizes (approximated with integral images)
- Keypoint detection: Uses Hessian matrix determinant for blob detection:
Hessian(x,σ)=[Lxx(x,σ)Lxy(x,σ)Lxy(x,σ)Lyy(x,σ)]
- Orientation: Calculated using Haar wavelet responses
- Descriptor: 64- or 128-dimensional vector from sums of Haar responses
ORB — Oriented FAST and Rotated BRIEF
Developed by OpenCV (2011) as an efficient, open-source alternative to SIFT and SURF.
Key Properties:
- FAST + BRIEF with rotation invariance
- Binary descriptor (much smaller and faster)
- Free to use, suitable for real-time systems
Method Overview:
1. Keypoint Detection (FAST):
- Detect corners using FAST algorithm (rapid intensity comparison).
- Apply Harris corner score to retain best keypoints.
2. Orientation Assignment:
- Compute intensity centroid (xc,yc) of a patch around the keypoint.
- Estimate orientation θ from the centroid offset:
θ=arctan(m10m01),mpq=∑xpyqI(x,y)
3. Descriptor (Rotated BRIEF):
- Use binary tests (intensity comparisons) on pairs of pixels.
- Rotate test pattern according to keypoint orientation.
Summary:
- Detector: FAST with Harris ranking
- Descriptor: 256-bit binary string
- Invariant to: Rotation (not scale by default)
- Very fast, ideal for mobile or embedded devices
HOG — Histogram of Oriented Gradients
Originally proposed for pedestrian detection by Dalal and Triggs (2005).
Key Properties:
- Describes object shape and structure
- Based on local gradient orientation histograms
- Often used in object detection with SVM
Method Steps:
1. Compute gradients
- Horizontal and vertical gradients using simple kernels.
2. Divide image into cells
- Each cell: e.g., 8×8 pixels.
3. Orientation histogram
- For each cell, create a histogram of gradient directions (e.g., 9 bins from 0° to 180°).
4. Block normalization
- Group adjacent cells into blocks (e.g., 2×2 cells).
- Normalize histograms to reduce illumination sensitivity.
5. Concatenate all normalized histograms
- Result: High-dimensional feature vector representing the entire region or image.
Summary:
- Descriptor only (no keypoint detection)
- Dimensionality: Often 3k–10k dimensions
- Invariant to: Illumination, small local distortions
Summary Comparison Table
Method | Type | Invariance | Descriptor | License |
---|---|---|---|---|
SIFT | Float | Scale, Rotation | 128-dim | Originally patented, now public |
SURF | Float | Scale, Rotation | 64–128 dim | Patented |
ORB | Binary | Rotation (not scale) | 256-bit | Free |
HOG | Float | Lighting | Varies | Free |
Practical Tips
- Use ORB for lightweight, real-time systems.
- Use SIFT when robustness and accuracy are more important than speed.
- Use HOG for shape-based tasks (e.g., pedestrian detection).
- Visualize keypoints using OpenCV:
# Example: ORB keypoint visualization
orb = cv2.ORB_create()
keypoints = orb.detect(image, None)
image_with_kp = cv2.drawKeypoints(image, keypoints, None, flags=0)
cv2.imshow("Keypoints", image_with_kp)