Image Coordinate Systems and Geometric Transformations
2025-08-08
Image Coordinate Systems and Geometric Transformations is a fundamental topic in computer vision with strong mathematical foundations. This is especially important for tasks like image alignment, warping, stitching, and object tracking.
1 Image Coordinate Systems
Pixel Coordinate System
In digital images, each pixel is addressed using a 2D coordinate system. There are two common conventions:
Convention | Origin | Axes |
---|---|---|
Image coordinate system (OpenCV, NumPy) | Top-left corner (0, 0) | x → right, y → down |
Mathematical coordinate system | Bottom-left or center | x → right, y → up |
Pixel Indexing in NumPy / OpenCV
- A grayscale image is accessed using:
img[y, x]
- A color image:
img[y, x, c]
wherec
is the channel index (e.g., 0=B, 1=G, 2=R in OpenCV)
Note on Origin and Axes:
- In math: positive y-axis goes up
- In images: positive y-axis goes down
This inverted y-axis is crucial when performing geometric transformations.
2 Geometric Transformations Overview
Geometric transformations map a point (x,y) in the source image to a new point (x′,y′) in the transformed image using a transformation matrix.
The general form:
[x′y′]=T⋅xy1
Where:
- T is a 2×3 or 3×3 transformation matrix
- Homogeneous coordinates are used to represent translations
3 Common 2D Transformations
1. Translation
Shifts an image along the x and y axes.
[x′y′]=[1001txty]⋅xy1
- Moves image right by tx, down by ty
- Implemented using
cv2.warpAffine()
2. Scaling
Enlarges or shrinks an image.
[x′y′]=[sx00sy00]⋅xy1
- sx, sy: scaling factors along x and y
3. Rotation
Rotates the image by angle θ (in radians or degrees) around the origin or a specific center.
[x′y′]=[cosθsinθ−sinθcosθ00]⋅xy1
To rotate around the image center, we must translate to origin, rotate, then translate back.
In OpenCV:
M = cv2.getRotationMatrix2D(center, angle, scale)
4. Affine Transformation
A linear mapping that preserves lines and parallelism but not angles or lengths.
Affine matrix:
T=[a11a21a12a22txty]
Can be determined by 3 point pairs (source and destination).
In OpenCV:
M = cv2.getAffineTransform(src_pts, dst_pts)
5. Perspective Transformation (Projective)
Allows mapping from one plane to another; useful for perspective warping.
Homography matrix:
H=h11h21h31h12h22h32h13h23h33
Transformation:
x′y′w=H⋅xy1⇒(x′/w,y′/w)
Requires 4 point pairs to solve for the 8 degrees of freedom (scale is ignored).
In OpenCV:
H = cv2.getPerspectiveTransform(src_pts, dst_pts)
warped = cv2.warpPerspective(image, H, output_size)
4 OpenCV Transformation Functions
Transformation | Function |
---|---|
Affine | cv2.getAffineTransform() + cv2.warpAffine() |
Perspective | cv2.getPerspectiveTransform() + cv2.warpPerspective() |
Rotation | cv2.getRotationMatrix2D() + cv2.warpAffine() |
Resize | cv2.resize() |
Flip | cv2.flip() |
5 Practical Examples
Example 1: Translation
import cv2
import numpy as np
img = cv2.imread("your_image.jpg")
rows, cols = img.shape[:2]
# Translation matrix: shift right by 50, down by 30
M = np.float32([[1, 0, 50], [0, 1, 30]])
translated = cv2.warpAffine(img, M, (cols, rows))
Example 2: Rotation Around Center
center = (cols // 2, rows // 2)
angle = 45
scale = 1.0
M = cv2.getRotationMatrix2D(center, angle, scale)
rotated = cv2.warpAffine(img, M, (cols, rows))
Example 3: Perspective Warp
pts1 = np.float32([[50,50], [200,50], [50,200], [200,200]])
pts2 = np.float32([[10,100], [180,50], [100,250], [280,250]])
H = cv2.getPerspectiveTransform(pts1, pts2)
warped = cv2.warpPerspective(img, H, (cols, rows))
6 Summary
Transformation | Preserves | Requires | Matrix Type |
---|---|---|---|
Translation | Parallelism | 2 values | 2×3 |
Scaling | Parallelism | 2 values | 2×3 |
Rotation | Angles | 1 angle + center | 2×3 |
Affine | Parallelism | 3 point pairs | 2×3 |
Perspective | Nothing (fully projective) | 4 point pairs | 3×3 |
7 Suggested Exercises
- Rotate an image by 90 degrees around its center using OpenCV.
- Try warping an image using both affine and perspective transformations. Compare the differences.
- Manually compute the result of a rotation + translation matrix on a point and visualize the outcome.