Software Based Video Stabilization in Smartphones
Updated: Oct 19, 2020
Image Stabilization is an important technology for delivering an improved image quality in almost any contemporary digital camera. The market shift to compact mobile devices with high megapixel capturing ability has created an increased demand for advanced stabilization techniques. Two methods, electronic image stabilization (EIS) also referred as Digital Image Stablization and optical image stabilization (OIS), are the most common stabilization implementations. The block diagram below shows us the different implementations and sub-types of image stabilization:
Image Stabilization Techniques
Optical Image Stabilization
OIS is a hardware solution implemented in most of the high-end smartphones. The technology has been around since the mid-1990’s and works by having a little motor inside the camera that moves one or more of the glass components to keep them still if you shake the camera. The lens basically sits on top of a gyroscope which measures the direction and amount of movement and then compensates by turning the lens in the opposite direction. This has the effect of keeping the lens still and avoiding blurry photos. The reason OIS is only found on high-end smartphones is that it’s a lot more expensive to implement than Digital Image Stabilization.
Optical Image Stabilization (source)
Electronic image stabilization relies on different algorithms for modeling camera motion, which are then used to correct the images. It is a software solution, which means you don’t need any physical equipment to do it, it can be added to the smartphone’s camera with a software update. This technology is found on almost every smartphone now, even low-priced smartphones. EIS is a lot cheaper than OIS to implement and it’s only become popular in the last couple of years.
Digital Image Stabilization (source)
Most of the smartphones available in the market from Apple, Google and Samsung use a combination of hardware and software processing i.e. OIS and EIS for image/video stabilization. According to Google, when the Pixel's team had to decide between OIS and EIS for the first Pixel, they agreed on EIS because of its video performance. OIS primarily improves low light photography by physically compensating for hand shake within each single frame, and EIS improves shaky video by maintaining a consistent framing between multiple video frames. OIS is primarily for photo, and EIS for video. This leads to another advantage of EIS, its ability to get better over time with software updates.
Types of Digital Image Stabilization
There are two main ways Digital Image Stabilization can be achieved. The first way uses a sensor inside the camera to calculate the impact of your body movement on the camera. This data is used to detect which pixels are being used and use the pixels that are out of frame to smooth the transition from frame to frame. This method uses a gyroscope in a similar way to Optical Image Stabilization.
The second way is effectively the same, except that it completes this process in post-production. It uses the same technology of cropping the image and using the pixels out of shot to smooth the transition. The only problem is that this often ends up in smaller cropped videos, which not everyone wants. At the end, the video will be cropped, and the extra pixels will be used to make the video smoother. Whether this is automatic or in post-production, it’s a relatively effective way of making video smoother. However, it’s not as good for static pictures.
Digital Image Stabilization systems can be subdivided in two modules:
Motion Estimation: It allows detecting motion in the frame referring to previous frame.
Compensation: When unwanted motion of the frame is detected, a motion compensation is done.
The video stabilization is done in the following steps, as shown in the figure below:
Video Stabilization Process (source)
In general, DIS system consists of Global Motion Estimation (GME) and Motion Compensation (MC). GME estimates inter-frame motion, which can obtain translation, rotation and zooming. Motion in video images is caused by either the object motion or the camera movement. MC is designed to correct motion by smoothing the motion parameters to reduce dithering. First, the optical flow between two successive frames is computed, then from the computed optical flow field and an affine motion model the camera motion is estimated. A filter is used to smooth the affine parameters. Finally, the stabilized frames are obtained from the previous stabilized frames and the corresponding smoothed affine parameters.
DIS Implementation – Open source code deep dive
In our attempt to delineate the implementation details of DIS, we will be exploring software based video stabilization method implemented in Open Source Computer Vision Library (OpenCV). In our research of open source implementations of DIS, we found that OpenCV is well documented and had openly available source code, hence we choose OpenCV. OpenCV is a library of programming functions mainly aimed at real-time computer vision, developed by Intel. It has a modular structure, which means that the package includes several shared or static libraries. The video stabilization module contains a set of functions and classes for global motion estimation between images. The flow chart below shows the basic steps:
Flowchart - Video Stabilization Algorithm
The first step is estimating the interframe motion of adjacent frames, which is implemented by the function calcOpticalFlowPyrLK(). This function has the logic to calculate the Optical Flow, which is the motion of objects between consecutive frames of sequence, caused by the relative movement between the object and camera and can be expressed as:
Relative movement of a point from one frame to another (source)
Where, between consecutive frames, the image intensity (I) is expressed as a function of space (x, y) and time (t). In other words, if we take the first image I(x, y, t) and move its pixels by (dx, dy) over t time, we obtain the new image I(x+dx, y+dy, t+dt).
The method involves tracking a few feature points between two consecutive frames. These tracked feature points allow for estimating the motion between frames and compensate for it. On the question of choosing which feature points to track, OpenCV has a feature detector that detects features that are ideal for tracking. (goodFeaturesToTrack()). See example figure below:
3X3 Patch – where all 9 points have the same motion (source)
Once the feature points are detected in one frame, they are tracked in the next frame using the function calcOpticalFlowPyrLK(). The function name calcOpticalFlowPyrLK refers to LK or Lucas-Kanade, and Pyr stands for the pyramid. An image pyramid in computer vision is used to process an image at different scales or resolutions.
In other words, location of the features in the current frame were detected, and as shown above we already know the location of the features in the previous frame. These two sets of points can be used to find the rigid (Euclidean) transformation that maps the previous frame to the current frame. This implementation is observed in the function estimateRigidTransform() where TransformParam class stores the motion information (dx — motion in x, dy — motion in y, and da — change in angle), and the function getTransform() converts this motion into a transformation matrix.
To recap, the process above estimated the motion between the frames and stored them in an array. In order to find the trajectory of motion, the differential motion estimated in the previous step is cumulatively added. Once the trajectory of motion is obtained, the curve is smoothed using the function movingAverage(), which applies moving average filter in order to smooth the curve. As the name suggests, a moving average filter replaces the value of a function at this point using the average of its neighbors defined by a window.
However, the real trajectory often contains unwanted camera movement, thus Kalman Filter is used to stabilize this camera movement. Kalman Filter estimates a virtual trajectory using Gaussian motion method, then it convolutes the virtual trajectory over the real trajectory to create an optimal trajectory. Figure below shows an example of the optimal state:
Estimating Optimal Trajectory - Kalman Filter (source)
Finally, the function warpAffine() acts on the source image and creates a new image on the optimal trajectory. The output frame sequence is the stabilized video.
In a nutshell, the source code flow described above calculates the optical flow between frames using feature points obtained from feature detector. The optical flow is then used to generate frame to frame transformations. Finally, transformations are applied to stabilize the video.
In this blog, we’ve attempted to throw light on the different types of image stabilization techniques used in latest smartphones, followed by a drill down of open source code of an exemplary software implementation of DIS. Digital image stabilization is not optimal when there is heavy motion blur. Hence, today’s mobile phones include a combination of DIS and OIS components to achieve image stabilization. In the future, we can expect to see advancements in similar hybrid systems which use a combination of hardware and software processing to achieve image/video stabilization.