Summary

Super-resolution (SR) is the process through which low-resolution (LR) data is processed to estimate it's high-resolution (HR) equivalent. This work proposes a new architecture of deep neural networks to achieve real-time SR on mobile GPU, and introduces a new training paradigm that produces photo-realistic super-resolved images.

Problem and objectives

Deep neural networks have shown to be powerful for SR. By looking at millions of LR and HR image pairs and optimising a predefined learning cost, a network is capable of progressively estimating a function that reverts the downscaling process. However, the design of traditional approaches has been inefficient, and the use of classical, pixel-wise objective functions is known to produce blurry and visually unsatifactory images. The goal of this project was to improve the efficiency and visual quality of SR with deep networks by proposing architectural changes and a new objective function that forces the generation of photo-realistic images.

Improving efficiency

Traditional SR has often decomposed the problem into a bicubic upsampling and a de-blurring stage, mapping the low-pass filtering and decimation stages incurred by image downscaling. The complexity of neural networks is however proportional to input image size, meaning that preprocessing images with bicubic upsampling burdens the computational complexity of the network. Intead, we propose to directly process the input image in LR space, and learn a single function that will jointly upsample and de-blur the image.

Improving visual quality

Pixel-wise error metrics such as mean-squared error are often the norm to objectively determine whether a super-reolved image is an accurate estimation of its ground truth or not. Neural networks are therefore often trained to optimise this metric when approximating the SR function. Unfortunately, these metrics are known to correlate poorly with human perception, hence optimal results based on them are not necessarily the most pleasing visually. We modified the training mechanism making use of generative adversarial networks to relax pizel-wise traditional metrics in favour of producing results that are inditinguishable from photo-realistic images.

Results

The architectural changes we introduced for SR neural networks was proven to produce state-of-the-art SR results while reducing runtime by x30. This traslated into real-time operation on mobile GPUs, enabling real-time video SR. Additionally, the new training method proposed for SR networks resulted in images that, even though by traditional objective metrics are worse than competing methods, they are visually much more pleasing.


Selected publications

Real-time video super-resolution with spatio-temporal networks and motion compensation
J. Caballero, C. Ledig, A. Aitken, A. Acosta, J. Totz, Z. Wang, W. Shi
Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, 2017

Photo-realistic single image super-resolution using a generative adversarial network
C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, W. Shi
Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, 2017

Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network
W. Shi, J. Caballero, F. Huszar, J. Totz, A. Aitken, R. Bishop, D. Rueckert, Z. Wang
Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016