A Validation Study of a Perceptually-Based Metric of Smartphone Image Quality

For years, even as smartphones continued to gain more and more prominence in daily life, there was no standard that rated the quality of images taken by mobile phone cameras. The absence of standards left manufacturers frustrated by their lack of understanding about the impact that design changes had on perceived quality, and consumers were confused about the level of quality that could be expected from the devices they purchased. However, after years of research, the IEEE P1858 CPIQ (Camera Phone Image Quality) Standard was recently published [1]. More than 30 companies took part in the development of this standard, which can be used to consistently evaluate image quality and make comparisons between phone models. The standard was intended to be something that manufacturers could use in the development of particular products and that the industry could use to level the playing field in the evaluation of these products.

It is well known that the number of megapixels alone is insufficient for adequately characterizing perceived image quality. To address this, seven metrics, all based on objective measurements, were used to create the new CPIQ standard. The metrics include spatial frequency response (SFR), lateral chromatic displacement (LCD), chroma level (CL), color uniformity (CU), local geometric distortion (LGD), visual noise (VN), and texture blur (TB) [2], [3]. These individual metrics are described in the CPIQ document [1].

The individual metrics were quantified by quality loss (QL) in just noticeable differences (JND). The QL values for each metric were then combined using the Minkowski metric to generate the overall predicted QL, as

QL = (∑i(QLi)n max)(1/n max)

where n max = 1 + 2·tanh(QLmax/16.9) and QLmax is the maximum QL for a given test condition for a given camera [3].

In the crafting of this new standard, the following normative references were used to provide information on color management, spatial resolution measurement, methods for measuring camera optoelectronic conversion functions (OECFs), and more:

IEC 61966-2-1, Multimedia systems and equipment–Color measurement and management–Part 2-1: Color management–Default RGB color space–sRGB
ISO 7589:2002 Photography–Illuminants for sensitometry–Specifications for daylight, incandescent tungsten, and printer
ISO 12233:2014 Photography–Electronic still-picture imaging–Resolution and spatial frequency responses
ISO 14524:2009 Photography–Electronic still-picture cameras–Methods for measuring optoelectronic conversion functions (OECFs)
ISO 15739:2013 Photography–Electronic still-picture imaging–Noise measurements
ISO 16067-1:2003 Photography–Spatial resolution measurements of electronic scanners for photographic images–Part 1: Scanners for reflective media

In recent work, the objective results were compared to the results of subjective testing to determine if the objective metrics correlate with the image quality that observers perceive. The subjective results, generated using paired comparison [4], [5] and softcopy quality ruler [6], [7] protocols, serve as independent verification of the standard. The objective measurements that are used in the CPIQ standard were found to be related to the perception of image quality.

The devices in the study were selected from a variety of manufacturers in order to assess a wide spectrum of quality and pixel counts. The devices were from well-known smartphone manufacturers such as Apple, Samsung, and Nokia. A variety of image quality characteristics were analyzed with each camera. In the subjective evaluation, images were taken with each camera of ten real-world scenes. The scenes chosen represented a range of illumination conditions and image content that consumers are likely to photograph, such as flowers and people. They were also selected to resemble the pre-existing set of images used in the softcopy quality ruler experimental protocol. Once all the images were taken, they were cropped so that all images had the same dimensions; care was taken so that the target had the same pixel height in the image, regardless of the pixel height of each camera. The same images were used in both the paired comparison experiment and the softcopy quality ruler experiment.

Twenty observers participated in each experiment; the observers were tested for color deficiency and acuity prior to participation. In the paired comparison experiment, a pair of images of the same scene taken with two different smartphones were presented on the calibrated display, as shown in Figure 1. The participants were directed to press the arrow key on a keyboard corresponding to which image of the two they preferred. Preference was explained to be a result of multiple factors, such as sharpness, color, and noise present in each image, but not image composition or facial expression. The order of presentation was randomized for each observer.

Figure 1—An observer participating in the paired comparison experiment.

In the softcopy quality ruler assessment, two images were again displayed next to each other on the same display. However, in this case, one was a ruler image and the other a test image. The GUI incorporated a slider bar that participants were asked to use to adjust the sharpness of the ruler image. The participants moved the slider bar until they felt that the overall quality of the ruler image matched the quality of the test image. The image set order was randomized for each observer, but the assessment was performed for each image of an individual scene before moving to a new scene.

The paired comparison results were analyzed by finding the probability and the corresponding z-score that each image would be selected as preferred. SQS values on an absolute quality scale were determined for the softcopy quality ruler study. A correlation coefficient was then calculated for the relationship between the SQS values from the quality ruler and the z-scores from the paired comparison. Seven out of the ten scenes were found to have highly correlated results for the paired comparison and quality ruler tests. Eight of the nine cameras tested had highly correlated results for all ten scenes. The results indicated that either experimental approach would provide a measure of perceived image quality.

The subjective results were then compared to the objective results with the ten scenes grouped into three categories based on the lighting conditions under which the images were taken: daylight, indoor lighting, and low light (see Figure 2). The objective metric results were found to be fairly well correlated with the subjective assessment of image quality, although they also serve as evidence that there is still room for improvement, especially with low light scenes. This work continues, with the goal of providing objective metrics that provide an accurate measure of perceived quality for manufacturers to use in the development of their products and consumers to use for making more informed purchases.

Figure 2—The subjective results relative to the objective metrics expressed in terms of Quality Loss [2].

This new standard is expected to be used by manufacturers of smartphones and smartphone components to evaluate the effects of design choices on output image quality as well as by testing labs to provide a common language for reporting image quality results to their customers and to consumers. To provide further assistance to manufacturers and testing labs, work on CPIQ continues in areas such as exposure control, automatic white balance, and video capture, to name just a few. An additional standard, which will incorporate the results of this work, is expected to be published in May 2018. The publication of this standard will bring the imaging community another step closer to a complete body of standards for measuring and quantifying perceived image quality. What remains includes evaluation of special camera functions such as High Dynamic Range imaging and Portrait mode.

References

IEEE P1858, IEEE Standard for Camera Phone Image Quality (CPIQ), May 2017.
Baxter, D., F. Cao, H. Eliasson, and J. B. Phillips, “Development of the I3A CPIQ spatial metrics,” Proc. SPIE 8293, p. 829302, Jan. 2012.
Jin, E. W., J. B. Phillips, S. Farnand, M. Belska, V. Tran, E. Chang, Y. Wang, and B. Tseng, “Towards the development of the IEEE P1858 CPIQ standard—A validation study,” Electronic Imaging, vol. 2017, no. 12, pp. 88–94, 2017.
Torgerson, W. S., Theory and methods of scaling, New York, NY: J. Wiley & Sons, 1958.
Engeldrum, P. G., Psychometric scaling: A toolkit for imaging systems, Winchester, MA: Imcotek Press, 2000. Jin, E. W., B. W. Keelan, J. Chen, J. B. Phillips, and Y. Chen, “Softcopy quality ruler method: Implementation and validation,” Proc. SPIE 7242, p. 724206, 2009.
Jin, E. W., and B. W. Keelan, “Slider-adjusted softcopy ruler for calibrated image quality assessment,” Journal of Electronic Imaging, vol. 19, no. 1, p. 011009, 2010.

Katherine Carpenter is a 2nd year PhD graduate student in the Program of Color Science at the Rochester Institute of Technology. She received her BS in physics from SUNY Oneonta. She executed the subjective evaluation of the CPIQ objective metric. She can be reached at kmc2582@rit.edu.

Susan Farnand is a Visiting Assistant Professor in the Program of Color Science at the Rochester Institute of Technology. Her research interests include human vision and perception, color science, cultural heritage imaging and 3Dprinting. She received her BS in engineering from Cornell University, her Masters in Imaging Science and her PhD in Color Science from the Rochester Institute of Technology. She began her career at Eastman Kodak, designing and evaluating printer systems. She is Publications Vice President of the international Society of Imaging Science and Technology and serves as an Associate Editor for the Journal of Imaging Science and Technology. She participates in several Standards efforts including ISO TC 42 JWG26 Archival Imaging and IEEE CPIQ. She can be reached at susan.farnand@rit.edu.