Applications of Saliency Models – Part Two

Missed the Part One? We’ve got you covered.

Applications based on normality detection

In this section we focus on a second category of applications based on the locations having the lowest saliency scores. Those areas correspond with repeating and less informative regions, which might be easily compressed.

Compression is the process of converting a signal into a format that takes up less storage space or transmission bandwidth. The classical compression methods tend to distribute the coding resources evenly in an image. On the contrary, attention-based methods encode visually salient regions with high priority, while reating less interesting regions with low priority. The aim of these methods is to achieve compression without significant degradation of perceived quality.

In [1], a saliency map for each frame of a video sequence is computed and a smoothing filter is applied to all non-salient regions. Smoothing leads to higher spatial correlation, a better prediction efficiency of the encoder, and therefore a reduced bitrate of the encoded video. An extension of [1], uses a similar neurobiological model of visual attention to generate a saliency map [2]. The most salient locations are used to generate a so-called guidance map which is used to guide the bit allocation. Using the bit allocation model of [2], a scheme for attention video compression has been suggested by [3]. This method is based on visual saliency propagation (using motion vectors), to save computational time. More recently, attention-based image compression patents like [4] has been accepted, which also shows that compression algorithms are more and more efficient in real-life applications and become close to reach the market.

Compression aims in reducing the amount of data in a signal. A usual approach consist in modifying the coding rate, but other approaches can also reduce the amount of data in the signal by cropping or resizing the signal. An obvious idea which drastically compresses an image is of course to decrease its size. This size decrease can be brutal (zoom on a region and the rest of the image is discarded) or softer (the resolution of the context of the region of interest is decreased but not fully discarded).

The authors in [5] use Itti algorithm to compute the saliency map [6], that serves as a basis to automatically delineate a rectangular cropping window. The Self-Adaptive Image Cropping for Small Displays [7] is based on an Itti and Koch bottom-up attention algorithm but also on top-down considerations as face detection or skin color. According to a given threshold, the region is either kept or eliminated. A completely automatic solution to create thumbnails according to the saliency distribution or the cover rate is presented by [8]. An algorithm proposed in [9] starts by adaptively partition the input image into number of strips according to a combined map which contains both gradient information and visual saliency. The methods of intelligent perceptual zooming based on saliency algorithms become more and more interesting with the advances in saliency maps computation in terms of both real-time and spatio-temporal cues integration. Even big companies as Google [10] become more and more involved in developing applications based on perceptual zooms. The idea is to generalize the perceptual zoom for images and videos and keep the temporal coherence of the zoomed image even in case of objects of interest which might brutally appear in the image far from the previous zoom area.

Perceptual zoom does not always preserve the image structure. To keep the image structure intact several methods exist: warping and seam carving. Those methods are also used to provide data ßummarization”.

Warping is an operation that maps a position in a source image to a position in a target image by a spatial transformation. This transformation could be a simple scaling transformation [11].A retargeting method based on global energy optimization is detailed in [12] and extended to combine an uniform sampling and a structure-aware image representation [13]. A warping method which uses the grid mesh of quads to retarget the images is defined in [14]. The method determines an optimal scaling factor for regions with high content importance as well as for regions with homogeneous content which will be distorted. A significance map is computed based on the product of the gradient and the saliency map. [15] proposes an extended significance measurement to preserve shapes of both visually salient objects and structure lines while minimizing visual distortions.

The other method for image retargeting is seam carving. Seam carving [16] allows to retarget the image thanks to an energy function which defines the pixels importance. The most classical energy function is the gradient map, but other functions can be used such as entropy, histograms of oriented gradients, or saliency maps [17]. For spatio-temporal images, [18] propose to remove 2D seam manifolds from 3D space-time volumes by replacing dynamic programming method with graph cuts optimization to find the optimal seams. A saliency-based spatio-temporal seam-carving approach with much better spatio-temporal continuity than [18] is proposed by [19]. In [20], the authors describe a saliency map which takes more into account the context and proposes to apply it to seam carving. Interestingly, recent papers as [21] propose to mix seam carving and warping techniques.

Summarization of images or videos is a term which is similar to retargeting. It might be based on cropping [22]. It might also be based on carving as in [23]. The main purpose is to provide a relevant summary of a video or an image. In [24] the authors used video summarization to provide a mashup of several videos into a unique pleasant video containing the important sequences of all the concatenated videos.


1. Itti, L. (2004) Automatic foveation for video compression using a neurobiological model of visual attention. IEEE Transactions on Image Processing, 13 (10), 1304–1318, doi:.834657.

2. Li, Z., Qin, S., and Itti, L. (2011) Visual attention guided bit allocation in video compression. Image and Vision Computing, 29 (1), 1 – 14, doi:10.1016/j.imavis.2010.07.001. URL S0262885610001083.

3. Gupta, R. and Chaudhury, S. (2011) A scheme for attentional video compression. Pattern Recognition and Machine Intelligence, 6744, 458–465.

4. Zund, F., Pritch, Y., Hornung, A.S., and Gross, T. (2013), Content-aware image compression method. US Patent App. 13/802,165.

5. Suh, B., Ling, H., Bederson, B.B., and Jacobs, D.W. (2003) Automatic thumbnail cropping and its effectiveness., in Proceedings of the 16th annual ACM symposium on User interface software and technology (UIST), pp. 95–104.

6. Itti, L. and Koch, C. (2001) Computational modelling of visual attention. Nature Reviews Neuroscience, 2 (3), 194–203.

7. Ciocca, G., Cusano, C., Gasparini, F., and Schettini, R. (2007) Self-adaptive image cropping for small displays. IEEE Transactions on Consumer Electronics, 53 (4), 1622–1627.

8. Le Meur, O., Le Callet, P., and Barba, D. (2007) Construction d’images miniatures avec recadrage automatique basé sur un modéle perceptuel bio-inspiré, in Traitement du signal, vol. 24(5), vol. 24(5), pp. 323–335.

9. Zhu, T., Wang, W., Liu, P., and Xie, Y. (2011) Saliency-based adaptive scaling for image retargeting, in Computational Intelligence and Security (CIS), 2011 Seventh International Conference on, pp. 1201–1205, doi:10.1109/CIS.2011.266.

10. Grundmann, M. and Kwatra, V. (2014), Methods and systems for video retargeting using motion saliency. URL, uS Patent App. 14/058,411.

11. Liu, F. and Gleicher, M. (2005) Automatic image retargeting with fisheye-view warping, in Proceedings of User Interface Software Technologies (UIST). URL

12. Ren, T., Liu, Y., and Wu, G. (2009) Image retargeting using multi-map constrained region warping, in ACM Multimedia, pp. 853–856.

13. Ren, T., Liu, Y., and Wu, G. (2010) Rapid image retargeting based on curve-edge grid representation, in ICIP, pp. 869–872.

14. Wang, Y.S., Tai, C.L., Sorkine, O., and Lee, T.Y. (2008) Optimized scale-and-stretch for image resizing. ACM Trans. Graph. (Proceedings of ACM SIGGRAPH ASIA, 27 (5).

15. Lin, S.S., Yeh, I.C., Lin, C.H., and Lee, T.Y. (2013) Patch-based image warping for content-aware retargeting. Multimedia, IEEE Transactions on, 15 (2), 359–368, doi:10.1109/TMM.2012.2228475.

16. Avidan, S. and Shamir, A. (2007) Seam carving for content-aware image resizing. ACM Trans. Graph., 26 (3), 10.

17. Vaquero, D., Turk, M., Pulli, K., Tico, M., and Gelf, N. (2010) A survey of image retargeting techniques, in SPIE Applications of Digital Image Processing.

18. Rubinstein, M., Shamir, A., and Avidan, S. (2008) Improved seam carving for video retargeting. ACM Transactions on Graphics (SIGGRAPH), 27 (3), 1–9.

19. Grundmann, M., Kwatra, V., Han, M., and Essa, I. (2010) Discontinuous seam-carving for video retargeting, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pp. 569–576, doi:10.1109/CVPR.2010.5540165.

20. Goferman, S., Zelnik-Manor, L., and Tal, A. (2012) Context-aware saliency detection. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 34 (10), 1915–1926.

21. Wu, L., Cao, L., Xu, M., and Wang, J. (2014) A hybrid image retargeting approach via combining seam carving and grid warping. Journal of Multimedia, 9 (4). URL

22. Ejaz, N., Mehmood, I., Sajjad, M., and Baik, S.W. (2014) Video summarization by employing visual saliency in a sufficient content change method. International Journal of Computer Theory and Engineering, 6 (1), 26.

23. Dong, W., Zhou, N., Lee, T.Y., Wu, F., Kong, Y., and Zhang, X. (2014) Summarization-based image resizing by intelligent object carving. Visualization and Computer Graphics, IEEE Transactions on, 20 (1), 1–1.

24. Zhang, L., Xia, Y., Mao, K., Ma, H., and Shan, Z. (2015) An effective video summarization framework toward handheld devices. Industrial Electronics, IEEE Transactions on, 62 (2), 1309–1316.