Catch up on Parts One and Two.
Applications based on abnormality processing
The third category of attention-based applications concerns abnormality processing. Some applications go further than the use of the simple detection of the areas of interest. They use comparisons between the areas on the saliency maps. Application domains such as robotics or advertisement highly benefit from this category of applications.
Robotics is a very large domain of application with various needs. There are three research axes where robots can take advantage from saliency models: 1) image registration and landmarks extraction, 2) object recognition, and 3) robots action guidance.
An important need of a robot is to know where it is located. For this aim, the robot can use the data from its sensors to find landmarks (salient features extraction) and register images taken at different times (salient features comparison) to build a model of the scene. The general process of real-time building of a view of the scene is called Simultaneous Localization and Mapping (SLAM). Saliency models can help a lot in the extraction of more stable landmarks from images which can be more robustly compared [25]. Those techniques imply first the computation of saliency maps, but the results are not used directly: they need to be further processed (especially comparisons of salient areas).
Another important need of robots after they establish the scene, is to recognize the objects which are present in this scene and which might be interesting to interact with. Two steps are needed to recognize objects. First of all, the robot needs to detect the object in a scene. For this goal saliency models can help a lot as they can provide information about proto-objects [26] or areas objectness [27]. When objects are detected, they need to be recognized. In this area the main approach is to 1) extract features (SIFT, SURF or any others) from the object 2) filter the features based on a saliency map 3) perform the recognition based on a classifier (such as a SVM or others). Papers like [28] or [29] apply this technique which let a computer drastically decrease the number of needed keypoints to perform the object recognition. Another approach was used in [30] or [31]. Here the features which are mostly present in the searched object and not present in the surroundings are learned and this learning phase provides a new set of weights for bottom-up attention models. In this way, the features which are the most discriminant in the searched object will get the higher response in the final saliency map. A third approach can be found in [32] where relative position of salient points (called cliques) are used for image recognition.
Once robots know where they are (attentive visual SLAM) and they recognize objects around them (attentive object recognition), they need to decide what to do next. One of the decisions they need to make is to know where to look next and this decision is obviously taken based on visual attention. Several robots implement multi-modal attention like the iCub robot. They combine visual and audio saliency in an egosphere and this is used to point the gaze on the next location. An interesting survey on attention for interactive robots can be found in [33].
Another domain is is also part of this abnormal region processing category of applications: visual communication optimization. Marketing optimization can be applied to a large amount of practical cases such as: web sites, advertisement, product placement in supermarkets, signage, 2D and 3D objects placement in galleries.
Among the different applications of automatic saliency computation, the marketing and communication optimization is probably one of the closest to market. As it is possible to predict an image attention map, which is a map of the probability that people attend each pixel of the image, it is possible to predict where people are likely to look on a marketing material like an advertisement or a website. Attracting customer attention is the first step of the process of attracting people interest, induce desire and need for the product and finally push the client to buy it.
Feng-GUI [34] is an Israeli company mainly focusing on web pages and advertising optimization even if the algorithm is also capable to analyze video sequences. AttentionWizzard [35] is a US company mainly focusing on web pages. There are few a hints on the used algorithm, but it uses bottom-up features like: color differences, contrast, density, brightness and intensity, edges and intersections, length and width, curves and line orientations. Top-down features include face detection, skin color and text (especially big text) detection. 3M VAS [36] is the only big international player in this field. Very few details are given on the used algorithm, but it is also capable to provide video saliency. They provide attention maps for web pages optimization, but also advertisement with static images or videos, packaging or in-store merchandising. Eyequant [37] is a German company specialized in website optimization. Their algorithm use extensive eye-tracking tests to train the algorithm and make it closer to real eye-tracking for a given task. All those companies claim around 90 % accuracy for the first 3/5 viewing seconds [38]. They base their claim on different comparison between their algorithm and several existing databases using several ROC metrics. They always compare the results with the maximum ROC score obtained by the human users. Nevertheless, for real-life images and for given tasks and emotion-based communication, this accuracy dramatically drops but still remains usable.
With more and more 3D objects which are created, manipulated, sold or even printed, 3D saliency is a very promising future research direction. The main idea is to compute the saliency score of each view of a 3D model: the best viewpoint is the one where the total object saliency is maximized [39]. Mesh saliency was introduced based on adapting to the mesh structure concepts for 2D saliency [40]. The notion of viewpoint and mesh simplification are also related through the use of mesh saliency [41]. While the best viewpoint application can be used for computer graphics or even 3D mesh compression, marketing is one of the targets of this research topic: more and more 3D objects are shown even on internet and the question of how to display them in an optimal way is very interesting in marketing.
Conclusion
During the last two decades, significant progresses have been made in the area of visual attention.
Regarding the applications, 3 categories taxonomy is proposed here:
- Abnormality detection: use the most salient areas detection.
- Normality detection: use the less salient areas detection.
- Abnormality processing: compare and further process the most salient areas.
This categories let us simplify and classify a very long list of applications which can benefit from attention models. We are just at the early stages of the use of saliency maps into computer vision applications. Nevertheless, the number of already existing applications shows a promising avenue for saliency models in improving existing applications, and for the creation of new applications. Indeed, several factors are nowadays turning saliency computation from labs to industry:
- The models accuracy drastically increased in two decades both concerning bottom-up saliency and top-down information and learning.
- The models working both on videos and images are more and more numerous andprovide more and more realistic results. New models including audio signals and 3D data are released and are expected to provide convincing results in the near future.
- The combined enhancement of computing hardware and algorithms optimization led to real-time or almost real-time good quality saliency computation.
References:
25. Frintrop, S. and Jensfelt, P. (2008) Attentional landmarks and active gaze control for visual slam. Robotics, IEEE Transactions on, 24 (5), 1054–1065.
26. Walther, D. and Koch, C. (2006) Modeling attention to salient proto-objects. Neural networks, 19 (9), 1395–1407.
27. Alexe, B., Deselaers, T., and Ferrari, V. (2010) What is an object?, in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, IEEE, pp. 73–80.
28. Zdziarski, Z. and Dahyot, R. (2012) Feature selection using visual saliency for content-based image retrieval, in Signals and Systems Conference (ISSC 2012), IET Irish, IET, pp. 1–6.
29. Awad, D., Courboulay, V., and Revel, A. (2012) Saliency filtering of sift detectors: application to cbir, in Advanced Concepts for Intelligent Vision Systems, Springer, pp. 290–300.
30. Navalpakkam, V. and Itti, L. (2006) An integrated model of top-down and bottom-up attention for optimizing detection speed, in Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 2, IEEE, vol. 2, pp. 2049–2056.
31. Frintrop, S., Backer, G., and Rome, E. (2005) Goal-directed search with a top-down modulated computational attention system, in Pattern Recognition, Springer, pp. 117–124.
32. Stentiford, F. and Bamidele, A. (2010) Image recognition using maximal cliques of interest points, in Image Processing (ICIP), 2010 17th IEEE International Conference on, IEEE, pp. 1121–1124.
33. Ferreira, J.F. and Dias, J. (2014) Attentional mechanisms for socially interactive robots–a survey. Autonomous Mental Development, IEEE Transactions on, 6 (2), 110–125.
34. Feng gui website proposes automatic saliency maps for marketing material. URL http://www.feng-gui.com/.
35. Attention wizzard website proposes automatic saliency maps for marketing material. URL https: //www.attentionwizard.com/.
36. 3m vas website proposes automatic saliency maps for marketing material. URL http://solutions.3m.com/wps/portal/3M/en_US/VAS-NA/VAS/.
37. Eyequant website proposes automatic saliency maps for marketing material. URL http://www.eyequant.com/
38. Page containing the 3m vas studies showingalgorithm accuracy in general and in a marketing framework. URL http://solutions.3m.com/wps/portal/3M/en_US/VAS-NA/VAS/eye-tracking-software/eye-tracking-studies/.
39. Takahashi, S., Fujishiro, I., Takeshima, Y., and Nishita, T. (2005) A feature-driven approach to locating optimal viewpoints for volume visualization, in Visualization, 2005. VIS 05. IEEE, IEEE, pp. 495–502.
40. Lee, C.H., Varshney, A., and Jacobs, D.W. (2005) Mesh saliency, in ACM transactions on graphics (TOG), vol. 24, ACM, vol. 24, pp. 659–666.
41. Castelló, P., Chover, M., Sbert, M., and Feixas, M. (2014) Reducing complexity in polygonal meshes with view-based saliency. Computer Aided Geometric Design, 31 (6), 279–293.