Image Retrieval Challenge: a new GPU-based MapReduce Framework

Six innovative researches won the Best Paper Award at CHINACOM 2015, the Tenth International Conference on Communications and Networking in China, held in August in Shanghai.

Lei Wang, Hanli Wang and Bo Xiao from the Department of Computer Science at Tongji University, presented a study titled ‘A GPU-based MapReduce Framework for MSR-Bing Image Retrieval Challenge’ about a new system designed for efficiently and accurately searching images and scoring image-query pairs based on their relevance.

State-of-the-art Image retrieval system (e.g. Bing Image, Google Image and Baidu Image, etc. ), is experiencing some challenges due to the increasing amount of multimedia data produced everyday with the development of social networks. More flexible, powerful and robust large-scale image retrieval systems are needed in order to meet the continuous and growing flow of data.

A new system Unlike the former systems, which usually start with text queries to select partial images and then process their visual contents, the GPU-based MapReduce Framework (GMRF) attempts to search similar images directly from the dataset through visual content, and then compare their text similarities. GMRF is a computational framework aiming at investigating the parallel power of the fusion of MapReduce and GPU.

  • MapReduce is a parallel programming model originally proposed by Google to process large-scale data. It highly abstracts the process of complex parallelization and distribution of computations, and offers automatic task/data management, inter-machine communication as well as fault tolerance;
  • Graphics Processing Unit (GPU), initially designed to accelerate computer graphic applications, is becoming more suitable for data-parallel computations by distributing the data amongst massive computing units to achieve parallelization.

Goals and contributions The goal of MSR-Bing IRC is to encourage the contestants to build efficient image retrieval and scoring systems for assessing the effectiveness of query terms in describing the images crawled from the web for image search purposes. A contesting system is asked to produce a floating-point score on each image-query pair that reflects how relevant the query could be used to describe the given image. The contributions of the research can be synthetized as follows:

  1. The mechanism of data loading and transmission is specifically optimized to improve the throughput capacity of GMRF;
  2. A novel programming prototype referred as Worker Prototype is designed to exploit jointly the computing power of CPUs and GPUs;
  3. A simplification and optimization of GMRF comes by removing several unnecessary procedures, as compared with the traditional MapReduce framework.

The full text of the paper will be available on the European Union Digital Library (EUDL).