Serious Games Evaluation

Evaluation is the systematic and objective assessment of an ongoing or completed activity or project and/or its resulting products. The aim is to determine the relevance and fulfilment of objectives, efficiency, effectiveness, impact and sustainability using a, normally predefined, specific set of criteria. Evaluation can be formative when the purpose is to extract valuable information to improve the current process or products, or summative when the purpose is to assess the overall achievements. Some authors also consider diagnostic evaluation as a tool to understand the existing context before the actual process started.

Writing about the evaluation of Serious Games (SG), especially in such a limited number of words, is not a simple matter. The diversity of purposes (education and training, skill and competence development, awareness raising, marketing, science and research, etc.), the very different types of gameplay (role-play, adventure, simulation, strategy, etc.), the multiple contexts of use, the diversified target audiences, all imply that the evaluation of each individual game must be unique even if following a methodological framework.

The specific nature of a SG implies that its evaluation must consider all its inherent characteristics, that is, it must be assessed as a game (gameplay), as a software [1] application (usability) and as a personal development tool (content). Together these vectors assess the quality of the game, the quality of human-game interaction, and the quality of this interaction in a given “serious” context. Therefore SG evaluation is a threefold activity where the weights of each individual component should be balanced but are not necessarily equal. The quality and relevance of the content is naturally important but the user experience and the game-quality of the product are essential to ensure an effective and engaging tool that leads to skill and competence development. Although these three vectors of assessment should be orthogonal, it is hard to ensure that as there are some clear dependencies between them. A game has many intrinsic features, differently accommodated by distinct users depending on their previous experience and background but also on their preferences.

Usability assessment in software applications has been the subject of extended research for the past few years. However, traditional metrics need to be adapted to digital games and complement the assessment of the gameplay experience, using for instance Malone’s intrinsic factors for engaging gameplay— challenge, curiosity, and fantasy— or Csikszentmihalyi’s flow theory.

Concerning the content vector, or the “serious” objective, the evaluation process can be summed up in this question: “Did the player acquire the knowledge, competences and skills with this game that he/she was supposed to?” The question maybe simple, but it is often difficult to answer because skill development is an individual process, depending on the specific context. For SG used for education and training purposes, evaluation of this vector can rely on the extended research and practice in educational evaluation, namely through Kirkpatrick’s work. For instance, through the use of knowledge and skill pre- and post-testing, aiming to measure changes in educational outcomes after modifications to the learning process. Advergames (or SG for marketing and advertising) can benefit from the impact assessment processes that this industry already uses for other channels. Awareness raising SG are in a crossroad between the two previous ones and could benefit from existing practice in both domains. So, in general, the evaluation of this vector can, in general, be based on existing assessment methodologies, practices and data collection tools adjusted for this special channel. Focus groups, think-aloud play, questionnaires, semi-structured interviews, user observation and a few more, all have been reported as being used in different SG evaluation studies, in different stages like alpha, beta and gamma testing where user involvement is gradually more and more relevant and the evaluation focus shifts from the more technical aspects to the more content-related ones.

One evaluation tool that is gradually gaining relevance and explores the intrinsic aspects of SG is the use of game analytics. The automatic and permanent collection of interaction data between the player and the SG can contribute to the three vectors providing real-time information that no other tool is able to capture with the same precision and independence. Therefore it can have a relevant formative purpose by adapting the SG so that it can better fit the needs and skills of the player but it can also have a summative purpose on the assessment and even certification of the acquired competences.

For further reading on this subject, take a look at the following articles. Each of them provides an extended list of studies and research in this domain:

[1] Mayer et al present the methodological background to an ongoing research project on the scientific evaluation of serious games and/or computer-based simulation games (SGs) for advanced learning.

Igor Mayer, Geertje Bekebrede, Casper Harteveld, Harald Warmelink, Qiqi Zhou, Theo van Ruijven, Julia Lo, Rens Kortmann andIvo Wenzler, The research and evaluation of serious games: Toward a comprehensive methodology, British Journal of Educational Technology, Volume 45, Issue 3, pages 502–527, May 2014, DOI: 10.1111/bjet.12067

[2] Bellotti et al review a significant body of literature related to the evaluation of serious games and suggest directions for further research.

Francesco Bellotti, Bill Kapralos, Kiju Lee, Pablo Moreno-Ger, and Riccardo Berta, Assessment in and of Serious Games: An Overview, Advances in Human-Computer Interaction, Volume 2013 (2013), Article ID 136864, 11 pages,

[3] Nacke et al focus on the evaluation of the gameplay experience in a SG context.

Lennart Nacke, Anders Drachen, Stefan Göbel, Methods for Evaluating Gameplay Experience in a Serious Gaming Context, Available online at:

[1] What if Serious Games are not digital tools? Usability evaluation is also applied in non-digital contexts like board games. However, for the purpose of simplicity, we’ll focus on the use of digital SG.