Rating player strength in computer games

Written by Anders Törlind - 2013 - All rights reserved.

Introduction

Creating a numerical representation of a players strength in a game my serve several purposes - social assertion, as a measure of personal development and so on. In this text, focus lies on rating for creating agreeable matching of players.

Ways of measuring strength

Play strength, or possibly "skill", is an elusive thing. In a sport or a game, may different skills play a part of a players performance - tactics, experience, raw power, technical excellence, prioritization, willpower, reaction time etc. etc. Measuring all of these factors and weighing them together as a single number is not easy, and takes some consideration. The usual answer is to forgoe measuring single aspects, and focus only on results.

In the rest of the text, I will refer to relative rating and absolute rating. These two types are different approaches to how one measures player strength - The first is a purely relative measure, where the rating is an abstract function of who beat whom at what point in time. The second is an attempt to quantify strength as a function of what was accomplished during a match.

A small case study might be in order.

Golf

One of the most well known rating systems is the golf handicap system, where a players strength is measured as a number - The lower the number, the better the player is considered to be. A players handicap is being constantly revised, and subjected to change due to improvement (or degradation) of play over time.

The main stated purpose of the handicap system is to ensure that players of dissimilar skill can still enjoy a game together, each competing to beat their handicap by a larger margin than their opponent.

This system is an example of absolute rating - The measured result is a function of the number of strokes used to complete a course and what par for the course was. In no way does the result of your opponent play any part in your rating (handicap).

Tennis

Ranking is tennis is another approach entirely to how one measures play strength. Here, the major purpose is seeding for competitions, where competitions consist of big single elimination trees.

In brief, ranking points (which we consider to be a form or rating for the purposes of this text) are earned through advancement through certain tournaments. Different tournaments have different point totals, and points are awarded according to what level of the elimination tree the player reaches. Ranking points earned in a tournament are only valid until that same tournament is played the next year, and the tournament contribution to the total is removed after its successor has concluded.

This is an example of relative rating, but also subjected to a fair bit of chance. Advancement in a tournament depends on your skill, and the skill of the opponents you face. The tennis ranking is therefore not comparable backwards in time - A tennis player at a certain ranking in 1970 is not, necessarily, at all comparable with one at the same ranking in 2010.

Chess

Chess is, I assume, well known to the reader. Chess ratings have long been an established rating system that has been well studied. I'll use the ELO system in this example, though several different versions of that system and others have been used at different times and places.

The ELO system is used to measure play strength in chess works by comparing wins and losses of players directly. Tournaments, as opposed to tennis, play a part only in bringing together players of similar skill. Rating points are awarded for defeating opponents and deducted for losing to opponents. The won and lost points relate directly to the rating disparity between the players. A lower rated player beating a higher rated player gets a big raise in rating, a higher rated player beating a lower rated player gets a marginal raise of rating (or possibly none at all, if the rating gap is too large).

This is the arch-example of relative rating. The rating of a player is entirely dependent on who beat whom and in what order they did it. The rating is an entirely synthetic thing, not tied to performance in tournaments or title bouts, but only dependent on individual match results.

Using player rating for matchmaking

In this section, I'll use terminology from my text on matchmaking. A quick read through would be appropriate at this time, if you have not already done so.

Desirable properties of your rating system

In order to use a rating system for match making, the rating system must make sense in the context of the match making algorithm. The ways matches are set up may skew the rating system, unless some thought is given to the problem.

Example #1: You have a multi player game where two teams consisting of several players compete. The mode of matchmaking is random and your way to rate players is win rate - That is to say, you measure how many percent of matches played are won by each players team. This will indicate player contribution to the team effort over a large number of games played.

Example #1, modified: Let us now say that you wish to switch to a rated mode of matchmaking. The naive approach would be to match people of similar rating (win rate) against each other and be done with it. This would lead to unfortunate skew in the rating system, since a person matched against their equals would be expected to win half of his/her matches. This would depress each players rating (win rate) and cause them to face weaker opponents. Clearly not an optimal situation. Another rating system is called for.

Example #2: You have a multi player game where two teams consisting of several players compete. The mode of matchmaking is random and your way to rate players is by average points scored per match. A strong player will carry his team by scoring many times, and so cause wins more often than not. This will indicate player contribution to the team effort over a large number of games played.

Example #2, modified: Let us now say that you wish to switch to a rated mode of matchmaking. The naive approach would be to match people of similar rating (average score) against each other. This would also, unsurprisingly, lead to skew of the rating system. Evenly matched teams, where all players are of comparable skill level would not lead to higher scores, but rather to an evening out of points scored across the players of a team. This would depress strong players average score and inflate the scores of weak players as compared to the original situation. Again, this would lead to a collapse of the rating system as a good measure of skill.

Choosing your rating system

As seen, choosing a rating system is not entirely easy. It's very easy to set up a system that, given any changes to your matchmaker, is devalued as a measure of play strength.

In my very personal opinion, relative ratings are the way to go. The chess model of rating is mathematically sound and even though there are some problems with selective matching and other types of manipulation, it is still less vulnerable to skew than most other ways of measuring results.

Conclusion

Your choice of rating system is worthy of some consideration. It should be reasonably easy to understand by the user. It should fit with the game modes and matchmaker algorithms. Lastly, it should be a relevant measure of play strength. If your rating system is subject to manipulation by players, credibility will soon erode. This means that ongoing monitoring of rating development, such as inflation, deflation, discontinuities and other anomalies is needed. If your game needs anything dependent on player rating, your rating system had better be in good working order!