Analyzing Rating-Scale Data

Overview

When people are asked to give feedback using rating scales, such as a survey that asks you to rate something from “Strongly Disagree” to “Strongly Agree,” researchers need a way to turn those opinions into something measurable. The most common approach is to assign numbers to each option and calculate the average.

For example, imagine you are using a five point scale where:

1 = Strongly Disagree
2 = Disagree
3 = Neutral
4 = Agree
5 = Strongly Agree

Once every participant has responded, you add up all the numbers and divide by the number of people to get an average. This makes it easier to compare opinions between different groups or across multiple studies.

That said, it is important to understand that rating scale data is not perfectly numerical. Unlike measurements such as height or temperature, where the difference between each value is consistent, the difference between choosing a 2 versus a 3 may not be the same as choosing a 4 versus a 5. Even so, many researchers treat these scales as evenly spaced because it simplifies analysis and works well enough for most practical use cases.

Methods for Analyzing Data

Averages
Shows overall player sentiment by summarizing responses into a single value.
Distribution
Reveals how responses are spread out, highlighting polarization or clusters.
Segments
Compares how different player groups respond, such as new vs. experienced players.
Top-box
Focuses on the strongest positive responses to show who truly loved something.
Trends
Tracks how responses change over time, such as improvement across builds or updates.
Correlations
Explores relationships between two variables

Averages

Calculate mean ratings to get a general sense of how players felt.
Compare averages across levels, game builds, or player segments.

How to calculate an average rating:

To calculate the average (mean) of a rating scale, assign a number to each response option, then add up all the responses and divide by the total number of players.

For example, if you use a 1 to 5 scale and receive these ratings:
5, 4, 4, 3, 4

You would add them together:
5 + 4 + 4 + 3 + 4 = 20

Then divide by the number of responses:
20 ÷ 5 = 4.0

That final number represents the average player rating.

🔎 Example:

Level 1 Fun = 4.6
Level 2 Fun = 2.8 → Might need work

⚠️ Caveat: Averages can hide strong differences in opinion.

A big trap is focusing only on the average score. Doing so can obscure important patterns in the data.

For example, imagine you asked 20 players whether they enjoyed a game level using a 1 to 7 scale. If the average response is a 4, you might assume players felt neutral. But if you look closer and see that 10 players rated the level a 1 and 10 rated it a 7, you would discover that no one felt neutral at all. The response was sharply divided between love and hate.

That insight is completely lost when you look only at the average.

This is why researchers also examine the distribution of responses, meaning how many players selected each number on the scale. Looking at the spread of responses provides a much more complete picture of player sentiment.

Here’s a visual example of why averages can be misleading when analyzing player ratings:

Level 1 shows a polarized split: half of the players rated it 1 (hated it), and half rated it 7 (loved it). The average is 4—but no one actually rated it a 4!
Level 2 has a true average of 4, and all players rated it exactly that—clearly a more neutral response.

Distribution

Count how many people gave each score (1, 2, 3, etc.)
This helps spot polarized responses that averages miss.

Example:

10 players rated Level 3: five gave it a 1, five gave it a 5 → Average = 3
But the level isn’t “meh”—it’s love it or hate it.

This might suggest:

The level caters to a specific player type
Something about difficulty or pacing splits opinion

Segments

Break down ratings by:

Experience level (new vs. experienced gamers)
Platform (console vs. PC)
Play style (explorers vs. speedrunners)

This helps you tailor features or understand what is working for who.

Top-Box

Let’s say you’re using a 5-point scale, and someone gives a rating of 5 (Strongly Agree):

A top-box score would be the percentage of people who gave a 5.
A top-2-box score would be the percentage of people who gave a 4 or a 5.

These scores are useful when you want to know how many people really liked something—not just felt okay about it.

However, when you use this method, you're only looking at the top end of the scale and ignoring the lower ratings. That means:

You lose some information.
You shouldn’t average these scores anymore. Just report them as percentages.

Calculate how many players gave very high ratings:

Top-box = % who gave the highest score (e.g., 5 out of 5)
Top-2-box = % who gave a 4 or 5

🎯 Why it is useful:

Quickly shows how many people really loved something
Great for stakeholder summaries: "85% of players rated the boss fight 4 or 5"

Segments

Distribution

Averages

TopBox

Here’s a chart that shows Top-Box and Top-2-Box scores:

Top-Box (Yellow): Percentage of players who rated a feature 5 (Strongly Agree).
Top-2-Box (Orange): Percentage who rated it 4 or 5, showing strong positive sentiment.

For example:

The Boss Fight had the highest top-box score, meaning a lot of players loved it.
The Tutorial Clarity had a lower top-box score but a decent top-2-box, meaning more people thought it was good rather than amazing.

Trends

If you test the same level across different versions:

Track how ratings change
Combine this with notes from qualitative feedback (like interviews or open text)

📉 Example:

Before redesign, Level 2 "Clarity" = 2.1
After tutorial update, Level 2 "Clarity" = 4.2

This suggests the tutorial or level changes significantly improved player understanding. It's a clear example of how tracking metrics across builds helps validate that your design decisions are working.

Trends

Correlations

Scatter plots let you visually explore relationships (correlations) between two variables. When you're dealing with rating scale data—say, players rate both fun and challenge from 1 to 5—you're sitting on potential insight into how these concepts influence each other.

✅ "Do players who rated 'challenge' high also rate 'fun' high?"

This checks for a positive correlation.

If you see a rising trend (dots go up together), it suggests players find more challenge = more fun.
If there's no clear pattern, it may mean challenge and fun are unrelated in your current design.
A downward trend? Uh-oh—more challenge may actually reduce fun for your players.

⚠️ "Are 'frustration' and 'clarity' negatively correlated?"

This checks for a negative correlation.

If you see frustration scores go up when clarity goes down, that suggests confusing parts of the game might be frustrating players.
Fixing clarity issues could directly lower frustration.

Here's an example of two scatter plots:

Fun vs. Challenge: The plot on the left shows how players rated "fun" and "challenge" on a 1–5 scale. This can help you see if players who found the game more challenging also found it more fun (or vice versa).

Clarity vs. Frustration: The plot on the right shows how players rated "clarity" and "frustration." If there's a pattern where lower clarity scores are associated with higher frustration, it might suggest areas in the game that need more explanation or clearer instructions.

These visualizations can help uncover patterns or areas where game elements interact, guiding design decisions.

Correlations

Resources

Albert, Bill, and Tom Tullis. Measuring the User Experience: Collecting, Analyzing, and Presenting UX Metrics. 3rd ed., Morgan Kaufmann, 2022.