A common misconception about usability testing I often hear is: “As a user experience designer, shouldn’t you already know how to make things usable?” While it’s a simple question, it has a complicated answer: partially yes and partially no.
Seasoned experience designers have internalized usability heuristics, standards, and best practices. These are the guiding principles that determine how an interface should function and behave in order to be usable, and they are what inform our everyday work.
But these are just guidelines that work for most people. They don’t tell us how a product’s target audience is actually going to use that product, which is knowledge that helps us make an experience feel straightforward and even intuitive.
What Makes A Product Usable, Anyways?
What constitutes something as “usable” varies greatly depending on a person’s life experiences, technical proficiency, device platform, age, skillset, native language, interests, other products they are familiar with… the list goes on. Additionally, when most people think of something as “usable,” they are often only thinking about a users’ ability to complete an operation — such as completing a purchase online. They are disregarding things such as how autofill makes completing a form more efficient or how micro-interactions make an experience feel more satisfying.
Experience designers measure a product’s usability in six main ways:
Usefulness — Does the product enable people to achieve their goals?
Efficiency — Can those goals be accomplished accurately, completely, and quickly?
Effectiveness — Does the product behave the way people expect it to?
Learnability — Can the product be easily learned within a reasonable amount of time?
Satisfaction — How do people feel about using the product?
Accessibility — Can people with disabilities easily use the product?
How To Make Your Product More Usable
Usability testing is a flexible user research method that can be conducted in a variety of different ways depending upon a product’s maturity within the development cycle and what information you want to learn from users.
There are four types of usability tests:
As the name suggests, exploratory — also known as formative — testing helps product teams quickly delve into what aspects of a design concept are working — or not working — and why. Conducting this type of research early in the development cycle while the product is still being defined and designed allows product teams to better understand their users and provides an opportunity to iterate on design concepts. The goal is to arrive more quickly at the solution that will best meet the needs and preferences of the intended users. In other words, this type of early research prevents product teams from making incorrect assumptions about what users want or need only to be discovered after the product has shipped and the time and cost of development has been incurred.
Keep It Lean:
Quickly iterate by testing multiple concepts of low-fidelity wireframes
Case Study: Messaging App
We conducted an exploratory test during the early phases of a workplace messaging app concept. Our goal was to learn more about the specific user base’s expectations and preferences in regards to peer-to-peer and group messaging. The client’s industry was highly regulated. As such, we knew that common patterns in consumer messaging apps — though familiar to users — might not be appropriate or robust enough for the type of communications required by our client. In addition, we assumed that users’ understanding of how these other products worked may affect how they expected this product to function and behave. How might we provide a familiar messaging experience that also allows users to be effective in their work?
With our initial low-fidelity concepts in hand, we set out to answer a number of research questions, one of which was:
Do users added to a conversation thread have the ability to view the full history of the thread to which they are added?
To answer this question we presented users with two scenarios.
In the first scenario, the participants were provided chat history and asked to imagine it was a conversation they had with a colleague. In this conversation, the colleagues had been discussing a private work-related topic. It then segwayed into lunch plans, at which point, they decided to invite a third person to the conversation. Study participants were then asked to add Jane Doe to the conversation. Once they completed this task, they were asked “Should Jane Doe be able to see the conversation that took place before she was added to the thread?” 88% of participants said no because due to the nature of their work, the private conversation should not be shared.
In the second scenario, participants were once again, provided chat history. In this conversation, the colleagues had been discussing another private work related topic which resulted in the decision to pull in another colleague, John Doe, for a consultation. Study participants were asked to add John Doe to the conversation. Once they completed this task, they were asked the same question. Should John Doe be able to see the conversation that took place before he was added to the thread? This time 88% of participants said yes, because it would be annoying to have to reiterate the entire conversation all over again for John.
It was not a revolutionary finding that context changed the way study participants answered this question. In this case, the important finding for us was under which circumstances do user preferences change and how can we create an experience that takes those nuances into account?
Though our lunch example was not profound to the effectiveness of workplace messaging, the simplistic concept used in our test made it immediately relatable for study participants. They were able to draw a mental line between what they deemed appropriate to share and were even able to think of their own examples of when they may or may not have wanted to share the contents of a thread. Ultimately, the recommendation (screen 2 above) coming out of this evaluation was to allow users to determine on a per message basis if they would like to share the contents of the thread. We also established that the additional development effort to implement this solution would be beneficial for users.
When people think of usability testing, this is the type of test that usually comes to mind. Assessment — also known as a summative — testing is used to evaluate the overall usability of a product. Assessment tests build off of the insights learned in the exploratory testing phase but allow product teams to focus on the more granular aspects of a design and how they impact usability. Typically this type of test is conducted early to midway through the development cycle when users are able to interact with functioning prototypes. Users are asked to perform specific tasks and to think aloud as they complete them, while their interactions and behaviors are observed and their attitudes are measured.
Keep It Lean:
Test different aspects of a design using clickable prototypes before a single line of code is even written.
Case Study: News App
We conducted an assessment test for a news app concept midway through the development life cycle. Our intent was to evaluate the overall effectiveness of the product and to establish baseline user performance and satisfaction levels. Establishing baseline measures at this stage would create a point of comparison for future testing, which would enable us to see if our design iterations were on the right track. In addition, we were particularly interested in developing an understanding of a customization concept which would allow users to curate their news feed according to their own interests. Would users be able to discover this feature on their own and would they be able to successfully use it? Some of our other research questions included:
How quickly and easily are users able to customize their news feed by turning on and off desired topics? How quickly and easily are users able to customize their news feed by arranging topics in a desired order?
To answer these questions, we asked users to perform a series of tasks; then we asked them to evaluate how easy they found them to be. This combination of methodologies provided us with both quantitative data (how many people were able to complete the task) and qualitative data (how easy or enjoyable users found it to be).
In the first task, users were instructed: “update your news feed to include only four of your favorite sections”. We specifically didn’t tell users where to go or what to click on because we wanted to observe how easily they could find the customization screen, then turn on and off desired sections on their own.
100% of participants were able to complete this task and were observed easily turning on and off desired subtopics within each section (screens 1 & 2 above). When asked, “On a scale of 1–5, how easy was it to update your news feed?” 33% of users marked down their score, stating that it felt like too many steps and that it could be easier. Our recommendation (screen 3 above) was to combine the functionality from both screens into one. Using an accordion to hide and show the subtopics within each section allowed users to easily see all the sections at a glance, much like the initial concept, but created efficiency for users because they only had to deal with one screen. We also proposed fixing each section header to the top of the screen, rather than scrolling off the screen, while its subtopics were present in the viewport. This would provide context to users and keep the ability to collapse the section persistent on the screen.
In the second task, we asked users: “Update your news feed so that your favorite sections appear first”. For this task, we expected users to tap on “Sort”, which would hide the arrows and display drag icons (screen 1 above), allowing users to drag and drop items into a priority of their choosing. We were surprised to discover that the majority of users struggled, in varying degrees, with this task. 37.5% of users were even unable to complete it. While observing their attempts we noticed that they were tapping on the entire cell or icon rather than tapping and holding to drag it up or down.
Users generally understood what to do, it was just not happening the way they had expected. When asked, what does the icon mean to you? The same 37.5% who were unable to complete this task were also unable to correctly describe the meaning of the icon. We also observed that some users, after tapping on sort, did not initially notice the icons change. They had paused and were waiting for more feedback. Our recommendation (screen 2 above) was to add arrows to the icon, which might provide more meaning to unfamiliar users. We also recommended adding simple instructional text and hiding the sections that were turned off in the previous screen, making it quicker and easier for users to arrange sections in a desired order.
Validation — also known as verification — testing is used to evaluate how a product compares to predetermined usability standards or benchmarks. This type of evaluation is best served late in the development cycle, following the exploratory and assessment research done earlier in the process. Participants are asked to complete tasks, much like the assessment test. The difference is that users are not asked to “think aloud” and are not questioned mid-task about why they are choosing to interact with something. Validation testing is a way to gather quantitative data, such as how much time it takes users to sign up for a new service (referred to as time on task), or how many users successfully complete a task (referred to as completion rate). This quantitative data can be measured against baseline metrics from previous iterations of the product, a company’s internal standard, and even a competitor’s standard. The objective of the validation test is to make sure a product meets defined criteria before it is released.
Keep It Lean:
Remote unmoderated studies are a quick and cost efficient way to get user feedback.
Case Study: News App
Validating the recommendations for the news app (formulated during the assessment test described above) starts with creating the revised concept and defining the success criteria it will be measured against. The baseline measurement shows where it is a point in time and is able to show progress. The standard is where it needs to get.
The objectives for the news app assessment test were to establish a baseline of user performance measures (how the app was performing at that moment in time) and to provide recommendations for improvement. During the verification test, a revised concept is tested with new users, and the performance measurement is compared to the baseline. In this case, 62.5% was the baseline measure for users attempting to arrange sections in a desired order. This baseline of 62.5% and an established standard, such as 87.5% is measured against actual user performance when new users attempt to complete this task using the revised UI.
The successful completion rate for arranging news sections in a desired order was 62.5%. During the verification test, this baseline of 62.5% and an established standard, such as 87.5% is measured against actual user performance, when new users attempt to complete this task using the revised UI.
Comparison testing can be used for a variety of purposes throughout the development cycle. It is used to compare two or more designs with the objective of establishing which design, or parts of a design, are preferred by users. Early in the development cycle multiple workflows and even visual design concepts can be compared in order to learn what aspects resonate most with target users. Later on, comparison testing can be used to evaluate the effectiveness of specific elements and even to determine which workflows allow users to complete their goal more quickly.
Keep It Lean:
Testing two or more concepts at any stage in the development cycle allows you to get more data out of your usability testing participants.
In today’s market, competition exists for just about every product and users of those products expect a certain ease of use. As such, good user experience is a market separator and provides a significant edge over your competition. To keep that edge sharp, usability testing is key. Obviously, the earlier you can add testing into your design and development process, the better. Regardless of where you are in the software development cycle though, there’s always something to be learned from usability testing.
The four types of tests we’ve outlined above are the main categories but within each, there are plenty of variations. With comparison tests, for example, you could either do A/B testing or parallel testing (testing multiple subcomponents of one application concurrently in order to reduce your test time).
Overall, it all comes down to whether you’re seeking qualitative feedback or quantitative feedback from users. Do you want to learn if your product is conceptually aligning with target users or are you looking for more specific information? Depending on what you’re trying to learn, you can structure your tests and tasks to suit your specific needs and find out exactly what you need to deliver the best user experience possible.