Why couldn't shoppers tell six Echo Shows apart?

A usability study of Amazon's mobile shopping experience for the Echo Show line and why Alexa+, the most-promoted feature of 2026, was invisible to the people meant to buy it.

CLIENT

Amazon Consumer Devices

Deliverable

Final usability report to Amazon Consumer Devices, Marketing & UXR partners

timeline

12 weeks

Jan - Mar 2026

Location

Seattle, WA

MY ROLE

· UX Researcher

· Moderator

· Notetaker

· Documentation Lead

team

4 Graduate Researchers

(equal contribution)

METHODS

· Moderated remote usability testing

· Think-aloud protocol

· Semantic differential survey

· Task analysis

Design Concepts

Generated

participants

8

(recruited via UserTesting)

Platforms

(Kiosk + App)

Context

In late 2025, Amazon launched Alexa+ — a generative-AI assistant bundled at no cost with Prime, integrated into the Echo Show line of smart displays. It is a meaningful product upgrade, and a commercially important one: every Echo Show sold with Alexa+ is a household onboarded into Amazon's next assistant generation.

The Echo Show line itself is six devices — 5, 5 Kids, 8, 11, 15, and 21 — varying in screen size and form factor. Customers shopping for one almost always start on the Amazon mobile app, where the Search Results Page and Product Detail Pages do the work of explaining what is what.

Amazon's Consumer Devices team, working with Marketing and internal UX Research, asked us a question that mattered to both teams at once:

When a first-time shopper opens the Amazon app to browse Echo Shows, do they understand what they're looking at — and do they notice Alexa+?

We had four weeks to find out.

What we set out to learn

We translated the brief into two research questions that could be answered with task-based testing:

RQ1. How accurately do users differentiate between Echo Show models from the information available on the Search Results Page and initial Product Detail Page exposure?

RQ2. How effectively do users identify and understand Alexa+ as a built-in feature when browsing the Amazon mobile app?

Two questions, each tied to a different commercial risk:
RQ1 is a purchase-confidence problem: if shoppers can't tell models apart, they bounce.
RQ2 is a feature-adoption problem: if shoppers don't see Alexa+, Amazon's most strategic launch of the year goes silent at the point of sale.

Method

We ran a moderated remote usability study — 8 participants, 60-minute sessions, conducted on UserTesting between February 19 and 23, 2026. Each session was facilitated by one moderator and observed by three notetakers, with roles rotating across the team so every researcher moderated, observed, and synthesized.

I led moderation on a portion of the sessions, took notes on the others, and owned the documentation track — cleaning raw data, building the shared synthesis sheet, and writing the final report alongside the team.

We made three deliberate methodological choices worth naming:

Choice 1 — A/B isolation tasks.

We created two prototype variants: one with all product images blurred, one with all titles blurred. Half the participants started with one, half with the other. This let us isolate whether shoppers were leaning on imagery or text to differentiate models — a cleaner read than asking people to introspect on their own scan strategy.

Choice 2 — Pre-tap recall, not just observation.

For Alexa+, we deliberately did not ask participants whether they noticed it during browsing. Instead, after they had spent fifteen minutes browsing, we asked — without showing the screen — whether they remembered seeing anything called "Alexa+." This separated discoverability (did they see it) from comprehension (did they understand it). The two failures need different fixes.

Choice 3 — Triangulated data.

Quantitative metrics (task success, time-on-task, swipe count, recall rate, perceived model count) ran alongside qualitative think-aloud and a post-task semantic-differential survey. Findings only made it into the final report if at least two of the three pillars supported them.

Participants.

Eight regular online shoppers, ages 26–50, mixed gender, screened to exclude current Echo Show owners, electronics retail specialists, and anyone working in tech, UX, or at Amazon. Tech familiarity ranged from "somewhat familiar" to "extremely familiar," stratified intentionally.

What we asked participants to do

Five tasks, ordered from passive to active:

1.

Image Isolation

Looking only at images (titles blurred), how many distinct device types are on the page? Find the largest, wall-mountable model.

2.

Text Isolation

Looking only at titles (images blurred), how many distinct device types?

3.

Grouping

Sort the six models into three categories, first from the Search Results Page only, then again after exploring Product Detail Pages.

4.

Pre-tap Recall

Without looking back at the screen, did you notice anything called Alexa+? What is it?

5.

Targeted Search

Find a smart display for the kitchen with Fire TV built-in, without using the Filter button.

Severity Matrix (7 findings)
Low
Critical

What this study taught me

Three things, in order of how much they surprised me.

Severity is a design conversation, not a research output. Picking "1 — Critical" vs. "2 — Major" looks like a researcher's call. It isn't. It's a forecast about which fix Amazon's PMs and engineers will actually prioritize when our report lands on their desk. Calibrating severity is a political exercise as much as an analytical one — and I left this study with much more respect for the senior researchers who do it well.

Recall is the cleanest measure of a design's voice. The fifteen-minute, no-screen recall question for Alexa+ told us more about the SRP than any direct observation did. If a shopper can browse for fifteen minutes and not remember the most-promoted feature in your roadmap, the design is not failing to teach — it's failing to speak. That distinction will shape how I run the next study.

Equal contribution is a research method. Every member of our team moderated, observed, synthesized, and wrote. No one was the "lead" and no one was a junior. The findings in this report are sharper because four researchers stress-tested every claim before it left the room. I want to design that into every team I work with going forward.

Limitations & Next Steps

Three honest critiques of the study itself:

Remote testing limited what we could see. UserTesting is fast and geographically flexible, but the friction of sharing screen between the prototype and Zoom occasionally interrupted natural browsing. For a study that hinges on what shoppers notice, in-person sessions with eye-tracking would surface attention patterns we had to infer.

Our sample skewed familiar. All eight participants had at least somewhat familiar tech literacy. The discoverability problem we found is likely worse for first-time smart-home shoppers — exactly the population Amazon is most trying to reach with Alexa+. A future round should include lower-familiarity participants.

Mobile-only is half the picture. Echo Show shoppers cross devices — they may scan the SRP on mobile and read PDPs on desktop. Replicating the study on desktop would test whether the title and Alexa+ findings hold or attenuate on a larger viewport.

Team

Researched and written with Alefiya Haveliwala, Hajra Lat, and Lori Li, under the guidance of Dr. Katya Cherukumilli at the University of Washington's Human-Centered Design & Engineering program.

Conducted in partnership with the Amazon Consumer Devices team, with thanks to the Marketing and UX Research partners who scoped the brief and reviewed the findings.

Create a free website with Framer, the website builder loved by startups, designers and agencies.