Spotify Agent

Case-study for the re-design of core moments of the Spotify user experience in the scenario of a proprietary voice user interface.

// Spring_2018 Information Architecture course project

Time length
6 weeks

Input delivered
Interaction design, information architecture, voice user interface design, user testing, mock-ups.

Prompt
Choose an interactive system that consists of a series of
micro-interactions that holds potential for redesign. And, create such redesign.

Solution
A context-aware Voice Agent that provides a personalized and curated music experience to its users. Providing a greater control and immersion in the music listening experience which sets itself apart from the other offerings in the market.

The Agent

Persona
The Spotify Agent must represent the brand in its presentation to the users, due to this I designed his persona and attitude to be:

Assertive
Charismatic
Competent
Intuitive

Think of him as the fun young uncle that bridges the age divide in family gatherings, he is as wise as an adult but young enough to incentivize exploration and spontaneous behaviors.

His Archetype is best represented by the Magician Arcana (I) found in the Tarot. The magician is a figure that can provide the users with all of the resources and tools to realize their goals. He bridges the gap between both realms, that of the user and the service platform.

“I can play for you, the right music, at the right time.”

Invocation
To call the Agent to action I determined it wouldn’t matter how one called it forth as long as the name was included in the call. So offering both the ability to use the “abbreviated” form of Spot that is provided in this case study as well as Spotify.

From competitor research, I evaluated that the top solutions in the market usually offer their Assistant’s call in a three syllable format. Hence providing the following possible callings:

“Hey Spot”
“Okay Spot”
“Spotify”

*Depending on the user’s pronunciation of “Ok” and “Google” it may or may not exceed three syllables.

“[…] I don’t care! As long as you call me!” -RuPaul Charles

Voice
For the voice of the Agent, there are three ways that can be approached effectively: Synthesized and Recorded

Spotify is often associated to a male voice in its late 20’s which can be synthesized or recorded.

But there is another opportunity that would set the Spotify Agent apart from the competition and impact enjoyment of use and immersion greatly. That if legal rights allow it, Spotify could generate voice bank’s from the artist’s own songs in the platform and utilize those banks to voice the Agent, providing a more personal experience to each user. The technology would be akin to offerings such as the VOCALOID software by Yamaha.

Tone
The Spotify Agent’s tone is:

Confident
Lighthearted

This was deliberately decided to embody the brand’s public image even in the speech, the Agent will serve as a constant ambassador for Spotify to the users.

Intent
The agent’s possible commands by the user, apart from those found in the competitor’s solutions, provides more in-depth navigation and features in its voice user interface to provide a memorable and captivating listening experience.

Talk-Over
Since the Spotify Agent is context-aware to the listening experience and catered to users that take great dedication to their music listening experience, It’s crucial to design how the Agent would interact in the middle of a session. So I introduced a Talk-Over feature.

This feature assigns a level of priority to task and queries.
Based on the level of priority the Agent decides whether to talk-over the song or pause it before speaking.
High priority level tasks are paused.
Low priority level tasks are talked over.
The assigned levels will change in accordance to the user’s taste music and listening behaviors recorded.

Data Points
To grant awareness in the Agent, these are the main data points that must be provided.

Music Library
Taste*
Date & Time
Location**
Schedule**
[Other]’s Music Library
Voice recognition***

Pages+from+Final_SpotifyAgent+%281%29.pdf_Page_4.jpg

Agent Flow
Below is the projected flow of interacting with the Agent.

Speaker Integration

Requirements
To take advantage of the Spotify Agent alongside Internet of Things connected devices, the most preferable would be a proprietary smart speaker. This speaker would need the following features to deliver the minimum lovable product when packaged with the Agent:

Multi-microphone array
OLED Display
Highest fidelity feasible

The microphone array enables the Agent to discern the music playing to the user and their voice. While, the OLED Display would feature the album art of the media currently playing as well as visual feedback, something Spotify as a brand gives importance to. And, since product would be affiliated to a music streaming service, it is of utmost importance for it to deliver the best sound quality possible.

Smart speaker by Spotify that was leaked during the making of this project

Core Moments

Redesigns
The ten core moments that I chose to focus for this case study were:

On-boarding
Discover
Sharing
Playlists
Search
Queue
Song Information
Events
Planning Sessions
Continuing Sessions

Scenarios

Scenario no. 1

Agent is aware of A’s location
Based on A’s routine and tasks involved, the Agent has some grasp on the usual mood when A arrives home due to previous logged occurrences.
When A arrives home, the Agent automatically plays music it understands will cater to A at that moment in time.
A asks which song is playing.
After a response from the Agent, A tells to save the current song to his library and show lyrics.
The Agent does so, and then asks for any follow-up commands.
B returns home for a while and commands the Agent to play music. Interrupting the session.
Agent has access to user A and B’s profile.
Due to being a communal speaker, Agent plays music that caters to both users.
Later on, based on schedules, he knows that B is supposed to be outside of the house; it prioritizes A’s tastes.
B returns and does not want to hear music in the house anymore unlike A.
A tells the Agent to continue the session in their mobile device.
The switch occurs.

Scenario no. 2

B prompts the Agent to surprise them.
Then asks the Agent what song is playing and is given the song attributes when answered.
Afterwards B requests the Agent to play more music with a similar attribute.
The Agent adds similar songs to B’s queue.
B tells the Agent to share the currently playing song to a friend.
When B leaves for an event, the Agent curates a playlist based on the event.
The Agent plays the curated playlist to B when they come back.
B schedules a session to play after every one of those events using the playlist.

Prototyping

The Wizard of Oz
I decided to use this methodology to get a better assessment of my proposed design solution. By making the users tested believe that the product is real and they can interact with it, it opens the possibility of them suggesting other interactions and features that they not only find important, but also expect from a Voice Agent.

To achieve this illusion I had a vast range of sample phrases and responses saved, and would play them back to the user based on their interactions, as well as the Spotify player open for any music playback.

User Testing

Results
The prototype was tested on multiple volunteers. From the sessions I was able to gather the following feedback, which was applied into the final solution that is presented in this case-study:

Define whether Agent should talk over or pause media
Make special actions more explicit
Use less verbiage
Exploit more non-invasive points of Data collection.
Create an on-boarding process for new users.
Apply a more casual wording.
Add key nuances that express the personality of the Agent.
Reinvent casual intents.

VUI

Core Interface
The final Voice User Interface for the Spotify Agent, showcasing the 10 redesigned core moments.

Thank you.