Smart speaker


A smart speaker is a type of speaker and voice command device with an integrated virtual assistant that offers interactive actions and hands-free activation with the help of one "hot word". Some smart speakers can also act as a smart device that utilizes Wi-Fi, Bluetooth and other protocol standards to extend usage beyond audio playback, such as to control home automation devices. This can include, but is not limited to, features such as compatibility across a number of services and platforms, peer-to-peer connection through mesh networking, virtual assistants, and others. Each can have its own designated interface and features in-house, usually launched or controlled via application or home automation software. Some smart speakers also include a screen to show the user a visual response.
As of winter 2017, it is estimated by NPR and Edison Research that 39 million Americans own a smart speaker.
A smart speaker with a touchscreen is known as a smart display. It is a smart Bluetooth device that integrates conversational user interface with display screens to augment voice interaction with images and video. They are powered by one of the common voice assistants and offer controls for smart home devices, feature streaming apps and web browsers with touch controls for selecting content. The first smart displays were introduced in 2017 by Amazon.

Accuracy

In March 2020, Proceedings of the National Academy of Sciences of the United States of America released a study conducted by Stanford University that measured potential bias on smart speakers and other speech recognition devices. It found that the six biggest tech development companies, Amazon, Apple, Google, Yandex, IBM and Microsoft, had misidentified more words spoken by "black people" than "white people". The systems tested errors and unreadability, with a 19 and 35 percent discrepancy for the former and 2 and 20 percent discrepancy for the latter.
The North American Chapter of the Association for Computational Linguistics also identified a discrepancy between male and female voices. The research concluded that Google's speech recognition software was 13 percent more accurate for men than women. It performed better than the systems used by Bing, AT&T and IBM.

Privacy concerns

The built-in microphone in smart speakers is continuously listening for "hot words" followed by a command. However, these continuously listening microphones also raise privacy concerns among users. These include what is being recorded, how the data will be used, how it will be protected, and whether it will be used for invasive advertising. Further, an analysis of Amazon Alexa Echo Dots showed that 30–38% of "spurious audio recordings were human conversations", suggesting that these devices capture audio outside of strictly after detection of the hot word.

As a wiretap

There are strong concerns that the ever-listening microphone of smart speakers presents a perfect candidate for wiretapping. In 2017, British security researcher Mark Barnes showed that pre-2017 Echos have exposed pins which allow for a compromised OS to be booted.

Voice assistance vs privacy

While voice assistants provide a valuable service, there can be some hesitation towards using them in various social contexts, such as in public or around other users. However, only more recently have users begun interacting with voice assistants through an interaction with smart speakers rather than an interaction with the phone. On the phone, most voice assistants have the option to be engaged by a physical button rather than solely by hot word-based engagement in a smart speaker. While this distinction increases the privacy by limiting when the microphone is on, users felt that having to press a button first removed the convenience of voice interaction. This trade-off is not unique to voice assistants; as more and more devices come online, there is an increasing trade-off between convenience and privacy.

Factors influencing adoption

While there are many factors influencing smart speaker adoption, specifically with regards to privacy, Lau et. al. define five distinct categories as pros and cons: convenience, identity as an early adopter, contributing factors, perceived lack of utility, privacy, and security concerns.

Security concerns

When configured without authentication, smart speakers can be activated by people other than the intended user or owner. For example, visitors to a home or office, or people in a publicly accessible area outside an open window, partial wall, or security fence, may be able to be heard by a speaker. One team demonstrated the ability to stimulate the microphones of smart speakers and smartphones through a closed window, from another building across the street, using a laser.

Most popular smart speaker devices and platforms

Gallery