HTML5 audio


HTML5 Audio is a subject of the HTML5 specification, incorporating audio input, playback, and synthesis, as well as speech to text, in the browser.

<audio> element

The <audio> element represents a sound, or an audio stream. It is commonly used to play back a single audio file within a web page, showing a GUI widget with play/pause/volume controls.
The <audio> element has these attributes:
Instructs the User-Agent to automatically begin playback of the audio stream as soon as it can do so without stopping.
Represents a hint to the User-Agent about whether optimistic downloading of the audio stream itself or its metadata is considered worthwhile.
Instructs the User-Agent to expose a user interface for controlling playback of the audio stream.
Instructs the User-Agent to seek back to the start of the audio stream upon reaching the end.
Instructs the User-Agent to link multiple videos and/or audio streams together.
Represents the default state of the audio stream, potentially overriding user preferences.
The URL for the audio stream.
Example:


Supporting browsers

On PC:
On mobile devices:
The adoption of HTML5 audio, as with HTML5 video, has become polarized between proponents of free and patent-encumbered formats. In 2007, the recommendation to use Vorbis was retracted from the specification by the W3C together with that to use Ogg Theora, citing the lack of a format accepted by all the major browser vendors.
Apple and Microsoft support the ISO/IEC-defined formats AAC and the older MP3. Mozilla and Opera support the free and open, royalty-free Vorbis format in Ogg and WebM containers, and criticize the patent-encumbered nature of MP3 and AAC, which are guaranteed to be “non-free”. Google has so far provided support for all common formats.
Most AAC files with finite length are wrapped in an MPEG-4 container, which is supported natively in Internet Explorer, Safari, and Chrome, and supported by the OS in Firefox and Opera. Most AAC live streams with infinite length are wrapped in an Audio Data Transport Stream container, which is supported by Chrome, Safari, Firefox and Edge.
Many browsers also support uncompressed PCM audio in a WAVE container.
In 2012, the free and open royalty-free Opus format was released and standardized by IETF. It is supported by Mozilla, Google, Opera and Edge.
This table documents the current support for audio coding formats by the <audio> element.
FormatContainerMIME typeChromeInternet ExplorerEdgeFirefoxOperaSafari
PCMWAVaudio/wav
MP3MP3audio/mpeg
AACMP4audio/mp4
AACADTSaudio/aac
audio/aacp
VorbisOggaudio/ogg
VorbisWebMaudio/webm
OpusOggaudio/ogg
OpusWebMaudio/webm
FLACFLACaudio/flac
FLACOggaudio/ogg

Web Audio API and MediaStream Processing API

The Web Audio API specification developed by W3C describes a high-level JavaScript API for processing and synthesizing audio in web applications. The primary paradigm is of an audio routing graph, where a number of AudioNode objects are connected together to define the overall audio rendering. The actual processing will primarily take place in the underlying implementation, but direct JavaScript processing and synthesis is also supported.
Mozilla's Firefox browser implements a similar Audio Data API extension since version 4, implemented in 2010 and released in 2011, but Mozilla warns it is non-standard and deprecated, and recommends the Web Audio API instead.
Some JavaScript audio processing and synthesis libraries such as support both APIs.
The is also considering the MediaStream Processing API specification developed by Mozilla.
In addition to audio mixing and processing, it covers more general media streaming, including synchronization with HTML elements, capture of audio and video streams, and peer-to-peer routing of such media streams.

Supporting browsers

On PC:
The Web Speech API aims to provide an alternative input method for web applications. With this API, developers can give web apps the ability to transcribe voice to text, from the computer's microphone. The recorded audio is sent to speech servers for transcription, after which the text is typed out for the user. The API itself is agnostic of the underlying speech recognition implementation and can support both server based as well as embedded recognizers.
The HTML Speech Incubator group has proposed the implementation of audio-speech technology in browsers in the form of uniform, cross-platform APIs. The API contains both:
Google integrated this feature into Google Chrome in March 2011. Letting its users search the web with their voice with code like:





Supporting browsers