8000 csswg-drafts/css-speech/Overview.html at 4bb5cce0eb8d7f7eabe4843a95851338f51e7815 · w3c/csswg-drafts · GitHub
Skip to content

Latest commit

 

History

History
executable file
·
4050 lines (3064 loc) · 146 KB

File metadata and controls

executable file
·
4050 lines (3064 loc) · 146 KB
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html lang=en>
<head>
<title>CSS Speech Module</title>
<meta content="text/html; charset=utf-8" http-equiv=Content-Type>
<link href="../default.css" rel=stylesheet type="text/css">
<style type="text/css">
p
{
padding-bottom : 1em;
}
p + p
{
text-indent : 0;
}
*:target
{
border : 1px dashed #66CC66;
}</style>
<!--
.prod
{
font-family : inherit;
font-size : inherit
}
pre.prod
{
white-space : pre-wrap;
margin : 1em 0 1em 2em
}
code
{
font-size : inherit;
}
#box-shadow-samples td
{
background : white;
color : black;
}
caption
{
text-align : left;
font-weight : bold
}
.note
{
font-style : italic
}
.issue
{
color : maroon;
font-style : italic
}
div.example pre
{
color : green;
margin-left : 2em
}
dl
{
margin-left : 2em
}
caption dfn
{
font-size : 120%
}
-->
<link href="http://www.w3.org/StyleSheets/TR/W3C-ED" rel=stylesheet
type="text/css">
<body>
<div class=head> <!--begin-logo-->
<p><a href="http://www.w3.org/"><img alt=W3C height=48
src="http://www.w3.org/Icons/w3c_home" width=72></a> <!--end-logo-->
<h1 id=top>CSS Speech Module</h1>
<h2 class="no-num no-toc" id=longstatus-date>Editor's Draft 20 March 2012</h2>
<dl id=versions>
<dt>This version:
<dd>
<!--<a href="http://www.w3.org/TR/2012/WD-css3-speech-20120320/">http://www.w3.org/TR/2012/ED-css3-speech-20120320/</a>-->
<a
href="http://dev.w3.org/csswg/css3-speech">http://dev.w3.org/csswg/css3-speech</a>
<dt>Latest version:
<dd> <a
href="http://www.w3.org/TR/css3-speech/">http://www.w3.org/TR/css3-speech/</a>
<dt>Previous versions:
<dd> <a
href="http://www.w3.org/TR/2011/WD-css3-speech-20110818/">http://www.w3.org/TR/2011/WD-css3-speech-20110818/</a>
<dt>Feedback:
<dd><a href="mailto:www-style@w3.org?subject=%5Bcss-speech%5D%20feedback">www-style@w3.org</a>
with subject line &ldquo;<kbd>[css-speech] <var>&hellip; message topic &hellip;</var></kbd>&rdquo;
(<a rel="discussion" href="http://lists.w3.org/Archives/Public/www-style/">archives</a>)
</dl>
<dl id=editors-list>
<dt>Editor:
<dd><a href="mailto:dweck@daisy.org">Daniel Weck</a> (<a
href="http://www.daisy.org">DAISY Consortium</a>)
<dt>Former editors:
<dd><a href="mailto:dsr@w3.org">Dave Raggett</a> (<a
href="http://www.w3.org/">W3C</a>/<a
href="http://www.canon.com/">Canon</a>)
<dd><a href="mailto:daniel@glazman.org">Daniel Glazman</a> (<a
href="http://www.disruptive-innovations.com/">Disruptive
Innovations</a>)
<dd><a href="mailto:csant@opera.com">Claudio Santambrogio</a> (<a
href="http://www.opera.com/">Opera Software</a>)
</dl>
<!--begin-copyright-->
<p class=copyright><a
href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright"
rel=license>Copyright</a> &copy; 2012 <a href="http://www.w3.org/"><abbr
title="World Wide Web Consortium">W3C</abbr></a><sup>&reg;</sup> (<a
href="http://www.csail.mit.edu/"><abbr
title="Massachusetts Institute of Technology">MIT</abbr></a>, <a
href="http://www.ercim.eu/"><abbr
title="European Research Consortium for Informatics and Mathematics">ERCIM</abbr></a>,
<a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a
href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>,
<a
href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a>
and <a
href="http://www.w3.org/Consortium/Legal/copyright-documents">document
use</a> rules apply.</p>
<!--end-copyright-->
<hr title="Separator for header">
</div>
<h2 class="no-num no-toc" id=abstract>Abstract</h2>
<p>CSS (Cascading Style Sheets) is a language that describes the rendering
of markup documents (e.g. HTML, XML) on various supports, such as screen,
paper, speech, etc. The Speech module defines aural CSS properties that
enable authors to declaratively control the rendering of documents via
speech synthesis, and using optional audio cues. Note that this standard
was developed in cooperation with the <a
href="http://www.w3.org/Voice/">Voice Browser Activity</a>.
<h2 class="no-num no-toc" id=status>Status of this document</h2>
<!--begin-status-->
<p>This is a public copy of the editors' draft. It is provided for
discussion only and may change at any moment. Its publication here does
not imply endorsement of its contents by W3C. Don't cite this document
other than as work in progress.
<p>The (<a
href="http://lists.w3.org/Archives/Public/www-style/">archived</a>) public
mailing list <a
href="mailto:www-style@w3.org?Subject=%5Bcss3-speech%5D%20PUT%20SUBJECT%20HERE">
www-style@w3.org</a> (see <a
href="http://www.w3.org/Mail/Request">instructions</a>) is preferred for
discussion of this specification. When sending e-mail, please put the text
&#8220;css3-speech&#8221; in the subject, preferably like this:
&#8220;[<!---->css3-speech<!---->] <em>&hellip;summary of
comment&hellip;</em>&#8221;
<p>This document was produced by the <a href="/Style/CSS/members">CSS
Working Group</a> (part of the <a href="/Style/">Style Activity</a>).
<p>This document was produced by a group operating under the <a
href="/Consortium/Patent-Policy-20040205/">5 February 2004 W3C Patent
Policy</a>. W3C maintains a <a href="/2004/01/pp-impl/32061/status"
rel=disclosure>public list of any patent disclosures</a> made in
connection with the deliverables of the group; that page also includes
instructions for disclosing a patent. An individual who has actual
knowledge of a patent which the individual believes contains <a
href="/Consortium/Patent-Policy-20040205/#def-essential">Essential
Claim(s)</a> must disclose the information in accordance with <a
href="/Consortium/Patent-Policy-20040205/#sec-Disclosure">section 6 of the
W3C Patent Policy</a>.</p>
<!--end-status-->
<!-- <h3 class="no-num no-toc" id="maturity">Maturity Level</h3> -->
<p> This document is based on the <a
href="http://www.w3.org/TR/2011/WD-css3-speech-20110818/">Last Call
Working Draft (18 August 2011)</a> and includes changes that reflect the
outcome of the <a
href="http://wiki.csswg.org/spec/css3-speech">disposition of comments</a>.
<p>Before the specification can progress to <a
href="http://www.w3.org/2005/10/Process-20051014/tr#cfr">Proposed
Recommendation,</a> the <a href="#exit">CR exit criteria</a> must be met.
The specification will not become Proposed Recommendation before 20
September 2012. A test suite and an implementation report will be made
during the Candidate Recommendation period.
<p id=at-risk>The following features are at-risk and may be dropped at the
end of the Candidate Recommendation period if there has not been enough
interest from implementers: &lsquo;<a href="#voice-balance"><code
class=property>voice-balance</code></a>&rsquo;, &lsquo;<a
href="#voice-duration"><code
class=property>voice-duration</code></a>&rsquo;, &lsquo;<a
href="#voice-pitch"><code class=property>voice-pitch</code></a>&rsquo;,
&lsquo;<a href="#voice-range"><code
class=property>voice-range</code></a>&rsquo;, and &lsquo;<a
href="#voice-stress"><code class=property>voice-stress</code></a>&rsquo;.
<h2 class="no-num no-toc" id=contents>Table of contents</h2>
<!--begin-toc-->
<ul class=toc>
<li><a href="#intro"><span class=secno>1. </span>Introduction, design
goals</a>
<li><a href="#background"><span class=secno>2. </span>Background
information, CSS 2.1</a>
<li><a href="#ssml-rel"><span class=secno>3. </span>Relationship with
SSML</a>
<li><a href="#css-values"><span class=secno>4. </span>CSS values</a>
<li><a href="#example"><span class=secno>5. </span>Example</a>
<li><a href="#aural-model"><span class=secno>6. </span>The aural
formatting model</a>
<li><a href="#mixing-props"><span class=secno>7. </span>Mixing
properties</a>
<ul class=toc>
<li><a href="#mixing-props-voice-volume"><span class=secno>7.1.
</span>The &lsquo;<code class=property>voice-volume</code>&rsquo;
property</a>
<li><a href="#mixing-props-voice-balance"><span class=secno>7.2.
</span>The &lsquo;<code class=property>voice-balance</code>&rsquo;
property</a>
</ul>
<li><a href="#speaking-props"><span class=secno>8. </span>Speaking
properties</a>
<ul class=toc>
<li><a href="#speaking-props-speak"><span class=secno>8.1. </span>The
&lsquo;<code class=property>speak</code>&rsquo; property</a>
<li><a href="#speaking-props-speak-as"><span class=secno>8.2. </span>The
&lsquo;<code class=property>speak-as</code>&rsquo; property</a>
</ul>
<li><a href="#pause-props"><span class=secno>9. </span>Pause properties
</a>
<ul class=toc>
<li><a href="#pause-props-pause-before-after"><span class=secno>9.1.
</span>The &lsquo;<code class=property>pause-before</code>&rsquo; and
&lsquo;<code class=property>pause-after</code>&rsquo; properties</a>
<li><a href="#pause-props-pause"><span class=secno>9.2. </span>The
&lsquo;<code class=property>pause</code>&rsquo; shorthand property</a>
<li><a href="#collapsed-pauses"><span class=secno>9.3. </span>Collapsing
pauses</a>
</ul>
<li><a href="#rest-props"><span class=secno>10. </span>Rest properties</a>
<ul class=toc>
<li><a href="#rest-props-rest-before-after"><span class=secno>10.1.
</span>The &lsquo;<code class=property>rest-before</code>&rsquo; and
&lsquo;<code class=property>rest-after</code>&rsquo; properties</a>
<li><a href="#rest-props-rest"><span class=secno>10.2. </span>The
&lsquo;<code class=property>rest</code>&rsquo; shorthand property</a>
</ul>
<li><a href="#cue-props"><span class=secno>11. </span>Cue properties</a>
<ul class=toc>
<li><a href="#cue-props-cue-before-after"><span class=secno>11.1.
</span>The &lsquo;<code class=property>cue-before</code>&rsquo; and
&lsquo;<code class=property>cue-after</code>&rsquo; properties</a>
<li><a href="#cue-props-volume"><span class=secno>11.2. </span>Relation
between audio cues and speech synthesis volume levels</a>
<li><a href="#cue-props-cue"><span class=secno>11.3. </span>The
&lsquo;<code class=property>cue</code>&rsquo; shorthand property</a>
</ul>
<li><a href="#voice-char-props"><span class=secno>12. </span>Voice
characteristic properties</a>
<ul class=toc>
<li><a href="#voice-props-voice-family"><span class=secno>12.1.
</span>The &lsquo;<code class=property>voice-family</code>&rsquo;
property</a>
<li><a href="#voice-props-voice-rate"><span class=secno>12.2. </span>The
&lsquo;<code class=property>voice-rate</code>&rsquo; property</a>
<li><a href="#voice-props-voice-pitch"><span class=secno>12.3.
</span>The &lsquo;<code class=property>voice-pitch</code>&rsquo;
property</a>
<li><a href="#voice-props-voice-range"><span class=secno>12.4.
</span>The &lsquo;<code class=property>voice-range</code>&rsquo;
property</a>
<li><a href="#voice-props-voice-stress"><span class=secno>12.5.
</span>The &lsquo;<code class=property>voice-stress</code>&rsquo;
property</a>
</ul>
<li><a href="#duration-props"><span class=secno>13. </span>Voice duration
property</a>
<ul class=toc>
<li><a href="#mixing-props-voice-duration"><span class=secno>13.1.
</span>The &lsquo;<code class=property>voice-duration</code>&rsquo;
property</a>
</ul>
<li><a href="#lists"><span class=secno>14. </span>List items and counters
styles</a>
<li><a href="#content"><span class=secno>15. </span>Inserted and replaced
content</a>
<li><a href="#pronunciation"><span class=secno>16. </span> Pronunciation,
phonemes </a>
<li class=no-num><a href="#property-index">Appendix A &mdash; Property
index</a>
<li class=no-num><a href="#index">Appendix B &mdash; Index</a>
<li class=no-num><a href="#definitions">Appendix C &mdash; Definitions</a>
<ul class=toc>
<li class=no-num><a href="#glossary">Glossary</a>
<li class=no-num><a href="#conformance">Conformance</a>
<li class=no-num><a href="#exit">CR exit criteria</a>
</ul>
<li class=no-num><a href="#ack">Appendix D &mdash; Acknowledgements</a>
<li class=no-num><a href="#changes">Appendix E &mdash; Changes from
previous draft</a>
<li class=no-num><a href="#references">Appendix F &mdash; References</a>
<ul class=toc>
<li class=no-num><a href="#normative-references">Normative
references</a>
<li class=no-num><a href="#other-references">Other references</a>
</ul>
</ul>
<!--end-toc-->
<h2 id=intro><span class=secno>1. </span>Introduction, design goals</h2>
<p class=note>Note that this section is informative.
<p>The aural presentation of information is commonly used by people who are
blind, visually-impaired or otherwise print-disabled. For instance,
"screen readers" allow users to interact with visual interfaces that would
otherwise be inaccessible to them. There are also circumstances in which
<em>listening</em> to content (as opposed to <em>reading</em>) is
preferred, or sometimes even required, irrespective of a person's physical
ability to access information. For instance: playing an e-book whilst
driving a vehicle, learning how to manipulate industrial and medical
devices, interacting with home entertainment systems, teaching young
children how to read.
<p> The CSS properties defined in the Speech module enable authors to
declaratively control the presentation of a document in the aural
dimension. The aural rendering of a document combines speech synthesis
(also known as "TTS", the acronym for "Text to Speech") and auditory icons
(which are referred-to as "audio cues" in this specification). The CSS
Speech properties provide the ability to control speech pitch and rate,
sound levels, TTS voices, etc. These stylesheet properties can be used
together with visual properties (mixed media), or as a complete aural
alternative to a visual presentation.
<h2 id=background><span class=secno>2. </span>Background information, CSS
2.1</h2>
<p class=note>Note that this section is informative.
<p> The CSS Speech module is a re-work of the informative CSS2.1 Aural
appendix, within which the "aural" media type was described, but also
deprecated (in favor of the "speech" media type). Although the <a
href="#CSS21" rel=biblioentry>[CSS21]<!--{{!CSS21}}--></a> specification
reserves the "speech" media type, it doesn't actually define the
corresponding properties. The Speech module describes the CSS properties
that apply to the "speech" media type, and defines a new "box" model
specifically for the aural dimension.
<p> Content creators can conditionally include CSS properties dedicated to
user agents with text to speech synthesis capabilities, by specifying the
"speech" media type via the <code>media</code> attribute of the
<code>link</code> element, or with the <code>@media</code> at-rule, or
within an <code>@import</code> statement. When styles are authored within
the scope of such conditional statements, they are ignored by user agents
that do not support the Speech module.
<h2 id=ssml-rel><span class=secno>3. </span>Relationship with SSML</h2>
<p class=note>Note that this section is informative.
<p>Some of the features in this specification are conceptually similar to
functionality described in the Speech Synthesis Markup Language (SSML)
Version 1.1 <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a>.
However, the specificities of the CSS model mean that compatibility with
SSML in terms of syntax and/or semantics is only partially achievable. The
definition of each property in the Speech module includes informative
statements, wherever necessary, to clarify their relationship with similar
functionality from SSML.
<h2 id=css-values><span class=secno>4. </span>CSS values</h2>
<p>This specification follows the <a
href="http://www.w3.org/TR/CSS21/about.html#property-defs">CSS property
definition conventions</a> from <a href="#CSS21"
rel=biblioentry>[CSS21]<!--{{!CSS21}}--></a>. Value types not defined in
this specification are defined in CSS Value and Units Level 3 <a
href="#CSS3VAL" rel=biblioentry>[CSS3VAL]<!--{{!CSS3VAL}}--></a>.
<p>In addition to the property-specific values listed in their definitions,
all properties defined in this specification also accept the <a
href="http://www.w3.org/TR/CSS21/cascade.html#value-def-inherit">inherit</a>
keyword as their property value. For readability it has not been repeated
explicitly.
<h2 id=example><span class=secno>5. </span>Example</h2>
<div class=example>
<p>This example shows how authors can tell the speech synthesizer to speak
HTML headings with a voice called "paul", using "moderate" emphasis
(which is more than normal) and how to insert an audio cue (pre-recorded
audio clip located at the given URL) before the start of TTS rendering
for each heading. In a stereo-capable sound system, paragraphs marked
with the CSS class "heidi" are rendered on the left audio channel (and
with a female voice, etc.), whilst the class "peter" corresponds to the
right channel (and to a male voice, etc.). The volume level of text spans
marked with the class "special" is lower than normal, and a prosodic
boundary is created by introducing a strong pause after it is spoken
(note how the <code>span</code> inherits the voice-family from its parent
paragraph).</p>
<pre>
h1, h2, h3, h4, h5, h6
{
voice-family: paul;
voice-stress: moderate;
cue-before: url(../audio/ping.wav);
voice-volume: medium 6dB;
}
p.heidi
{
voice-family: female;
voice-balance: left;
voice-pitch: high;
voice-volume: -6dB;
}
p.peter
{
voice-family: male;
voice-balance: right;
voice-rate: fast;
}
span.special
{
voice-volume: soft;
pause-after: strong;
}
...
&lt;h1&gt;I am Paul, and I speak headings.&lt;/h1&gt;
&lt;p class="heidi"&gt;Hello, I am Heidi.&lt;/p&gt;
&lt;p class="peter"&gt;
&lt;span class="special"&gt;Can you hear me ?&lt;/span&gt;
I am Peter.
&lt;/p&gt;</pre>
</div>
<h2 id=aural-model><span class=secno>6. </span>The aural formatting model</h2>
<p>The CSS formatting model for aural media is based on a sequence of
sounds and silences that occur within a nested context similar to the <a
href="#box-model-def">visual box model</a>, which we name the <dfn
id=aural-box-model>aural "box" model</dfn>. The aural "canvas" consists of
a two-channel (stereo) space and of a temporal dimension, within which
synthetic speech and audio cues coexist. The selected element is
surrounded by &lsquo;<a href="#rest"><code
class=property>rest</code></a>&rsquo;, &lsquo;<a href="#cue"><code
class=property>cue</code></a>&rsquo; and &lsquo;<a href="#pause"><code
class=property>pause</code></a>&rsquo; properties (from the innermost to
the outermost position). These can be seen as aural equivalents to
&lsquo;<code class=property>padding</code>&rsquo;, &lsquo;<code
class=property>border</code>&rsquo; and &lsquo;<code
class=property>margin</code>&rsquo;, respectively. When used, the
&lsquo;<code class=css>:before</code>&rsquo; and &lsquo;<code
class=css>:after</code>&rsquo; pseudo-elements <a href="#CSS21"
rel=biblioentry>[CSS21]<!--{{!CSS21}}--></a> get inserted between the
element's contents and the &lsquo;<a href="#rest"><code
class=property>rest</code></a>&rsquo;.
<p> The following diagram illustrates the equivalence between properties of
the visual and aural box models, applied to the selected &lt;element&gt;:
<p> <img
alt="The aural 'box' model, illustrated by a diagram: the selected element is positioned in the center, on its left side are (from innermost to outermost) rest-before, cue-before, pause-before, on its right side are (from innermost to outermost) rest-after, cue-after, pause-after, where rest is conceptually similar to padding, cue is similar to border, pause is similar to margin."
id=aural-box src=aural-box.png
title="The aural 'box' model, illustrated by a diagram: the selected element is positioned in the center, on its left side are (from innermost to outermost) rest-before, cue-before, pause-before, on its right side are (from innermost to outermost) rest-after, cue-after, pause-after, where rest is conceptually similar to padding, cue is similar to border, pause is similar to margin.">
<h2 id=mixing-props><span class=secno>7. </span>Mixing properties</h2>
<h3 id=mixing-props-voice-volume><span class=secno>7.1. </span>The
&lsquo;<a href="#voice-volume"><code
class=property>voice-volume</code></a>&rsquo; property</h3>
<table class=propdef summary="name: syntax">
<tbody>
<tr>
<td>Name:
<td> <dfn id=voice-volume>voice-volume</dfn>
<tr>
<td> <em>Value:</em>
<td>silent | [[x-soft | soft | medium | loud | x-loud] ||
&lt;decibel&gt;]
<tr>
<td> <em>Initial:</em>
<td>medium
<tr>
<td> <em>Applies&nbsp;to:</em>
<td>all elements
<tr>
<td> <em>Inherited:</em>
<td>yes
<tr>
<td> <em>Percentages:</em>
<td>N/A
<tr>
<td> <em>Media:</em>
<td>speech
<tr>
<td> <em>Computed value:</em>
<td>&lsquo;<code class=property>silent</code>&rsquo;, or a keyword value
and optionally also a decibel offset (if not zero)
</table>
<p>The &lsquo;<a href="#voice-volume"><code
class=property>voice-volume</code></a>&rsquo; property allows authors to
control the amplitude of the audio waveform generated by the speech
synthesiser, and is also used to adjust the relative volume level of <a
href="#cue-props">audio cues</a> within the <a href="#aural-model">aural
box model</a> of the selected element.
<p class=note> Note that although the functionality provided by this
property is similar to the <a
href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>volume</code>
attribute of the <code>prosody</code> element</a> from the SSML markup
language <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a>, there
are notable discrepancies. For example, CSS Speech volume keywords and
decibels units are not mutually-exclusive, due to how values are inherited
and combined for selected elements.
<dl><!-- dt>
<strong>normal</strong>
</dt>
<dd>
<p> Corresponds to +0.0dB, which means that there is no modification of volume level. This
value overrides the inherited value.</p>
</dd -->
<dt> <strong>silent</strong>
<dd>
<p> Specifies that no sound is generated (the text is read "silently").</p>
<p class=note> Note that this has the same effect as using negative
infinity decibels. Also note that there is a difference between an
element whose &lsquo;<a href="#voice-volume"><code
class=property>voice-volume</code></a>&rsquo; property has a value of
&lsquo;<code class=property>silent</code>&rsquo;, and an element whose
&lsquo;<a href="#speak"><code class=property>speak</code></a>&rsquo;
property has the value &lsquo;<code class=property>none</code>&rsquo;.
With the former, the selected element takes up the same time as if it
was spoken, including any pause before and after the element, but no
sound is generated (descendants within the <a href="#aural-model">aural
box model</a> of the selected element can override the &lsquo;<a
href="#voice-volume"><code class=property>voice-volume</code></a>&rsquo;
value, and may therefore generate audio output). With the latter, the
selected element is not rendered in the aural dimension and no time is
allocated for playback (descendants within the <a
href="#aural-model">aural box model</a> of the selected element can
override the &lsquo;<a href="#speak"><code
class=property>speak</code></a>&rsquo; value, and may therefore generate
audio output).</p>
<dt><strong>x-soft</strong>, <strong>soft</strong>,
<strong>medium</strong>, <strong>loud</strong>, <strong>x-loud</strong>
<dd>
<p>This sequence of keywords corresponds to monotonically non-decreasing
volume levels, mapped to implementation-dependent values that meet the
listener's requirements with regards to perceived loudness. These audio
levels are typically provided via a preference mechanism that allow
users to calibrate sound options according to their auditory
environment. The keyword &lsquo;<code
class=property>x-soft</code>&rsquo; maps to the user's <em>minimum
audible</em> volume level, &lsquo;<code
class=property>x-loud</code>&rsquo; maps to the user's <em>maximum
tolerable</em> volume level, &lsquo;<code
class=property>medium</code>&rsquo; maps to the user's
<em>preferred</em> volume level, &lsquo;<code
class=property>soft</code>&rsquo; and &lsquo;<code
class=property>loud</code>&rsquo; map to intermediary values.</p>
<dt> <strong>&lt;decibel&gt;</strong>
<dd>
<p>A <a href="#number-def">number</a> immediately followed by "dB"
(decibel unit). This represents a change (positive or negative) relative
to the given keyword value (see enumeration above), or to the default
value for the root element, or otherwise to the inherited volume level
(which may itself be a combination of a keyword value and of a decibel
offset, in which case the decibel values are combined additively). When
the inherited volume level is &lsquo;<code
class=property>silent</code>&rsquo;, this &lsquo;<a
href="#voice-volume"><code class=property>voice-volume</code></a>&rsquo;
resolves to &lsquo;<code class=property>silent</code>&rsquo; too,
regardless of the specified &lt;decibel&gt; value. Decibels represent
the ratio of the squares of the new signal amplitude (a1) and the
current amplitude (a0), as per the following logarithmic equation:
volume(dB) = 20 log10 (a1 / a0)</p>
<p class=note> Note that -6.0dB is approximately half the amplitude of
the audio signal, and +6.0dB is approximately twice the amplitude.</p>
</dl>
<p class=note>Note that perceived loudness depends on various factors, such
as the listening environment, user preferences or physical abilities. The
effective volume variation between &lsquo;<code
class=property>x-soft</code>&rsquo; and &lsquo;<code
class=property>x-loud</code>&rsquo; represents the dynamic range (in terms
of loudness) of the audio output. Typically, this range would be
compressed in a noisy context, i.e. the perceived loudness corresponding
to &lsquo;<code class=property>x-soft</code>&rsquo; would effectively be
closer to &lsquo;<code class=property>x-loud</code>&rsquo; than it would
be in a quiet environment. There may also be situations where both
&lsquo;<code class=property>x-soft</code>&rsquo; and &lsquo;<code
class=property>x-loud</code>&rsquo; would map to low volume levels, such
as in listening environments requiring discretion (e.g. library,
night-reading).
<h3 id=mixing-props-voice-balance><span class=secno>7.2. </span>The
&lsquo;<a href="#voice-balance"><code
class=property>voice-balance</code></a>&rsquo; property</h3>
<table class=propdef summary="name: syntax">
<tbody>
<tr>
<td>Name:
<td> <dfn id=voice-balance>voice-balance</dfn>
<tr>
<td> <em>Value:</em>
<td>&lt;number&gt; | left | center | right | leftwards | rightwards
<tr>
<td> <em>Initial:</em>
<td>center
<tr>
<td> <em>Applies&nbsp;to:</em>
<td>all elements
<tr>
<td> <em>Inherited:</em>
<td>yes
<tr>
& 37C6 lt;td> <em>Percentages:</em>
<td>N/A
<tr>
<td> <em>Media:</em>
<td>speech
<tr>
<td> <em>Computed value:</em>
<td>the specified value resolved to a &lt;number&gt; between
&lsquo;<code class=css>-100</code>&rsquo; and &lsquo;<code
class=css>100</code>&rsquo; (inclusive)
</table>
<p> The &lsquo;<a href="#voice-balance"><code
class=property>voice-balance</code></a>&rsquo; property controls the
spatial distribution of audio output across a lateral sound stage: one
extremity is on the left, the other extremity is on the right hand side,
relative to the listener's position. Authors can specify intermediary
steps between left and right extremities, to represent the audio
separation along the resulting left-right axis.
<p class=note> Note that the functionality provided by this property has no
match in the SSML markup language <a href="#SSML"
rel=biblioentry>[SSML]<!--{{!SSML}}--></a>.
<dl>
<dt> <strong>&lt;number&gt;</strong>
<dd>
<p>A <a href="#number-def">number</a> between &lsquo;<code
class=css>-100</code>&rsquo; and &lsquo;<code
class=css>100</code>&rsquo; (inclusive). Values smaller than
&lsquo;<code class=css>-100</code>&rsquo; are clamped to &lsquo;<code
class=css>-100</code>&rsquo;. Values greater than &lsquo;<code
class=css>100</code>&rsquo; are clamped to &lsquo;<code
class=css>100</code>&rsquo;. The value &lsquo;<code
class=css>-100</code>&rsquo; represents the left side, and the value
&lsquo;<code class=css>100</code>&rsquo; represents the right side. The
value &lsquo;<code class=css>0</code>&rsquo; represents the center point
whereby there is no discernible audio separation between left and right
sides (in a stereo sound system, this corresponds to equal distribution
of audio signals between left and right speakers).</p>
<dt> <strong>left</strong>
<dd>
<p>Same as &lsquo;<code class=css>-100</code>&rsquo;.</p>
<dt> <strong>center</strong>
<dd>
<p>Same as &lsquo;<code class=css>0</code>&rsquo;.</p>
<dt> <strong>right</strong>
<dd>
<p>Same as &lsquo;<code class=css>100</code>&rsquo;.</p>
<dt> <strong>leftwards</strong>
<dd>
<p>Moves the sound to the left, by subtracting 20 from the inherited
&lsquo;<a href="#voice-balance"><code
class=property>voice-balance</code></a>&rsquo; value, and by clamping
the resulting number to &lsquo;<code class=css>-100</code>&rsquo;.</p>
<dt> <strong>rightwards</strong>
<dd>
<p>Moves the sound to the right, by adding 20 to the inherited &lsquo;<a
href="#voice-balance"><code
class=property>voice-balance</code></a>&rsquo; value, and by clamping
the resulting number to &lsquo;<code class=css>100</code>&rsquo;.</p>
</dl>
<p> user agents may be connected to different kinds of sound systems,
featuring varying audio mixing capabilities. The expected behavior for
mono, stereo, and surround sound systems is defined as follows:
<ul>
<li> When user agents produce audio via a mono-aural sound system (i.e.
single-speaker setup), the &lsquo;<a href="#voice-balance"><code
class=property>voice-balance</code></a>&rsquo; property has no effect.
<li> When user agents produce audio through a stereo sound system (e.g.
two speakers, a pair of headphones), the left-right distribution of audio
signals can precisely match the authored values for the &lsquo;<a
href="#voice-balance"><code
class=property>voice-balance</code></a>&rsquo; property.
<li> When user agents are capable of mixing audio signals through more
than 2 channels (e.g. 5-speakers surround sound system, including a
dedicated center channel), the physical distribution of audio signals
resulting from the application of the &lsquo;<a
href="#voice-balance"><code
class=property>voice-balance</code></a>&rsquo; property should be
performed so that the listener perceives sound as if it was coming from a
basic stereo layout. For example, the center channel as well as the
left/right speakers may be used altogether in order to emulate the
behavior of the &lsquo;<code class=property>center</code>&rsquo; value.
</ul>
<p> Future revisions of the CSS Speech module may include support for
three-dimensional audio, which would effectively enable authors to specify
"azimuth" and "elevation" values. In the future, content authored using
the current specification may therefore be consumed by user agents which
are compliant with the version of CSS Speech that supports
three-dimensional audio. In order to prepare for this possibility, the
values enabled by the current &lsquo;<a href="#voice-balance"><code
class=property>voice-balance</code></a>&rsquo; property are designed to
remain compatible with "azimuth" angles. More precisely, the mapping
between the current left-right audio axis (lateral sound stage) and the
envisioned 360 degrees plane around the listener's position is defined as
follows:
<ul>
<li>The value &lsquo;<code class=css>0</code>&rsquo; maps to zero degrees
(&lsquo;<code class=property>center</code>&rsquo;). This is in "front" of
the listener, not from "behind".
<li>The value &lsquo;<code class=css>-100</code>&rsquo; maps to -40
degrees (&lsquo;<code class=property>left</code>&rsquo;). Negative angles
are in the counter-clockwise direction (the audio stage is seen from the
top).
<li>The value &lsquo;<code class=css>100</code>&rsquo; maps to 40 degrees
(&lsquo;<code class=property>right</code>&rsquo;). Positive angles are in
the clockwise direction (the audio stage is seen from the top).
<li>Intermediary values on the scale from &lsquo;<code
class=css>-100</code>&rsquo; to &lsquo;<code class=css>100</code>&rsquo;
map to the angles between -40 and 40 degrees in a numerically
linearly-proportional manner. For example, &lsquo;<code
class=css>-50</code>&rsquo; maps to -20 degrees.
</ul>
<p class=note> Note that sound systems may be configured by users in such a
way that it would interfere with the left-right audio distribution
specified by document authors. Typically, the various "surround" modes
available in modern sound systems (including systems based on basic stereo
speakers) tend to greatly alter the perceived spatial arrangement of audio
signals. The illusion of a three-dimensional sound stage is often achieved
using a combination of phase shifting, digital delay, volume control
(channel mixing), and other techniques. Some users may even configure
their system to "downgrade" any rendered sound to a single mono channel,
in which case the effect of the &lsquo;<a href="#voice-balance"><code
class=property>voice-balance</code></a>&rsquo; property would obviously
not be perceivable at all. The rendering fidelity of authored content is
therefore dependent on such user customizations, and the &lsquo;<a
href="#voice-balance"><code class=property>voice-balance</code></a>&rsquo;
property merely specifies the desired end-result.
<p class=note> Note that many speech synthesizers only generate mono sound,
and therefore do not intrinsically support the &lsquo;<a
href="#voice-balance"><code class=property>voice-balance</code></a>&rsquo;
property. The sound distribution along the left-right axis consequently
occurs at post-synthesis stage (when the speech-enabled user agent mixes
the various audio sources authored within the document)
<h2 id=speaking-props><span class=secno>8. </span>Speaking properties</h2>
<h3 id=speaking-props-speak><span class=secno>8.1. </span>The &lsquo;<a
href="#speak"><code class=property>speak</code></a>&rsquo; property</h3>
<table class=propdef summary="name: syntax">
<tbody>
<tr>
<td>Name:
<td> <dfn id=speak>speak</dfn>
<tr>
<td> <em>Value:</em>
<td>auto | none | normal
<tr>
<td> <em>Initial:</em>
<td>auto
<tr>
<td> <em>Applies&nbsp;to:</em>
<td>all elements
<tr>
<td> <em>Inherited:</em>
<td>yes
<tr>
<td> <em>Percentages:</em>
<td>N/A
<tr>
<td> <em>Media:</em>
<td>speech
<tr>
<td> <em>Computed value:</em>
<td>specified value
</table>
<p>The &lsquo;<a href="#speak"><code class=property>speak</code></a>&rsquo;
property determines whether or not to render text aurally.
<p class=note> Note that the functionality provided by this property has no
match in the SSML markup language <a href="#SSML"
rel=biblioentry>[SSML]<!--{{!SSML}}--></a>.
<dl>
<dt> <strong>auto</strong>
<dd>
<p>Resolves to a computed value of &lsquo;<code
class=property>none</code>&rsquo; when <a
href="#display-def">&lsquo;<code
class=property>display</code>&rsquo;</a> is &lsquo;<code
class=property>none</code>&rsquo;, otherwise resolves to a computed
value of &lsquo;<code class=property>auto</code>&rsquo; which yields a
used value of &lsquo;<code class=property>normal</code>&rsquo;.</p>
<p class=note> Note that the &lsquo;<code
class=property>none</code>&rsquo; value of the <a
href="#display-def">&lsquo;<code
class=property>display</code>&rsquo;</a> property cannot be overridden
by descendants of the selected element, but the &lsquo;<code
class=property>auto</code>&rsquo; value of &lsquo;<a href="#speak"><code
class=property>speak</code></a>&rsquo; can however be overridden using
either of &lsquo;<code class=property>none</code>&rsquo; or &lsquo;<code
class=property>normal</code>&rsquo;.</p>
<dt> <strong>none</strong>
<dd>
<p> This value causes an element (including pauses, cues, rests and
actual content) to not be rendered (i.e., the element has no effect in
the aural dimension).</p>
<p class=note> Note that any of the descendants of the affected element
are allowed to override this value, so descendants can actually take
part in the aural rendering despite using &lsquo;<code
class=property>none</code>&rsquo; at this level. However, the pauses,
cues, and rests of the ancestor element remain "deactivated" in the
aural dimension, and therefore do not contribute to the <a
href="#collapsed-pauses">collapsing of pauses</a> or additive behavior
of adjoining rests.</p>
<dt> <strong>normal</strong>
<dd>
<p> The element is rendered aurally (regardless of its <a
href="#display-def">&lsquo;<code
class=property>display</code>&rsquo;</a> value and the <a
href="#display-def">&lsquo;<code
class=property>display</code>&rsquo;</a> and &lsquo;<a
href="#speak"><code class=property>speak</code></a>&rsquo; values of its
ancestors).</p>
<p class=note> Note that using this value can result in the element being
rendered in the aural dimension even though it would not be rendered on
the visual canvas.</p>
</dl>
<h3 id=speaking-props-speak-as><span class=secno>8.2. </span>The &lsquo;<a