8000 csswg-drafts/css2/aural.src at 0cf85a28f88caa25b9920a1da943cb7c5834a3aa · w3c/csswg-drafts · GitHub
Skip to content

Latest commit

 

History

History
957 lines (801 loc) · 34.4 KB

File metadata and controls

957 lines (801 loc) · 34.4 KB
'mix' is not specified, the element's background sound replaces
the parent's.
<dt><strong>repeat</strong>
<dd>When present, this keyword means that the sound will repeat if it
is too short to fill the entire duration of the element. Otherwise,
the sound plays once and then stops. This is similar to the <span
class="propinst-background-repeat">'background-repeat'</span>
property. If the sound is too long for the element, it is clipped once
the element has been spoken.
<dt><strong>auto</strong>
<dd>The sound of the parent element continues to play
(it is not restarted, which would have been the case if this property
had been inherited).
<dt><strong>none</strong>
<dd>This keyword means that there is silence. The sound of the
parent element (if any) is silent during the current element and
continues after the current element.
</dl>
<div class="example"><P>
<PRE>
blockquote.sad { play-during: url("violins.aiff") }
blockquote Q { play-during: url("harp.wav") mix }
span.quiet { play-during: none }
</pre>
</div>
<H2><a name="spatial-props">Spatial properties</a>: <span
class="propinst-azimuth">'azimuth'</span> and
<span class="propinst-elevation">'elevation'</span>
</H2>
<p>Spatial audio is an important stylistic property for aural
presentation. It provides a natural way to tell several voices apart,
as in real life (people rarely all stand in the same spot in a
room). Stereo speakers produce a lateral sound stage. Binaural
headphones or the increasingly popular 5-speaker home theater setups
can generate full surround sound, and multi-speaker setups can create
a true three-dimensional sound stage. VRML 2.0 also includes spatial
audio, which implies that in time consumer-priced spatial audio
hardware will become more widely available.</p>
<!-- #include src=properties/azimuth.srb -->
<P>Values have the following meanings:</p>
<dl>
<dt><span class="index-inst" title="&lt;angle&gt;"><span class="value-inst-angle"><strong>&lt;angle&gt;</strong></span></span>
<dd>Position is described in terms of an angle
within the range '-360deg' to '360deg'.
The value '0deg' means directly ahead in the center of the sound
stage. '90deg' is to the right, '180deg' behind, and '270deg' (or,
equivalently and more conveniently, '-90deg') to the left.
<dt><strong>left-side</strong>
<dd>Same as '270deg'. With 'behind', '270deg'.
<dt><strong>far-left</strong>
<dd>Same as '300deg'. With 'behind', '240deg'.
<dt><strong>left</strong>
<dd>Same as '320deg'. With 'behind', '220deg'.
<dt><strong>center-left</strong>
<dd>Same as '340deg'. With 'behind', '200deg'.
<dt><strong>center</strong>
<dd>Same as '0deg'. With 'behind', '180deg'.
<dt><strong>center-right</strong>
<dd>Same as '20deg'. With 'behind', '160deg'.
<dt><strong>right</strong>
<dd>Same as '40deg'. With 'behind', '140deg'.
<dt><strong>far-right</strong>
<dd>Same as '60deg'. With 'behind', '120deg'.
<dt><strong>right-side</strong>
<dd>Same as '90deg'. With 'behind', '90deg'.
<dt><strong>leftwards</strong>
<dd>Moves the sound
to the left, relative to the current angle.
More precisely, subtracts 20 degrees.
Arithmetic is carried out modulo 360 degrees. Note that
'leftwards' is more accurately described as "turned
counter-clockwise," since it <em>always</em> subtracts 20 degrees,
even if the inherited azimuth is already behind the listener (in which
case the sound actually appears to move to the right).
<dt><strong>rightwards</strong>
<dd>Moves the sound
to the right, relative to the
current angle. More precisely, adds 20 degrees. See 'leftwards'
for arithmetic.
</dl>
<p>This property is most likely to be implemented by mixing the same
signal into different channels at differing volumes. It might also
use phase shifting, digital delay, and other such techniques to
provide the illusion of a sound stage. The precise means used to
achieve this effect and the number of speakers used to do so are
user agent-dependent; this property merely identifies the desired end
result.
<div class="example"><P>
<PRE>
h1 { azimuth: 30deg }
td.a { azimuth: far-right } /* 60deg */
#12 { azimuth: behind far-right } /* 120deg */
p.comment { azimuth: behind } /* 180deg */
</PRE>
</div>
<p>If spatial-azimuth is specified and the output device cannot
produce sounds <em>behind</em> the listening position, user agents
should convert values in the rearwards hemisphere to forwards
hemisphere values. One method is as follows:</p>
<ul>
<li>if 90deg &lt; x &lt;= 180deg then x := 180deg - x
<li>if 180deg &lt; x &lt;= 270deg then x := 540deg - x
</ul>
<!-- #include src=properties/elevation.srb -->
<P>Values of this property have the following meanings:</p>
<dl>
<dt><span class="index-inst" title="&lt;angle&gt;"><span class="value-inst-angle"><strong>&lt;angle&gt;</strong></span></span>
<dd>Specifies the elevation as an angle, between '-90deg' and '90deg'.
'0deg' means on the forward horizon, which loosely means level with
the listener. '90deg' means directly overhead and '-90deg' means directly
below.
<dt><strong>below</strong>
<dd>Same as '-90deg'.
<dt><strong>level</strong>
<dd>Same as '0deg'.
<dt><strong>above</strong>
<dd>Same as '90deg'.
<dt><strong>higher</strong>
<dd>Adds 10 degrees to the current elevation.
<dt><strong>lower</strong>
<dd>Subtracts 10 degrees from the current elevation.
</dl>
<P>The precise means used to achieve this effect and the
number of speakers used to do so are undefined. This property merely
identifies the desired end result.
<div class="example"><P>
<PRE>
h1 { elevation: above }
tr.a { elevation: 60deg }
tr.b { elevation: 30deg }
tr.c { elevation: level }
</pre>
</div>
<h2><a name="voice-char-props">Voice characteristic properties</a>: <span
class="propinst-speech-rate">'speech-rate'</span>, <span
class="propinst-voice-family">'voice-family'</span>,
<span class="propinst-pitch">'pitch'</span>,
<span class="propinst-pitch-range">'pitch-range'</span>,
<span class="propinst-stress">'stress'</span>, and
<span class="propinst-richness">'richness'</span></H2>
<!-- #include src=properties/speech-rate.srb -->
<P>This property specifies the speaking rate. Note that both absolute
and relative keyword values are allowed (compare with <span
class="propinst-font-size">'font-size'</span>). Values have
the following meanings:</p>
<dl>
<dt><span class="index-inst" title="&lt;number&gt;"><span
class="value-inst-number"><strong>&lt;number&gt;</strong></span></span>
<dd>Specifies the speaking rate in words per minute, a quantity that varies
somewhat by language but is nevertheless widely supported by speech
synthesizers.
<dt><strong>x-slow</strong>
<dd>Same as 80 words per minute.
<dt><strong>slow</strong>
<dd>Same as 120 words per minute
<dt><strong>medium</strong>
<dd>Same as 180 - 200 words per minute.
<dt><strong>fast</strong>
<dd>Same as 300 words per minute.
<dt><strong>x-fast</strong>
<dd>Same as 500 words per minute.
<dt><strong>faster</strong>
<dd>Adds 40 words per minute to the current speech rate.
<dt><strong>slower</strong>
<dd>Subtracts 40 words per minutes from the current speech rate.
</dl>
<!-- #include src=properties/voice-family.srb -->
<P>The value is a comma-separated, prioritized list of voice family
names (compare with <span
class="propinst-font-family">'font-family'</span>). Values have the
following meanings:</P>
<dl>
<dt><span class="index-def" title="&lt;generic-voice&gt;,
definition of"><a
name="value-def-generic-voice"><strong>&lt;generic-voice&gt;</strong></a></span>
<dd>Values are voice families. Possible values
are 'male', 'female', and 'child'.
<dt><span class="index-def" title="&lt;specific-voice&gt;::definition of"><a name="value-def-specific-voice"><strong>&lt;specific-voice&gt;</strong></a></span>
<dd>Values are specific instances (e.g., comedian, trinoids, carlos, lani).
</dl>
<div class="example"><P>
<pre>
h1 { voice-family: announcer, male }
p.part.romeo { voice-family: romeo, male }
p.part.juliet { voice-family: juliet, female }
</pre>
</div>
<p>Names of specific voices may be quoted, and indeed must be quoted
if any of the words that make up the name does not conform to the
syntax rules for <a
href="syndata.html#tokenization">identifiers</a>. It is also
recommended to quote specific voices with a name consisting of more
than one word. If quoting is omitted, any <a
href="syndata.html#whitespace">white space</a> characters before and
after the voice family name are ignored and any sequence of white space
characters inside the voice family name is converted to a single space.
<!-- #include src=properties/pitch.srb -->
<p>Specifies the average pitch (a frequency) of the speaking voice. The
average pitch of a voice depends on the voice family. For example,
the average pitch for a standard male voice is around 120Hz,
but for a female voice, it's around 210Hz.</p>
<P>Values have the following meanings:</P>
<dl>
<dt><span class="index-inst" title="&lt;frequency&gt;"><span class="value-inst-frequency"><strong>&lt;frequency&gt;</strong></span></span>
<dd>Specifies the average pitch of the speaking voice in hertz (Hz).
<dt><strong>x-low</strong>, <strong>low</strong>,
<strong>medium</strong>, <strong>high</strong>, <strong>x-high</strong>
<dd>These values do not map to absolute frequencies since
these values depend on the voice family. User agents should map
these values to appropriate frequencies based on the voice family
and user environment. However, user agents must map these values in
order (i.e., 'x-low' is a lower frequency than 'low', etc.).
</dl>
<!-- #include src=properties/pitch-range.srb -->
<p>Specifies variation in average pitch. The perceived pitch of a
human voice is determined by the fundamental frequency and typically
has a value of 120Hz for a male voice and 210Hz for a female voice.
Human languages are spoken with varying inflection and pitch; these
variations convey additional meaning and emphasis. Thus, a highly
animated voice, i.e., one that is heavily inflected, displays a high
pitch range. This property specifies the range over which these
variations occur, i.e., how much the fundamental frequency may deviate
from the average pitch.
<P>Values have the following meanings:</p>
<dl>
<dt><span class="index-inst" title="&lt;number&gt;"><span class="value-inst-number"><strong>&lt;number&gt;</strong></span></span>
<dd>A value between '0' and '100'. A pitch range of '0' produces
a flat, monotonic voice. A pitch range of 50 produces normal
inflection. Pitch ranges greater than 50 produce animated voices.
</dl>
<!-- #include src=properties/stress.srb -->
<p>Specifies the height of "local peaks" in the intonation contour
of a voice. For example, English is a <strong>stressed</strong>
language, and different parts of a sentence are assigned primary,
secondary, or tertiary stress. The value of <span
class="propinst-stress">'stress'</span> controls the amount of
inflection that results from these stress markers. This property is a
companion to the <span
class="propinst-pitch-range">'pitch-range'</span> property and is
provided to allow developers to exploit higher-end auditory displays.
<P>Values have the following meanings:</p>
<dl>
<dt><span class="index-inst" title="&lt;number&gt;"><span class="value-inst-number"><strong>&lt;number&gt;</strong></span></span>
<dd>A value, between '0' and '100'. The meaning of values
depends on the language being spoken. For example,
a level of '50' for a
standard, English-speaking male voice (average pitch = 122Hz), speaking
with normal intonation and emphasis would have a different
meaning than '50' for an Italian voice.
</dl>
<!-- #include src=properties/richness.srb -->
<P>Specifies the richness, or brightness, of the speaking voice. A
rich voice will "carry" in a large room, a smooth voice will not.
(The term "smooth" refers to how the wave form looks when drawn.)
<P>Values have the following meanings:</p>
<dl>
<dt><span class="index-inst" title="&lt;number&gt;"><span class="value-inst-number"><strong>&lt;number&gt;</strong></span></span>
<dd>A value between '0' and '100'.
The higher the value, the more the voice will carry.
A lower value will produce a soft, mellifluous voice.
</dl>
<H2><a name="speech-props">Speech properties</a>:
<span class="propinst-speak-punctuation">'speak-punctuation'</span>
and <span class="propinst-speak-numeral">'speak-numeral'</span>
</h2>
<p>An additional speech property, <span
class="propinst-speak-header">'speak-header'</span>, is
described below.
<!-- #include src=properties/speak-punctuation.srb -->
<P>This property specifies how punctuation is spoken. Values have the
following meanings:</p>
<dl>
<dt><strong>code</strong>
<dd>Punctuation such as semicolons,
braces, and so on are to be spoken literally.
<dt><strong>none</strong>
<dd>Punctuation is not to be spoken, but instead rendered
naturally as various pauses.
</dl>
<!-- #include src=properties/speak-numeral.srb -->
<p>This property controls how numerals are spoken. Values have the
following meanings:</P>
<dl>
<dt><strong>digits</strong>
<dd>Speak the numeral as individual digits. Thus, "237" is spoken
"Two Three Seven".
<dt><strong>continuous</strong>
<dd>Speak the numeral as a full number. Thus, "237" is spoken
"Two hundred thirty seven". Word representations are language-dependent.
</dl>
<h2><a name="aural-tables">Audio rendering of tables</a></h2>
<p>When a table is spoken by a speech generator, the relation between
the data cells and the header cells must be expressed in a different
way than by horizontal and vertical alignment. Some speech browsers
may allow a user to move around in the 2-dimensional space, thus
giving them the opportunity to map out the spatially represented
relations. When that is not possible, the style sheet must specify at
which points the headers are spoken.</p>
<h3><a name="speak-headers">Speaking headers:</a> the <span
class="propinst-speak-header">'speak-header'</span> property</h3>
<!-- #include src=properties/speak-header.srb -->
<P>This property specifies whether table headers
are spoken before every
cell, or only before a cell when that cell is associated with a
different header than the previous cell. Values have
the following meanings:</p>
<dl>
<dt><strong>once</strong>
<dd>The header is spoken one time, before a series of
cells.
<dt><strong>always</strong>
<dd>The header is spoken before every pertinent cell.
</dl>
<p>Each document language may have different mechanisms that allow
authors to specify headers. For example, in HTML 4 ([[HTML4]]),
it is possible to specify header information with three different
attributes ("headers", "scope", and "axis"), and the specification
gives an algorithm for determining header information when these
attributes have not been specified.</p>
<div class="html-example">
<div class="figure">
<P><img src="images/table1.png" alt="Image of a table created in MS
Word"><p class="caption"> Image of a table with header cells ("San
Jose" and "Seattle") that are not in the same column or row as the
data they apply to.
</div>
<p>This HTML example presents the money spent on meals, hotels and
transport in two locations (San Jose and Seattle) for successive
days. Conceptually, you can think of the table in terms of an
n-dimensional space. The headers of this space are: location, day,
category and subtotal. Some cells define marks along an axis while
others give money spent at points within this space. The markup
for this table is:</p>
<pre>
&lt;TABLE&gt;
&lt;CAPTION&gt;Travel Expense Report&lt;/CAPTION&gt;
&lt;TR&gt;
&lt;TH&gt;&lt;/TH&gt;
&lt;TH&gt;Meals&lt;/TH&gt;
&lt;TH&gt;Hotels&lt;/TH&gt;
&lt;TH&gt;Transport&lt;/TH&gt;
&lt;TH&gt;subtotal&lt;/TH&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TH id="san-jose" axis="san-jose"&gt;San Jose&lt;/TH&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TH headers="san-jose"&gt;25-Aug-97&lt;/TH&gt;
&lt;TD&gt;37.74&lt;/TD&gt;
&lt;TD&gt;112.00&lt;/TD&gt;
&lt;TD&gt;45.00&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TH headers="san-jose"&gt;26-Aug-97&lt;/TH&gt;
&lt;TD&gt;27.28&lt;/TD&gt;
&lt;TD&gt;112.00&lt;/TD&gt;
&lt;TD&gt;45.00&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TH headers="san-jose"&gt;subtotal&lt;/TH&gt;
&lt;TD&gt;65.02&lt;/TD&gt;
&lt;TD&gt;224.00&lt;/TD&gt;
&lt;TD&gt;90.00&lt;/TD&gt;
&lt;TD&gt;379.02&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TH id="seattle" axis="seattle"&gt;Seattle&lt;/TH&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TH headers="seattle"&gt;27-Aug-97&lt;/TH&gt;
&lt;TD&gt;96.25&lt;/TD&gt;
&lt;TD&gt;109.00&lt;/TD&gt;
&lt;TD&gt;36.00&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TH headers="seattle"&gt;28-Aug-97&lt;/TH&gt;
&lt;TD&gt;35.00&lt;/TD&gt;
&lt;TD&gt;109.00&lt;/TD&gt;
&lt;TD&gt;36.00&lt;/TD&gt;
&lt;TD&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TH headers="seattle"&gt;subtotal&lt;/TH&gt;
&lt;TD&gt;131.25&lt;/TD&gt;
&lt;TD&gt;218.00&lt;/TD&gt;
&lt;TD&gt;72.00&lt;/TD&gt;
&lt;TD&gt;421.25&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TH&gt;Totals&lt;/TH&gt;
&lt;TD&gt;196.27&lt;/TD&gt;
&lt;TD&gt;442.00&lt;/TD&gt;
&lt;TD&gt;162.00&lt;/TD&gt;
&lt;TD&gt;800.27&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TABLE&gt;
</pre>
<p>By providing the data model in this way, authors make it
possible for speech enabled-browsers to explore the table in
rich ways, e.g., each cell could be spoken as a list, repeating the
applicable headers before each data cell:</p>
<pre>
San Jose, 25-Aug-97, Meals: 37.74
San Jose, 25-Aug-97, Hotels: 112.00
San Jose, 25-Aug-97, Transport: 45.00
...
</pre>
<p>The browser could also speak the headers only when they change:</p>
<pre>
San Jose, 25-Aug-97, Meals: 37.74
Hotels: 112.00
Transport: 45.00
26-Aug-97, Meals: 27.28
Hotels: 112.00
...
</pre>
</div>
<h2><a name="sample">Sample style sheet for HTML</a></h2>
<p>This style sheet describes a possible rendering of HTML 4:
<pre>
@media aural {
h1, h2, h3,
h4, h5, h6 { voice-family: paul, male; stress: 20; richness: 90 }
h1 { pitch: x-low; pitch-range: 90 }
h2 { pitch: x-low; pitch-range: 80 }
h3 { pitch: low; pitch-range: 70 }
h4 { pitch: medium; pitch-range: 60 }
h5 { pitch: medium; pitch-range: 50 }
h6 { pitch: medium; pitch-range: 40 }
li, dt, dd { pitch: medium; richness: 60 }
dt { stress: 80 }
pre, code, tt { pitch: medium; pitch-range: 0; stress: 0; richness: 80 }
em { pitch: medium; pitch-range: 60; stress: 60; richness: 50 }
strong { pitch: medium; pitch-range: 60; stress: 90; richness: 90 }
dfn { pitch: high; pitch-range: 60; stress: 60 }
s, strike { richness: 0 }
i { pitch: medium; pitch-range: 60; stress: 60; richness: 50 }
b { pitch: medium; pitch-range: 60; stress: 90; richness: 90 }
u { richness: 0 }
a:link { voice-family: harry, male }
a:visited { voice-family: betty, female }
a:active { voice-family: betty, female; pitch-range: 80; pitch: x-high }
}
</pre>
<h2><a name="Emacspeak">Emacspeak</a></h2>
<p>For information, here is the list of properties implemented by
Emacspeak, a speech subsystem for the Emacs editor.
<ul>
<li>voice-family
<li>stress (but with a different range of values)
<li>richness (but with a different range of values)
<li>pitch (but with differently named values)
<li>pitch-range (but with a different range of values)
</ul>
<p>(We thank T. V. Raman for the information about implementation
status of aural properties.)
</BODY>
</HTML>
<!-- Keep this comment at the end of the file
Local variables:
mode: sgml
sgml-declaration:"~/SGML/HTML4.decl"
sgml-default-doctype-name:"html"
sgml-minimize-attributes:t
sgml-nofill-elements:("pre" "style" "br")
End:
-->