csswg-drafts/css2/aural.src at 0cf85a28f88caa25b9920a1da943cb7c5834a3aa · w3c/csswg-drafts · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
<!DOCTYPE html PUBLIC '-//W3C//DTD HTML 4.01//EN'>

<html lang="en">
<!-- $Id: aural.src,v 2.59 2011-10-18 19:25:27 bbos Exp $ -->
<HEAD>
<TITLE>Aural style sheets</TITLE>
<!--script src="http://www.w3c-test.org/css/harness/annotate.js#CSS21_DEV" type="text/javascript" defer></script-->
</HEAD>
<BODY>
<H1>Aural style sheets</H1>

<p>This chapter is informative. UAs are not required to implement the
properties of this chapter in order to conform to CSS 2.1.

<h2><a name="aural-media-group">The media types 'aural' and 'speech'</a></h2>

<p>We expect that in a future level of CSS there will be new
properties and values defined for speech output. Therefore
CSS&nbsp;2.1 reserves the 'speech' media type (see <a
href="media.html">chapter 7, "Media types"</a>), but does not yet
define which properties do or do not apply to it.

<p>The properties in this appendix apply to a media type 'aural', that
was introduced in CSS2. The type 'aural' is now deprecated.

<div class=note>
<p>This means that a style sheet such as

<pre>
@media speech {
  body { voice-family: Paul }
}
</pre>

<p>is valid, but that its meaning is not defined by CSS&nbsp;2.1,
while

<pre>
@media aural {
  body { voice-family: Paul }
}
</pre>

<p>is deprecated, but defined by this appendix.
</div>


<H2><a name="aural-intro">Introduction to aural style sheets</a></H2>

<p>The aural rendering of a document, already commonly used by the
blind and print-impaired communities, combines speech synthesis and
<span class="index-def" title="auditory icon">"auditory icons."</span> Often
such aural presentation occurs by converting the document to plain
text and feeding this to a <span class="index-def" title="screen
reader"><dfn>screen reader</dfn></span> -- software or hardware that
simply reads all the characters on the screen.  This results in less
effective presentation than would be the case if the document
structure were retained.  Style sheet properties for aural presentation
may be used together with visual properties (mixed media) or as an
aural alternative to visual presentation.

<p>Besides the obvious accessibility advantages, there are other large
markets for listening to information, including in-car use, industrial
and medical documentation systems (intranets), home entertainment, and
to help users learning to read or who have difficulty reading.

<p>When using aural properties, the <span class="index-inst"
title="canvas">canvas</span> consists of a three-dimensional physical
space (sound surrounds) and a temporal space (one may specify sounds
before, during, and after other sounds). The CSS properties also
allow authors to vary the quality of synthesized speech (voice type,
frequency, inflection, etc.).

<div class="example"><p>
<pre>
h1, h2, h3, h4, h5, h6 {
    voice-family: paul;
    stress: 20;
    richness: 90;
    cue-before: url("ping.au")
}
p.heidi { azimuth: center-left }
p.peter { azimuth: right }
p.goat  { volume: x-soft }
</pre>

<p>This will direct the speech synthesizer to speak headers in a voice
(a kind of "audio font") called "paul", on a flat tone, but in a very
rich voice. Before speaking the headers, a sound sample will be played
from the given URL. Paragraphs with class "heidi" will appear to come
from front left (if the sound system is capable of spatial audio), and
paragraphs of class "peter" from the right. Paragraphs with class
"goat" will be very soft.
</div>


<H3><a name="angles">Angles</a></H3>
<P>Angle values are denoted by <span class="index-def"
title="&lt;angle&gt;::definition of"><a
name="value-def-angle">&lt;angle&gt;</a></span> in the text.
Their format is a <span class="index-inst"
title="&lt;number&gt;"><span
class="value-inst-number">&lt;number&gt;</span></span> immediately
followed by an angle unit identifier.

<P>Angle unit identifiers are:</p>

<ul>
<li><strong>deg</strong>: degrees
<LI><strong>grad</strong>: grads
<LI><strong>rad</strong>: radians
</UL>

<p>Angle values may be negative. They should be normalized to the
range 0-360deg by the user agent. For example, -10deg and 350deg are
equivalent.

<P>For example, a right angle is '90deg' or '100grad' or
'1.570796326794897rad'.

<p>Like for &lt;length&gt;, the unit may be omitted, if the value is
zero: '0deg' may be written as '0'.

<H3><a name="times">Times</a></H3>

<P>Time values are denoted by <span class="index-def" title="&lt;time&gt;::definition of"><a name="value-def-time">&lt;time&gt;</a></span> in the
text.
Their format is a <span class="index-inst"
title="&lt;number&gt;"><span
class="value-inst-number">&lt;number&gt;</span></span> immediately
followed by a time unit identifier.

<P>Time unit identifiers are:</p>

<UL>
<LI><strong>ms</strong>: milliseconds
<LI><strong>s</strong>: seconds
</UL>

<p>Time values may not be negative.

<p>Like for &lt;length&gt;, the unit may be omitted, if the value is
zero: '0s' may be written as '0'.

<H3><a name="frequencies">Frequencies</a></H3>

<P>Frequency values are denoted by <span class="index-def"
title="&lt;frequency&gt;::definition of"><a
name="value-def-frequency">&lt;frequency&gt;</a></span> in the text.
Their format is a <span class="index-inst"
title="&lt;number&gt;"><span
class="value-inst-number">&lt;number&gt;</span></span> immediately
followed by a frequency unit identifier.

<p>Frequency unit identifiers are:</p>

<ul>
<li><strong>Hz</strong>: Hertz
<li><strong>kHz</strong>: kilohertz
</ul>

<p>Frequency values may not be negative.

<P> For example, 200Hz (or 200hz) is a bass sound, and 6kHz
is a treble sound.

<p>Like for &lt;length&gt;, the unit may be omitted, if the value is
zero: '0Hz' may be written as '0'.

<H2><a name="volume-props">Volume properties</a>: <span
class="propinst-volume">'volume'</span></H2>

<!-- #include src=properties/volume.srb -->

<P><span class="index-def" title="volume">Volume</span> refers to the
median volume of the waveform. In other words, a highly inflected
voice at a volume of 50 might peak well above that. The overall values
are likely to be human adjustable for comfort, for example with a
physical volume control (which would increase both the 0 and 100
values proportionately); what this property does is adjust the dynamic
range.


<P>Values have the following meanings:</p>

<dl>
<dt><span class="index-inst" title="&lt;number&gt;"><span
		class="value-inst-number"><strong>&lt;number&gt;</strong>
</span></span>
<dd>Any number between '0' and '100'.
'0' represents the <em>minimum audible</em>
volume level and 100 corresponds to the
<em>maximum comfortable</em> level.
<dt><span class="index-inst" title="&lt;percentage&gt;"><span class="value-inst-percentage"><strong>&lt;percentage&gt;</strong></span></span>
<dd>Percentage values are calculated relative to the inherited value,
and are then clipped to the range '0' to '100'.
<dt><strong>silent</strong>
<dd>No sound at all. The value '0' does not mean
the same as 'silent'.
<dt><strong>x-soft</strong>
<dd>Same as '0'.
<dt><strong>soft</strong>
<dd>Same as '25'.
<dt><strong>medium</strong>
<dd>Same as '50'.
<dt><strong>loud</strong>
<dd>Same as '75'.
<dt><strong>x-loud</strong>
<dd>Same as '100'.
</dl>

<p>User agents should allow the values corresponding to '0' and '100'
to be set by the listener. No one setting is universally applicable;
suitable values depend on the equipment in use (speakers, headphones),
the environment (in car, home theater, library) and personal
preferences. Some examples:</p>

<ul>
<li>A browser for in-car use has a setting for when there is lots of
background noise. '0' would map to a fairly high level and '100' to a
quite high level. The speech is easily audible over the road noise but
the overall dynamic range is compressed. Cars with better
insulation might allow a wider dynamic range.

<li>Another speech browser is being used in an apartment, late at
night, or in a shared study room. '0' is set to a very quiet level and
'100' to a fairly quiet level, too. As with the first example, there
is a low slope; the dynamic range is reduced. The actual volumes are
low here, whereas they were high in the first example.

<li>In a quiet and isolated house, an expensive hi-fi home theater
setup. '0' is set fairly low and '100' to quite high; there is wide
dynamic range.
</ul>

<p>The same author style sheet could be used in all cases, simply by
mapping the '0' and '100' points suitably at the client side.

<H2><a name="speaking-props">Speaking properties</a>: <span
class="propinst-speak">'speak'</span></H2>

<!-- #include src=properties/speak.srb -->

<P>This property specifies whether text will be rendered aurally and
if so, in what manner. The possible values are:

<dl>
<dt><strong>none</strong></dt>
<dd>Suppresses aural rendering so that the
element requires no time to render. Note, however, that
descendants may override this value and will be spoken. (To
be sure to suppress rendering of an
element and its descendants, use the
<span class="propinst-display">'display'</span> property).

<dt><strong>normal</strong></dt>
<dd>Uses  language-dependent pronunciation rules for rendering
an element and its children.

<dt><strong>spell-out</strong></dt>
<dd>Spells the text one letter at a time (useful for acronyms and
abbreviations).
</dl>

<p>Note the difference between an element whose <span
class="propinst-volume">'volume'</span> property has a value of
'silent' and an element whose <span
class="propinst-speak">'speak'</span> property has the value 'none'.
The former takes up the same time as if it had been spoken, including
any pause before and after the element, but no sound is generated. The
latter requires no time and is not rendered (though its descendants
may be).

<H2><a name="pause-props">Pause properties</a>: <span
class="propinst-pause-before">'pause-before'</span>, <span
class="propinst-pause-after">'pause-after'</span>, and <span
class="propinst-pause">'pause'</span></H2>

<!-- #include src=properties/pause-before.srb -->

<!-- #include src=properties/pause-after.srb -->


<P>These properties specify a pause to be observed before (or after)
speaking an element's content.  Values have the following
meanings:</p>

<p class=note><strong>Note.</strong> In CSS3 pauses are inserted
around the cues and content rather than between them. See
[[-CSS3SPEECH]] for details.

<dl>
<dt><span class="index-inst" title="&lt;time&gt;"><span class="value-inst-time"><strong>&lt;time&gt;</strong></span></span>
<dd>Expresses the pause in absolute time units (seconds and milliseconds).
<dt><span class="index-inst" title="&lt;percentage&gt;"><span class="value-inst-percentage"><strong>&lt;percentage&gt;</strong></span></span>
<dd>Refers to the inverse of the value of the
<span class="propinst-speech-rate">'speech-rate'</span> property.
For example, if the speech-rate is 120 words per minute
(i.e., a word takes half a second, or 500ms) then a <span
class="propinst-pause-before">'pause-before'</span> of 100% means a
pause of 500 ms and a <span
class="propinst-pause-before">'pause-before'</span> of 20% means
100ms.
</dl>

<p>The pause is inserted between the element's content and any <span
class="propinst-cue-before">'cue-before'</span> or <span
class="propinst-cue-after">'cue-after'</span> content.

<p>Authors should use relative units to create more robust style
sheets in the face of large changes in speech-rate.</p>

<!-- #include src=properties/pause.srb -->

<P>The <span class="propinst-pause">'pause'</span> property is a
shorthand for setting <span
class="propinst-pause-before">'pause-before'</span> and <span
class="propinst-pause-after">'pause-after'</span>.  If two values are
given, the first value is <span
class="propinst-pause-before">'pause-before'</span> and the second is
<span class="propinst-pause-after">'pause-after'</span>. If only one
value is given, it applies to both properties.

<div class="example"><P>
<PRE>
h1 { pause: 20ms } /* pause-before: 20ms; pause-after: 20ms */
h2 { pause: 30ms 40ms } /* pause-before: 30ms; pause-after: 40ms */
h3 { pause-after: 10ms } /* pause-before unspecified; pause-after: 10ms */
</PRE>
</div>

<H2><a name="cue-props">Cue properties</a>: <span
class="propinst-cue-before">'cue-before'</span>, <span
class="propinst-cue-after">'cue-after'</span>, and <span
class="propinst-cue">'cue'</span></H2>

<!-- #include src=properties/cue-before.srb -->

<!-- #include src=properties/cue-after.srb -->

<P>Auditory icons are another way to distinguish semantic
elements. Sounds may be played before and/or after the element to
delimit it. Values have the following meanings:</p>

<dl>
<dt><span class="index-inst" title="&lt;uri&gt;"><span class="value-inst-uri"><strong>&lt;uri&gt;</strong></span></span>
<dd> The URI must designate an auditory icon resource. If the URI resolves to something other than an audio file, such as an image, the resource should be ignored and the property treated as if it had the value 'none'.

<dt><strong>none</strong>
<dd> No auditory icon is specified.
</dl>

<div class="example"><P>
<PRE>
a {cue-before: url("bell.aiff"); cue-after: url("dong.wav") }
h1 {cue-before: url("pop.au"); cue-after: url("pop.au") }
</pre>
</div>

<!-- #include src=properties/cue.srb -->

<P>The <span class="propinst-cue">'cue'</span> property is a shorthand
for setting <span class="propinst-cue-before">'cue-before'</span>
and <span class="propinst-cue-after">'cue-after'</span>.  If two
values are given, the first value is <span
class="propinst-cue-before">'cue-before'</span> and the second is
<span class="propinst-cue-after">'cue-after'</span>. If only one
value is given, it applies to both properties.</p>

<div class="example"><P>
The following two rules are equivalent:
<PRE>
h1 {cue-before: url("pop.au"); cue-after: url("pop.au") }
h1 {cue: url("pop.au") }
</pre>
</div>

<P>If a user agent cannot render an auditory icon (e.g., the user's
environment does not permit it), we recommend that it produce an
alternative cue.

<P>Please see the sections on <a
href="generate.html#before-after-content"> the :before and :after
pseudo-elements</a> for information on other content generation
techniques. 'Cue-before' sounds and 'pause-before' gaps are inserted
before content from the ':before' pseudo-element. Similarly,
'pause-after' gaps and 'cue-after' sounds are inserted after content
from the ':after' pseudo-element.

<H2><a name="mixing-props">Mixing properties</a>: <span
class="propinst-play-during">'play-during'</span></H2>

<!-- #include src=properties/play-during.srb -->

<p>Similar to the <span
class="propinst-cue-before">'cue-before'</span> and <span
class="propinst-cue-after">'cue-after'</span> properties, this
property specifies a sound to be played as a background
while an element's content is spoken.
Values have the following meanings:</p>

<dl>
<dt><span class="index-inst" title="&lt;uri&gt;"><span class="value-inst-uri"><strong>&lt;uri&gt;</strong></span></span>
<dd>The sound designated by this <span class="index-inst"
title="&lt;uri&gt;"><span
class="value-inst-uri">&lt;uri&gt;</span></span> is played
as a background while the element's content is spoken.
<dt><strong>mix</strong>
<dd>When present, this keyword means that
the sound inherited from the parent element's <span
class="propinst-play-during">'play-during'</span> property continues
to play and the sound designated by the <span
class="index-inst" title="&lt;uri&gt;"><span
class="value-inst-uri">&lt;uri&gt;</span></span> is mixed with it. If
'mix' is not specified, the element's background sound replaces
the parent's.
<dt><strong>repeat</strong>
<dd>When present, this keyword means that the sound will repeat if it
is too short to fill the entire duration of the element. Otherwise,
the sound plays once and then stops. This is similar to the <span
class="propinst-background-repeat">'background-repeat'</span>
property. If the sound is too long for the element, it is clipped once
the element has been spoken.
<dt><strong>auto</strong>
<dd>The sound of the parent element continues to play
(it is not restarted, which would have been the case if this property
had been inherited).
<dt><strong>none</strong>
<dd>This keyword means that there is silence. The sound of the
parent element (if any) is silent during the current element and
continues after the current element.
</dl>

<div class="example"><P>
<PRE>
blockquote.sad { play-during: url("violins.aiff") }
blockquote Q   { play-during: url("harp.wav") mix }
span.quiet     { play-during: none }
</pre>
</div>

<H2><a name="spatial-props">Spatial properties</a>: <span
class="propinst-azimuth">'azimuth'</span> and
<span class="propinst-elevation">'elevation'</span>
</H2>

<p>Spatial audio is an important stylistic property for aural
presentation. It provides a natural way to tell several voices apart,
as in real life (people rarely all stand in the same spot in a
room). Stereo speakers produce a lateral sound stage. Binaural
headphones or the increasingly popular 5-speaker home theater setups
can generate full surround sound, and multi-speaker setups can create
a true three-dimensional sound stage. VRML 2.0 also includes spatial
audio, which implies that in time consumer-priced spatial audio
hardware will become more widely available.</p>

<!-- #include src=properties/azimuth.srb -->

<P>Values have the following meanings:</p>

<dl>
<dt><span class="index-inst" title="&lt;angle&gt;"><span class="value-inst-angle"><strong>&lt;angle&gt;</strong></span></span>
<dd>Position is described in terms of an angle
within the range '-360deg' to '360deg'.
The value '0deg' means directly ahead in the center of the sound
stage.  '90deg' is to the right, '180deg' behind, and '270deg' (or,
equivalently and more conveniently, '-90deg') to the left.
<dt><strong>left-side</strong>
<dd>Same as '270deg'. With 'behind', '270deg'.
<dt><strong>far-left</strong>
<dd>Same as '300deg'. With 'behind', '240deg'.
<dt><strong>left</strong>
<dd>Same as '320deg'. With 'behind', '220deg'.
<dt><strong>center-left</strong>
<dd>Same as '340deg'. With 'behind', '200deg'.
<dt><strong>center</strong>
<dd>Same as '0deg'. With 'behind', '180deg'.
<dt><strong>center-right</strong>
<dd>Same as '20deg'. With 'behind', '160deg'.
<dt><strong>right</strong>
<dd>Same as '40deg'. With 'behind', '140deg'.
<dt><strong>far-right</strong>
<dd>Same as '60deg'. With 'behind', '120deg'.
<dt><strong>right-side</strong>
<dd>Same as '90deg'. With 'behind', '90deg'.
<dt><strong>leftwards</strong>
<dd>Moves the sound
to the left, relative to the current angle.
More precisely, subtracts 20 degrees.
Arithmetic is carried out modulo 360 degrees. Note that
'leftwards' is more accurately described as "turned
counter-clockwise," since it <em>always</em> subtracts 20 degrees,
even if the inherited azimuth is already behind the listener (in which
case the sound actually appears to move to the right).
<dt><strong>rightwards</strong>
<dd>Moves the sound
to the right, relative to the
current angle. More precisely, adds 20 degrees. See 'leftwards'
for arithmetic.
</dl>

<p>This property is most likely to be implemented by mixing the same
signal into different channels at differing volumes.  It might also
use phase shifting, digital delay, and other such techniques to
provide the illusion of a sound stage.  The precise means used to
achieve this effect and the number of speakers used to do so are
user agent-dependent; this property merely identifies the desired end
result.

<div class="example"><P>
<PRE>
h1   { azimuth: 30deg }
td.a { azimuth: far-right }          /*  60deg */
#12  { azimuth: behind far-right }   /* 120deg */
p.comment { azimuth: behind }        /* 180deg */
</PRE>
</div>

<p>If spatial-azimuth is specified and the output device cannot
produce sounds <em>behind</em> the listening position, user agents
should convert values in the rearwards hemisphere to forwards
hemisphere values.  One method is as follows:</p>

<ul>
    <li>if 90deg &lt; x &lt;= 180deg then x := 180deg - x
    <li>if 180deg &lt; x &lt;= 270deg then x := 540deg - x
</ul>

<!-- #include src=properties/elevation.srb -->

<P>Values of this property have the following meanings:</p>

<dl>
<dt><span class="index-inst" title="&lt;angle&gt;"><span class="value-inst-angle"><strong>&lt;angle&gt;</strong></span></span>
<dd>Specifies the elevation as an angle, between '-90deg' and '90deg'.
'0deg' means on the forward horizon, which loosely means level with
the listener.  '90deg' means directly overhead and '-90deg' means directly
below.
<dt><strong>below</strong>
<dd>Same as '-90deg'.
<dt><strong>level</strong>
<dd>Same as '0deg'.
<dt><strong>above</strong>
<dd>Same as '90deg'.
<dt><strong>higher</strong>
<dd>Adds 10 degrees to the current elevation.
<dt><strong>lower</strong>
<dd>Subtracts 10 degrees from the current elevation.
</dl>

<P>The precise means used to achieve this effect and the
number of speakers used to do so are undefined.  This property merely
identifies the desired end result.

<div class="example"><P>
<PRE>
h1   { elevation: above }
tr.a { elevation: 60deg }
tr.b { elevation: 30deg }
tr.c { elevation: level }
</pre>
</div>

<h2><a name="voice-char-props">Voice characteristic properties</a>: <span
class="propinst-speech-rate">'speech-rate'</span>, <span
class="propinst-voice-family">'voice-family'</span>,
<span class="propinst-pitch">'pitch'</span>,
<span class="propinst-pitch-range">'pitch-range'</span>,
<span class="propinst-stress">'stress'</span>, and
<span class="propinst-richness">'richness'</span></H2>

<!-- #include src=properties/speech-rate.srb -->

<P>This property specifies the speaking rate. Note that both absolute
and relative keyword values are allowed (compare with <span
class="propinst-font-size">'font-size'</span>). Values have
the following meanings:</p>

<dl>
<dt><span class="index-inst" title="&lt;number&gt;"><span
class="value-inst-number"><strong>&lt;number&gt;</strong></span></span>
<dd>Specifies the speaking rate in words per minute, a quantity that varies
somewhat by language but is nevertheless widely supported by speech
synthesizers.
<dt><strong>x-slow</strong>
<dd>Same as 80 words per minute.
<dt><strong>slow</strong>
<dd>Same as 120 words per minute
<dt><strong>medium</strong>
<dd>Same as 180 - 200 words per minute.
<dt><strong>fast</strong>
<dd>Same as 300 words per minute.
<dt><strong>x-fast</strong>
<dd>Same as 500 words per minute.
<dt><strong>faster</strong>
<dd>Adds 40 words per minute to the current speech rate.
<dt><strong>slower</strong>
<dd>Subtracts 40 words per minutes from the current speech rate.
</dl>

<!-- #include src=properties/voice-family.srb -->

<P>The value is a comma-separated, prioritized list of voice family
names (compare with <span
class="propinst-font-family">'font-family'</span>). Values have the
following meanings:</P>

<dl>
<dt><span class="index-def" title="&lt;generic-voice&gt;,
definition of"><a
name="value-def-generic-voice"><strong>&lt;generic-voice&gt;</strong></a></span>
<dd>Values are voice families. Possible values
are 'male', 'female', and 'child'.
<dt><span class="index-def" title="&lt;specific-voice&gt;::definition of"><a name="value-def-specific-voice"><strong>&lt;specific-voice&gt;</strong></a></span>
<dd>Values are specific instances (e.g., comedian, trinoids, carlos, lani).
</dl>

<div class="example"><P>

<pre>
h1 { voice-family: announcer, male }
p.part.romeo  { voice-family: romeo, male }
p.part.juliet { voice-family: juliet, female }
</pre>
</div>

<p>Names of specific voices may be quoted, and indeed must be quoted
if any of the words that make up the name does not conform to the
syntax rules for <a
href="syndata.html#tokenization">identifiers</a>. It is also
recommended to quote specific voices with a name consisting of more
than one word. If quoting is omitted, any <a
href="syndata.html#whitespace">white space</a> characters before and
after the voice family name are ignored and any sequence of white space
characters inside the voice family name is converted to a single space.

<!-- #include src=properties/pitch.srb -->

<p>Specifies the average pitch (a frequency) of the speaking voice.  The
average pitch of a voice depends on the voice family.  For example,
the average pitch for a standard male voice is around 120Hz,
but for a female voice, it's around 210Hz.</p>

<P>Values have the following meanings:</P>

<dl>
<dt><span class="index-inst" title="&lt;frequency&gt;"><span class="value-inst-frequency"><strong>&lt;frequency&gt;</strong></span></span>
<dd>Specifies the average pitch of the speaking voice in hertz (Hz).
<dt><strong>x-low</strong>, <strong>low</strong>,
<strong>medium</strong>, <strong>high</strong>, <strong>x-high</strong>
<dd>These values do not map to absolute frequencies since
these values depend on the voice family. User agents should map
these values to appropriate frequencies based on the voice family
and user environment. However, user agents must map these values in
order (i.e., 'x-low' is a lower frequency than 'low', etc.).
</dl>


<!-- #include src=properties/pitch-range.srb -->

<p>Specifies variation in average pitch.  The perceived pitch of a
human voice is determined by the fundamental frequency and typically
has a value of 120Hz for a male voice and 210Hz for a female voice.
Human languages are spoken with varying inflection and pitch; these
variations convey additional meaning and emphasis.  Thus, a highly
animated voice, i.e., one that is heavily inflected, displays a high
pitch range. This property specifies the range over which these
variations occur, i.e., how much the fundamental frequency may deviate
from the average pitch.

<P>Values have the following meanings:</p>

<dl>
<dt><span class="index-inst" title="&lt;number&gt;"><span class="value-inst-number"><strong>&lt;number&gt;</strong></span></span>
<dd>A value between '0' and '100'. A pitch range of '0' produces
a flat, monotonic voice. A pitch range of 50 produces normal
inflection.  Pitch ranges greater than 50 produce animated voices.
</dl>


<!-- #include src=properties/stress.srb -->

<p>Specifies the height of "local peaks" in the intonation contour
of a voice. For example, English is a <strong>stressed</strong>
language, and different parts of a sentence are assigned primary,
secondary, or tertiary stress. The value of <span
class="propinst-stress">'stress'</span> controls the amount of
inflection that results from these stress markers.  This property is a
companion to the <span
class="propinst-pitch-range">'pitch-range'</span> property and is
provided to allow developers to exploit higher-end auditory displays.

<P>Values have the following meanings:</p>

<dl>
<dt><span class="index-inst" title="&lt;number&gt;"><span class="value-inst-number"><strong>&lt;number&gt;</strong></span></span>
<dd>A value, between '0' and '100'. The meaning of values
depends on the language being spoken. For example,
a level of '50' for a
standard, English-speaking male voice (average pitch = 122Hz), speaking
with normal intonation and emphasis would have a different
meaning than '50' for an Italian voice.
</dl>

<!-- #include src=properties/richness.srb -->

<P>Specifies the richness, or brightness, of the speaking voice.  A
rich voice will "carry" in a large room, a smooth voice will not.
(The term "smooth" refers to how the wave form looks when drawn.)

<P>Values have the following meanings:</p>

<dl>
<dt><span class="index-inst" title="&lt;number&gt;"><span class="value-inst-number"><strong>&lt;number&gt;</strong></span></span>
<dd>A value between '0' and '100'.
The higher the value, the more the voice will carry.
A lower value will produce a soft, mellifluous voice.
</dl>

<H2><a name="speech-props">Speech properties</a>:
<span class="propinst-speak-punctuation">'speak-punctuation'</span>
and <span class="propinst-speak-numeral">'speak-numeral'</span>
</h2>

<p>An additional speech property, <span
class="propinst-speak-header">'speak-header'</span>, is
described below.

<!-- #include src=properties/speak-punctuation.srb -->

<P>This property specifies how punctuation is spoken.  Values have the
following meanings:</p>

<dl>
<dt><strong>code</strong>
<dd>Punctuation such as semicolons,
braces, and so on are to be spoken literally.
<dt><strong>none</strong>
<dd>Punctuation is not to be spoken, but instead rendered
naturally as various pauses.
</dl>

<!-- #include src=properties/speak-numeral.srb -->

<p>This property controls how numerals are spoken.  Values have the
following meanings:</P>

<dl>
<dt><strong>digits</strong>
<dd>Speak the numeral as individual digits. Thus, "237" is spoken
"Two Three Seven".
<dt><strong>continuous</strong>
<dd>Speak the numeral as a full number. Thus, "237" is spoken
"Two hundred thirty seven". Word representations are language-dependent.
</dl>

<h2><a name="aural-tables">Audio rendering of tables</a></h2>
<p>When a table is spoken by a speech generator, the relation between
the data cells and the header cells must be expressed in a different
way than by horizontal and vertical alignment. Some speech browsers
may allow a user to move around in the 2-dimensional space, thus
giving them the opportunity to map out the spatially represented
relations. When that is not possible, the style sheet must specify at
which points the headers are spoken.</p>

<h3><a name="speak-headers">Speaking headers:</a> the <span
class="propinst-speak-header">'speak-header'</span> property</h3>


<!-- #include src=properties/speak-header.srb -->

<P>This property specifies whether table headers
are spoken before every
cell, or only before a cell when that cell is associated with a
different header than the previous cell. Values have
the following meanings:</p>

<dl>
<dt><strong>once</strong>
<dd>The header is spoken one time, before a series of
cells.
<dt><strong>always</strong>
<dd>The header is spoken before every pertinent cell.
</dl>

<p>Each document language may have different mechanisms that allow
authors to specify headers. For example, in HTML 4 ([[HTML4]]),
it is possible to specify header information with three different
attributes ("headers", "scope", and "axis"), and the specification
gives an algorithm for determining header information when these
attributes have not been specified.</p>

<div class="html-example">
<div class="figure">
<P><img src="images/table1.png" alt="Image of a table created in MS
Word"><p class="caption"> Image of a table with header cells ("San
Jose" and "Seattle") that are not in the same column or row as the
data they apply to.
</div>

<p>This HTML example presents the money spent on meals, hotels and
transport in two locations (San Jose and Seattle) for successive
days. Conceptually, you can think of the table in terms of an
n-dimensional space. The headers of this space are: location, day,
category and subtotal. Some cells define marks along an axis while
others give money spent at points within this space. The markup
for this table is:</p>

<pre>
&lt;TABLE&gt;
&lt;CAPTION&gt;Travel Expense Report&lt;/CAPTION&gt;
&lt;TR&gt;
  &lt;TH&gt;&lt;/TH&gt;
  &lt;TH&gt;Meals&lt;/TH&gt;
  &lt;TH&gt;Hotels&lt;/TH&gt;
  &lt;TH&gt;Transport&lt;/TH&gt;
  &lt;TH&gt;subtotal&lt;/TH&gt;
&lt;/TR&gt;
&lt;TR&gt;
  &lt;TH id="san-jose" axis="san-jose"&gt;San Jose&lt;/TH&gt;
&lt;/TR&gt;
&lt;TR&gt;
  &lt;TH headers="san-jose"&gt;25-Aug-97&lt;/TH&gt;
  &lt;TD&gt;37.74&lt;/TD&gt;
  &lt;TD&gt;112.00&lt;/TD&gt;
  &lt;TD&gt;45.00&lt;/TD&gt;
  &lt;TD&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
  &lt;TH headers="san-jose"&gt;26-Aug-97&lt;/TH&gt;
  &lt;TD&gt;27.28&lt;/TD&gt;
  &lt;TD&gt;112.00&lt;/TD&gt;
  &lt;TD&gt;45.00&lt;/TD&gt;
  &lt;TD&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
  &lt;TH headers="san-jose"&gt;subtotal&lt;/TH&gt;
  &lt;TD&gt;65.02&lt;/TD&gt;
  &lt;TD&gt;224.00&lt;/TD&gt;
  &lt;TD&gt;90.00&lt;/TD&gt;
  &lt;TD&gt;379.02&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
  &lt;TH id="seattle" axis="seattle"&gt;Seattle&lt;/TH&gt;
&lt;/TR&gt;
&lt;TR&gt;
  &lt;TH headers="seattle"&gt;27-Aug-97&lt;/TH&gt;
  &lt;TD&gt;96.25&lt;/TD&gt;
  &lt;TD&gt;109.00&lt;/TD&gt;
  &lt;TD&gt;36.00&lt;/TD&gt;
  &lt;TD&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
  &lt;TH headers="seattle"&gt;28-Aug-97&lt;/TH&gt;
  &lt;TD&gt;35.00&lt;/TD&gt;
  &lt;TD&gt;109.00&lt;/TD&gt;
  &lt;TD&gt;36.00&lt;/TD&gt;
  &lt;TD&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
  &lt;TH headers="seattle"&gt;subtotal&lt;/TH&gt;
  &lt;TD&gt;131.25&lt;/TD&gt;
  &lt;TD&gt;218.00&lt;/TD&gt;
  &lt;TD&gt;72.00&lt;/TD&gt;
  &lt;TD&gt;421.25&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
  &lt;TH&gt;Totals&lt;/TH&gt;
  &lt;TD&gt;196.27&lt;/TD&gt;
  &lt;TD&gt;442.00&lt;/TD&gt;
  &lt;TD&gt;162.00&lt;/TD&gt;
  &lt;TD&gt;800.27&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TABLE&gt;
</pre>

<p>By providing the data model in this way, authors make it
possible for speech enabled-browsers to explore the table in
rich ways, e.g., each cell could be spoken as a list, repeating the
applicable headers before each data cell:</p>

<pre>
  San Jose, 25-Aug-97, Meals:  37.74
  San Jose, 25-Aug-97, Hotels:  112.00
  San Jose, 25-Aug-97, Transport:  45.00
 ...
</pre>

<p>The browser could also speak the headers only when they change:</p>

<pre>
San Jose, 25-Aug-97, Meals: 37.74
    Hotels: 112.00
    Transport: 45.00
  26-Aug-97, Meals: 27.28
    Hotels: 112.00
...
</pre>
</div>

<h2><a name="sample">Sample style sheet for HTML</a></h2>

<p>This style sheet describes a possible rendering of HTML 4:

<pre>
@media aural {
h1, h2, h3,
h4, h5, h6    { voice-family: paul, male; stress: 20; richness: 90 }
h1            { pitch: x-low; pitch-range: 90 }
h2            { pitch: x-low; pitch-range: 80 }
h3            { pitch: low; pitch-range: 70 }
h4            { pitch: medium; pitch-range: 60 }
h5            { pitch: medium; pitch-range: 50 }
h6            { pitch: medium; pitch-range: 40 }
li, dt, dd    { pitch: medium; richness: 60 }
dt            { stress: 80 }
pre, code, tt { pitch: medium; pitch-range: 0; stress: 0; richness: 80 }
em            { pitch: medium; pitch-range: 60; stress: 60; richness: 50 }
strong        { pitch: medium; pitch-range: 60; stress: 90; richness: 90 }
dfn           { pitch: high; pitch-range: 60; stress: 60 }
s, strike     { richness: 0 }
i             { pitch: medium; pitch-range: 60; stress: 60; richness: 50 }
b             { pitch: medium; pitch-range: 60; stress: 90; richness: 90 }
u             { richness: 0 }
a:link        { voice-family: harry, male }
a:visited     { voice-family: betty, female }
a:active      { voice-family: betty, female; pitch-range: 80; pitch: x-high }
}
</pre>

<h2><a name="Emacspeak">Emacspeak</a></h2>

<p>For information, here is the list of properties implemented by
Emacspeak, a speech subsystem for the Emacs editor.

<ul>
<li>voice-family
<li>stress (but with a different range of values)
<li>richness (but with a different range of values)
<li>pitch (but with differently named values)
<li>pitch-range (but with a different range of values)
</ul>

<p>(We thank T. V. Raman for the information about implementation
status of aural properties.)

</BODY>
</HTML>
<!-- Keep this comment at the end of the file
Local variables:
mode: sgml
sgml-declaration:"~/SGML/HTML4.decl"
sgml-default-doctype-name:"html"
sgml-minimize-attributes:t
sgml-nofill-elements:("pre" "style" "br")
End:
-->