9090
9191 < h1 id =top > CSS Speech Module</ h1 >
9292
93- < h2 class ="no-num no-toc " id =longstatus-date > Editor's Draft 06 July 2011</ h2 >
93+ < h2 class ="no-num no-toc " id =longstatus-date > Editor's Draft 07 July 2011</ h2 >
9494
9595 < dl >
9696 < dt > This version:
9797
9898 < dd >
99- <!--<a href="http://www.w3.org/TR/2011/WD-css3-speech-20110706 ">http://www.w3.org/TR/2011/ED-css3-speech-20110706 /</a>-->
99+ <!--<a href="http://www.w3.org/TR/2011/WD-css3-speech-20110707 ">http://www.w3.org/TR/2011/ED-css3-speech-20110707 /</a>-->
100100 < a
101101 href ="http://dev.w3.org/csswg/css3-speech "> http://dev.w3.org/csswg/css3-speech</ a >
102102
@@ -442,6 +442,7 @@ <h2 id=example><span class=secno>3. </span>Example</h2>
442442 voice-family: paul;
443443 voice-stress: moderate;
444444 cue-before: url(../audio/ping.wav);
445+ voice-volume: medium 6dB;
445446}
446447p.heidi
447448{
@@ -516,13 +517,13 @@ <h3 id=mixing-props-voice-volume><span class=secno>5.1. </span>The
516517 < tr >
517518 < td > < em > Value:</ em >
518519
519- < td > normal | silent | x-soft | soft | medium | loud | x-loud |
520- <decibel>
520+ < td > silent | [[ x-soft | soft | medium | loud | x-loud] | |
521+ <decibel>]
521522
522523 < tr >
523524 < td > < em > Initial:</ em >
524525
525- < td > normal
526+ < td > medium
526527
527528 < tr >
528529 < td > < em > Applies to:</ em >
@@ -547,7 +548,7 @@ <h3 id=mixing-props-voice-volume><span class=secno>5.1. </span>The
547548 < tr >
548549 < td > < em > Computed value:</ em >
549550
550- < td > specified value
551+ < td > keyword value, and decibel offset (if not zero)
551552 </ table >
552553
553554 < p > The ‘< a href ="#voice-volume "> < code
@@ -563,12 +564,13 @@ <h3 id=mixing-props-voice-volume><span class=secno>5.1. </span>The
563564 attribute of the < code > prosody</ code > element</ a > from the SSML markup
564565 language < a href ="#SSML " rel =biblioentry > [SSML]<!--{{!SSML}}--> </ a > .
565566
566- < dl >
567- < dt > < strong > normal</ strong >
568-
569- < dd >
570- < p > Corresponds to +0.0dB, which means that there is no modification of
571- volume level. This value overrides the inherited value.</ p >
567+ < dl > <!-- dt>
568+ <strong>normal</strong>
569+ </dt>
570+ <dd>
571+ <p> Corresponds to +0.0dB, which means that there is no modification of volume level. This
572+ value overrides the inherited value.</p>
573+ </dd -->
572574
573575 < dt > < strong > silent</ strong >
574576
@@ -582,9 +584,9 @@ <h3 id=mixing-props-voice-volume><span class=secno>5.1. </span>The
582584 ‘< code class =property > silent</ code > ’, and an element whose
583585 ‘< a href ="#speak "> < code class =property > speak</ code > </ a > ’
584586 property has the value ‘< code class =property > none</ code > ’.
585- With the former, the selected takes up the same time as if it had been
586- spoken, including any pause before and after the element, but no sound
587- is generated (descendants can override the ‘< a
587+ With the former, the selected element takes up the same time as if it
588+ was spoken, including any pause before and after the element, but no
589+ sound is generated (descendants can override the ‘< a
588590 href ="#voice-volume "> < code class =property > voice-volume</ code > </ a > ’
589591 value and may therefore generate audio output). With the latter, the
590592 selected element is not rendered in the aural dimension and no time is
@@ -598,8 +600,8 @@ <h3 id=mixing-props-voice-volume><span class=secno>5.1. </span>The
598600 < dd >
599601 < p > This sequence of keywords corresponds to monotonically non-decreasing
600602 volume levels, mapped to implementation-dependent values (i.e. inferred
601- by the user-agent) that meet user's requirements in terms of perceived
602- sound loudness . The keyword ‘< code
603+ by the user-agent) that meet the user's requirements in terms of
604+ perceived sound loudness . The keyword ‘< code
603605 class =property > x-soft</ code > ’ maps to the user's < em > minimum
604606 audible</ em > volume level, ‘< code
605607 class =property > x-loud</ code > ’ maps to the user's < em > maximum
@@ -614,10 +616,17 @@ <h3 id=mixing-props-voice-volume><span class=secno>5.1. </span>The
614616 < dd >
615617 < p > A < a href ="#number-def "> number</ a > immediately followed by "dB"
616618 (decibel unit). This represents a change (positive or negative) relative
617- to the default value for the root element, or to the inherited volume
618- level otherwise. This is expressed as the ratio of the squares of the
619- new signal amplitude (a1) and the current amplitude (a0), as per the
620- following logarithmic equation: volume(dB) = 20 log10 (a1 / a0)</ p >
619+ to the given keyword value (see enumeration above), or to the default
620+ value for the root element, or otherwise to the inherited volume level
621+ (which may itself be be a combination of a keyword value and of a
622+ decibel offset). When the inherited volume level is ‘< code
623+ class =property > silent</ code > ’, this ‘< a
624+ href ="#voice-volume "> < code class =property > voice-volume</ code > </ a > ’
625+ resolves to ‘< code class =property > silent</ code > ’ too,
626+ regardless of the provided <decibel> value. Decibels express the
627+ ratio
F882
of the squares of the new signal amplitude (a1) and the current
628+ amplitude (a0), as per the following logarithmic equation: volume(dB) =
629+ 20 log10 (a1 / a0)</ p >
621630
622631 < p class =note > Note that -6.0dB is approximately half the amplitude of
623632 the audio signal, and +6.0dB is approximately twice the amplitude.</ p >
@@ -1369,9 +1378,8 @@ <h3 id=rest-props-rest-before-after><span class=secno>8.1. </span>The
13691378 < dt > < strong > none</ strong >
13701379
13711380 < dd >
1372- < p > Equivalent to 0ms (no prosodic break in the speech output). This
1373- value can be used to inhibit a prosodic break which the processor would
1374- otherwise produce.</ p >
1381+ < p > Equivalent to 0ms (no prosodic break is produced by the speech
1382+ processor).</ p >
13751383
13761384 < dt > < strong > x-weak</ strong > , < strong > weak</ strong > ,
13771385 < strong > medium</ strong > , < strong > strong</ strong > , and
@@ -1579,23 +1587,18 @@ <h3 id=cue-props-cue-before-after><span class=secno>9.1. </span>The
15791587 < dd >
15801588 < p > A < a href ="#number-def "> number</ a > immediately followed by "dB"
15811589 (decibel unit). This represents a change (positive or negative) relative
1582- to the default sound level of audio clip. This is expressed as the ratio
1590+ to the computed value of the ‘< a href ="#voice-volume "> < code
1591+ class =property > voice-volume</ code > </ a > ’ property within the < a
1592+ href ="#aural-model "> aural "box" model</ a > of the selected element. When
1593+ the ‘< a href ="#voice-volume "> < code
1594+ class =property > voice-volume</ code > </ a > ’ property is set to
1595+ ‘< code class =property > silent</ code > ’, the audio cue is also
1596+ set to ‘< code class =property > silent</ code > ’ (regardless of
1597+ the value provided for this <decibel>). Decibels express the ratio
15831598 of the squares of the new signal amplitude (a1) and the current
15841599 amplitude (a0), as per the following logarithmic equation: volume(dB) =
15851600 20 log10 (a1 / a0)</ p >
15861601
1587- < p > Audio cues apply to the selected element within the < a
1588- href ="#aural-model "> audio "box" model</ a > , so when the inherited value
1589- from the ‘< a href ="#voice-volume "> < code
1590- class =property > voice-volume</ code > </ a > ’ property is ‘< code
1591- class =property > silent</ code > ’, the volume level for the audio cue
1592- is resolved to -infinity decibels (which effectively silences the audio
1593- cue), regardless of the value provided for this <decibel>. In
1594- other words, a selected element can be entirely silenced (i.e. including
1595- its associated audio cues) by setting the ‘< a
1596- href ="#voice-volume "> < code class =property > voice-volume</ code > </ a > ’
1597- property to ‘< code class =property > silent</ code > ’.</ p >
1598-
15991602 < p class =note > Note that -6.0dB is approximately half the amplitude of
16001603 the audio signal, and +6.0dB is approximately twice the amplitude.</ p >
16011604
@@ -1802,6 +1805,12 @@ <h3 id=voice-props-voice-family><span class=secno>10.1. </span>The
18021805 rel =biblioentry > [SSML]<!--{{!SSML}}--> </ a > , voice names are
18031806 space-separated and cannot contain whitespace characters.</ p >
18041807
1808+ < p > It is recommended to quote voice names that contain white space,
1809+ digits, or punctuation characters other than hyphens - even if these
1810+ voice names are valid in unquoted form - in order to improve code
1811+ clarity. For example: < code > voice-family: "john doe", "Henry
1812+ the-8th";</ code > </ p >
1813+
18051814 < dt > < strong > <age></ strong >
18061815
18071816 < dd >
@@ -1855,15 +1864,6 @@ <h3 id=voice-props-voice-family><span class=secno>10.1. </span>The
18551864voice-family: john 1st; /* identifier cannot start with digit */</ pre >
18561865 </ div >
18571866
1858- < div class =example >
1859- < p > This is an example of valid voice names that contain white space,
1860- digits, or punctuation characters other than hyphens, but which are
1861- quoted nonetheless, for reading clarity.</ p >
1862-
1863- < pre >
1864- voice-family: "john doe", "Henry the-8th";</ pre >
1865- </ div >
1866-
18671867 < h4 class =no-toc id =voice-selection > < span class =secno > 10.1.1. </ span > Voice
18681868 selection, content language</ h4 >
18691869
@@ -2079,10 +2079,12 @@ <h3 id=voice-props-voice-pitch><span class=secno>10.3. </span>The ‘<a
20792079
20802080 < p > The ‘< a href ="#voice-pitch "> < code
20812081 class =property > voice-pitch</ code > </ a > ’ property specifies the
2082- average pitch of generated speech output, and depends on the ‘< a
2083- href ="#voice-family "> < code class =property > voice-family</ code > </ a > ’.
2084- For example, the default average pitch for a common male voice is around
2085- 120Hz, whereas it is around 210Hz for a female voice.
2082+ "baseline" pitch of the generated speech output, which depends on the used
2083+ ‘< a href ="#voice-family "> < code
2084+ class =property > voice-family</ code > </ a > ’ instance, and varies across
2085+ speech synthesis processors (it approximately corresponds to the average
2086+ pitch of the output). For example, the common pitch for a male voice is
2087+ around 120Hz, whereas it is around 210Hz for a female voice.
20862088
20872089 < p class =note > Note that the functionality provided by this property is
20882090 related to the < a
@@ -2095,24 +2097,18 @@ <h3 id=voice-props-voice-pitch><span class=secno>10.3. </span>The ‘<a
20952097
20962098 < dd >
20972099 < p > A value in < a href ="#frequency-def "> frequency</ a > units (Hertz or
2098- kiloHertz, e.g. "100Hz", "+2kHz"). Unless the ‘< code
2099- class =property > relative</ code > ’ keyword is used, values are
2100- restricted to positive numbers (using negative numbers results in the
2101- property value being ignored). When the ‘< code
2102- class =property > relative</ code > ’ keyword is used, the provided
2103- value specifies a relative change (decrement or increment) to the
2104- inherited value. When the ‘< code
2105- class =property > relative</ code > ’ keyword is not used, the provided
2106- value specifies the average pitch of the speaking voice, expressed as an
2107- absolute frequency.</ p >
2100+ kiloHertz, e.g. "100Hz", "+2kHz"). Values are restricted to positive
2101+ numbers (unless the ‘< code class =property > relative</ code > ’
2102+ keyword is used), and using negative numbers results in the property
2103+ value being ignored.</ p >
21082104
21092105 < dt > < strong > relative</ strong >
21102106
21112107 < dd >
21122108 < p > This keyword specifies that the provided frequency value is expressed
2113- relatively to another base value. This disambiguates absolute positive
2114- <frequency> values from increments (e.g. "+2kHz" can either be an
2115- increment or an absolute value) .</ p >
2109+ relatively to the inherited value, with positive or negative numbers.
2110+ For example, "+2kHz relative" is an increment, unlike "+2kHz" which is a
2111+ positive absolute value.</ p >
21162112
21172113 < dt > < strong > <semitones></ strong >
21182114
@@ -2132,7 +2128,7 @@ <h3 id=voice-props-voice-pitch><span class=secno>10.3. </span>The ‘<a
21322128 < p > Only non-negative < a href ="#percentage-def "> percentage</ a > values are
21332129 allowed. Computed values are calculated relative to the inherited value.
21342130 For example, 50% means that the inherited value gets multiplied by 0.5,
2135- which results in half the inherited average pitch of the voice.</ p >
2131+ which results in half the inherited pitch of the voice.</ p >
21362132
21372133 < dt > < strong > x-low</ strong > , < strong > low</ strong > , < strong > medium</ strong > ,
21382134 < strong > high</ strong > , < strong > x-high</ strong >
@@ -2150,8 +2146,10 @@ <h3 id=voice-props-voice-pitch><span class=secno>10.3. </span>The ‘<a
21502146h1 { voice-pitch: +250Hz; } /* identical to the line above */
21512147h2 { voice-pitch: +30Hz relative; }
21522148h2 { voice-pitch: 30Hz relative; } /* identical to the line above */
2153- h3 { voice-pitch: relative -2st; } /* the swapped keyword placement is a legal syntax */
2154- h4 { voice-pitch: -2st; } /* Illegal syntax ! ("relative" keyword is missing) */</ pre >
2149+ h3 { voice-pitch: relative -20Hz; } /* the swapped keyword placement is a legal syntax */
2150+ h4 { voice-pitch: -20Hz; } /* Illegal syntax ! ("relative" keyword is missing for negative frequency) */
2151+ h4 { voice-pitch: -3.5st; } /* Legal syntax: semitones are always relative, no need for the keyword. */
2152+ </ pre >
21552153 </ div >
21562154
21572155 < h3 id =voice-props-voice-pitch-range > < span class =secno > 10.4. </ span > The
@@ -2204,11 +2202,12 @@ <h3 id=voice-props-voice-pitch-range><span class=secno>10.4. </span>The
22042202
22052203 < p > The ‘< a href ="#voice-pitch-range "> < code
22062204 class =property > voice-pitch-range</ code > </ a > ’ property specifies the
2207- variability in average pitch, i.e. how much the fundamental frequency may
2208- deviate from the average pitch. The dynamic pitch range of the generated
2209- speech output typically increases for a highly animated voice, for example
2210- when variations in inflection are used to convey meaning and emphasis in
2211- speech.
2205+ variability in the "baseline" pitch, i.e. how much the fundamental
2206+ frequency may deviate from the average pitch of the speech output. The
2207+ dynamic pitch range of the generated speech generally increases for a
2208+ highly animated voice, for example when variations in inflection are used
2209+ to convey meaning and emphasis in speech. Typically, a low range produces
2210+ a flat, monotonic voice, whereas a high range produces an animated voice.
22122211
22132212 < p class =note > Note that the functionality provided by this property is
22142213 related to the < a
@@ -2221,27 +2220,18 @@ <h3 id=voice-props-voice-pitch-range><span class=secno>10.4. </span>The
22212220
22222221 < dd >
22232222 < p > A value in < a href ="#frequency-def "> frequency</ a > units (Hertz or
2224- kiloHertz, e.g. "100Hz", "+2kHz"). Unless the ‘< code
2225- class =property > relative</ code > ’ keyword is used, values are
2226- restricted to positive numbers (using negative numbers results in the
2227- property value being ignored). When the ‘< code
2228- class =property > relative</ code > ’ keyword is used, the provided
2229- value specifies a relative change (decrement or increment) to the
2230- inherited value. When the ‘< code
2231- class =property > relative</ code > ’ keyword is not used, the provided
2232- value specifies the average pitch of the speaking voice, expressed as an
2233- absolute frequency.</ p >
2234-
2235- < p class =note > Low ranges produce a flat, monotonic voice. A high range
2236- produces animated voices.</ p >
2223+ kiloHertz, e.g. "100Hz", "+2kHz"). Values are restricted to positive
2224+ numbers (unless the ‘< code class =property > relative</ code > ’
2225+ keyword is used), and using negative numbers results in the property
2226+ value being ignored.</ p >
22372227
22382228 < dt > < strong > relative</ strong >
22392229
22402230 < dd >
22412231 < p > This keyword specifies that the provided frequency value is expressed
2242- relatively to another base value. This disambiguates absolute positive
2243- <frequency> values from increments (e.g. "+2kHz" can either be an
2244- increment or an absolute value) .</ p >
2232+ relatively to the inherited value, with positive or negative numbers.
2233+ For example, "+2kHz relative" is an increment, unlike "+2kHz" which is a
2234+ positive absolute value.</ p >
22452235
22462236 < dt > < strong > <semitones></ strong >
22472237
@@ -2260,7 +2250,7 @@ <h3 id=voice-props-voice-pitch-range><span class=secno>10.4. </span>The
22602250 < p > Only non-negative < a href ="#percentage-def "> percentage</ a > values are
22612251 allowed. Computed values are calculated relative to the inherited value.
22622252 For example, 50% means that the inherited value gets multiplied by 0.5,
2263- which results in half the inherited average pitch range of the voice.</ p >
2253+ which results in half the inherited pitch range of the voice.</ p >
22642254
22652255 < dt > < strong > x-low</ strong > , < strong > low</ strong > , < strong > medium</ strong > ,
22662256 < strong > high</ strong > and < strong > x-high</ strong >
@@ -2958,10 +2948,10 @@ <h2 class=no-num id=property-index>Appendix A — Property index</h2>
29582948 < tr >
29592949 < td > < a class =property href ="#voice-volume "> voice-volume</ a >
29602950
2961- < td > normal | silent | x-soft | soft | medium | loud | x-loud |
2962- <decibel>
2951+ < td > silent | [[ x-soft | soft | medium | loud | x-loud] | |
2952+ <decibel>]
29632953
2964- < td > normal
2954+ < td > medium
29652955
29662956
4166
code> < td > all elements
29672957
0 commit comments