@@ -436,8 +436,8 @@ <h2 id=ssml-rel><span class=secno>3. </span>Relationship with SSML</h2>
436436 However, the specificities of the CSS model mean that compatibility with
437437 SSML in terms of syntax and/or semantics is only partially achievable. The
438438 definition of each property in the Speech module includes informative
439- statements, wherever necessary, to clarify the relationship with similar
440- features in SSML.
439+ statements, wherever necessary, to clarify their relationship with similar
440+ functionality from SSML.
441441
442442 < h2 id =css-values > < span class =secno > 4. </ span > CSS values</ h2 >
443443
@@ -1206,7 +1206,8 @@ <h3 id=pause-props-pause-before-after><span class=secno>9.1. </span>The
12061206 property is similar to the < a
12071207 href ="http://www.w3.org/TR/speech-synthesis11/#edef_break "> < code > break</ code >
12081208 element</ a > from the SSML markup language < a href ="#SSML "
1209- rel =biblioentry > [SSML]<!--{{!SSML}}--> </ a > , the application of prosodic
1209+ rel =biblioentry > [SSML]<!--{{!SSML}}--> </ a > , the application of ‘< a
1210+ href ="#pause "> < code class =property > pause</ code > </ a > ’ prosodic
12101211 boundaries within the < a href ="#aural-model "> aural "box" model</ a > of CSS
12111212 Speech requires special considerations (e.g. < a
12121213 href ="#collapsed-pauses "> "collapsed" pauses</ a > ).
@@ -1482,11 +1483,15 @@ <h3 id=rest-props-rest-before-after><span class=secno>10.1. </span>The
14821483 that occurs before (or after) the speech synthesis rendition of an element
14831484 within the < a href ="#aural-model "> audio "box" model</ a > .
14841485
1485- < p class =note > Note that the functionality provided by this property is
1486- related to the < a
1486+ < p class =note > Note that although the functionality provided by this
1487+ property is similar to the < a
14871488 href ="http://www.w3.org/TR/speech-synthesis11/#edef_break "> < code > break</ code >
14881489 element</ a > from the SSML markup language < a href ="#SSML "
1489- rel =biblioentry > [SSML]<!--{{!SSML}}--> </ a > .
1490+ rel =biblioentry > [SSML]<!--{{!SSML}}--> </ a > , the application of ‘< a
1491+ href ="#rest "> < code class =property > rest</ code > </ a > ’ prosodic
1492+ boundaries within the < a href ="#aural-model "> aural "box" model</ a > of CSS
1493+ Speech requires special considerations (e.g. interspersed audio cues,
1494+ additive adjacent rests).
14901495
14911496 < dl >
14921497 < dt > < strong > <time></ strong >
@@ -1683,11 +1688,15 @@ <h3 id=cue-props-cue-before-after><span class=secno>11.1. </span>The
16831688 clips) to be played before (or after) the selected element within the < a
16841689 href ="#aural-model "> audio "box" model</ a > .
16851690
1686- < p class =note > Note that the functionality provided by this property is
1687- related to the < a
1691+ < p class =note > Note that although the functionality provided by this
1692+ property may appear related to the < a
16881693 href ="http://www.w3.org/TR/speech-synthesis11/#edef_audio "> < code > audio</ code >
16891694 element</ a > from the SSML markup language < a href ="#SSML "
1690- rel =biblioentry > [SSML]<!--{{!SSML}}--> </ a > .
1695+ rel =biblioentry > [SSML]<!--{{!SSML}}--> </ a > , there are in fact major
1696+ discrepancies. For example, the < a href ="#aural-model "> aural "box"
1697+ model</ a > means that audio cues are associated to the selected element's
1698+ volume level, and CSS Speech's auditory icons provide limited
1699+ functionality compared to SSML's < code > audio</ code > element.
16911700
16921701 < dl >
16931702 < dt > < strong > <uri></ strong >
@@ -1936,11 +1945,14 @@ <h3 id=voice-props-voice-family><span class=secno>12.1. </span>The
19361945 < p > < strong > <generic-voice></ strong > = [<age>? <gender>
19371946 <integer>?]
19381947
1939- < p class =note > Note that the functionality provided by this property is
1940- related to the < a
1948+ < p class =note > Note that although the functionality provided by this
1949+ property is similar to the < a
19411950 href ="http://www.w3.org/TR/speech-synthesis11/#edef_voice "> < code > voice</ code >
19421951 element</ a > from the SSML markup language < a href ="#SSML "
1943- rel =biblioentry > [SSML]<!--{{!SSML}}--> </ a > .
1952+ rel =biblioentry > [SSML]<!--{{!SSML}}--> </ a > , CSS Speech does not provide an
1953+ equivalent to SSML's sophisticated voice language selection. This
1954+ technical limitation may be alleviated in a future revision of the Speech
1955+ module.
19441956
19451957 < dl >
19461958 < dt > < strong > <name></ strong >
@@ -1986,24 +1998,14 @@ <h3 id=voice-props-voice-family><span class=secno>12.1. </span>The
19861998 < p > Possible values are ‘< code class =property > child</ code > ’,
19871999 ‘< code class =property > young</ code > ’ and ‘< code
19882000 class =property > old</ code > ’, indicating the preferred age category
1989- to match during voice selection. The mapping with < a href ="#SSML "
1990- rel =biblioentry > [SSML]<!--{{!SSML}}--> </ a > ages is defined as follows:
1991- ‘< code class =property > child</ code > ’ = 6 y/o, ‘< code
1992- class =property > young</ code > ’ = 24 y/o, ‘< code
1993- class =property > old</ code > ’ = 75 y/o (note that more flexible age
1994- ranges may be used by the processor-dependent voice-matching algorithm).
1995- </ p >
2001+ to match during voice selection.</ p >
19962002
1997- < p class =note > Note that the interpretation of the relationship between a
1998- person's age and a recognizable type of voice cannot realistically be
1999- defined in a universal manner, as it effectively depends on numerous
2000- criteria (cultural, linguistic, biological, etc.). The values provided
2001- by this specification therefore represent a simplified model that can be
2002- reasonably applied to a broad variety of speech contexts, albeit at the
2003- cost of a certain degree of approximation. Future versions of this
2004- specification may refine the level of precision of the voice-matching
2005- algorithm, as speech processor implementations become more standardized.
2006- </ p >
2003+ < p class =note > Note that a recommended mapping with < a href ="#SSML "
2004+ rel =biblioentry > [SSML]<!--{{!SSML}}--> </ a > ages is: ‘< code
2005+ class =property > child</ code > ’ = 6 y/o, ‘< code
2006+ class =property > young</ code > ’ = 24 y/o, ‘< code
2007+ class =property > old</ code > ’ = 75 y/o. More flexible age ranges may
2008+ be used by the processor-dependent voice-matching algorithm.</ p >
20072009
20082010 < dt > < strong > <gender></ strong >
20092011
@@ -2013,6 +2015,17 @@ <h3 id=voice-props-voice-family><span class=secno>12.1. </span>The
20132015 class =property > neutral</ code > ’, specifying a male, female, or
20142016 neutral voice, respectively.</ p >
20152017
2018+ < p class =note > Note that the interpretation of the relationship between a
2019+ person's age or gender, and a recognizable type of voice, cannot
2020+ realistically be defined in a universal manner as it effectively depends
2021+ on numerous criteria (cultural, linguistic, biological, etc.). The
2022+ functionality provided by this specification therefore represent a
2023+ simplified model that can be reasonably applied to a broad variety of
2024+ speech contexts, albeit at the cost of a certain degree of
2025+ approximation. Future versions of this specification may refine the
2026+ level of precision of the voice-matching algorithm, as speech processor
2027+ implementations become more standardized.</ p >
2028+
20162029 < dt > < strong > <integer></ strong >
20172030
20182031 < dd >
@@ -2184,11 +2197,14 @@ <h3 id=voice-props-voice-rate><span class=secno>12.2. </span>The ‘<a
21842197 class =property > voice-rate</ code > </ a > ’ property manipulates the rate
21852198 of generated synthetic speech in terms of words per minute.
21862199
2187- < p class =note > Note that the functionality provided by this property is
2188- related to the < a
2200+ < p class =note > Note that although the functionality provided by this
2201+ property is similar to the < a
21892202 href ="http://www.w3.org/TR/speech-synthesis11/#edef_prosody "> < code > rate</ code >
21902203 attribute of the < code > prosody</ code > element</ a > from the SSML markup
2191- language < a href ="#SSML " rel =biblioentry > [SSML]<!--{{!SSML}}--> </ a > .
2204+ language < a href ="#SSML " rel =biblioentry > [SSML]<!--{{!SSML}}--> </ a > , there
2205+ are notable discrepancies. For example, CSS Speech rate keywords and
2206+ percentage modifiers are not mutually-exclusive, due to how values are
2207+ inherited and combined for selected elements.
21922208
21932209 < dl >
21942210 < dt > < strong > normal</ strong >
@@ -2323,11 +2339,15 @@ <h3 id=voice-props-voice-pitch><span class=secno>12.3. </span>The ‘<a
23232339 pitch of the output). For example, the common pitch for a male voice is
23242340 around 120Hz, whereas it is around 210Hz for a female voice.
23252341
2326- < p class =note > Note that the functionality provided by this property is
2327- related to the < a
2342+ < p class =note > Note that although the functionality provided by this
2343+ property is similar to the < a
23282344 href ="http://www.w3.org/TR/speech-synthesis11/#edef_prosody "> < code > pitch</ code >
23292345 attribute of the < code > prosody</ code > element</ a > from the SSML markup
2330- language < a href ="#SSML " rel =biblioentry > [SSML]<!--{{!SSML}}--> </ a > .
2346+ language < a href ="#SSML " rel =biblioentry > [SSML]<!--{{!SSML}}--> </ a > , there
2347+ are notable discrepancies. For example, CSS Speech pitch keywords and
2348+ relative changes (frequency, semitone or percentage) are not
2349+ mutually-exclusive, due to how values are inherited and combined for
2350+ selected elements.
23312351
23322352 < dl >
23332353 < dt > < strong > <frequency></ strong >
@@ -2483,11 +2503,15 @@ <h3 id=voice-props-voice-range><span class=secno>12.4. </span>The ‘<a
24832503 to convey meaning and emphasis in speech. Typically, a low range produces
24842504 a flat, monotonic voice, whereas a high range produces an animated voice.
24852505
2486- < p class =note > Note that the functionality provided by this property is
2487- related to the < a
2506+ < p class =note > Note that although the functionality provided by this
2507+ property is similar to the < a
24882508 href ="http://www.w3.org/TR/speech-synthesis11/#edef_prosody "> < code > range</ code >
24892509 attribute of the < code > prosody</ code > element</ a > from the SSML markup
2490- language < a href ="#SSML " rel =biblioentry > [SSML]<!--{{!SSML}}--> </ a > .
2510+ language < a href ="#SSML " rel =biblioentry > [SSML]<!--{{!SSML}}--> </ a > , there
2511+ are notable discrepancies. For example, CSS Speech pitch range keywords
2512+ and relative changes (frequency, semitone or percentage) are not
2513+ mutually-exclusive, due to how values are inherited and combined for
2514+ selected elements.
24912515
24922516 < dl >
24932517 < dt > < strong > <frequency></ strong >
@@ -2689,7 +2713,7 @@ <h3 id=voice-props-voice-stress><span class=secno>12.5. </span>The
26892713 spoken.
26902714
26912715 < p class =note > Note that the functionality provided by this property is
2692- related to the < a
2716+ similar to the < a
26932717 href ="http://www.w3.org/TR/speech-synthesis11/#edef_emphasis "> < code > emphasis</ code >
26942718 element</ a > from the SSML markup language < a href ="#SSML "
26952719 rel =biblioentry > [SSML]<!--{{!SSML}}--> </ a > .
@@ -2817,7 +2841,7 @@ <h3 id=mixing-props-voice-duration><span class=secno>13.1. </span>The
28172841 property).
28182842
28192843 < p class =note > Note that the functionality provided by this property is
2820- related to the < a
2844+ similar to the < a
28212845 href ="http://www.w3.org/TR/speech-synthesis11/#edef_prosody "> < code > duration</ code >
28222846 attribute of the < code > prosody</ code > element</ a > from the SSML markup
28232847 language < a href ="#SSML " rel =biblioentry > [SSML]<!--{{!SSML}}--> </ a > .
@@ -2916,7 +2940,7 @@ <h2 id=content><span class=secno>15. </span>Inserted and replaced content</h2>
29162940 unlikely to be recognized by the synthesizer. The ‘< a
29172941 href ="#content-def "> < code class =property > content</ code > </ a > ’
29182942 property can be used to replace one string by another. The functionality
2919- provided by this property is related to the < a
2943+ provided by this property is similar to the < a
29202944 href ="http://www.w3.org/TR/speech-synthesis11/#edef_sub "> < code > alias</ code >
29212945 attribute of the < code > sub</ code > element</ a > from the SSML markup
29222946 language < a href ="#SSML " rel =biblioentry > [SSML]<!--{{!SSML}}--> </ a > .
@@ -3002,11 +3026,11 @@ <h2 id=pronunciation><span class=secno>16. </span> Pronunciation, phonemes</h2>
30023026 < p > Additionally, an attribute-based mechanism can be used within the
30033027 markup to author text-pronunciation associations. At the time of writing,
30043028 such mechanism isn't formally defined in the W3C HTML standard(s).
3005- However, the < a href ="http://idpf.org/epub/30 "> EPUB 3.0 draft
3006- specification </ a > allows (x)HTML5 documents to contain attributes derived
3007- from the < a href ="#SSML " rel =biblioentry > [SSML]<!--{{!SSML}}--> </ a >
3008- specification, that describe how to pronounce text based on a particular
3009- phonetic alphabet.</ p >
3029+ However, the < a href ="http://idpf.org/epub/30 "> EPUB 3.0 specification </ a >
3030+ allows (x)HTML5 documents to contain attributes derived from the < a
3031+ href ="#SSML " rel =biblioentry > [SSML]<!--{{!SSML}}--> </ a > specification,
3032+ that describe how to pronounce text based on a particular phonetic
3033+ alphabet.</ p >
30103034 <!-- p>
30113035 One avenue to explore is the use CSS to "bind" HTML text with a
30123036 phoneme (also declared in the HTML document). This would maintain a
0 commit comments