Skip to content

Commit 47b7828

Browse files
committed
finished SSML relationship prose fixes.
1 parent 808c936 commit 47b7828

2 files changed

Lines changed: 127 additions & 84 deletions

File tree

css3-speech/Overview.html

Lines changed: 70 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -436,8 +436,8 @@ <h2 id=ssml-rel><span class=secno>3. </span>Relationship with SSML</h2>
436436
However, the specificities of the CSS model mean that compatibility with
437437
SSML in terms of syntax and/or semantics is only partially achievable. The
438438
definition of each property in the Speech module includes informative
439-
statements, wherever necessary, to clarify the relationship with similar
440-
features in SSML.
439+
statements, wherever necessary, to clarify their relationship with similar
440+
functionality from SSML.
441441

442442
<h2 id=css-values><span class=secno>4. </span>CSS values</h2>
443443

@@ -1206,7 +1206,8 @@ <h3 id=pause-props-pause-before-after><span class=secno>9.1. </span>The
12061206
property is similar to the <a
12071207
href="http://www.w3.org/TR/speech-synthesis11/#edef_break"><code>break</code>
12081208
element</a> from the SSML markup language <a href="#SSML"
1209-
rel=biblioentry>[SSML]<!--{{!SSML}}--></a>, the application of prosodic
1209+
rel=biblioentry>[SSML]<!--{{!SSML}}--></a>, the application of &lsquo;<a
1210+
href="#pause"><code class=property>pause</code></a>&rsquo; prosodic
12101211
boundaries within the <a href="#aural-model">aural "box" model</a> of CSS
12111212
Speech requires special considerations (e.g. <a
12121213
href="#collapsed-pauses">"collapsed" pauses</a>).
@@ -1482,11 +1483,15 @@ <h3 id=rest-props-rest-before-after><span class=secno>10.1. </span>The
14821483
that occurs before (or after) the speech synthesis rendition of an element
14831484
within the <a href="#aural-model">audio "box" model</a>.
14841485

1485-
<p class=note> Note that the functionality provided by this property is
1486-
related to the <a
1486+
<p class=note> Note that although the functionality provided by this
1487+
property is similar to the <a
14871488
href="http://www.w3.org/TR/speech-synthesis11/#edef_break"><code>break</code>
14881489
element</a> from the SSML markup language <a href="#SSML"
1489-
rel=biblioentry>[SSML]<!--{{!SSML}}--></a>.
1490+
rel=biblioentry>[SSML]<!--{{!SSML}}--></a>, the application of &lsquo;<a
1491+
href="#rest"><code class=property>rest</code></a>&rsquo; prosodic
1492+
boundaries within the <a href="#aural-model">aural "box" model</a> of CSS
1493+
Speech requires special considerations (e.g. interspersed audio cues,
1494+
additive adjacent rests).
14901495

14911496
<dl>
14921497
<dt> <strong>&lt;time&gt;</strong>
@@ -1683,11 +1688,15 @@ <h3 id=cue-props-cue-before-after><span class=secno>11.1. </span>The
16831688
clips) to be played before (or after) the selected element within the <a
16841689
href="#aural-model">audio "box" model</a>.
16851690

1686-
<p class=note> Note that the functionality provided by this property is
1687-
related to the <a
1691+
<p class=note> Note that although the functionality provided by this
1692+
property may appear related to the <a
16881693
href="http://www.w3.org/TR/speech-synthesis11/#edef_audio"><code>audio</code>
16891694
element</a> from the SSML markup language <a href="#SSML"
1690-
rel=biblioentry>[SSML]<!--{{!SSML}}--></a>.
1695+
rel=biblioentry>[SSML]<!--{{!SSML}}--></a>, there are in fact major
1696+
discrepancies. For example, the <a href="#aural-model">aural "box"
1697+
model</a> means that audio cues are associated to the selected element's
1698+
volume level, and CSS Speech's auditory icons provide limited
1699+
functionality compared to SSML's <code>audio</code> element.
16911700

16921701
<dl>
16931702
<dt> <strong>&lt;uri&gt;</strong>
@@ -1936,11 +1945,14 @@ <h3 id=voice-props-voice-family><span class=secno>12.1. </span>The
19361945
<p> <strong>&lt;generic-voice&gt;</strong> = [&lt;age&gt;? &lt;gender&gt;
19371946
&lt;integer&gt;?]
19381947

1939-
<p class=note> Note that the functionality provided by this property is
1940-
related to the <a
1948+
<p class=note> Note that although the functionality provided by this
1949+
property is similar to the <a
19411950
href="http://www.w3.org/TR/speech-synthesis11/#edef_voice"><code>voice</code>
19421951
element</a> from the SSML markup language <a href="#SSML"
1943-
rel=biblioentry>[SSML]<!--{{!SSML}}--></a>.
1952+
rel=biblioentry>[SSML]<!--{{!SSML}}--></a>, CSS Speech does not provide an
1953+
equivalent to SSML's sophisticated voice language selection. This
1954+
technical limitation may be alleviated in a future revision of the Speech
1955+
module.
19441956

19451957
<dl>
19461958
<dt> <strong>&lt;name&gt;</strong>
@@ -1986,24 +1998,14 @@ <h3 id=voice-props-voice-family><span class=secno>12.1. </span>The
19861998
<p> Possible values are &lsquo;<code class=property>child</code>&rsquo;,
19871999
&lsquo;<code class=property>young</code>&rsquo; and &lsquo;<code
19882000
class=property>old</code>&rsquo;, indicating the preferred age category
1989-
to match during voice selection. The mapping with <a href="#SSML"
1990-
rel=biblioentry>[SSML]<!--{{!SSML}}--></a> ages is defined as follows:
1991-
&lsquo;<code class=property>child</code>&rsquo; = 6 y/o, &lsquo;<code
1992-
class=property>young</code>&rsquo; = 24 y/o, &lsquo;<code
1993-
class=property>old</code>&rsquo; = 75 y/o (note that more flexible age
1994-
ranges may be used by the processor-dependent voice-matching algorithm).
1995-
</p>
2001+
to match during voice selection.</p>
19962002

1997-
<p class=note> Note that the interpretation of the relationship between a
1998-
person's age and a recognizable type of voice cannot realistically be
1999-
defined in a universal manner, as it effectively depends on numerous
2000-
criteria (cultural, linguistic, biological, etc.). The values provided
2001-
by this specification therefore represent a simplified model that can be
2002-
reasonably applied to a broad variety of speech contexts, albeit at the
2003-
cost of a certain degree of approximation. Future versions of this
2004-
specification may refine the level of precision of the voice-matching
2005-
algorithm, as speech processor implementations become more standardized.
2006-
</p>
2003+
<p class=note> Note that a recommended mapping with <a href="#SSML"
2004+
rel=biblioentry>[SSML]<!--{{!SSML}}--></a> ages is: &lsquo;<code
2005+
class=property>child</code>&rsquo; = 6 y/o, &lsquo;<code
2006+
class=property>young</code>&rsquo; = 24 y/o, &lsquo;<code
2007+
class=property>old</code>&rsquo; = 75 y/o. More flexible age ranges may
2008+
be used by the processor-dependent voice-matching algorithm.</p>
20072009

20082010
<dt> <strong>&lt;gender&gt;</strong>
20092011

@@ -2013,6 +2015,17 @@ <h3 id=voice-props-voice-family><span class=secno>12.1. </span>The
20132015
class=property>neutral</code>&rsquo;, specifying a male, female, or
20142016
neutral voice, respectively.</p>
20152017

2018+
<p class=note> Note that the interpretation of the relationship between a
2019+
person's age or gender, and a recognizable type of voice, cannot
2020+
realistically be defined in a universal manner as it effectively depends
2021+
on numerous criteria (cultural, linguistic, biological, etc.). The
2022+
functionality provided by this specification therefore represent a
2023+
simplified model that can be reasonably applied to a broad variety of
2024+
speech contexts, albeit at the cost of a certain degree of
2025+
approximation. Future versions of this specification may refine the
2026+
level of precision of the voice-matching algorithm, as speech processor
2027+
implementations become more standardized.</p>
2028+
20162029
<dt> <strong>&lt;integer&gt;</strong>
20172030

20182031
<dd>
@@ -2184,11 +2197,14 @@ <h3 id=voice-props-voice-rate><span class=secno>12.2. </span>The &lsquo;<a
21842197
class=property>voice-rate</code></a>&rsquo; property manipulates the rate
21852198
of generated synthetic speech in terms of words per minute.
21862199

2187-
<p class=note> Note that the functionality provided by this property is
2188-
related to the <a
2200+
<p class=note> Note that although the functionality provided by this
2201+
property is similar to the <a
21892202
href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>rate</code>
21902203
attribute of the <code>prosody</code> element</a> from the SSML markup
2191-
language <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a>.
2204+
language <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a>, there
2205+
are notable discrepancies. For example, CSS Speech rate keywords and
2206+
percentage modifiers are not mutually-exclusive, due to how values are
2207+
inherited and combined for selected elements.
21922208

21932209
<dl>
21942210
<dt> <strong>normal</strong>
@@ -2323,11 +2339,15 @@ <h3 id=voice-props-voice-pitch><span class=secno>12.3. </span>The &lsquo;<a
23232339
pitch of the output). For example, the common pitch for a male voice is
23242340
around 120Hz, whereas it is around 210Hz for a female voice.
23252341

2326-
<p class=note> Note that the functionality provided by this property is
2327-
related to the <a
2342+
<p class=note> Note that although the functionality provided by this
2343+
property is similar to the <a
23282344
href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>pitch</code>
23292345
attribute of the <code>prosody</code> element</a> from the SSML markup
2330-
language <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a>.
2346+
language <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a>, there
2347+
are notable discrepancies. For example, CSS Speech pitch keywords and
2348+
relative changes (frequency, semitone or percentage) are not
2349+
mutually-exclusive, due to how values are inherited and combined for
2350+
selected elements.
23312351

23322352
<dl>
23332353
<dt> <strong>&lt;frequency&gt;</strong>
@@ -2483,11 +2503,15 @@ <h3 id=voice-props-voice-range><span class=secno>12.4. </span>The &lsquo;<a
24832503
to convey meaning and emphasis in speech. Typically, a low range produces
24842504
a flat, monotonic voice, whereas a high range produces an animated voice.
24852505

2486-
<p class=note> Note that the functionality provided by this property is
2487-
related to the <a
2506+
<p class=note> Note that although the functionality provided by this
2507+
property is similar to the <a
24882508
href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>range</code>
24892509
attribute of the <code>prosody</code> element</a> from the SSML markup
2490-
language <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a>.
2510+
language <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a>, there
2511+
are notable discrepancies. For example, CSS Speech pitch range keywords
2512+
and relative changes (frequency, semitone or percentage) are not
2513+
mutually-exclusive, due to how values are inherited and combined for
2514+
selected elements.
24912515

24922516
<dl>
24932517
<dt> <strong>&lt;frequency&gt;</strong>
@@ -2689,7 +2713,7 @@ <h3 id=voice-props-voice-stress><span class=secno>12.5. </span>The
26892713
spoken.
26902714

26912715
<p class=note> Note that the functionality provided by this property is
2692-
related to the <a
2716+
similar to the <a
26932717
href="http://www.w3.org/TR/speech-synthesis11/#edef_emphasis"><code>emphasis</code>
26942718
element</a> from the SSML markup language <a href="#SSML"
26952719
rel=biblioentry>[SSML]<!--{{!SSML}}--></a>.
@@ -2817,7 +2841,7 @@ <h3 id=mixing-props-voice-duration><span class=secno>13.1. </span>The
28172841
property).
28182842

28192843
<p class=note> Note that the functionality provided by this property is
2820-
related to the <a
2844+
similar to the <a
28212845
href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>duration</code>
28222846
attribute of the <code>prosody</code> element</a> from the SSML markup
28232847
language <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a>.
@@ -2916,7 +2940,7 @@ <h2 id=content><span class=secno>15. </span>Inserted and replaced content</h2>
29162940
unlikely to be recognized by the synthesizer. The &lsquo;<a
29172941
href="#content-def"><code class=property>content</code></a>&rsquo;
29182942
property can be used to replace one string by another. The functionality
2919-
provided by this property is related to the <a
2943+
provided by this property is similar to the <a
29202944
href="http://www.w3.org/TR/speech-synthesis11/#edef_sub"><code>alias</code>
29212945
attribute of the <code>sub</code> element</a> from the SSML markup
29222946
language <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a>.
@@ -3002,11 +3026,11 @@ <h2 id=pronunciation><span class=secno>16. </span> Pronunciation, phonemes</h2>
30023026
<p> Additionally, an attribute-based mechanism can be used within the
30033027
markup to author text-pronunciation associations. At the time of writing,
30043028
such mechanism isn't formally defined in the W3C HTML standard(s).
3005-
However, the <a href="http://idpf.org/epub/30">EPUB 3.0 draft
3006-
specification</a> allows (x)HTML5 documents to contain attributes derived
3007-
from the <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a>
3008-
specification, that describe how to pronounce text based on a particular
3009-
phonetic alphabet.</p>
3029+
However, the <a href="http://idpf.org/epub/30">EPUB 3.0 specification</a>
3030+
allows (x)HTML5 documents to contain attributes derived from the <a
3031+
href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a> specification,
3032+
that describe how to pronounce text based on a particular phonetic
3033+
alphabet.</p>
30103034
<!-- p>
30113035
One avenue to explore is the use CSS to "bind" HTML text with a
30123036
phoneme (also declared in the HTML document). This would maintain a

0 commit comments

Comments
 (0)