saschanaz
diff --git a/‎css3-speech/Overview.html‎
Lines changed: 101 additions & 53 deletions b/‎css3-speech/Overview.html‎
Lines changed: 101 additions & 53 deletions
@@ -427,7 +427,7 @@ <h2 id=example><span class=secno>3. </span>Example</h2>
   <div class=example>
    <p>This example shows how authors can tell the speech synthesizer to speak
     HTML headings with a voice called "paul", using "moderate" emphasis
-    (which is more than normal) and how to insert an audio cue (prerecorded
+    (which is more than normal) and how to insert an audio cue (pre-recorded
     audio clip located at the given URL) before the start of TTS rendering
     for each heading. In a stereo-capable sound system, paragraphs marked
     with the CSS class "heidi" are rendered on the left audio channel (and
@@ -554,9 +554,9 @@ <h3 id=mixing-props-voice-volume><span class=secno>5.1. </span>The
   </table>
 
   <p>The &lsquo;<a href="#voice-volume"><code
-   class=property>voice-volume</code></a>&rsquo; property manipulates the
-   amplitude of the audio waveform generated by the speech synthesiser, and
-   is also used when calculating the relative volume level of <a
+   class=property>voice-volume</code></a>&rsquo; property allows authors to
+   control the amplitude of the audio waveform generated by the speech
+   synthesiser, and is also used to adjust the relative volume level of <a
    href="#cue-props">audio cues</a> within the <a href="#aural-model">audio
    "box" model</a>.
 
@@ -1225,9 +1225,9 @@ <h3 id=pause-props-pause><span class=secno>7.2. </span>The &lsquo;<a
      <td> <em>Value:</em>
 
      <td>&lt;&lsquo;<a href="#pause-before"><code
-      class=property>pause-before</code></a>&rsquo;&gt; || &lt;&lsquo;<a
+      class=property>pause-before</code></a>&rsquo;&gt; &lt;&lsquo;<a
       href="#pause-after"><code
-      class=property>pause-after</code></a>&rsquo;&gt;
+      class=property>pause-after</code></a>&rsquo;&gt;?
 
     <tr>
      <td> <em>Initial:</em>
@@ -1495,9 +1495,9 @@ <h3 id=rest-props-rest><span class=secno>8.2. </span>The &lsquo;<a
      <td> <em>Value:</em>
 
      <td>&lt;&lsquo;<a href="#rest-before"><code
-      class=property>rest-before</code></a>&rsquo;&gt; || &lt;&lsquo;<a
+      class=property>rest-before</code></a>&rsquo;&gt; &lt;&lsquo;<a
       href="#rest-after"><code
-      class=property>rest-after</code></a>&rsquo;&gt;
+      class=property>rest-after</code></a>&rsquo;&gt;?
 
     <tr>
      <td> <em>Initial:</em>
@@ -1639,8 +1639,8 @@ <h3 id=cue-props-cue-before-after><span class=secno>9.1. </span>The
   <p>The &lsquo;<a href="#cue-before"><code
    class=property>cue-before</code></a>&rsquo; and &lsquo;<a
    href="#cue-after"><code class=property>cue-after</code></a>&rsquo;
-   properties specify auditory icons (i.e. prerecorded audio clips) to be
-   played before (or after) the selected element within the <a
+   properties specify auditory icons (i.e. pre-recorded / pre-generated sound
+   clips) to be played before (or after) the selected element within the <a
    href="#aural-model">audio "box" model</a>.
 
   <p class=note> Note that the functionality provided by this property is
@@ -1670,15 +1670,61 @@ <h3 id=cue-props-cue-before-after><span class=secno>9.1. </span>The
      (decibel unit). This represents a change (positive or negative) relative
      to the computed value of the &lsquo;<a href="#voice-volume"><code
      class=property>voice-volume</code></a>&rsquo; property within the <a
-     href="#aural-model">aural "box" model</a> of the selected element. When
-     the &lsquo;<a href="#voice-volume"><code
+     href="#aural-model">aural "box" model</a> of the selected element.
+     Decibels express the ratio of the squares of the new signal amplitude
+     (a1) and the current amplitude (a0), as per the following logarithmic
+     equation: volume(dB) = 20 log10 (a1 / a0)</p>
+
+    <p> When the &lsquo;<a href="#voice-volume"><code
      class=property>voice-volume</code></a>&rsquo; property is set to
      &lsquo;<code class=property>silent</code>&rsquo;, the audio cue is also
      set to &lsquo;<code class=property>silent</code>&rsquo; (regardless of
-     the value specified for this &lt;decibel&gt;). Decibels express the
-     ratio of the squares of the new signal amplitude (a1) and the current
-     amplitude (a0), as per the following logarithmic equation: volume(dB) =
-     20 log10 (a1 / a0)</p>
+     this specified &lt;decibel&gt; value). Otherwise (when not &lsquo;<code
+     class=property>silent</code>&rsquo;), &lsquo;<a
+     href="#voice-volume"><code class=property>voice-volume</code></a>&rsquo;
+     values are always specified relatively to the volume level keywords,
+     which map to a user-configured scale of "preferred" loudness settings
+     (see the definition of &lsquo;<a href="#voice-volume"><code
+     class=property>voice-volume</code></a>&rsquo;). If the inherited
+     &lsquo;<a href="#voice-volume"><code
+     class=property>voice-volume</code></a>&rsquo; value already contains a
+     decibel offset, the dB offset specific to the audio cue is combined
+     additively.
+
+    <p> The desired effect of an audio cue set at +0dB is that the volume
+     level during playback of the pre-recorded / pre-generated audio signal
+     is effectively the same as the volume level of live (i.e. real-time)
+     speech synthesis rendition. In order to achieve this effect, speech
+     processors are capable of directly controlling the waveform amplitude of
+     generated text-to-speech audio, user agents must be able to adjust the
+     volume output of audio cues (i.e. amplify or attenuate audio signals
+     based on the intrinsic waveform amplitude of sound clips), and last but
+     not least, authors must ensure that the "normal" volume level of
+     pre-recorded audio cues (on average, as there may be discrete variations
+     due to changes in the audio stream, such as intonation, stress, etc.)
+     matches that of a "typical" TTS voice output (based on the &lsquo;<a
+     href="#voice-family"><code class=property>voice-family</code></a>&rsquo;
+     intended for use), given standard listening conditions (i.e. default
+     system volume levels, centered equalization across the frequency
+     spectrum). This latter prerequisite sets a baseline that enables a user
+     agent to align the volume outputs of both TTS and cue audio streams
+     within the same "aural box model". Due to the complex relationship
+     between perceived audio characteristics and the processing applied to
+     the digitized audio signal, we will simplify the definition of "normal"
+     volume levels by referring to a canonical recording scenario, whereby
+     the attenuation is typically indicated in decibels, ranging from 0dB
+     (maximum audio input, near clipping threshold) to -60dB (total silence).
+     In this common context, a "standard" audio clip would oscillate between
+     these values, the loudest peak levels would be close to -3dB (to avoid
+     distortion), and the audible passages would have average volume levels
+     as high as possible (i.e. not too quiet, to avoid background noise
+     during amplification). This would roughly provide an audio experience
+     that could be seamlessly combined with text-to-speech output (i.e. there
+     would be no discernible difference in volume levels when switching from
+     pre-recorded audio to speech synthesis). Although there exists no
+     industry-wide standard to backup such convention, TTS engines usually
+     generate comparably-loud audio signals when no amplification (or
+     attenuation) is specified.</p>
 
     <p class=note> Note that -6.0dB is approximately half the amplitude of
      the audio signal, and +6.0dB is approximately twice the amplitude.</p>
@@ -1906,15 +1952,16 @@ <h3 id=voice-props-voice-family><span class=secno>10.1. </span>The
      ranges may be used by the processor-dependent voice-matching algorithm).
      </p>
 
-    <p class=note> The interpretation of the relationship between a person's
-     age and a recognizable type of voice cannot realistically be defined in
-     a universal manner, as it effectively depends on numerous cultural and
-     linguistic variations. The values provided by this specification
-     therefore represent a simplified model that can be reasonably applied to
-     a great variety of speech locales, albeit at the cost of a certain
-     degree of approximation. Future versions of this specification may
-     refine the level of precision of the voice-matching algorithm, as speech
-     processor implementations become more standardized.</p>
+    <p class=note> Note that the interpretation of the relationship between a
+     person's age and a recognizable type of voice cannot realistically be
+     defined in a universal manner, as it effectively depends on numerous
+     criteria (cultural, linguistic, biological, etc.). The values provided
+     by this specification therefore represent a simplified model that can be
+     reasonably applied to a broad variety of speech contexts, albeit at the
+     cost of a certain degree of approximation. Future versions of this
+     specification may refine the level of precision of the voice-matching
+     algorithm, as speech processor implementations become more standardized.
+     </p>
 
    <dt> <strong>&lt;gender&gt;</strong>
 
@@ -2218,10 +2265,11 @@ <h3 id=voice-props-voice-pitch><span class=secno>10.3. </span>The &lsquo;<a
     <tr>
      <td> <em>Computed value:</em>
 
-     <td> one of the predefined keywords if only the keyword is specified by
-      itself, otherwise a fixed frequency calculated by converting the
-      keyword value (if any) to an absolute value based on the current
-      voice-family and by applying the specified relative offset (if any)
+     <td> one of the predefined pitch keywords if only the keyword is
+      specified by itself, otherwise an absolute frequency calculated by
+      converting the keyword value (if any) to a fixed frequency based on the
+      current voice-family and by applying the specified relative offset (if
+      any)
   </table>
 
   <p>The &lsquo;<a href="#voice-pitch"><code
@@ -2306,14 +2354,14 @@ <h3 id=voice-props-voice-pitch><span class=secno>10.3. </span>The &lsquo;<a
      the conversion from a keyword to a concrete, voice-dependent frequency.</p>
   </dl>
 
-  <p> Computed absolute frequency values that are negative are clamped to
-   zero Hertz. Speech-capable user agents are likely to support a specific
-   range of values rather than the full range of possible calculated
-   numerical values for frequencies. The actual values in user agents may
-   therefore be clamped to implementation-dependent minimum and maximum
-   boundaries. For example: although the 0Hz frequency can be legitimately
-   calculated, it may be clamped to a more meaningful value in the context of
-   the speech synthesizer.
+  <p> Computed absolute frequencies that are negative are clamped to zero
+   Hertz. Speech-capable user agents are likely to support a specific range
+   of values rather than the full range of possible calculated numerical
+   values for frequencies. The actual values in user agents may therefore be
+   clamped to implementation-dependent minimum and maximum boundaries. For
+   example: although the 0Hz frequency can be legitimately calculated, it may
+   be clamped to a more meaningful value in the context of the speech
+   synthesizer.
 
   <div class=example>
    <p>Examples of property values:</p>
@@ -2377,10 +2425,11 @@ <h3 id=voice-props-voice-range><span class=secno>10.4. </span>The &lsquo;<a
     <tr>
      <td> <em>Computed value:</em>
 
-     <td> one of the predefined keywords if only the keyword is specified by
-      itself, otherwise a fixed frequency calculated by converting the
-      keyword value (if any) to an absolute value based on the current
-      voice-family and by applying the specified relative offset (if any)
+     <td> one of the predefined pitch keywords if only the keyword is
+      specified by itself, otherwise an absolute frequency calculated by
+      converting the keyword value (if any) to a fixed frequency based on the
+      current voice-family and by applying the specified relative offset (if
+      any)
   </table>
 
   <p> The &lsquo;<a href="#voice-range"><code
@@ -2465,14 +2514,14 @@ <h3 id=voice-props-voice-range><span class=secno>10.4. </span>The &lsquo;<a
      the conversion from a keyword to a concrete, voice-dependent frequency.</p>
   </dl>
 
-  <p> Computed absolute frequency values that are negative are clamped to
-   zero Hertz. Speech-capable user agents are likely to support a specific
-   range of values rather than the full range of possible calculated
-   numerical values for frequencies. The actual values in user agents may
-   therefore be clamped to implementation-dependent minimum and maximum
-   boundaries. For example: although the 0Hz frequency can be legitimately
-   calculated, it may be clamped to a more meaningful value in the context of
-   the speech synthesizer.
+  <p> Computed absolute frequencies that are negative are clamped to zero
+   Hertz. Speech-capable user agents are likely to support a specific range
+   of values rather than the full range of possible calculated numerical
+   values for frequencies. The actual values in user agents may therefore be
+   clamped to implementation-dependent minimum and maximum boundaries. For
+   example: although the 0Hz frequency can be legitimately calculated, it may
+   be clamped to a more meaningful value in the context of the speech
+   synthesizer.
 
   <div class=example>
    <p>Examples of inherited values:</p>
@@ -3000,8 +3049,8 @@ <h2 class=no-num id=property-index>Appendix A &mdash; Property index</h2>
     <tr>
      <th><a class=property href="#pause">pause</a>
 
-     <td>&lt;&lsquo;pause-before&rsquo;&gt; ||
-      &lt;&lsquo;pause-after&rsquo;&gt;
+     <td>&lt;&lsquo;pause-before&rsquo;&gt;
+      &lt;&lsquo;pause-after&rsquo;&gt;?
 
      <td>N/A (see individual properties)
 
@@ -3046,8 +3095,7 @@ <h2 class=no-num id=property-index>Appendix A &mdash; Property index</h2>
     <tr>
      <th><a class=property href="#rest">rest</a>
 
-     <td>&lt;&lsquo;rest-before&rsquo;&gt; ||
-      &lt;&lsquo;rest-after&rsquo;&gt;
+     <td>&lt;&lsquo;rest-before&rsquo;&gt; &lt;&lsquo;rest-after&rsquo;&gt;?
 
      <td>N/A (see individual properties)