bocoup
diff --git a/‎css-text/Overview.bs‎
Lines changed: 98 additions & 46 deletions b/‎css-text/Overview.bs‎
Lines changed: 98 additions & 46 deletions
@@ -140,59 +140,113 @@ Values</h3>
 <h3 id="terms">
 Terminology</h3>
 
-  <p><dfn>semantically-perceived character</dfn>
-  <p><dfn>visually-perceived character</dfn>
-  <p><dfn>semantically-perceived letters</dfn>
-  <p><dfn>visually-perceived letters</dfn>
-
-  <p id="grapheme-cluster">A <dfn>grapheme cluster</dfn> is what
-    a language user considers to be a character or a basic unit of the
-    script. The term is described in detail in the Unicode Technical
-    Report: Text Boundaries [[!UAX29]]. This specification uses the
-    <em>extended grapheme cluster</em> definition in [[!UAX29]] (not
-    the <em>legacy grapheme cluster</em> definition).
-    The UA may further tailor the definition as required by typographical tradition.
+  <p>In addition to the terms defined below,
+    other terminology and concepts used in this specification are defined
+    in [[!CSS21]] and [[!CSS3-WRITING-MODES]].
+
+<h4 id="characters">
+Characters and Letters</h4>
+
+  <p>The basic unit of typesetting is the <dfn>character</dfn>.
+  However, because writing systems are not always as simple as the basic English alphabet,
+  what a <i>character</i> actually is depends on the context in which the term is used.
+  For example, in Hangul (the Korean writing system),
+  each square representation of a syllable can be considered a <i>character</i>.
+  However, the square symbol is really composed of multiple symbols each representing a phoneme,
+  and these also could each be considered a <i>character</i>.
+  A basic unit of computer text encoding, for any given encoding, is also called a <i>character</i>,
+  and depending on the encoding, a single encoding <i>character</i> might correspond
+  either to a single phonemic <i>character</i>
+  or to a unitary pre-composed syllabic <i>character</i>.
+  In turn, a single encoding <i>character</i> can be represented in the data stream as one or more bytes;
+  and in programming environments one or a pair of such bytes is sometimes also called a <i>character</i>.
+    
+  <p>For text layout, the relevant unit is
+  the “user-perceived character”, also known as the <dfn>grapheme cluster</dfn>.
+  It is roughly equivalent to what a <em>language user</em> (as opposed to a computer programmer)
+  considers to be a <i>character</i> or basic unit of the script.
+  This term is described in detail in the Unicode Technical Report: Text Boundaries [[!UAX29]]. 
+  Since even typesetting alone requires different notions of <i>grapheme clusters</i>
+  depending on the application, CSS introduces the following terms:
+
+  <dl>
+    <dt><dfn>semantically-perceived character</dfn>
+    <dd>
+      <p>Represents a unit of the writing system,
+      such as a Latin alphabetic letter (including its diacritics),
+      Hangul syllable,
+      Chinese ideographic character,
+      Myanmar syllable cluster,
+      that is indivisible with regards to segmentation
+      (line-breaking, first-letter effects, etc).
+    
+      <p>The UA must interpret this as an <em>extended grapheme cluster</em>
+      (not <em>legacy grapheme cluster</em>) as defined in [[!UAX29]].
+      However, the UA should tailor the definition as required by typographical tradition,
+      since the default rules are not always appropriate.
+
+    <dt><dfn>visually-perceived character</dfn>
+    <dd>
+      <p>Represents a unit of the writing system,
+      that is indivisible with regards to spacing separation
+      (letter-spacing, justification, etc).
+  </dl>
 
+  <p>The UA must interpret both <i>visually-perceived characters</i> and <i>semantically-perceived characters</i>
+  as <em>extended grapheme clusters</em> (not <em>legacy grapheme clusters</em>)
+  as defined in [[!UAX29]].
+  However, the UA may tailor the definitions as required by typographical tradition,
+  since the default rules are not always appropriate or ideal,
+  and is expected to tailor them differently
+  for <i>visually-perceived characters</i> than <i>semantically-perceived characters</i>
+  as needed.
+  
   <div class="example">
     <p>For example,
-      in some scripts such as Myanmar or Devanagari,
-      the typographic unit for 'letter-spacing' is an entire syllable,
-      which may include multiple [[!UAX29]] <i>grapheme clusters</i>.
-  </div>
+    in some scripts such as Myanmar or Devanagari,
+    the typographic unit for both justification and line-breaking
+    (<i>visually-perceived character</i> and <i>semantically-perceived characters</i>)
+    is an entire syllable,
+    which can include multiple [[!UAX29]] <i>grapheme clusters</i>.
 
-  <div class="example">
     <p>In other scripts such as Thai or Lao,
-      the typographic unit for 'letter-spacing' is more than a single Unicode codepoint,
-      but less than a [[!UAX29]] <i>grapheme cluster</i>,
-      and may require decomposition or other substitutions.
+    even though a <i>semantically-perceived character</i> matches the default <i>grapheme cluster</i> definition,
+    a <i>visually-perceived character</i>
+    is <em>less</em> than a [[!UAX29]] <i>grapheme cluster</i>,
+    and may require decomposition or other substitutions before spacing can be inserted.
 
-    <p>For example,
-      to properly letter-space the Thai word คำ (U+0E04 + U+0E33),
-      the U+0E33 needs to be decomposed into U+0E4D + U+0E32,
-      and then the extra letter-space inserted before the U+0E32: คํ า.
+    <p>For instance,
+    to properly letter-space the Thai word คำ (U+0E04 + U+0E33),
+    the U+0E33 needs to be decomposed into U+0E4D + U+0E32,
+    and then the extra letter-space inserted before the U+0E32: คํ า.
 
     <p>A slightly more complex example is น้ำ (U+0E19 + U+0E49 + U+0E33).
-       In this case, normal Thai shaping will first decompose the U+0E33 into U+0E4D + U+0E32
-       and then swap the U+0E4D with the U+0E49, giving U+0E19 + U+0E4D + U+0E49 + U+0E32.
-       As before the extra letter-space is then inserted before the U+0E32: นํ้ า.
+     In this case, normal Thai shaping will first decompose the U+0E33 into U+0E4D + U+0E32
+     and then swap the U+0E4D with the U+0E49, giving U+0E19 + U+0E4D + U+0E49 + U+0E32.
+     As before the extra letter-space is then inserted before the U+0E32: นํ้ า.
   </div>
 
-  <p>Within this specification,
-    the ambiguous term <dfn>character</dfn> is used as a friendlier synonym
-    for <i>grapheme cluster</i>.
-    See <a href="http://www.w3.org/TR/css3-writing-modes/#character-properties">Characters and Properties</a>
-    for how to determine the Unicode properties of a character.
+  <p>A <dfn>letter</dfn> for the purpose of this specification
+  is a <i>character</i> belonging to one of the Letter or Number general
+  categories in Unicode. [[!UAX44]]
+  To be more precise,
+  a <dfn>semantically-perceived letter</dfn> is a <i>semantically-perceived character</i>
+  belonging to one of the Letter or Number general categories
+  and a <dfn>visually-perceived letter</dfn> is likewise a <i>visually-perceived character</i>
+  belonging to one of the Letter or Number general categories.
+
+  See <a href="http://www.w3.org/TR/css3-writing-modes/#character-properties">Characters and Properties</a>
+  for how to determine the Unicode properties of a <i>character</i>.
 
-  <p id="letter">A <dfn>letter</dfn> for the purpose of this specification
-    is a <i>semantically-perceived character</i> belonging to one of the Letter or Number general
-    categories in Unicode. [[!UAX44]]
+  <p>The rendering characteristics of a <i>character</i> divided
+  by an element boundary is undefined:
+  it may be rendered as belonging to either side of the boundary,
+  or as some approximation of belonging to both.
+  Authors are forewarned that dividing <i>grapheme clusters</i>
+  by element boundaries may give inconsistent or undesired results.
 
-  <p>The rendering characteristics of a <i>semantically-perceived character</i>
-    and a <i>visually-perceived character</i> divided by an
-    element boundary is undefined: it may be rendered as belonging to
-    either side of the boundary, or as some approximation of belonging
-    to both. Authors are forewarned that dividing grapheme clusters by
-    element boundaries may give inconsistent or undesired results.
+<h4 id="languages">
+Languages and Typesetting</h4>
 
   <p>The <dfn>content language</dfn> of an element is the (human) language
     the element is declared to be in, according to the rules of the
@@ -208,11 +262,9 @@ Terminology</h3>
   <p class="note">
     Many typographic effects vary by linguistic context.
     In CSS, language-specific typographic tailorings
-    are only applied when the content language is known.
-    Authors should tag their content accurately for the best typographic behavior.
-
-  <p>Other terminology and concepts used in this specification are defined
-    in [[!CSS21]] and [[!CSS3-WRITING-MODES]].
+    are only applied when the content language is known (declared).
+  
+  <strong>Authors should tag their content accurately for the best typographic behavior.</strong>
 
 <h2 id="transforming">
   Transforming Text</h2>