Title: CSS Text Module Level 4
Shortname: css-text
Level: 4
Status: ED
Work Status: Exploring
Group: csswg
ED: https://drafts.csswg.org/css-text-4/
TR: https://www.w3.org/TR/css-text-4/
Editor: Elika J. Etemad / fantasai, Invited Expert, http://fantasai.inkedblade.net/contact, w3cid 35400
Editor: Koji Ishii, Google, kojiishi@gmail.com, w3cid 45369
Editor: Alan Stearns, Adobe Systems, stearns@adobe.com, w3cid 46659
Abstract: This module defines properties for text manipulation and specifies their processing model. It covers line breaking, justification and alignment, white space handling, and text transformation.
Ignored terms: segment break, segment breaks

Introduction

Issue: Add final level 3 content

Transforming Text

Issue: Add final level 3 content

White Space Processing

Issue: Add final level 3 tab-size and processing details

White Space Collapsing: the 'text-space-collapse' property

	Name: text-space-collapse
	Value: collapse | discard | preserve | preserve-breaks | preserve-spaces
	Initial: collapse
	Applies to: all elements
	Inherited: yes
	Percentages: n/a
	Computed value: as specified
	Media: visual
	
This property declares whether and how white space inside the element is collapsed. Values have the following meanings, which must be interpreted according to the white space processing rules:
collapse
This value directs user agents to collapse sequences of white space into a single character (or in some cases, no character).
preserve
This value prevents user agents from collapsing sequences of white space. Segment breaks are preserved as forced line breaks.
preserve-breaks
This value collapses white space as for ''collapse'', but preserves segment breaks as forced line breaks.
preserve-spaces
This value prevents user agents from collapsing sequences of white space, and converts tabs and segment breaks to spaces. (This value is intended to match the behavior of xml:space="preserve" in SVG.)
discard
This value directs user agents to “discard” all white space in the element. Issue: Does this preserve line break opportunities or no? Do we need a "hide" value?
The following style rules implement MathML's white space processing:
			@namespace m "http://www.w3.org/1998/Math/MathML";
			m|* {
				text-space-collapse: discard;
			}
			m|mi, m|mn, m|mo, m|ms, m|mtext {
				text-space-trim: trim-inner;
			}
		

White Space Trimming: the 'text-space-trim' property

	Name: text-space-trim
	Value: none | trim-inner || discard-before || discard-after
	Initial: none
	Applies to: all elements
	Inherited: no
	Percentages: n/a
	Computed value: as specified
	Media: visual
	
This property allows authors to specify trimming behavior at the beginning and end of a box. Values have the following meanings, which must be interpreted according to the white space processing rules:
trim-inner
For block containers this value directs UAs to discard all whitespace at the beginning of the element up to and including the last segment break before the first non-white-space character in the element as well as to discard all white space at the end of the element starting with the first segment break after the last non-white-space character in the element. For other elements this value directs UAs to discard all whitespace at the beginning and end of the element.
discard-before
This value directs the UA to collapse all collapsible whitespace immediately before the start of the element.
discard-after
This value directs the UA to collapse all collapsible whitespace immediately after the end of the element.

The following style rules render DT elements as a comma-separated list:

			dt { display: inline; }
			dt + dt:before { content: ", "; text-space-trim: discard-before; }
		

Line Breaking and Word Boundaries

Issue: Add final level 3 content

Text Wrapping

Text wrapping is controlled by the 'text-wrap', 'wrap-before', 'wrap-after', 'wrap-inside', and overflow-wrap properties: Issue: Add final level 3 overflow-wrap

Text Wrap Settings: the 'text-wrap' property

	Name: text-wrap
	Value: wrap | nowrap | balance | multi-line
	Initial: wrap
	Applies to: all elements
	Inherited: yes
	Percentages: n/a
	Computed value: as specified
	Media: visual
	
This property specifies the mode for text wrapping. Possible values:
wrap
Lines may break at allowed break points, as determined by the line-breaking rules in effect. Line breaking behavior defined for the WJ, ZW, and GL line-breaking classes in [[!UAX14]] must be honored. The exact algorithm is UA-defined. The algorithm may consider multiple lines when making break decisions. The UA may bias for speed over best layout.
nowrap
Lines may not break; text that does not fit within the block container overflows it.
balance
Same as ''text-wrap/wrap'' for inline-level elements. For block-level elements that contain line boxes as direct children, line breaks are chosen to balance the remaining (empty) space in each line box, if better balance than ''text-wrap/wrap'' is possible. This must not change the number of line boxes the block would contain if 'text-wrap' were set to ''text-wrap/wrap''. The remaining space to consider is that which remains after placing floats and inline content, but before any adjustments due to text justification. Line boxes are balanced when the standard deviation from the average inline-size of the remaining space in each line box is reduced over the block (including lines that end in a forced break). The exact algorithm is UA-defined. UAs may treat this value as ''text-wrap/wrap'' if there are more than ten lines to balance.
multi-line
Same as ''text-wrap/wrap'' for inline-level elements. Same as ''text-wrap/wrap'' for block-level elements, except as below. The exact algorithm is UA-defined. The algorithm should consider multiple lines when making break decisions. The UA should bias for best layout over speed. ISSUE: This feature does not have CSSWG consensus, it is being proposed in Issue 672.
Regardless of the 'text-wrap' value, lines always break at forced breaks: for all values, line-breaking behavior defined for the BK, CR, LF, CM, NL, and SG line breaking classes in [[!UAX14]] must be honored. UAs that allow breaks at punctuation other than spaces should prioritize breakpoints. For example, if breaks after slashes have a lower priority than spaces, the sequence “check /etc” will never break between the ‘/’ and the ‘e’. The UA may use the width of the containing block, the text's language, and other factors in assigning priorities. As long as care is taken to avoid such awkward breaks, allowing breaks at appropriate punctuation other than spaces is recommended, as it results in more even-looking margins, particularly in narrow measures. Note: The ''text-wrap/wrap'' value is intended for speedy legacy line breaking, which has so far used first-fit/greedy algorithms that can often give sub-optimal results. UAs could experiment with better line breaking algorithms with this default value, but optimal results will probably take more time. The ''text-wrap/multi-line'' and ''text-wrap/balance'' values are intended as opt-in choices to take more time for better results. The ''text-wrap/balance'' value is intended for titles and captions, and the ''text-wrap/multi-line'' is intended for body text. Note: Some line breaking algorithms can interact unexpectedly with editing. Changing upstream line breaks on user edits can be unsettling. As UAs experiment with better line breaking algorithms, we will likely need to add a property to constrain upstream changes while editing.

Inline breaks between boxes: the 'wrap-before'/'wrap-after' properties

	Name: wrap-before, wrap-after
	Value: auto | avoid | avoid-line | avoid-flex | line | flex
	Initial: auto
	Applies to: inline-level boxes and flex items
	Inherited: no
	Percentages: n/a
	Computed value: as specified
	Media: visual
	
These properties specify modifications to break opportunities in line breaking (and flex line breaking [[CSS3-FLEXBOX]]). Possible values:
auto
Lines may break at allowed break points before and after the box, as determined by the line-breaking rules in effect.
avoid
Line breaking is suppressed immediately before/after the box: the UA may only break there if there are no other valid break points in the line. If the text breaks, line-breaking restrictions are honored as for ''wrap-before/auto''.
avoid-line
Same as ''wrap-before/avoid'', but only for line breaks.
avoid-flex
Same as ''wrap-before/avoid'', but only for flex line breaks.
line
Force a line break immediately before/after the box if the box is an inline-level box.
flex
Force a flex line break immediately before/after the box if the box is a flex item in a multi-line flex container.
Forced line breaks on inline-level boxes propagate upward through any parent inline boxes the same way forced breaks on block-level boxes propagate upward through any parent block boxes in the same fragmentation context. [[!CSS3-BREAK]]

Line breaks within boxes: the 'wrap-inside' property

	Name: wrap-inside
	Value: auto | avoid
	Initial: auto
	Applies to: inline boxes
	Inherited: no
	Percentages: n/a
	Computed value: as specified
	Media: visual
	
auto
Lines may break at allowed break points within the box, as determined by the line-breaking rules in effect.
avoid
Line breaking is suppressed within the box: the UA may only break within the box if there are no other valid break points in the line. If the text breaks, line-breaking restrictions are honored as for ''wrap-inside/auto''. If boxes with ''wrap-inside/avoid'' are nested and the UA must break within these boxes, a break in an outer box must be used before a break within an inner box may be used.

Example of using 'wrap-inside: avoid' in presenting a footer

The priority of breakpoints can be set to reflect the intended grouping of text. Given the rules
			footer { wrap-inside: avoid; }
			venue { wrap-inside: avoid; }
			date { wrap-inside: avoid; }
			place { wrap-inside: avoid; }
		
and the following markup:
			<footer>
			<venue>27th Internationalization and Unicode Conference</venue>
			&#8226; <date>April 7, 2005</date> &#8226;
			<place>Berlin, Germany</place>
			</footer>
		
In a narrow window the footer could be broken as
			27th Internationalization and Unicode Conference •
			April 7, 2005 • Berlin, Germany
		
or in a narrower window as
			27th Internationalization and Unicode
			Conference • April 7, 2005 •
			Berlin, Germany
		
but not as
			27th Internationalization and Unicode Conference • April
			7, 2005 • Berlin, Germany
		

Last Line Minimum Length

See thread. Issue is about requiring a minimum length for lines. Common measures seem to be Suggestion for value space is ''match-indent | <> | <>'' (with ''Xch'' given as an example to make that use case clear). Alternately <> could actually count the characters. It's unclear how this would interact with text balancing (above); one earlier proposal had them be the same property (with ''100%'' meaning full balancing). People have requested word-based limits, but since this is really dependent on the length of the word, character-based is better.

Shorthand for White Space and Wrapping: the 'white-space' property

	Name: white-space
	Value: normal | pre | nowrap | pre-wrap | pre-line
	Initial: auto
	Applies to: all elements
	Inherited: yes
	Percentages: n/a
	Computed value: as specified
	Media: visual
	
This property is a shorthand for 'text-space-collapse', 'text-wrap', and 'text-space-trim'. Note: This shorthand combines both inheritable and non-inheritable properties. If this is a problem, please inform the CSSWG. The following table gives the mapping of the values of the shorthand to its longhands.
'white-space' 'text-space-collapse' 'text-wrap' 'text-space-trim'
''white-space/normal'' ''text-space-collapse/collapse'' ''text-wrap/wrap'' ''text-space-trim/none''
''pre'' ''text-space-collapse/preserve'' ''text-wrap/nowrap'' ''text-space-trim/none''
''nowrap'' ''text-space-collapse/collapse'' ''text-wrap/nowrap'' ''text-space-trim/none''
''pre-wrap'' ''text-space-collapse/preserve'' ''text-wrap/wrap'' ''text-space-trim/none''
''pre-line'' ''text-space-collapse/preserve-breaks'' ''text-wrap/wrap'' ''text-space-trim/none''
Issue: Add details from level 3

Breaking Within Words

Issue: Add final level 3 content

Hyphens: the 'hyphenate-character' property

	Name: hyphenate-character
	Value: auto | <string>
	Initial: auto
	Applies to: all elements
	Inherited: yes
	Percentages: n/a
	Computed value: as specified
	Media: visual
	
This property specifies strings that are shown between parts of hyphenated words. The auto value means that the user agent should find an appropriate value, preferably from the same source as the hyphenation dictionary. If a string is specified, it appears at the end of the line before a hyphenation break.
In Latin scripts, the hyphen character (U+2010) is often used to indicate that a word has been split. Normally, it will not be necessary to set it explicitly. However, this can easily be done:
			article { hyphenate-character: "\2010" }
		
Note: Both hyphens triggered by automatic hyphenation and hyphens triggered by soft hyphens are rendered according to 'hyphenate-character'.

Hyphenation Size Limit: the 'hyphenate-limit-zone' property

	Name: hyphenate-limit-zone
	Value: <percentage> | <length>
	Initial: 0
	Applies to: block containers
	Inherited: yes
	Percentages: refers to width of the line box
	Computed value: as specified
	Media: visual
	

Is 'hyphenate-limit-zone' a good name? Comments/suggestions? This property specifies the maximum amount of unfilled space (before justification) that may be left in the line box before hyphenation is triggered to pull part of a word from the next line back up into the current line.

Hyphenation Character Limits: the 'hyphenate-limit-chars' property

	Name: hyphenate-limit-chars
	Value: [ auto | <> ]{1,3}
	Initial: auto
	Applies to: all elements
	Inherited: yes
	Percentages: n/a
	Computed value: as specified
	Media: visual
	
This property specifies the minimum number of characters in a hyphenated word. If the word does not meet the required minimum number of characters in the word / before the hyphen / after the hyphen, then the word must not be hyphenated. Nonspacing combining marks (Unicode class) and intra-word punctuation (Unicode classes P*) do not count towards the minimum. If three values are specified, the first value is the required minimum for the total characters in a word, the second value is the minimum for characters before the hyphenation point, and the third value is the minimum for characters after the hyphenation point. If the third value is missing, it is the same as the second. If the second value is missing, then it is ''hyphenate-limit-chars/auto''. The ''hyphenate-limit-chars/auto'' value means that the UA chooses a value that adapts to the current layout. Note: Unless the UA is able to calculate a better value, it is suggested that ''hyphenate-limit-chars/auto'' means 2 for before and after, and 5 for the word total.
In the example below, the minimum size of a hyphenated word is left to the UA (which means it may vary depending on the language, the length of the line, or other factors), but the minimum number of characters before and after the hyphenation point is set to 3.
			p { hyphenate-limit-chars: auto 3; }
		

Hyphenation Line Limits: the 'hyphenate-limit-lines' and 'hyphenate-limit-last' properties

	Name: hyphenate-limit-lines
	Value: no-limit | <integer>
	Initial: no-limit
	Applies to: block containers
	Inherited: yes
	Percentages: n/a
	Computed value: as specified
	Media: visual
	
This property indicates the maximum number of successive hyphenated lines in an element. The ''no-limit'' value means that there is no limit. In some cases, user agents may not be able to honor the specified value. (See overflow-wrap.) It is not defined whether hyphenation introduced by such emergency breaking influences nearby hyphenation points.
	Name: hyphenate-limit-last
	Value: none | always | column | page | spread
	Initial: none
	Applies to: block containers
	Inherited: yes
	Percentages: n/a
	Computed value: as specified
	Media: visual
	
This property indicates hyphenation behavior at the end of elements, column, pages, and spreads. A spread is a set of two pages that are visible to the reader at the same time. Values are:
none
No restrictions imposed.
always
The last full line of the element, or the last line before any column, page, or spread break inside the element should not be hyphenated.
column
The last line before any column, page, or spread break inside the element should not be hyphenated.
page
The last line before page or spread break inside the element should not be hyphenated.
spread
The last line before any spread break inside the element should not be hyphenated.
			p { hyphenate-limit-last: always }
			div.chapter {	hyphenate-limit-last: spread }
		
A paragraph may be formatted like this when 'hyphenate-limit-last: none' is set:
			This is just a
			simple example
			to show Antarc-
			tica.
		
With 'hyphenate-limit-last: always' one would get:
			This is just a
			simple example
			to        show
			Antarctica.
		

Alignment and Justification

Issue: Add final level 3 content Add this value to 'text-align'
<string>
The string must be a single character; otherwise the declaration must be ignored. When applied to a table cell, specifies the alignment character around which the cell's contents will align. See below for further details and how this value combines with keywords.

Character-based Alignment in a Table Column

When multiple cells in a column have an alignment character specified, the alignment character of each such cell in the column is centered along a single column-parallel axis and the rest of the text in the column shifted accordingly. (Note that the strings do not have to be the same for each cell, although they usually are.)

Is this intended to say that it's the centers of the alignment characters that should be aligned? It's not clear that's what it says, but that (or a different behavior) needs to be specified, to describe what happens when different occurrences of the alignment character are in different fonts. (Further, is that the intended behavior? Probably the most significant use case to consider is bold vs. non-bold text, which only varies slightly in width.) [feedback] [minutes face-to-face 2016-02-02 10:00 AM]

The following style sheet:
			TD { text-align: "." center }
		
will cause the column of dollar figures in the following HTML table:
			<TABLE>
			<COL width="40">
			<TR> <TH>Long distance calls
			<TR> <TD> $1.30
			<TR> <TD> $2.50
			<TR> <TD> $10.80
			<TR> <TD> $111.01
			<TR> <TD> $85.
			<TR> <TD> N/A
			<TR> <TD> $.05
			<TR> <TD> $.06
			</TABLE>
		
to align along the decimal point. The table might be rendered as follows:
			+---------------------+
			| Long distance calls |
			+---------------------+
			|         $1.30       |
			|         $2.50       |
			|        $10.80       |
			|       $111.01       |
			|        $85.         |
			|        N/A          |
			|          $.05       |
			|          $.06       |
			+---------------------+
		
A keyword value may be specified in conjunction with the <string> value; if it is not given, it defaults to ''text-align/right''. This value is used: Note: Right alignment is used by default for character-based alignment because numbering systems are almost all left-to-right even in right-to-left writing systems, and the primary use case of character-based alignment is for numerical alignment. If the alignment character appears more than once in the text, the first instance is used for alignment. If the alignment character does not appear in a cell at all, the string is aligned as if the alignment character had been inserted at the end of its contents.

This needs to specify what text is searched for the alignment character. Is it only in-flow text whose containing block is the cell? Or is text within any in-flow descendants in the block formatting context established by the cell considered? If so, is it considered only as long as its 'text-align' property is consistent with the cell's? (Consistent in the alignment character, or fully consistent?)

This behavior of aligning as though the alignment character had been inserted at the end of the contents of the cell, combined with center-of-character alignment, will produce gaps on the end-side of lines that are alone on a line with <string> text-alignment, when none of the lines of the column has the alignment character, or, more importantly, when some of the lines do have the alignment character, but the column is not laid out at its max-content width. This is probably undesirable.

When the alignment character is inserted at the end of the contents, which font is used? (In particular, if the alignment character might be within a descendant block, is it the font of the block or the font of the table cell? Or if the insertion is at a forced break within an inline, does it use the font of the inline or the font of the block or cell?)

Character-based alignment occurs before table cell width computation so that auto width computations can leave enough space for alignment. Whether column-spanning cells participate in the alignment prior to or after width computation is undefined. If width constraints on the cell contents prevent full alignment throughout the column, the resulting alignment is undefined.

This should have a formal definition of how character alignment affects the min-content and max-content intrinsic widths (of table columns and all content that can be inside table columns). Max-content intrinsic widths need to be split into three numbers (assuming that it's the centers of the alignment character that are aligned): one for widths without alignment characters, one for widths on the inline-start side of the center of the alignment character, one for widths on the inline-end side of the center of the alignment character. This operates based on all segments of text between forced breaks for max-content widths. For min-content widths, segments of text between forced breaks that contain optional breaks within them should clearly contribute only to the without-alignment-character width. However, it's less clear whether all min-content widths should work this way, or whether segments between forced breaks that do not have optional breaks (and perhaps only those that actually contain the alignment character) should contribute to start-side-of-alignment-character and end-side-of-alignment-character min-content widths instead; this choice is a tradeoff between the meaning of min-content sizing of a table meaning the narrowest reasonable size versus honoring alignment characters in more cases. Another option might be to use whether line-breaking of optional breaks is allowed as a control for which behavior to use.

Formally defining the intrinsic width contributions of column-spanning cells with <string> values of 'text-align' is a complicated (although straightforward) extension of the decisions made for intrinsic width contributions of non-column-spanning cells; this should also be formally defined. Contributions end up being made to the split intrinsic widths of the startmost or endmost column (whichever is used for alignment), and to the without-alignment-character intrinsic widths of the other spanned columns.

Spacing

Issue: Add final level 3 word-spacing, letter-spacing

Character Class Spacing: the 'text-spacing' property

Name: text-spacing
Value: normal | none | [ trim-start | space-start ] || [ trim-end | space-end | allow-end ] || [ trim-adjacent | space-adjacent ] || no-compress || ideograph-alpha || ideograph-numeric || punctuation
Initial: normal
Applies to: block containers
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value
This property controls spacing between adjacent characters on the same line within the same inline formatting context using a set of character-class-based rules. Such spacing can either be created between or trimmed from the affected glyphs. Values are defined as follows:
normal
Specifies the baseline behavior, equivalent to ''space-start allow-end trim-adjacent''.
none
Turns off all text-spacing features. All fullwidth characters are set with full-width glyphs.
ideograph-alpha
Creates 1/4em extra spacing between runs of ideographs and non-ideographic letters. Note: A commonly used algorithm for determining this behavior is specified in [[JLREQ]].
ideograph-numeric
Creates 1/4em extra spacing between runs of ideographs and non-ideographic numerals glyphs. Note: A commonly used algorithm for determining this behavior is specified in [[JLREQ]].
punctuation
Creates extra non-breaking spacing around punctuation as required by language-specific typographic conventions. In this level, if the element's content language is French, narrow no-break space (U+202F) and no-break space (U+00A0) is inserted where required by French typographic guidelines. Otherwise this value has no effect. However future specifications may add automatic spacing behavior for other languages.
space-start
Set fullwidth opening punctuation with full-width glyphs (spaced) at the start of each line.
trim-start
Set fullwidth opening punctuation with half-width glyphs (flush) at the start of each line.
allow-end
Set fullwidth closing punctuation with half-width glyphs (flush) at the end of each line if it does not otherwise fit prior to justification; otherwise set the punctuation with full-width glyphs.
space-end
Set fullwidth closing punctuation with full-width glyphs (spaced) at the end of each line.
trim-end
Set fullwidth closing punctuation with half-width glyphs (flush) at the end of each line.
space-adjacent
Set fullwidth opening punctuation with full-width glyphs (spaced) when not at the start of the line. Set fullwidth closing punctuation with full-width glyphs (spaced) when not at the end of the line.
trim-adjacent
Collapse spacing between punctuation glyphs as described below.
no-compress
Justification may not compress text-spacing. (If this value is not specified, the justification process may reduce autospacing except when the spacing is at the start or end of the line.) Note: An example of compression rules is given for Japanese in 3.8 Line Adjustment in [[JLREQ]].
This property is additive with the 'word-spacing' and 'letter-spacing' properties. That is, the amount of spacing contributed by the 'letter-spacing' setting (if any) is added to the spacing created by 'text-spacing'. The same applies to 'word-spacing'. At element boundaries, the amount of extra spacing introduced between characters is determined by and rendered within the innermost element that contains the boundary. If the extra spacing is applied to a particular glyph, then the spacing is determined by the innermost element containing that glyph. Note: Values other than ''text-spacing/normal'', ''text-spacing/none'', ''trim-start'', ''trim-end'', and ''space-end'' are at-risk and may be dropped from this level of CSS. They are defined here currently to help work out a complete design of this feature. Support for this property is optional. It is strongly recommended for UAs that wish to support CJK typography. Issue: It was requested to add a value for doubling the space after periods.

Fullwidth Punctuation Collapsing

Typically, fullwidth characters have glyphs with the same advance width as a standard Han character (e.g. 水 U+6C34). However, many fullwidth punctuation glyphs only take up part of the fullwidth design space. Thus such punctuation are not always set fullwidth. Several values of 'text-spacing' allow the author to control when such characters are set half-width (typically half the width of an ideograph) and when they are set full-width. In order to set the text as specified, the UA will need to either Some fonts use proportional glyphs for fullwidth punctuation characters. For such proportional glyphs, the given advance width is considered simultaneously full-width and half-width: no space is added or removed. Note: The advance width of a standard Han character can be determined either from font metrics such as the OpenType ideo and idtp baselines for the opposite writing mode, or by taking the advance width of a Han character such as 水 U+6C34. (The opposite writing mode must be used because some fonts are compressed so that the characters are not square.) More information on OpenType metrics can be found in the OpenType spec. Note that if 水 U+6C34, 卜 U+535C, and 一 U+4E00 do not all have the same advance width, the font has proportional ideographs and the fullwidth advance width cannot be reliably determined by measuring glyphs. Unless 'text-spacing' is set to ''space-adjacent'' or ''text-spacing/none'' (or the font has proportional fullwidth punctuation glyphs), the UA must collapse the space typically associated with such full width glyphs when placed adjacently on a line as follows:
The following example table lists the punctuation pairs affected by adjancent-pairs trimming. It uses halfwidth equivalents to approximate the trimming effect.
Demonstration of adjacent-pairs punctuation trimming
Combination Sample Pair Looks Like
Opening—Opening + (
Middle Dot—Opening + (
Closing—Opening + )  (
Ideographic Space—Opening  +  (
Closing—Closing + )
Closing—Middle Dot + )
Closing—Ideographic Space +  ) 

Text Spacing Character Classes

In the context of this property the following definitions apply: Issue: Classes and Unicode code points need to be reviewed.
ideographs
Includes all typographic character units [[CSS3TEXT]] whose base character is listed below:
non-ideographic letters
Includes all typographic character units that belong to Unicode Letters [L*] and Mark [M*] category, except when any of the following conditions are met:
non-ideographic numerals
Includes all typographic character units that belong to the Unicode Decimal Digit Number [Nd] category, except when any of the following conditions are met:
fullwidth opening punctuation
Includes any opening punctuation character (Unicode category Ps) that belongs to the CJK Symbols and Punctuation block (U+3000–U+303F) or is categorized as East Asian Fullwidth (F) by [[!UAX11]]. Also includes LEFT SINGLE QUOTATION MARK (U+2018) and LEFT DOUBLE QUOTATION MARK (U+201C). When trimmed, the left (for horizontal text) or top (for vertical text) half is kerned.
fullwidth closing punctuation
Includes any closing punctuation character (Unicode category Pe) that belongs to the CJK Symbols and Punctuation block (U+3000–U+303F) or is categorized as East Asian Fullwidth (F) by [[!UAX11]]. Also includes RIGHT SINGLE QUOTATION MARK (U+2019) and RIGHT DOUBLE QUOTATION MARK (U+201D). May also include fullwidth colon punctuation and/or fullwidth dot punctuation ((see below). When trimmed, the right (for horizontal text) or bottom (for vertical text) half is kerned.
fullwidth middle dot punctuation
Includes MIDDLE DOT (U+00B7), HYPHENATION POINT (U+2027), and KATAKANA MIDDLE DOT (U+30FB). May also include fullwidth colon punctuation and/or fullwidth dot punctuation (see below).
fullwidth colon punctuation
Includes FULLWIDTH COLON (U+FF1A) and FULLWIDTH SEMICOLON (U+FF1B).
fullwidth dot punctuation
Includes IDEOGRAPHIC COMMA (U+3001), IDEOGRAPHIC FULL STOP (U+3002), FULLWIDTH COMMA (U+FF0C), FULLWIDTH FULL STOP (U+FF0E).

Whether fullwidth colon punctuation and fullwidth dot punctuation should be considered fullwidth closing punctuation or fullwidth middle dot punctuation depends on where in the glyph's box the punctuation is drawn. If the punctuation is centered, then it should be considered middle dot punctuation. If the punctuation is drawn to one side (left in horizontal text, top in vertical text) and the other half is therefore blank then the punctuation should be considered closing punctuation and trimmed accordingly. The UA must classify fullwidth colon punctuation and fullwidth dot punctuation under either the fullwidth closing punctuation category or the fullwidth middle dot punctuation category as appropriate. The UA may rely on language conventions and the writing mode (horizontal vs. vertical), and/or font information to determine this categorization. The UA may also add additional characters to any category as appropriate.

The following informative table summarizes language conventions for classifying fullwidth colon and dot punctuation:
colon punctuation dot punctuation
Simplified Chinese (horizontal) closing closing
Simplified Chinese (vertical) closing closing
Traditional Chinese middle dot middle dot
Korean middle dot closing
Japanese middle dot closing
Note that for Chinese fonts at least, the author observes that the standard convention is often not followed.

Japanese Paragraph-start Conventions in CSS

Japanese has three common start-edge typesetting schemes, which are distinguished by their handling of opening brackets.
The first scheme aligns opening brackets flush with the indent edge
						 on the first line and with the start edge of other lines.
						 The second scheme gives the opening bracket its full width,
						 so that it is effectively indented half an em from the indent edge
						 and from the start edge of other lines.
						 The third scheme aligns the opening brackets flush with the
						 start edge of lines, but hangs them inside the indent on the
						 first line (resulting in an effective half-em indent instead
						 of the full em for paragraphs that begin with an opening bracket).

Positioning of opening brackets at line head [[JLREQ]]

Assuming a UA style sheet of p { margin: 1em 0; }, CSS can achieve the Japanese typesetting styles with the following rules:

Edge Effects

Note: Add final level 3 content

Acknowledgements

Note: Add final level 3 list, with Randy Edmunds and Florian Rivoal added