Title: CSS Text Module Level 4
Shortname: css-text
Level: 4
Status: ED
Work Status: Exploring
Group: csswg
ED: https://drafts.csswg.org/css-text-4/
TR: https://www.w3.org/TR/css-text-4/
Editor: Elika J. Etemad / fantasai, Invited Expert, http://fantasai.inkedblade.net/contact, w3cid 35400
Editor: Koji Ishii, Google, kojiishi@gmail.com, w3cid 45369
Editor: Alan Stearns, Adobe Systems, stearns@adobe.com, w3cid 46659
Abstract: This module defines properties for text manipulation and specifies their processing model. It covers line breaking, justification and alignment, white space handling, and text transformation.
Ignored terms: segment break, segment breaks
spec: css-text-3; type: property
text: text-align
text: letter-spacing
text: word-spacing
Introduction
Issue: Add final level 3 content
Issue: Add final level 3 content
White Space Processing
Issue: Add final level 3 tab-size and processing details
White Space Collapsing: the 'text-space-collapse' property
Name: text-space-collapse
Value: collapse | discard | preserve | preserve-breaks | preserve-spaces
Initial: collapse
Applies to: all elements
Inherited: yes
Percentages: n/a
Computed value: as specified
Media: visual
This property declares whether and how
white space inside the element is
collapsed. Values have the following meanings, which must be interpreted
according to the white space processing rules:
- collapse
-
This value directs user agents to collapse sequences of white space
into a single character
(or in some cases, no character).
- preserve
-
This value prevents user agents
from collapsing sequences of white space.
Segment breaks are preserved as forced line breaks.
- preserve-breaks
-
This value collapses white space as for ''collapse'', but preserves
segment breaks as forced line breaks.
- preserve-spaces
-
This value prevents user agents
from collapsing sequences of white space,
and converts tabs and segment breaks to spaces.
(This value is intended to match the behavior
of
xml:space="preserve"
in SVG.)
- discard
-
This value directs user agents to “discard”
all white space in the element.
Issue: Does this preserve line break opportunities or no? Do we need a "hide" value?
The following style rules implement MathML's white space processing:
@namespace m "http://www.w3.org/1998/Math/MathML";
m|* {
text-space-collapse: discard;
}
m|mi, m|mn, m|mo, m|ms, m|mtext {
text-space-trim: trim-inner;
}
White Space Trimming: the 'text-space-trim' property
Name: text-space-trim
Value: none | trim-inner || discard-before || discard-after
Initial: none
Applies to: all elements
Inherited: no
Percentages: n/a
Computed value: as specified
Media: visual
This property allows authors to specify trimming behavior
at the beginning and end of a box.
Values have the following meanings,
which must be interpreted according to the white space processing rules:
- trim-inner
-
For block containers this value directs UAs to discard all whitespace
at the beginning of the element up to and including the last segment break
before the first non-white-space character in the element as well as
to discard all white space at the end of the element starting with the
first segment break after the last non-white-space character in the element.
For other elements this value directs UAs to discard all whitespace
at the beginning and end of the element.
- discard-before
-
This value directs the UA to collapse all collapsible whitespace
immediately before the start of the element.
- discard-after
-
This value directs the UA to collapse all collapsible whitespace
immediately after the end of the element.
The following style rules render DT elements as a comma-separated list:
dt { display: inline; }
dt + dt:before { content: ", "; text-space-trim: discard-before; }
Line Breaking and Word Boundaries
Issue: Add final level 3 content
Text Wrapping
Text wrapping is controlled by the 'text-wrap',
'wrap-before',
'wrap-after',
'wrap-inside',
and overflow-wrap properties:
Issue: Add final level 3 overflow-wrap
Text Wrap Settings: the 'text-wrap' property
Name: text-wrap
Value: wrap | nowrap | balance | multi-line
Initial: wrap
Applies to: all elements
Inherited: yes
Percentages: n/a
Computed value: as specified
Media: visual
This property specifies the mode for text wrapping.
Possible values:
- wrap
-
Lines may break at allowed break points,
as determined by the line-breaking rules in effect.
Line breaking behavior defined
for the WJ, ZW, and GL line-breaking classes
in [[!UAX14]] must be honored.
The exact algorithm is UA-defined.
The algorithm may consider multiple lines when making break decisions.
The UA may bias for speed over best layout.
- nowrap
-
Lines may not break; text that does not fit within the block container
overflows it.
- balance
-
Same as ''text-wrap/wrap'' for inline-level elements.
For block-level elements that
contain line boxes as direct children,
line breaks are chosen to balance
the remaining (empty) space in each line box,
if better balance than ''text-wrap/wrap'' is possible.
This must not change the number of line boxes
the block would contain
if 'text-wrap' were set to ''text-wrap/wrap''.
The remaining space to consider
is that which remains after placing floats and inline content,
but before any adjustments due to text justification.
Line boxes are balanced when the standard deviation
from the average inline-size of the remaining space in each line box
is reduced over the block
(including lines that end in a forced break).
The exact algorithm is UA-defined.
UAs may treat this value as ''text-wrap/wrap'' if there are more than ten lines to balance.
- multi-line
-
Same as ''text-wrap/wrap'' for inline-level elements.
Same as ''text-wrap/wrap'' for block-level elements,
except as below.
The exact algorithm is UA-defined.
The algorithm should consider multiple lines when making break decisions.
The UA should bias for best layout over speed.
ISSUE: This feature does not have CSSWG consensus, it is being proposed in Issue 672.
Regardless of the 'text-wrap' value,
lines always break at forced breaks:
for all values,
line-breaking behavior defined
for the BK, CR, LF, CM, NL, and SG line breaking classes
in [[!UAX14]] must be honored.
UAs that allow breaks at punctuation other than spaces
should prioritize breakpoints.
For example,
if breaks after slashes have a lower priority than spaces,
the sequence “check /etc”
will never break between the ‘/’ and the ‘e’.
The UA may use the width of the containing block,
the text's language,
and other factors in assigning priorities.
As long as care is taken to avoid such awkward breaks,
allowing breaks at appropriate punctuation other than spaces
is recommended,
as it results in more even-looking margins,
particularly in narrow measures.
Note: The ''text-wrap/wrap'' value is intended
for speedy legacy line breaking,
which has so far used first-fit/greedy algorithms
that can often give sub-optimal results.
UAs could experiment with better line breaking algorithms
with this default value,
but optimal results will probably take more time.
The ''text-wrap/multi-line'' and ''text-wrap/balance'' values
are intended as opt-in choices to take more time for better results.
The ''text-wrap/balance'' value is intended for titles and captions,
and the ''text-wrap/multi-line'' is intended for body text.
Note: Some line breaking algorithms can interact unexpectedly with editing.
Changing upstream line breaks on user edits can be unsettling.
As UAs experiment with better line breaking algorithms,
we will likely need to add a property
to constrain upstream changes while editing.
Inline breaks between boxes: the 'wrap-before'/'wrap-after' properties
Name: wrap-before, wrap-after
Value: auto | avoid | avoid-line | avoid-flex | line | flex
Initial: auto
Applies to: inline-level boxes and flex items
Inherited: no
Percentages: n/a
Computed value: as specified
Media: visual
These properties specify modifications to break opportunities
in line breaking (and flex line breaking [[CSS3-FLEXBOX]]).
Possible values:
- auto
-
Lines may break at allowed break points
before and after the box,
as determined by the line-breaking rules in effect.
- avoid
-
Line breaking is suppressed immediately before/after the box:
the UA may only break there
if there are no other valid break points
in the line.
If the text breaks,
line-breaking restrictions are honored as for
''wrap-before/auto''.
- avoid-line
-
Same as ''wrap-before/avoid'',
but only for line breaks.
- avoid-flex
-
Same as ''wrap-before/avoid'',
but only for flex line breaks.
- line
-
Force a line break immediately before/after the box
if the box is an inline-level box.
- flex
-
Force a flex line break immediately before/after the box
if the box is a flex item
in a multi-line flex container.
Forced line breaks on inline-level boxes propagate upward
through any parent inline boxes
the same way forced breaks on block-level boxes propagate upward
through any parent block boxes
in the same fragmentation context.
[[!CSS3-BREAK]]
Line breaks within boxes: the 'wrap-inside' property
Name: wrap-inside
Value: auto | avoid
Initial: auto
Applies to: inline boxes
Inherited: no
Percentages: n/a
Computed value: as specified
Media: visual
- auto
-
Lines may break at allowed break points
within the box,
as determined by the line-breaking rules in effect.
- avoid
-
Line breaking is suppressed within the box:
the UA may only break within the box
if there are no other valid break points in the line.
If the text breaks,
line-breaking restrictions are honored as for
''wrap-inside/auto''.
If boxes with ''wrap-inside/avoid'' are nested
and the UA must break within these boxes,
a break in an outer box must be used
before a break within an inner box may be used.
Example of using 'wrap-inside: avoid' in presenting a footer
The priority of breakpoints can be set
to reflect the intended grouping of text.
Given the rules
footer { wrap-inside: avoid; }
venue { wrap-inside: avoid; }
date { wrap-inside: avoid; }
place { wrap-inside: avoid; }
and the following markup:
<footer>
<venue>27th Internationalization and Unicode Conference</venue>
• <date>April 7, 2005</date> •
<place>Berlin, Germany</place>
</footer>
In a narrow window the footer could be broken as
27th Internationalization and Unicode Conference •
April 7, 2005 • Berlin, Germany
or in a narrower window as
27th Internationalization and Unicode
Conference • April 7, 2005 •
Berlin, Germany
but not as
27th Internationalization and Unicode Conference • April
7, 2005 • Berlin, Germany
Last Line Minimum Length
See
thread.
Issue is about requiring a minimum length for lines.
Common measures seem to be
- At least as long as the text-indent.
- At least X characters.
- Percentage-based.
Suggestion for value space is ''match-indent | <
> | <>''
(with ''Xch'' given as an example to make that use case clear).
Alternately <> could actually count the characters.
It's unclear how this would interact with text balancing (above);
one earlier proposal had them be the same property
(with ''100%'' meaning full balancing).
People have requested word-based limits, but since this is really
dependent on the length of the word, character-based is better.
Shorthand for White Space and Wrapping: the 'white-space' property
Name: white-space
Value: normal | pre | nowrap | pre-wrap | pre-line
Initial: auto
Applies to: all elements
Inherited: yes
Percentages: n/a
Computed value: as specified
Media: visual
This property is a shorthand for 'text-space-collapse', 'text-wrap', and 'text-space-trim'.
Note: This shorthand combines both inheritable and non-inheritable properties.
If this is a problem, please inform the CSSWG.
The following table gives the mapping of the values of the shorthand to its longhands.
'white-space'
| 'text-space-collapse'
| 'text-wrap'
| 'text-space-trim'
|
''white-space/normal''
| ''text-space-collapse/collapse''
| ''text-wrap/wrap''
| ''text-space-trim/none''
|
''pre''
| ''text-space-collapse/preserve''
| ''text-wrap/nowrap''
| ''text-space-trim/none''
|
''nowrap''
| ''text-space-collapse/collapse''
| ''text-wrap/nowrap''
| ''text-space-trim/none''
|
''pre-wrap''
| ''text-space-collapse/preserve''
| ''text-wrap/wrap''
| ''text-space-trim/none''
|
''pre-line''
| ''text-space-collapse/preserve-breaks''
| ''text-wrap/wrap''
| ''text-space-trim/none''
|
Issue: Add details from level 3
Breaking Within Words
Issue: Add final level 3 content
Hyphens: the 'hyphenate-character' property
Name: hyphenate-character
Value: auto | <string>
Initial: auto
Applies to: all elements
Inherited: yes
Percentages: n/a
Computed value: as specified
Media: visual
This property specifies strings that are shown between parts of
hyphenated words. The auto value means that the user agent should
find an appropriate value, preferably from the same source as the
hyphenation dictionary. If a string is specified, it appears at
the end of the line before a hyphenation break.
In Latin scripts, the hyphen character (U+2010) is often used to
indicate that a word has been split. Normally, it will not be
necessary to set it explicitly. However, this can easily be done:
article { hyphenate-character: "\2010" }
Note: Both hyphens triggered by automatic hyphenation and
hyphens triggered by soft hyphens are rendered according to
'hyphenate-character'.
Hyphenation Size Limit: the 'hyphenate-limit-zone' property
Name: hyphenate-limit-zone
Value: <percentage> | <length>
Initial: 0
Applies to: block containers
Inherited: yes
Percentages: refers to width of the line box
Computed value: as specified
Media: visual
Is 'hyphenate-limit-zone' a good name? Comments/suggestions?
This property specifies the maximum amount of unfilled space (before
justification) that may be left in the line box before hyphenation is
triggered to pull part of a word from the next line back up into the
current line.
Hyphenation Character Limits: the 'hyphenate-limit-chars' property
Name: hyphenate-limit-chars
Value: [ auto | <> ]{1,3}
Initial: auto
Applies to: all elements
Inherited: yes
Percentages: n/a
Computed value: as specified
Media: visual
This property specifies the minimum number of characters in a
hyphenated word. If the word does not meet the required minimum
number of characters in the word / before the hyphen / after the
hyphen, then the word must not be hyphenated. Nonspacing combining
marks (Unicode class) and intra-word
punctuation (Unicode classes P*) do not count towards the minimum.
If three values are specified, the first value is the required
minimum for the total characters in a word, the second value is
the minimum for characters before the hyphenation point, and
the third value is the minimum for characters after the hyphenation
point. If the third value is missing, it is the same as the second.
If the second value is missing,
then it is ''hyphenate-limit-chars/auto''.
The ''hyphenate-limit-chars/auto''
value means that the UA chooses a value that adapts to the current
layout.
Note: Unless the UA is able to calculate a better value, it
is suggested that ''hyphenate-limit-chars/auto'' means 2 for before and after, and 5 for
the word total.
In the example below, the minimum size of a hyphenated word is
left to the UA (which means it may vary depending on the language,
the length of the line, or other factors), but the minimum number
of characters before and after the hyphenation point is set to 3.
p { hyphenate-limit-chars: auto 3; }
Hyphenation Line Limits: the 'hyphenate-limit-lines' and 'hyphenate-limit-last' properties
Name: hyphenate-limit-lines
Value: no-limit | <integer>
Initial: no-limit
Applies to: block containers
Inherited: yes
Percentages: n/a
Computed value: as specified
Media: visual
This property indicates the maximum number of successive hyphenated
lines in an element. The ''no-limit'' value means that there is no limit.
In some cases, user agents may not be able to honor the specified value.
(See overflow-wrap.) It is not defined whether hyphenation introduced by
such emergency breaking influences nearby hyphenation points.
Name: hyphenate-limit-last
Value: none | always | column | page | spread
Initial: none
Applies to: block containers
Inherited: yes
Percentages: n/a
Computed value: as specified
Media: visual
This property indicates hyphenation behavior at the end of elements,
column, pages, and spreads. A spread is a set of two pages that are
visible to the reader at the same time. Values are:
- none
-
No restrictions imposed.
- always
-
The last full line of the element, or the last line before any
column, page, or spread break inside the element should not be
hyphenated.
- column
-
The last line before any column, page, or spread break inside
the element should not be hyphenated.
- page
-
The last line before page or spread break inside the element
should not be hyphenated.
- spread
-
The last line before any spread break inside the element should
not be hyphenated.
p { hyphenate-limit-last: always }
div.chapter { hyphenate-limit-last: spread }
A paragraph may be formatted like this when 'hyphenate-limit-last: none' is set:
This is just a
simple example
to show Antarc-
tica.
With 'hyphenate-limit-last: always' one would get:
This is just a
simple example
to show
Antarctica.
Alignment and Justification
Issue: Add final level 3 content
Add this value to 'text-align'
- <string>
-
The string must be a single character; otherwise the declaration must
be ignored.
When applied to a table cell, specifies the alignment character
around which the cell's contents will align. See
below for further details and
how this value combines with keywords.
Character-based Alignment in a Table Column
When multiple cells in a column have an alignment character specified,
the alignment character of each such cell in the column is centered along
a single column-parallel axis and the rest of the text in the column
shifted accordingly. (Note that the strings do not have to be the same
for each cell, although they usually are.)
Is this intended to say that it's the centers
of the alignment characters that should be aligned?
It's not clear that's what it says,
but that (or a different behavior) needs to be specified,
to describe what happens
when different occurrences of the alignment character
are in different fonts.
(Further, is that the intended behavior? Probably the most
significant use case to consider is bold vs. non-bold text,
which only varies slightly in width.)
[feedback]
[minutes face-to-face 2016-02-02 10:00 AM]
The following style sheet:
TD { text-align: "." center }
will cause the column of dollar figures in the following HTML table:
<TABLE>
<COL width="40">
<TR> <TH>Long distance calls
<TR> <TD> $1.30
<TR> <TD> $2.50
<TR> <TD> $10.80
<TR> <TD> $111.01
<TR> <TD> $85.
<TR> <TD> N/A
<TR> <TD> $.05
<TR> <TD> $.06
</TABLE>
to align along the decimal point. The table might be rendered as
follows:
+---------------------+
| Long distance calls |
+---------------------+
| $1.30 |
| $2.50 |
| $10.80 |
| $111.01 |
| $85. |
| N/A |
| $.05 |
| $.06 |
+---------------------+
A keyword value may be specified in conjunction with the <string>
value; if it is not given, it defaults to ''text-align/right''. This value is used:
-
when character-based alignment is applied to boxes that are not table
cells.
-
when the text wraps to multiple lines (at unforced break points).
-
when a character-aligned cell spans more than one column. In this
case the keyword alignment value is used to determine which column's
axis to align with: the leftmost column for ''text-align/left'', the rightmost
column for ''text-align/right'' and ''text-align/center'', the startmost column for ''text-align/start'',
the endmost column for ''text-align/end''.
-
when the column is wide enough that the character alignment alone does
not determine the positions of its character-aligned contents. In this
case the keyword alignment of the first cell in the column with a
specified alignment character is used to slide the position of the
character-aligned contents to match the keyword alignment insofar as
possible without changing the width of the column.
For ''text-align/center'', the UA may center
the aligned contents using its extremes, center the alignment axis
itself (insofar as possible), or optically center the aligned contents
some other way (such as by taking a weighted average of the extent of
the cells' contents to either side of the axis).
Note: Right alignment is used by default for character-based
alignment because numbering systems are almost all left-to-right even
in right-to-left writing systems, and the primary use case of
character-based alignment is for numerical alignment.
If the alignment character appears more than once in the text, the first
instance is used for alignment. If the alignment character does not appear
in a cell at all, the string is aligned as if the alignment character had
been inserted at the end of its contents.
This needs to specify what text is searched
for the alignment character.
Is it only in-flow text whose containing block is the cell?
Or is text within any in-flow descendants
in the block formatting context established by the cell considered?
If so, is it considered only as long as its 'text-align' property
is consistent with the cell's?
(Consistent in the alignment character, or fully consistent?)
This behavior of aligning as though
the alignment character had been inserted at the end of
the contents of the cell,
combined with center-of-character alignment,
will produce gaps on the end-side of lines
that are alone on a line with <string> text-alignment,
when none of the lines of the column has the alignment character,
or, more importantly, when some of the lines
do have the alignment character,
but the column is not laid out at its max-content width.
This is probably undesirable.
When the alignment character is inserted at
the end of the contents, which font is used?
(In particular, if the alignment character might be within
a descendant block, is it the font of the block or
the font of the table cell?
Or if the insertion is at a forced break within an inline,
does it use the font of the inline or the font of the block or cell?)
Character-based alignment occurs before table cell width computation so
that auto width computations can leave enough space for alignment.
Whether column-spanning cells participate in the alignment prior to
or after width computation is undefined.
If width constraints on the cell contents prevent full alignment
throughout the column, the resulting alignment is undefined.
This should have a formal definition
of how character alignment affects
the min-content and max-content intrinsic widths
(of table columns and all content that can be inside table columns).
Max-content intrinsic widths need to be split
into three numbers (assuming that it's the centers of the
alignment character that are aligned):
one for widths without alignment characters,
one for widths on the inline-start side
of the center of the alignment character,
one for widths on the inline-end side
of the center of the alignment character.
This operates based on all segments of text
between forced breaks for max-content widths.
For min-content widths, segments of text between forced breaks
that contain optional breaks within them should clearly contribute
only to the without-alignment-character width.
However, it's less clear
whether all min-content widths should work this way,
or whether segments between forced breaks
that do not have optional breaks
(and perhaps only those that actually contain the alignment character)
should contribute to start-side-of-alignment-character
and end-side-of-alignment-character min-content widths instead;
this choice is a tradeoff between the meaning of min-content
sizing of a table meaning the narrowest reasonable size versus
honoring alignment characters in more cases.
Another option might be to use whether line-breaking of optional breaks
is allowed as a control for which behavior to use.
Formally defining the intrinsic width contributions
of column-spanning cells with <string> values of
'text-align' is a complicated (although straightforward) extension
of the decisions made for intrinsic width contributions
of non-column-spanning cells;
this should also be formally defined.
Contributions end up being made to the split intrinsic widths
of the startmost or endmost column (whichever is used for alignment),
and to the without-alignment-character intrinsic widths
of the other spanned columns.
Spacing
Issue: Add final level 3 word-spacing, letter-spacing
Character Class Spacing: the 'text-spacing' property
Name:
| text-spacing
|
Value:
| normal | none |
[ trim-start | space-start ] ||
[ trim-end | space-end | allow-end ] ||
[ trim-adjacent | space-adjacent ] ||
no-compress ||
ideograph-alpha ||
ideograph-numeric ||
punctuation
|
Initial:
| normal
|
Applies to:
| block containers
|
Inherited:
| yes
|
Percentages:
| N/A
|
Media:
| visual
|
Computed value:
| specified value
|
This property controls spacing between adjacent characters
on the same line within the same inline formatting context
using a set of character-class-based rules.
Such spacing can either be created between or trimmed from the affected glyphs.
Values are defined as follows:
- normal
-
Specifies the baseline behavior,
equivalent to ''space-start allow-end trim-adjacent''.
- none
-
Turns off all text-spacing features.
All fullwidth characters are set with full-width glyphs.
- ideograph-alpha
-
Creates 1/4em extra spacing between runs of
ideographs and non-ideographic letters.
Note: A commonly used algorithm for determining this behavior is specified in [[JLREQ]].
- ideograph-numeric
-
Creates 1/4em extra spacing between runs of
ideographs and non-ideographic numerals glyphs.
Note: A commonly used algorithm for determining this behavior is specified in [[JLREQ]].
- punctuation
-
Creates extra non-breaking spacing around punctuation as required by language-specific typographic conventions.
In this level, if the element's content language is French,
narrow no-break space (U+202F) and no-break space (U+00A0) is inserted
where required by French typographic guidelines.
Otherwise this value has no effect.
However future specifications may add automatic spacing behavior for other languages.
- space-start
-
Set fullwidth opening punctuation with full-width glyphs (spaced)
at the start of each line.
- trim-start
-
Set fullwidth opening punctuation with half-width glyphs (flush)
at the start of each line.
- allow-end
-
Set fullwidth closing punctuation with half-width glyphs (flush)
at the end of each line
if it does not otherwise fit prior to justification;
otherwise set the punctuation with full-width glyphs.
- space-end
-
Set fullwidth closing punctuation with full-width glyphs (spaced)
at the end of each line.
- trim-end
-
Set fullwidth closing punctuation with half-width glyphs (flush)
at the end of each line.
- space-adjacent
-
Set fullwidth opening punctuation with full-width glyphs (spaced)
when not at the start of the line.
Set fullwidth closing punctuation with full-width glyphs (spaced)
when not at the end of the line.
- trim-adjacent
-
Collapse spacing between punctuation glyphs
as described below.
- no-compress
-
Justification may not compress text-spacing.
(If this value is not specified, the justification process may reduce autospacing
except when the spacing is at the start or end of the line.)
Note: An example of compression rules is given for Japanese
in 3.8 Line Adjustment in [[JLREQ]].
This property is additive with the 'word-spacing' and 'letter-spacing' properties.
That is, the amount of spacing contributed by the 'letter-spacing' setting (if any)
is added to the spacing created by 'text-spacing'.
The same applies to 'word-spacing'.
At element boundaries, the amount of extra spacing introduced between characters
is determined by and rendered within the innermost element that contains the boundary.
If the extra spacing is applied to a particular glyph,
then the spacing is determined by the innermost element containing that glyph.
Note: Values other than ''text-spacing/normal'', ''text-spacing/none'', ''trim-start'', ''trim-end'', and ''space-end''
are at-risk and may be dropped from this level of CSS.
They are defined here currently to help work out a complete design of this feature.
Support for this property is optional.
It is strongly recommended for UAs that wish to support CJK typography.
Issue: It was requested to add a value for doubling the space after periods.
Fullwidth Punctuation Collapsing
Typically, fullwidth characters have glyphs with the same advance width
as a standard Han character (e.g. 水 U+6C34).
However, many fullwidth punctuation glyphs only take up part of the fullwidth design space.
Thus such punctuation are not always set fullwidth.
Several values of 'text-spacing' allow the author to control
when such characters are set half-width (typically half the width of an ideograph)
and when they are set full-width.
In order to set the text as specified, the UA will need to either
-
trim (kern) the blank half of the glyphs,
if they are given full-width and must be set half-width, or
-
add space to the glyphs,
if they are given half-width must be set full-width.
Some fonts use proportional glyphs for fullwidth punctuation characters.
For such proportional glyphs, the given advance width is considered
simultaneously full-width and half-width: no space is added or removed.
Note: The advance width of a standard Han character
can be determined either from font metrics
such as the OpenType ideo
and idtp
baselines for the opposite writing mode,
or by taking the advance width of a Han character such as 水 U+6C34.
(The opposite writing mode must be used because some fonts are compressed
so that the characters are not square.)
More information on OpenType metrics can be found
in the OpenType spec.
Note that if 水 U+6C34, 卜 U+535C, and 一 U+4E00 do not all have the same advance width,
the font has proportional ideographs
and the fullwidth advance width cannot be reliably determined by measuring glyphs.
Unless 'text-spacing' is set to ''space-adjacent'' or ''text-spacing/none''
(or the font has proportional fullwidth punctuation glyphs),
the UA must collapse the space typically associated with such full width glyphs
when placed adjacently on a line
as follows:
The following example table lists the punctuation pairs
affected by adjancent-pairs trimming.
It uses halfwidth equivalents to approximate the trimming effect.
Demonstration of adjacent-pairs punctuation trimming
Combination
| Sample Pair
| Looks Like
|
Opening—Opening
| 〔+(
| 〔(
|
---|
Middle Dot—Opening
| ・+(
| ・(
|
---|
Closing—Opening
| 〕+(
| ) (
|
---|
Ideographic Space—Opening
| +(
| (
|
---|
Closing—Closing
| )+〕
| )〕
|
---|
Closing—Middle Dot
| )+・
| )・
|
---|
Closing—Ideographic Space
| )+
| )
|
---|
Text Spacing Character Classes
In the context of this property the following definitions apply:
Issue: Classes and Unicode code points need to be reviewed.
- ideographs
- Includes all typographic character units [[CSS3TEXT]] whose base character is listed below:
- All characters in the range of U+3041 to U+30FF,
except those that belong to Unicode Punctuation [P*] category.
- CJK Strokes (U+31C0 to U+31EF).
- Katakana Phonetic Extensions (U+31F0 to U+31FF).
- All characters that belongs to Han Unicode Script Property [[!UAX24]].
- non-ideographic letters
-
Includes all typographic character units that
belong to Unicode Letters [L*] and Mark [M*] category,
except when any of the following conditions are met:
- is defined as ideograph.
- is categorized as East Asian Fullwidth (F) by [[!UAX11]].
- is upright in vertical text flow using the 'text-orientation' property
or the 'text-combine-upright' property.
- non-ideographic numerals
-
Includes all typographic character units that
belong to the Unicode Decimal Digit Number [Nd] category,
except when any of the following conditions are met:
- is categorized as East Asian Fullwidth (F) by [[!UAX11]].
- is upright in vertical text flow using the 'text-orientation' property
or the 'text-combine-upright' property.
- fullwidth opening punctuation
-
Includes any opening punctuation character (Unicode category
Ps
)
that belongs to the CJK Symbols and Punctuation block (U+3000–U+303F)
or is categorized as East Asian Fullwidth (F) by [[!UAX11]].
Also includes LEFT SINGLE QUOTATION MARK (U+2018) and LEFT DOUBLE QUOTATION MARK (U+201C).
When trimmed, the left (for horizontal text) or top (for vertical text) half is kerned.
- fullwidth closing punctuation
-
Includes any closing punctuation character (Unicode category
Pe
)
that belongs to the CJK Symbols and Punctuation block (U+3000–U+303F)
or is categorized as East Asian Fullwidth (F) by [[!UAX11]].
Also includes RIGHT SINGLE QUOTATION MARK (U+2019) and RIGHT DOUBLE QUOTATION MARK (U+201D).
May also include fullwidth colon punctuation and/or fullwidth dot punctuation
((see below).
When trimmed, the right (for horizontal text) or bottom (for vertical text) half is kerned.
- fullwidth middle dot punctuation
-
Includes MIDDLE DOT (U+00B7), HYPHENATION POINT (U+2027), and KATAKANA MIDDLE DOT (U+30FB).
May also include fullwidth colon punctuation and/or fullwidth dot punctuation
(see below).
- fullwidth colon punctuation
-
Includes FULLWIDTH COLON (U+FF1A) and FULLWIDTH SEMICOLON (U+FF1B).
- fullwidth dot punctuation
-
Includes
IDEOGRAPHIC COMMA (U+3001),
IDEOGRAPHIC FULL STOP (U+3002),
FULLWIDTH COMMA (U+FF0C),
FULLWIDTH FULL STOP (U+FF0E).
Whether fullwidth colon punctuation and fullwidth dot punctuation
should be considered fullwidth closing punctuation or fullwidth middle dot punctuation
depends on where in the glyph's box the punctuation is drawn.
If the punctuation is centered,
then it should be considered middle dot punctuation.
If the punctuation is drawn to one side (left in horizontal text, top in vertical text)
and the other half is therefore blank
then the punctuation should be considered closing punctuation and trimmed accordingly.
The UA must classify fullwidth colon punctuation and fullwidth dot punctuation
under either the fullwidth closing punctuation category or the fullwidth middle dot punctuation category
as appropriate.
The UA may rely on language conventions and the writing mode (horizontal vs. vertical),
and/or font information to determine this categorization.
The UA may also add additional characters to any category as appropriate.
The following informative table summarizes language conventions
for classifying fullwidth colon and dot punctuation:
| colon punctuation | dot punctuation
|
Simplified Chinese (horizontal) | closing | closing
|
---|
Simplified Chinese (vertical) | closing | closing
|
---|
Traditional Chinese | middle dot | middle dot
|
---|
Korean | middle dot | closing
|
---|
Japanese | middle dot | closing
|
---|
Note that for Chinese fonts at least,
the author observes that the standard convention is often not followed.
Japanese Paragraph-start Conventions in CSS
Japanese has three common start-edge typesetting schemes,
which are distinguished by their handling of opening brackets.
Assuming a UA style sheet of
p { margin: 1em 0; }
,
CSS can achieve the Japanese typesetting styles with the following rules:
-
Brackets flush with indent, flush with other lines (first scheme):
p { /* Flush alignment */
margin: 0;
text-indent: 1em;
text-spacing: trim-start;
}
-
Brackets preserve fullwidth spacing on all lines (second scheme):
p { /* Fullwidth alignment */
margin: 0;
text-indent: 1em;
text-spacing: normal;
}
-
Brackets hang in indent, flush with other lines (third scheme):
p { /* Hanging alignment */
margin: 0;
text-indent: 1em;
text-spacing: trim-start;
hanging-punctuation: first;
}
Edge Effects
Note: Add final level 3 content
Acknowledgements
Note: Add final level 3 list, with Randy Edmunds and Florian Rivoal added