Title: CSS Text Module Level 3
Shortname: css-text
Level: 3
Status: ED
Work Status: Refining
Group: csswg
ED: https://drafts.csswg.org/css-text-3/
TR: https://www.w3.org/TR/css-text-3/
Previous version: https://www.w3.org/TR/2019/WD-css-text-3-20191113/
Previous Version: https://www.w3.org/TR/2018/WD-css-text-3-20181212/
Previous Version: https://www.w3.org/TR/2018/WD-css-text-3-20181206/
Previous Version: https://www.w3.org/TR/2018/WD-css-text-3-20180920/
Previous Version: https://www.w3.org/TR/2017/WD-css-text-3-20170822/
Previous Version: https://www.w3.org/TR/2013/WD-css-text-3-20131010/
Previous Version: https://www.w3.org/TR/2012/WD-css3-text-20121113/
Issue Tracking: Tracker http://www.w3.org/Style/CSS/Tracker/products/10
Test Suite: http://test.csswg.org/suites/css3-text/nightly-unstable/
Editor: Elika J. Etemad / fantasai, Invited Expert, http://fantasai.inkedblade.net/contact, w3cid 35400
Editor: Koji Ishii, Invited Expert, kojiishi@gluesoft.co.jp, w3cid 45369
Editor: Florian Rivoal, Invited Expert, https://florian.rivoal.net, w3cid 43241
Abstract: This CSS module defines properties for text manipulation and specifies their processing model. It covers line breaking, justification and alignment, white space handling, and text transformation.
At Risk: the ''full-width'' value of 'text-transform'
At Risk: the ''full-size-kana'' value of 'text-transform'
At Risk: the <length> values of the 'tab-size' property
At Risk: the 'text-justify' property
At Risk: the percentage values of 'word-spacing'
At Risk: the 'hanging-punctuation' property
At Risk: Writing-system specific adjustments to line-breaking
At Risk: Trimming trailing Ogham space marks
Ignored Vars: letter-spacing
Status Text: This publication partially addresses the issues in the disposition of comments since the October 2013 Last Call Working Draft, and, while a marked improvement over the previous draft, is not considered to be entirely up-to-date at the time of publication. A completed dispostion of comments and corresponding draft will be published once the issues are fully addressed and reviewed by the CSSWG and Internationalization WG.
WPT Path Prefix: /css/css-text/
Further information about the typesetting requirements
of various languages and writing systems around the world
can be found in the Internationalization Working Group’s
Typography Index.
[[TYPOGRAPHY]]
Module Interactions
This module, together with [[CSS-TEXT-DECOR-3]],
replaces and extends the text-level features defined in [[!CSS2]] chapter 16.
In addition to the terms defined below,
other terminology and concepts used in this specification are defined
in [[!CSS2]] and [[!CSS-WRITING-MODES-3]].
Value Definitions
This specification follows the CSS property definition conventions from [[!CSS2]]
using the value definition syntax from [[!CSS-VALUES-3]].
Value types not defined in this specification are defined in CSS Values & Units [[!CSS-VALUES-3]].
Combination with other CSS modules may expand the definitions of these value types.
In addition to the property-specific values listed in their definitions,
all properties defined in this specification
also accept the CSS-wide keywords as their property value.
For readability they have not been repeated explicitly.
Languages and Typesetting
Authors should accurately language-tag their content for the best typographic behavior.
Many typographic effects vary by linguistic context.
Language and writing system conventions can affect
line breaking, hyphenation, justification, glyph selection,
and many other typographic effects.
In CSS, language-specific typographic tailorings
are only applied when the content language is known (declared).
Therefore,
higher quality typography requires authors to communicate to the UA
the correct linguistic context of the text in the document.
The content language of an element is the (human) language
the element is declared to be in, according to the rules of the
document language.
Note that it is possible for the content language of an element
to be unknown--
e.g. untagged content,
or content in a document language that does not have a language-tagging facility
is considered to have an unknown content language.
Note: Authors can declare the [=content language=]
using the global lang attribute in HTML
or the universal xml:lang attribute in XML.
See the rules
for determining the content language of an HTML element in [[HTML]],
and the rules
for determining the content language of an XML element in [[XML10]].
The [=content language=] an element is declared to be in
also identifies the specific written form of that language used in that element,
known as the content writing system.
Depending on the [=document language=]'s facilities for identifying the [=content language=],
this information can be explicit or implied.
See the normative [[#script-tagging]].
Note: Some languages have more than one writing system tradition;
in other cases a language can be transliterated into a foreign writing system.
Authors should subtag such cases
so that the UA can adapt appropriately.
Characters and Letters
The basic unit of typesetting is the character.
However, because writing systems are not always as simple as the basic English alphabet,
what a character actually is depends on the context in which the term is used.
For example, in Hangul (the Korean writing system),
each square representation of a syllable
(e.g. 한=Han)
can be considered a character.
However, the square symbol is really composed of multiple letters each representing a phoneme
(e.g. ㅎ=h,
ㅏ=a,
ㄴ=n)
and these also could each be considered a character.
A basic unit of computer text encoding, for any given encoding,
is also called a character,
and depending on the encoding,
a single encoding character might correspond
to the entire pre-composed syllabic character (e.g. 한),
to the individual phonemic character (e.g. ㅎ),
or to smaller units such as
a base letterform (e.g. ㅇ)
and any combining marks that vary it (e.g. extra strokes that represent aspiration).
In turn, a single encoding character can be represented in the data stream as one or more bytes;
and in programming environments one byte is sometimes also called a character.
Therefore the term character is fairly ambiguous where technical precision is required.
For text layout, we will refer to the typographic character unit
as the basic unit of text.
Even within the realm of text layout,
the relevant character unit depends on the operation.
For example, line-breaking and letter-spacing will segment
a sequence of Thai characters that include U+0E33 THAI CHARACTER SARA AM differently;
or the behaviour of a conjunct consonant in a script such as Devanagari
may depend on the font in use.
So the typographic character represents a unit of the writing system—such as a Latin alphabetic letter (including its diacritics),
Hangul syllable,
Chinese ideographic character,
Myanmar syllable cluster—that is indivisible with respect to a particular typographic operation
(line-breaking, first-letter effects, tracking, justification, vertical arrangement, etc.).
word-break/word-break-break-all-007.html
word-break/word-break-break-all-008.html
Unicode Standard Annex #29: Text Segmentation
defines a unit called the grapheme cluster
which approximates the typographic character.
A UA must use the extended grapheme cluster
(not legacy grapheme cluster), as defined in [[!UAX29]],
as the basis for its typographic character unit.
However, the UA should tailor the definitions
as required by typographic tradition
since the default rules are not always appropriate or ideal--
and is expected to tailor them differently
depending on the operation as needed.
line-breaking/line-breaking-013.html
line-breaking/line-breaking-014.html
Note: The rules for such tailorings are out of scope for CSS.
The following are some examples of typographic character unit tailorings
required by standard typesetting practice:
In some scripts such as Myanmar or Devanagari,
the typographic character unit for both justification and line-breaking
is an entire syllable,
which can include more than one [[!UAX29]] grapheme cluster.
In other scripts such as Thai or Lao,
even though for line-breaking the typographic character
matches Unicode’s default grapheme clusters,
for letter-spacing the relevant unit
is less than a [[!UAX29]] grapheme cluster,
and may require decomposition or other substitutions
before spacing can be inserted.
For instance,
to properly letter-space the Thai word คำ (U+0E04 + U+0E33),
the U+0E33 needs to be decomposed into U+0E4D + U+0E32,
and then the extra letter-space inserted before the U+0E32: คํ า.
A slightly more complex example is น้ำ (U+0E19 + U+0E49 + U+0E33).
In this case, normal Thai shaping will first decompose the U+0E33 into U+0E4D + U+0E32
and then swap the U+0E4D with the U+0E49, giving U+0E19 + U+0E4D + U+0E49 + U+0E32.
As before the extra letter-space is then inserted before the U+0E32: นํ้ า.
Vertical typesetting [[!CSS-WRITING-MODES-3]] can also require tailoring.
For example, when typesetting ''text-orientation/upright'' text,
Tibetan tsek and shad marks are kept with the preceding grapheme cluster,
rather than treated as an independent typographic character unit.
A typographic letter unit or letter for the purpose of this specification
is a typographic character unit belonging to one of the Letter or Number general
categories in Unicode. [[!UAX44]]
See Character Properties
for how to determine the Unicode properties of a typographic character unit.
The rendering characteristics of a typographic character unit divided
by an element boundary is undefined.
Ideally each component should be rendered
according to the formatting requirements of its respective element’s properties
while maintaining correct shaping and positioning
of the typographic character unit as a whole.
However, depending on the nature of the formatting differences between its parts
and the capabilities of the font technology in use,
this is not always possible.
Therefore such a typographic character unit
may be rendered as belonging to either side of the boundary,
or as some approximation of belonging to both.
Authors are forewarned that dividing grapheme clusters
or ligatures
by element boundaries may give inconsistent or undesired results.
Text Processing
CSS is built on [[UNICODE]].
UAs that support Unicode must adhere to all normative requirements
of the Unicode Core Standard,
except where explicitly overridden by CSS.
UAs that use a different encoding are not explicitly supported by the CSS specifications;
they are, however, expected to fulfill the same text handling requirements
by assuming an appropriate mapping between that encoding and Unicode.
text-encoding/shaping-join-001.html
text-encoding/shaping-join-002.html
text-encoding/shaping-join-003.html
text-encoding/shaping-no-join-001.html
text-encoding/shaping-no-join-002.html
text-encoding/shaping-no-join-003.html
text-encoding/shaping-tatweel-001.html
text-encoding/shaping-tatweel-002.html
text-encoding/shaping-tatweel-003.html
shaping/shaping-arabic-diacritics-001.html
shaping/shaping-arabic-diacritics-002.html
For the purpose of determining adjacency for text processing
(such as white space processing, text transformation, line-breaking, etc.),
and thus in general within this specification,
intervening [=inline box=] boundaries and [=out-of-flow=] elements
must be ignored.
With respect to text shaping, however, see [[#boundary-shaping]].
line-breaking/line-breaking-002.html
line-breaking/line-breaking-003.html
line-breaking/line-breaking-004.html
line-breaking/line-breaking-005.html
line-breaking/line-breaking-006.html
line-breaking/line-breaking-007.html
line-breaking/line-breaking-008.html
white-space/seg-break-transformation-018.html
white-space/seg-break-transformation-019.html
word-break/word-break-min-content-002.html
word-break/word-break-min-content-003.html
word-break/word-break-min-content-004.html
word-break/word-break-min-content-005.html
overflow-wrap/overflow-wrap-anywhere-010.html
white-space-processing-048.xht
white-space-processing-049.xht
This property transforms text for styling purposes.
It has no effect on the underlying content,
and must not affect the content of a plain text copy & paste operation.
Note: The 'text-transform' property only affects the presentation layer;
correct casing for semantic purposes is expected to be represented
in the source document.
text-transform/text-transform-copy-paste-001-manual.html
Values have the following meanings:
none
No effects.
text-transform/text-transform-none-001.xht
text-transform-004.xht
capitalize
Puts the first typographic letter unit of each word, if lowercase, in titlecase;
other characters are unaffected.
text-transform/text-transform-capitalize-001.html
text-transform/text-transform-capitalize-003.html
text-transform/text-transform-capitalize-005.html
text-transform/text-transform-capitalize-007.html
text-transform/text-transform-capitalize-009.html
text-transform/text-transform-capitalize-010.html
text-transform/text-transform-capitalize-011.html
text-transform/text-transform-capitalize-014.html
text-transform/text-transform-capitalize-016.html
text-transform/text-transform-capitalize-018.html
text-transform/text-transform-capitalize-020.html
text-transform/text-transform-capitalize-022.html
text-transform/text-transform-capitalize-024.html
text-transform/text-transform-capitalize-026.html
text-transform/text-transform-capitalize-028.html
text-transform/text-transform-capitalize-030.html
text-transform/text-transform-capitalize-031.html
text-transform/text-transform-capitalize-032.xht
text-transform-001.xht
text-transform-cap-001.xht
text-transform-cap-002.xht
text-transform-cap-003.xht
Puts all typographic character units in fullwidth form.
If a character does not have a corresponding fullwidth form,
it is left as is.
This value is typically used to typeset Latin letters and digits
as if they were ideographic characters.
text-transform/text-transform-fullwidth-001.xht
text-transform/text-transform-fullwidth-002.xht
text-transform/text-transform-fullwidth-004.xht
text-transform/text-transform-fullwidth-005.xht
full-size-kana
Converts all [=small Kana=] characters to the equivalent [=full-size Kana=].
This value is typically used for ruby annotation text,
where authors may want all small Kana to be drawn as large Kana
to compensate for legibility issues at the small font sizes typically used in ruby.
text-transform/text-transform-full-size-kana-001.html
text-transform/text-transform-full-size-kana-002.html
text-transform/text-transform-full-size-kana-003.html
text-transform/text-transform-full-size-kana-004.html
text-transform/text-transform-full-size-kana-005.html
text-transform/text-transform-full-size-kana-006.html
text-transform/text-transform-full-size-kana-007.html
The following example converts the ASCII characters
used in abbreviations in Japanese text to their fullwidth variants
so that they lay out and line break like ideographs:
abbr:lang(ja) { text-transform: full-width; }
Note: The purpose of 'text-transform' is
to allow for presentational casing transformations
without affecting the semantics of the document.
Note in particular that 'text-transform' casing operations are lossy,
and can distort the meaning of a text.
While accessiblity interfaces may wish to convey
the apparent casing of the rendered text to the user,
the transformed text cannot be relied on accurately represent
the underlying meaning of the document.
In this example,
the first line of text is capitalized as a visual effect.
This effect cannot be written into the source document
because the position of the line break depends on layout.
But also, the capitalization is not reflecting a semantic distinction
and is not intended to affect the paragraph’s reading;
therefore it belongs in the presentation layer.
In this example,
the [=ruby=] annotations,
which are half the size of the main paragraph text,
are transformed to use regular-size kana
in place of [=small kana=].
Note that while this makes such letters easier to see at small type sizes,
the transformation distorts the text:
the reader needs to mentally substitute [=small kana=]
in the appropriate places--
not unlike reading a text in English
with all “s” characters substituted by “f”.
Mapping Rules
For ''capitalize'', what constitutes a “word“ is UA-dependent;
[[!UAX29]] is suggested (but not required)
for determining such word boundaries.
Authors should not expect ''capitalize'' to follow
language-specific titlecasing conventions
(such as skipping articles in English).
Out-of-flow elements and inline element boundaries
must not introduce a 'text-transform' word boundary
and must be ignored when determining such word boundaries.
text-transform/text-transform-capitalize-033.html
The UA must use the full case mappings for Unicode
characters, including any conditional casing rules, as defined in
Default Case Algorithm section of The Unicode Standard [[!UNICODE]].
If (and only if) the content language
of the element is, according to the rules of the
document language,
known,
then any appropriate language-specific rules must be applied as well.
These minimally include, but are not limited to, the language-specific
rules in Unicode's
SpecialCasing.txt.
writing-system/writing-system-text-transform-001.html
text-transform/text-transform-tailoring-001.html
text-transform/text-transform-tailoring-002.html
text-transform/text-transform-tailoring-002a.html
text-transform/text-transform-tailoring-003.html
text-transform/text-transform-tailoring-004.html
text-transform/text-transform-tailoring-005.html
text-transform/text-transform-upperlower-032.html
text-transform/text-transform-upperlower-033.html
text-transform/text-transform-upperlower-034.html
text-transform/text-transform-upperlower-035.html
text-transform/text-transform-upperlower-038.html
text-transform/text-transform-upperlower-039.html
text-transform/text-transform-upperlower-040.html
text-transform/text-transform-upperlower-041.html
text-transform/text-transform-upperlower-042.html
text-transform/text-transform-upperlower-043.html
For example, in Turkish there are two “i”s, one with
a dot—“İ” and “i”— and one
without—“I” and “ı”. Thus the usual
case mappings between “I” and “i” are
replaced with a different set of mappings to their respective
undotted/dotted counterparts, which do not exist in English. This
mapping must only take effect if the content language is Turkish
written in its modern Latin-based writing system
(or another Turkic language that uses Turkish casing rules);
in other languages, the usual mapping of “I”
and “i” is required. This rule is thus conditionally
defined in Unicode's SpecialCasing.txt file.
The definition of fullwidth and halfwidth forms can be found on the
Unicode consortium web site at [[!UAX11]].
The mapping to fullwidth form is defined by taking code points with
the <wide> or the <narrow> tag
in their Decomposition_Mapping in [[!UAX44]].
For the <narrow> tag,
the mapping is from the code point to the decomposition (minus <narrow> tag),
and for the <wide> tag,
the mapping is from the decomposition (minus the <wide> tag)
back to the original code point.
text-transform/text-transform-fullwidth-001.xht
text-transform/text-transform-fullwidth-002.xht
text-transform/text-transform-fullwidth-004.xht
text-transform/text-transform-fullwidth-005.xht
text-transform/text-transform-fullwidth-006.html
text-transform/text-transform-fullwidth-007.html
The mappings for small Kana to full-size Kana are defined in [[#small-kana]].
Order of Operations
When multiple values are specified and therefore multiple transformations need to be applied,
they are applied in the following order:
''capitalize'', ''uppercase'', and ''lowercase''
''full-width''
''full-size-kana''
text-transform/text-transform-multiple-001.html
Text transformation happens after [[#white-space-phase-1]]
but before [[#white-space-phase-2]].
This means that ''full-width'' only transforms
U+0020 spaces to U+3000 within preserved [=white space=].
text-transform/text-transform-fullwidth-006.html
text-transform/text-transform-fullwidth-007.html
text-transform/text-transform-fullwidth-008.html
text-transform/text-transform-fullwidth-009.html
This value directs user agents to collapse sequences of white space
into a single character (or in some
cases, no character).
Lines may wrap at allowed soft wrap opportunities,
as determined by the line-breaking rules in effect,
in order to minimize inline-axis overflow.
white-space/white-space-normal-011.html
white-space/white-space-pre-031.html
white-space/white-space-wrap-after-nowrap-001.html
line-break/line-break-anywhere-and-white-space-004.html
line-break/line-break-anywhere-and-white-space-005.html
white-space-normal-001.xht
white-space-normal-002.xht
white-space-normal-003.xht
white-space-normal-004.xht
white-space-normal-005.xht
white-space-normal-006.xht
white-space-normal-007.xht
white-space-normal-008.xht
white-space-normal-009.xht
white-space-p-element-001.xht
pre
This value prevents user agents from collapsing sequences of white space.
Segment breaks such as line feeds
are preserved as forced line breaks.
Lines only break at forced line breaks;
content that does not fit within the block container overflows it.
white-space-pre-001.xht
white-space-pre-002.xht
white-space-pre-005.xht
white-space-pre-006.xht
white-space-pre-element-001.xht
white-space/white-space-pre-031.html
white-space/white-space-pre-032.html
white-space/white-space-pre-034.html
white-space/white-space-pre-035.html
white-space/white-space-pre-011.html
white-space/white-space-pre-051.html
white-space/white-space-pre-052.html
line-break/line-break-anywhere-and-white-space-001.html
line-break/line-break-anywhere-and-white-space-003.html
white-space/white-space-zero-fontsize-001.html
white-space/white-space-zero-fontsize-002.html
white-space/white-space-intrinsic-size-015.html
white-space/white-space-intrinsic-size-016.html
white-space/white-space-intrinsic-size-018.html
nowrap
Like ''white-space/normal'', this value collapses white space;
but like ''pre'', it does not allow wrapping.
white-space/white-space-nowrap-011.html
white-space/white-space-wrap-after-nowrap-001.html
line-break/line-break-anywhere-and-white-space-002.html
overflow-wrap/overflow-wrap-anywhere-008.html
white-space-nowrap-001.xht
white-space-nowrap-005.xht
white-space-nowrap-006.xht
text-align-white-space-004.xht
text-align-white-space-008.xht
white-space-nowrap-attribute-001.xht
white-space-processing-006.xht
The behavior is identical to that of ''white-space/pre-wrap'',
except that:
* Any sequence of preserved [=white space=]
or [=other space separators=]
always takes up space,
including at the end of the line.
* A line breaking opportunity exists after every preservedwhite space character
and after every [=other space separator=]
(including between adjacent spaces).
overflow-wrap/overflow-wrap-break-word-002.html
overflow-wrap/overflow-wrap-anywhere-002.html
white-space/break-spaces-001.html
white-space/break-spaces-002.html
white-space/break-spaces-003.html
white-space/break-spaces-004.html
white-space/break-spaces-005.html
white-space/break-spaces-006.html
white-space/break-spaces-007.html
white-space/break-spaces-008.html
white-space/break-spaces-009.html
white-space/break-spaces-010.html
white-space/break-spaces-051.html
white-space/break-spaces-052.html
white-space/white-space-pre-034.html
white-space/textarea-break-spaces-001.html
white-space/textarea-break-spaces-002.html
white-space/break-spaces-before-first-char-001.html
white-space/break-spaces-before-first-char-002.html
white-space/break-spaces-before-first-char-003.html
white-space/break-spaces-before-first-char-004.html
white-space/break-spaces-before-first-char-005.html
white-space/break-spaces-before-first-char-006.html
white-space/break-spaces-before-first-char-007.html
white-space/break-spaces-before-first-char-008.html
white-space/break-spaces-before-first-char-009.html
white-space/break-spaces-before-first-char-010.html
white-space/break-spaces-before-first-char-011.html
white-space/break-spaces-before-first-char-012.html
white-space/break-spaces-before-first-char-013.html
white-space/break-spaces-before-first-char-014.html
white-space/break-spaces-before-first-char-015.html
white-space/break-spaces-before-first-char-016.html
white-space/break-spaces-before-first-char-017.html
white-space/break-spaces-before-first-char-018.html
white-space/tab-stop-threshold-005.html
white-space/tab-stop-threshold-006.html
word-break/word-break-break-all-017.html
white-space/break-spaces-tab-001.html
white-space/break-spaces-tab-002.html
white-space/break-spaces-tab-003.html
white-space/break-spaces-tab-004.html
white-space/break-spaces-tab-005.html
white-space/break-spaces-tab-006.html
white-space/break-spaces-with-overflow-wrap-001.html
white-space/break-spaces-with-overflow-wrap-002.html
white-space/break-spaces-with-overflow-wrap-003.html
white-space/break-spaces-with-overflow-wrap-004.html
white-space/break-spaces-with-overflow-wrap-005.html
white-space/break-spaces-with-overflow-wrap-006.html
white-space/break-spaces-with-overflow-wrap-007.html
white-space/break-spaces-with-overflow-wrap-008.html
white-space/break-spaces-with-overflow-wrap-009.html
white-space/break-spaces-with-overflow-wrap-010.html
line-break/line-break-anywhere-and-white-space-008.html
line-break/line-break-anywhere-and-white-space-009.html
white-space/ws-break-spaces-applies-to-001.html
white-space/ws-break-spaces-applies-to-002.html
white-space/ws-break-spaces-applies-to-003.html
white-space/ws-break-spaces-applies-to-005.html
white-space/ws-break-spaces-applies-to-006.html
white-space/ws-break-spaces-applies-to-007.html
white-space/ws-break-spaces-applies-to-008.html
white-space/ws-break-spaces-applies-to-009.html
white-space/ws-break-spaces-applies-to-010.html
white-space/ws-break-spaces-applies-to-011.html
white-space/ws-break-spaces-applies-to-012.html
white-space/ws-break-spaces-applies-to-013.html
white-space/ws-break-spaces-applies-to-014.html
white-space/ws-break-spaces-applies-to-015.html
white-space/break-spaces-newline-011.html
white-space/break-spaces-newline-012.html
white-space/break-spaces-newline-013.html
white-space/break-spaces-newline-014.html
white-space/break-spaces-newline-015.html
white-space/break-spaces-newline-016.html
Such [=preserved=] [=white space characters=]
and [=other space separators=]
take up space and do not [=hang=],
and thus affect the box's intrinsic sizes
([=min-content size=] and [=max-content size=]).
white-space/white-space-intrinsic-size-001.html
white-space/white-space-intrinsic-size-002.html
Note: This value does not guarantee that there will never be any overflow due to white space:
for example, if the line length is so short that even a single white space character does not fit,
overflow is unavoidable.
pre-line
Like ''white-space/normal'', this value collapses consecutive [=white space characters=] and allows wrapping,
but preserves segment breaks in the source as forced line breaks.
white-space-005.xht
white-space-generated-content-before-001.xht
white-space-processing-004.xht
white-space-processing-007.xht
white-space-processing-010.xht
white-space-processing-017.xht
white-space-processing-021.xht
white-space-processing-024.xht
white-space-processing-027.xht
white-space-processing-028.xht
white-space-processing-029.xht
white-space-processing-030.xht
white-space-processing-035.xht
white-space-processing-036.xht
white-space-processing-045.xht
white-space-processing-053.xht
text-align-white-space-002.xht
text-align-white-space-006.xht
white-space/pre-line-051.html
white-space/pre-line-052.html
white-space/white-space-pre-035.html
white-space/white-space-intrinsic-size-019.html
white-space/white-space-intrinsic-size-020.html
[=White space=] that was not removed or collapsed due to white space processing
is called preserved white space.
Note: In some cases,
[=preserved white space=] and [=other space separators=]
can hang when at the end of the line;
this can affect whether they are measured for [=intrinsic sizing=].
The following informative table summarizes the behavior of various
'white-space' values:
New Lines
Spaces and Tabs
Text Wrapping
End-of-line [=spaces=]
End-of-line [=other space separators=]
''white-space/normal''
Collapse
Collapse
Wrap
Remove
Hang
''pre''
Preserve
Preserve
No wrap
Preserve
No wrap
''nowrap''
Collapse
Collapse
No wrap
Remove
Hang
''pre-wrap''
Preserve
Preserve
Wrap
Hang
Hang
''break-spaces''
Preserve
Preserve
Wrap
Wrap
Wrap
''pre-line''
Preserve
Collapse
Wrap
Remove
Hang
See White Space Processing Rules
for details on how [=white space=] collapses. An informative summary of
collapsing (''white-space/normal'' and ''nowrap'') is presented below:
A sequence of segment breaks and other white space between two
Chinese, Japanese, or Yi characters collapses into nothing.
A zero width space before or after a [=white space=] sequence
containing a segment break causes the entire sequence of white space
to collapse into a zero width space.
Otherwise, consecutive white space collapses into a single [=space=].
The source text of a document often contains formatting
that is not relevant to the final rendering: for example,
breaking the source into segments
(lines) for ease of editing
or adding [=white space characters=] such as [=tabs=] and [=spaces=] to indent the source code.
CSS white space processing allows the author to control interpretation of such formatting:
to preserve or collapse it away when rendering the document.
White space processing in CSS interprets [=white space characters=] only for rendering:
it has no effect on the underlying document data.
White space processing in CSS is controlled with the 'white-space' property.
CSS does not define document segmentation rules. Segments can be
separated by a particular newline sequence (such as a line feed or
CRLF pair), or delimited by some other mechanism, such as the SGML
RECORD-START and RECORD-END tokens.
For CSS processing, each document language–defined segment break
and each line feed (U+000A)
in the text is treated as a segment break,
which is then interpreted for rendering as specified by the 'white-space' property.
white-space-processing-005.xht
white-space-processing-006.xht
white-space-processing-007.xht
Note: A document parser might
not only normalize any segment breaks,
but also collapse other space characters or
otherwise process white space according to markup rules.
Because CSS processing occurs after the parsing stage,
it is not possible to restore these characters for styling.
Therefore, some of the behavior specified below
can be affected by these limitations and
may be user agent dependent.
Note: Anonymous blocks consisting entirely of
collapsiblewhite space are removed from the rendering tree.
Thus any such white space surrounding a block-level element is collapsed away.
See [[CSS2]] section
9.2.2.1
Control characters (Unicode categoryCc)--
other than tabs (U+0009),
line feeds (U+000A),
carriage returns (U+000D)
and sequences that form a segment break--
must be rendered as a visible glyph
which the UA must synthethize if the glyphs found in the font are not visible,
and must be otherwise treated as any other character
of the Other Symbols (So) general category and Common script.
The UA may use a glyph provided by a font specifically for the control character,
substitute the glyphs provided for the corresponding symbol in the Control Pictures block,
generate a visual representation of its code point value,
or use some other method to provide an appropriate visible glyph.
As required by [[!UNICODE]],
unsupported Default_ignorable characters
must be ignored for text rendering.
white-space/control-chars-000.html
white-space/control-chars-001.html
white-space/control-chars-002.html
white-space/control-chars-003.html
white-space/control-chars-004.html
white-space/control-chars-005.html
white-space/control-chars-006.html
white-space/control-chars-007.html
white-space/control-chars-008.html
white-space/control-chars-00B.html
white-space/control-chars-00C.html
white-space/control-chars-00D.html
white-space/control-chars-00E.html
white-space/control-chars-00F.html
white-space/control-chars-010.html
white-space/control-chars-011.html
white-space/control-chars-012.html
white-space/control-chars-013.html
white-space/control-chars-014.html
white-space/control-chars-015.html
white-space/control-chars-016.html
white-space/control-chars-017.html
white-space/control-chars-018.html
white-space/control-chars-019.html
white-space/control-chars-01A.html
white-space/control-chars-01B.html
white-space/control-chars-01C.html
white-space/control-chars-01D.html
white-space/control-chars-01E.html
white-space/control-chars-01F.html
white-space/control-chars-07F.html
white-space/control-chars-080.html
white-space/control-chars-081.html
white-space/control-chars-082.html
white-space/control-chars-083.html
white-space/control-chars-084.html
white-space/control-chars-085.html
white-space/control-chars-086.html
white-space/control-chars-087.html
white-space/control-chars-088.html
white-space/control-chars-089.html
white-space/control-chars-08A.html
white-space/control-chars-08B.html
white-space/control-chars-08C.html
white-space/control-chars-08D.html
white-space/control-chars-08E.html
white-space/control-chars-08F.html
white-space/control-chars-090.html
white-space/control-chars-091.html
white-space/control-chars-092.html
white-space/control-chars-093.html
white-space/control-chars-094.html
white-space/control-chars-095.html
white-space/control-chars-096.html
white-space/control-chars-097.html
white-space/control-chars-098.html
white-space/control-chars-099.html
white-space/control-chars-09A.html
white-space/control-chars-09B.html
white-space/control-chars-09C.html
white-space/control-chars-09D.html
white-space/control-chars-09E.html
white-space/control-chars-09F.html
Carriage returns (U+000D) are treated identically to spaces (U+0020) in all respects.
white-space/control-chars-00D.html
white-space-processing-005.xht
white-space-processing-006.xht
white-space-processing-007.xht
Note: For HTML documents,
carriage returns present in the source code
are converted to line feeds at the parsing stage
(see [[HTML#preprocessing-the-input-stream]]
and the definition of [=normalize newlines=] in [[INFRA]])
and therefore do no appear as U+000D to CSS.
However, the character is preserved--
and the above rule observable--
when encoded using an escape sequence (
).
The White Space Processing Rules
Except where specified otherwise,
White space processing in CSS affects only
the document white space characters:
spaces (U+0020), tabs (U+0009), and segment breaks.
white-space-normal-003.xht
white-space-normal-004.xht
white-space-normal-005.xht
white-space-normal-006.xht
white-space-normal-007.xht
white-space-normal-008.xht
white-space-nowrap-005.xht
white-space-nowrap-006.xht
white-space-pre-005.xht
white-space-pre-006.xht
white-space-processing-054.xht
white-space-processing-055.xht
white-space-processing-056.xht
Note: The set of characters considered document white space (part of the document content)
and those considered syntactic white space (part of the CSS syntax)
are not necessarily identical.
However, since both include spaces (U+0020), tabs (U+0009), and line feeds (U+000A)
most authors won't notice any differences.
Besides
Space (U+0020)
and No-Break Space (U+00A0),
Unicode [[UNICODE]] defines a number of additional space separator characters.
In this specification
all characters in the Unicode Zs category (See [[!UAX44]])
except Space (U+0020)
and No-Break Space (U+00A0)
are collectively referred to as
other space separators.
white-space/trailing-other-space-separators-001.html
white-space/trailing-other-space-separators-002.html
white-space/trailing-other-space-separators-003.html
white-space/trailing-other-space-separators-004.html
white-space/trailing-other-space-separators-break-spaces-001.html
white-space/trailing-other-space-separators-break-spaces-002.html
white-space/trailing-other-space-separators-break-spaces-003.html
white-space/trailing-other-space-separators-break-spaces-004.html
white-space/trailing-other-space-separators-break-spaces-005.html
white-space/trailing-other-space-separators-break-spaces-006.html
white-space/trailing-other-space-separators-break-spaces-007.html
white-space/trailing-other-space-separators-break-spaces-008.html
white-space/trailing-other-space-separators-break-spaces-009.html
white-space/trailing-other-space-separators-break-spaces-010.html
white-space/trailing-other-space-separators-break-spaces-011.html
white-space/trailing-other-space-separators-break-spaces-012.html
white-space/trailing-other-space-separators-break-spaces-013.html
white-space/trailing-other-space-separators-break-spaces-014.html
white-space/trailing-other-space-separators-break-spaces-015.html
Phase I: Collapsing and Transformation
For each inline (including anonymous inlines;
see [[CSS2]] section 9.2.2.1)
within an [=inline formatting context=],
[=white space characters=] are processed as follows
prior to [=line breaking=] and bidi reordering,
ignoring bidi formatting characters
(characters with the Bidi_Control property [[!UAX9]])
as if they were not there:
white-space/white-space-collapse-002.html
white-space/white-space-normal-011.html
white-space/white-space-nowrap-011.html
If 'white-space' is set to
''white-space/normal'', ''nowrap'', or ''pre-line'',
[=white space characters=] are considered collapsible
and are processed by performing the following steps:
Any sequence of collapsible [=spaces=] and [=tabs=]
immediately preceding or following a segment break
is removed.
white-space/seg-break-transformation-000.html
white-space/white-space-normal-011.html
white-space/white-space-nowrap-011.html
white-space-processing-002.xht
white-space-processing-003.xht
white-space-processing-004.xht
white-space-processing-008.xht
white-space-processing-009.xht
white-space-processing-010.xht
white-space-processing-005.xht
white-space-processing-006.xht
white-space-processing-007.xht
Every [=collapsible=] [=tab=] is converted to a collapsible space (U+0020).
white-space/white-space-collapse-000.html
white-space/white-space-normal-011.html
white-space/white-space-nowrap-011.html
white-space-processing-019.xht
white-space-processing-020.xht
white-space-processing-021.xht
Any [=collapsible=] [=space=] immediately following another [=collapsible=] [=space=]—even
one outside the boundary of the inline containing that [=space=],
provided both [=spaces=] are within the same inline formatting
context—is collapsed to have zero advance width. (It is
invisible, but retains its soft wrap opportunity, if any.)
white-space/white-space-collapse-001.html
white-space/white-space-empty-text-sibling.html
white-space/white-space-normal-011.html
white-space/white-space-nowrap-011.html
white-space-001.xht
white-space-003.xht
white-space-005.xht
white-space-collapsing-001.xht
white-space-collapsing-002.xht
white-space-collapsing-004.xht
white-space-collapsing-005.xht
white-space-collapsing-breaks-001.xht
white-space-mixed-001.xht
white-space-mixed-002.xht
white-space-normal-001.xht
white-space-processing-001.xht
white-space-processing-022.xht
white-space-processing-023.xht
white-space-processing-024.xht
white-space-processing-025.xht
white-space-processing-026.xht
white-space-processing-027.xht
white-space-processing-028.xht
white-space-processing-029.xht
white-space-processing-030.xht
white-space-processing-031.xht
white-space-processing-032.xht
white-space-processing-033.xht
white-space-processing-034.xht
white-space-processing-035.xht
white-space-processing-036.xht
white-space-processing-050.xht
white-space-processing-051.xht
white-space-processing-053.xht
If 'white-space' is set to ''pre'', ''pre-wrap'', or ''break-spaces'',
any sequence of spaces is treated as a sequence of non-breaking spaces.
However, for ''pre-wrap'',
a soft wrap opportunity exists at the end of a sequence of [=spaces=] and/or [=tabs=],
while for ''break-spaces'',
a soft wrap opportunity exists after every [=space=] and every [=tab=].
white-space/white-space-pre-011.html
overflow-wrap/overflow-wrap-break-word-004.html
overflow-wrap/overflow-wrap-break-word-005.html
overflow-wrap/overflow-wrap-break-word-006.html
overflow-wrap/overflow-wrap-break-word-007.html
overflow-wrap/overflow-wrap-anywhere-004.html
overflow-wrap/overflow-wrap-anywhere-005.html
word-break/word-break-break-all-011.html
word-break/word-break-break-all-012.html
word-break/word-break-break-all-013.html
word-break/word-break-break-all-015.html
white-space/break-spaces-006.html
white-space/break-spaces-007.html
white-space/break-spaces-008.html
white-space/break-spaces-009.html
white-space/break-spaces-010.html
white-space/break-spaces-before-first-char-001.html
white-space/break-spaces-before-first-char-002.html
white-space/break-spaces-before-first-char-003.html
white-space/break-spaces-before-first-char-004.html
white-space/break-spaces-before-first-char-005.html
white-space/break-spaces-before-first-char-006.html
white-space/break-spaces-before-first-char-011.html
white-space/pre-wrap-008.html
white-space/pre-wrap-015.html
white-space/pre-wrap-016.html
white-space/pre-wrap-leading-spaces-001.html
white-space/pre-wrap-leading-spaces-002.html
white-space/pre-wrap-leading-spaces-003.html
white-space/pre-wrap-leading-spaces-004.html
white-space/pre-wrap-leading-spaces-005.html
white-space/pre-wrap-leading-spaces-006.html
white-space/pre-wrap-leading-spaces-007.html
white-space/pre-wrap-leading-spaces-008.html
white-space/pre-wrap-leading-spaces-009.html
white-space/pre-wrap-leading-spaces-010.html
white-space/pre-wrap-leading-spaces-011.html
white-space/pre-wrap-leading-spaces-012.html
white-space/pre-wrap-leading-spaces-013.html
white-space/pre-wrap-leading-spaces-014.html
white-space/break-spaces-tab-001.html
white-space/break-spaces-tab-002.html
white-space/break-spaces-tab-003.html
white-space/break-spaces-tab-004.html
white-space/pre-wrap-tab-001.html
white-space/pre-wrap-tab-002.html
white-space/pre-wrap-tab-003.html
white-space/pre-wrap-tab-004.html
white-space-002.xht
white-space-004.xht
white-space-processing-011.xht
white-space-processing-012.xht
white-space-processing-013.xht
white-space-processing-052.xht
The following example illustrates
the interaction of white-space collapsing and bidirectionality.
Consider the following markup fragment, taking special note of [=spaces=]
(with varied backgrounds and borders for emphasis and identification):
<ltr>A<rtl>B</rtl>C</ltr>
where the <ltr> element represents a left-to-right embedding
and the <rtl> element represents a right-to-left embedding.
If the 'white-space' property is set to ''white-space/normal'',
the white-space processing model will result in the following:
The [=space=] before the B ()
will collapse with the [=space=] after the A ().
The [=space=] before the C ()
will collapse with the [=space=] after the B ().
This will leave two [=spaces=],
one after the A in the left-to-right embedding level,
and one after the B in the right-to-left embedding level.
The text will then be ordered according to the Unicode bidirectional algorithm,
with the end result being:
ABC
Note that there will be two [=spaces=] between A and B,
and none between B and C.
This is best avoided by putting [=spaces=] outside the element
instead of just inside the opening and closing tags and, where practical,
by relying on implicit bidirectionality instead of explicit embedding levels.
white-space-bidirectionality-001.xht
Phase II: Trimming and Positioning
Then, the entire block is rendered.
Inlines are laid out,
taking bidi reordering into account,
and wrapping as specified by the 'white-space' property.
As each line is laid out,
A sequence of [=collapsible=] [=spaces=] at the beginning of a line
is removed.
white-space/line-edge-white-space-collapse-002.html
white-space/break-spaces-051.html
white-space/break-spaces-052.html
white-space/pre-line-051.html
white-space/pre-line-052.html
white-space/pre-wrap-051.html
white-space/pre-wrap-052.html
white-space/white-space-pre-051.html
white-space/white-space-pre-052.html
white-space-collapsing-003.xht
white-space-collapsing-bidi-003.xht
white-space-normal-001.xht
white-space-normal-002.xht
white-space-processing-037.xht
white-space-processing-038.xht
white-space-processing-039.xht
white-space-processing-040.xht
white-space-processing-041.xht
If the [=tab size=] is zero, [=preserved=] [=tabs=] are not rendered.
Otherwise, each [=preserved=] [=tab=] is rendered as a horizontal shift
that lines up the start edge of the next glyph with the next tab stop.
If this distance is less than 0.5ch,
then the subsequent tab stop is used instead.
Tab stops occur at points that are multiples of the [=tab size=]
from the starting content edge
of the [=preserved=] [=tab=]'s nearest [=block container=] ancestor.
The [=tab size=] is given by the 'tab-size' property.
white-space/tab-stop-threshold-001.html
white-space/tab-stop-threshold-002.html
white-space/tab-stop-threshold-003.html
white-space/tab-stop-threshold-004.html
white-space/tab-stop-threshold-005.html
white-space/tab-stop-threshold-006.html
white-space/white-space-pre-011.html
tab-size/tab-min-rendered-width-1.html
tab-size/tab-size-integer-003.html
text-indent/text-indent-tab-positions-001.html
tab-size/tab-size-inline-002.html
white-space-processing-042.xht
Note: See [[UAX9]] for rules on how U+0009 tabulation interacts with bidi.
A sequence at the end of a line
of collapsible [=spaces=]
is removed,
as well as any trailing U+1680 OGHAM SPACE MARK
whose 'white-space' property is ''white-space/normal'', ''nowrap'', or ''pre-line''.
white-space/line-edge-white-space-collapse-001.html
white-space/pre-float-001.html
white-space/pre-wrap-float-001.html
white-space/trailing-space-before-br-001.html
white-space/trailing-ogham-001.html
white-space/trailing-ogham-002.html
white-space/trailing-ogham-003.html
white-space-collapsing-003.xht
white-space-collapsing-bidi-003.xht
white-space-normal-001.xht
white-space-normal-002.xht
white-space-processing-005.xht
white-space-processing-043.xht
white-space-processing-044.xht
white-space-processing-045.xht
white-space-processing-046.xht
white-space-processing-047.xht
In the case of bidirectional text,
any sequence of [=collapsible=] [=spaces=] located at the end of the line
prior to bidi reordering [[CSS-WRITING-MODES-3]]
is also removed,
and bidi reordering is applied on the remaining content of the line.
white-space/eol-spaces-bidi-001.html
white-space/trailing-space-rtl-001.html
If there remains any sequence of white space,
and/or [=other space separators=],
at the end of a line (after bidi reordering [[CSS-WRITING-MODES-3]]):
If 'white-space' is set to ''white-space/normal'', ''white-space/nowrap'', or ''white-space/pre-line'',
the UA must [=hang=] this sequence (unconditionally).
white-space/trailing-ideographic-space-001.html
white-space/trailing-ideographic-space-002.html
white-space/trailing-other-space-separators-001.html
white-space/trailing-other-space-separators-003.html
white-space/trailing-other-space-separators-004.html
text-transform/text-transform-fullwidth-008.html
white-space/white-space-intrinsic-size-019.html
white-space/white-space-intrinsic-size-020.html
If 'white-space' is set to ''pre-wrap'',
the UA must (unconditionally) [=hang=] this sequence,
unless the sequence is followed by a [=forced line break=],
in which case it must [=conditionally hang=] the sequence is instead.
It may also visually collapse the character advance widths
of any that would otherwise overflow.
white-space/pre-wrap-001.html
white-space/pre-wrap-002.html
white-space/pre-wrap-003.html
white-space/pre-wrap-004.html
white-space/pre-wrap-005.html
white-space/pre-wrap-006.html
white-space/pre-wrap-007.html
white-space/pre-wrap-011.html
white-space/pre-wrap-012.html
white-space/pre-wrap-013.html
white-space/pre-wrap-014.html
white-space/pre-wrap-017.html
white-space/pre-wrap-018.html
white-space/pre-wrap-019.html
white-space/pre-wrap-020.html
white-space/textarea-pre-wrap-001.html
white-space/textarea-pre-wrap-002.html
white-space/textarea-pre-wrap-003.html
white-space/textarea-pre-wrap-004.html
white-space/textarea-pre-wrap-005.html
white-space/textarea-pre-wrap-006.html
white-space/textarea-pre-wrap-007.html
white-space/textarea-pre-wrap-011.html
white-space/textarea-pre-wrap-012.html
white-space/textarea-pre-wrap-013.html
white-space/textarea-pre-wrap-014.html
white-space/trailing-ideographic-space-003.html
white-space/trailing-ideographic-space-004.html
white-space/white-space-pre-wrap-trailing-spaces-001.html
white-space/white-space-pre-wrap-trailing-spaces-002.html
white-space/white-space-pre-wrap-trailing-spaces-003.html
white-space/white-space-pre-wrap-trailing-spaces-004.html
white-space/white-space-pre-wrap-trailing-spaces-005.html
white-space/white-space-pre-wrap-trailing-spaces-006.html
white-space/white-space-pre-wrap-trailing-spaces-007.html
white-space/white-space-pre-wrap-trailing-spaces-008.html
white-space/white-space-pre-wrap-trailing-spaces-010.html
white-space/white-space-pre-wrap-trailing-spaces-011.html
white-space/white-space-pre-wrap-trailing-spaces-012.html
white-space/white-space-pre-wrap-trailing-spaces-013.html
white-space/white-space-pre-wrap-trailing-spaces-014.html
white-space/white-space-pre-wrap-trailing-spaces-015.html
white-space/white-space-intrinsic-size-003.html
white-space/white-space-intrinsic-size-004.html
white-space/pre-wrap-tab-005.html
white-space/pre-wrap-tab-006.html
white-space/trailing-other-space-separators-002.html
text-transform/text-transform-fullwidth-009.html
white-space/trailing-space-position-001.html
line-break/line-break-anywhere-and-white-space-006.html
line-break/line-break-anywhere-and-white-space-007.html
white-space/white-space-intrinsic-size-013.html
white-space/white-space-intrinsic-size-014.html
white-space/white-space-intrinsic-size-017.html
white-space/eol-spaces-bidi-002.html
white-space/trailing-space-in-inline-box.html
Note: [=Hanging=] the white space rather than collapsing it
allows users to see the space when selecting or editing text.
If 'white-space' is set to ''break-spaces'',
[=hanging=] or collapsing the advance width of the [=spaces=],
[=tabs=],
or [=other space separators=]
at the end of the line is not allowed;
those that overflow must wrap to the next line.
white-space/break-spaces-001.html
white-space/break-spaces-005.html
white-space/textarea-break-spaces-001.html
white-space/break-spaces-tab-005.html
white-space/break-spaces-tab-006.html
white-space/trailing-other-space-separators-break-spaces-001.html
white-space/trailing-other-space-separators-break-spaces-002.html
white-space/trailing-other-space-separators-break-spaces-003.html
white-space/trailing-other-space-separators-break-spaces-004.html
white-space/trailing-other-space-separators-break-spaces-005.html
white-space/trailing-other-space-separators-break-spaces-006.html
white-space/trailing-other-space-separators-break-spaces-007.html
white-space/trailing-other-space-separators-break-spaces-008.html
white-space/trailing-other-space-separators-break-spaces-009.html
white-space/trailing-other-space-separators-break-spaces-010.html
white-space/trailing-other-space-separators-break-spaces-011.html
white-space/trailing-other-space-separators-break-spaces-012.html
white-space/trailing-other-space-separators-break-spaces-013.html
white-space/trailing-other-space-separators-break-spaces-014.html
white-space/trailing-other-space-separators-break-spaces-015.html
line-break/line-break-anywhere-and-white-space-008.html
line-break/line-break-anywhere-and-white-space-009.html
This example shows that [=conditionally hanging=] white space
at the end of lines with forced breaks
provides symmetry with the start of the line.
An underline is added to help visualize the spaces.
Since the final [=space=] is before a forced line break
and does not overflow,
it does not hang,
and centering works as expected.
This example illustrates the difference
between [=hanging=] [=spaces=]
at the end of lines without forced breaks,
and [=conditionally hanging=] them at the end of lines with forced breaks.
An underline is added to help visualize the [=spaces=].
If p { text-align: right; } was added,
the result would be as follows:
0 0 0 0
As the [=preserved=] [=spaces=] at the end of lines without a forced break must [=hang=],
they are not considered when placing the rest of the line during text alignment.
When aligning towards the end,
this means any such [=spaces=] will overflow,
and will not prevent the rest of the line's content from being flush with the edge of the line.
On the other hand,
preserved spaces at the end of a line with a forced break
[=conditionally hang=].
Since the space at the end of the last line would not overflow in this example,
it does not [=hang=]
and therefore is considered during text alignment.
In the following example,
there is not enough room on any line to fit the end-of-line spaces,
so they [=hang=] on all lines:
the one on the line without a forced break because it must,
as well as the one on the line with a forced break,
because it [=conditionally hangs=] and overflows.
An underline is added to help visualize the spaces.
The last line is not wrapped before the last 0
because characters that [=conditionally hang=] are not considered
when measuring the line’s contents for fit.
Segment Break Transformation Rules
When 'white-space' is ''pre'', ''pre-wrap'', ''break-spaces'', or ''pre-line'',
segment breaks are not collapsible
and are instead transformed into a preserved line feed (U+000A).
white-space-008.xht
white-space-processing-016.xht
white-space-processing-017.xht
white-space-processing-018.xht
For other values of 'white-space', segment breaks are collapsible.
Any collapsible segment break immediately following another collapsible segment break
is removed.
Then any remaining segment break is
either transformed into a space (U+0020) or removed
depending on the context before and after the break:
segment-break-transformation-removable-2.html
segment-break-transformation-removable-4.html
white-space/seg-break-transformation-000.html
If the character immediately before or immediately after the [=segment break=]
is the zero-width space character (U+200B), then the break
is removed, leaving behind the zero-width space.
white-space/seg-break-transformation-016.html
white-space/seg-break-transformation-017.html
white-space/seg-break-transformation-018.html
white-space/seg-break-transformation-019.html
segment-break-transformation-removable-1.html
segment-break-transformation-removable-3.html
Otherwise, if both the characters before and after the [=segment break=]
belong to the [=space-discarding character set=] (see [[#space-discard-set]]),
then the [=segment break=] is removed.
Otherwise, the [=segment break=] is converted to a space (U+0020).
white-space/seg-break-transformation-000.html
white-space-processing-005.xht
white-space-processing-014.xht
white-space-processing-015.xht
Note: The white space processing rules have already
removed any [=tabs=] and [=spaces=] around the [=segment break=]
before these checks take place.
The purpose of the segment break transformation rules
(and white space collapsing in general)
is to “unbreak” text that has been
broken into segments
to make the document source code easier to work with.
In languages that use word separators, such as English and Korean,
“unbreaking” a line requires joining the two lines with a [=space=].
Here is an English paragraph
that is broken into multiple lines
in the source code so that it can
more easily read in a text editor.
Here is an English paragraph that is broken into multiple lines in the source code so that it can be more easily read in a text editor.
Eliminating a line break in English requires maintaining a [=space=] in its place.
In languages that have no word separators, such as Chinese,
“unbreaking” a line requires joining the two lines with no intervening space.
這個段落是那麼長,
在一行寫不行。最好
用三行寫。
這個段落是那麼長,在一行寫不行。最好用三行寫。
Eliminating a line break in Chinese requires eliminating any intervening [=white space=].
The segment break transformation rules thus use adjacent context
to either transform the segment break into a space
or eliminate it entirely.
Comments on how well these rules would work in practice would
be very much appreciated, particularly from people who work with
Thai and similar scripts.
Note that browser implementations do not currently follow these rules consistently
(although IE does in some cases transform the break,
and Firefox follows the first two bullet points).
ISSUE(5086): Should space-discarding punctuation have a stronger influence over mismatched before/after contexts?
ISSUE(5017): Should we classify punctuation and/or symbols as a category of space-ambiguous characters? (Currently spaces are discarded only if both sides are space-discarding; ambiguous characters would defer to the other side.)
Tab Character Size: the 'tab-size' property
Name: tab-size
Value: <> | <>
Initial: 8
Applies to: inline boxes
Inherited: yes
Computed value: the specified number or absolute length
Animation type: by computed value type
Canonical order: n/a
This property determines the tab size used to render [=preserved=] tab characters (U+0009).
A <> represents the measure as a multiple of the advance width of the space character (U+0020)
of the nearest [=block container=] ancestor of the [=preserved=] [=tab=],
including its associated 'letter-spacing' and 'word-spacing'.
Negative values are not allowed.
tab-size/tab-size-integer-001.html
tab-size/tab-size-integer-002.html
tab-size/tab-size-integer-003.html
tab-size/tab-size-integer-004.html
tab-size/tab-size-length-001.html
tab-size/tab-size-length-002.html
tab-size/tab-size-percent-001.html
tab-size/tab-min-rendered-width-1.html
tab-size/tab-size-spacing-001.html
white-space/tab-stop-threshold-001.html
white-space/tab-stop-threshold-002.html
white-space/tab-stop-threshold-003.html
white-space/tab-stop-threshold-004.html
white-space/tab-stop-threshold-005.html
white-space/tab-stop-threshold-006.html
text-indent/text-indent-tab-positions-001.html
white-space-processing-042.xht
Line Breaking and Word Boundaries
When inline-level content is laid out into lines, it is broken across line boxes.
Such a break is called a line break.
When a line is broken due to explicit line-breaking controls
(such as a preserved newline character),
or due to the start or end of a block,
it is a forced line break.
When a line is broken due to content wrapping
(i.e. when the UA creates unforced line breaks in order to fit the content within the measure),
it is a soft wrap break.
The process of breaking inline-level content into lines is called line breaking.
Wrapping is only performed at an allowed break point,
called a soft wrap opportunity.
When wrapping is enabled (see 'white-space'),
the UA must minimize the amount of content overflowing a line
by wrapping the line at a soft wrap opportunity,
if one exists.
line-breaking/line-breaking-020.html
In most writing systems,
in the absence of hyphenation a soft wrap opportunity occurs only at word boundaries.
Many such systems use [=spaces=] or punctuation to explicitly separate words,
and soft wrap opportunities can be identified by these characters.
Scripts such as Thai, Lao, and Khmer, however,
do not use spaces or punctuation to separate words.
Although the zero width space (U+200B) can be used as an explicit word delimiter in these scripts,
this practice is not common.
As a result, a lexical resource is needed to correctly identify soft wrap opportunities in such texts.
In some other writing systems,
soft wrap opportunities are based on orthographic syllable boundaries,
not word boundaries.
Some of these systems, such as Javanese and Balinese,
are similar to Thai and Lao in that they
require analysis of the text to find breaking opportunities.
In others such as Chinese (as well as Japanese, Yi, and sometimes also Korean),
each syllable tends to correspond to a single typographic letter unit,
and thus line breaking conventions allow the line to break
anywhere except between certain character combinations.
Additionally the level of strictness in these restrictions
varies with the typesetting style.
i18n/css3-text-line-break-baspglwj-003.html
i18n/css3-text-line-break-baspglwj-004.html
i18n/css3-text-line-break-baspglwj-005.html
i18n/css3-text-line-break-baspglwj-006.html
i18n/css3-text-line-break-baspglwj-007.html
i18n/css3-text-line-break-baspglwj-008.html
i18n/css3-text-line-break-baspglwj-009.html
i18n/css3-text-line-break-baspglwj-010.html
i18n/css3-text-line-break-baspglwj-011.html
i18n/css3-text-line-break-baspglwj-012.html
i18n/css3-text-line-break-baspglwj-014.html
i18n/css3-text-line-break-baspglwj-015.html
i18n/css3-text-line-break-baspglwj-016.html
i18n/css3-text-line-break-baspglwj-017.html
i18n/css3-text-line-break-baspglwj-018.html
i18n/css3-text-line-break-baspglwj-019.html
i18n/css3-text-line-break-baspglwj-020.html
i18n/css3-text-line-break-baspglwj-021.html
i18n/css3-text-line-break-baspglwj-022.html
i18n/css3-text-line-break-baspglwj-023.html
i18n/css3-text-line-break-baspglwj-024.html
i18n/css3-text-line-break-baspglwj-025.html
i18n/css3-text-line-break-baspglwj-026.html
i18n/css3-text-line-break-baspglwj-030.html
i18n/css3-text-line-break-baspglwj-031.html
i18n/css3-text-line-break-baspglwj-032.html
i18n/css3-text-line-break-baspglwj-033.html
i18n/css3-text-line-break-baspglwj-034.html
i18n/css3-text-line-break-baspglwj-035.html
i18n/css3-text-line-break-baspglwj-036.html
i18n/css3-text-line-break-baspglwj-037.html
i18n/css3-text-line-break-baspglwj-038.html
i18n/css3-text-line-break-baspglwj-039.html
i18n/css3-text-line-break-baspglwj-040.html
i18n/css3-text-line-break-baspglwj-041.html
i18n/css3-text-line-break-baspglwj-042.html
i18n/css3-text-line-break-baspglwj-043.html
i18n/css3-text-line-break-baspglwj-044.html
i18n/css3-text-line-break-baspglwj-045.html
i18n/css3-text-line-break-baspglwj-046.html
i18n/css3-text-line-break-baspglwj-047.html
i18n/css3-text-line-break-baspglwj-048.html
i18n/css3-text-line-break-baspglwj-049.html
i18n/css3-text-line-break-baspglwj-050.html
i18n/css3-text-line-break-baspglwj-051.html
i18n/css3-text-line-break-baspglwj-052.html
i18n/css3-text-line-break-baspglwj-060.html
i18n/css3-text-line-break-baspglwj-061.html
i18n/css3-text-line-break-baspglwj-062.html
i18n/css3-text-line-break-baspglwj-063.html
i18n/css3-text-line-break-baspglwj-064.html
i18n/css3-text-line-break-baspglwj-065.html
i18n/css3-text-line-break-baspglwj-066.html
i18n/css3-text-line-break-baspglwj-067.html
i18n/css3-text-line-break-baspglwj-068.html
i18n/css3-text-line-break-baspglwj-069.html
i18n/css3-text-line-break-baspglwj-070.html
i18n/css3-text-line-break-baspglwj-071.html
i18n/css3-text-line-break-baspglwj-072.html
i18n/css3-text-line-break-baspglwj-073.html
i18n/css3-text-line-break-baspglwj-074.html
i18n/css3-text-line-break-baspglwj-075.html
i18n/css3-text-line-break-baspglwj-076.html
i18n/css3-text-line-break-baspglwj-077.html
i18n/css3-text-line-break-baspglwj-078.html
i18n/css3-text-line-break-baspglwj-080.html
i18n/css3-text-line-break-baspglwj-081.html
i18n/css3-text-line-break-baspglwj-082.html
i18n/css3-text-line-break-baspglwj-083.html
i18n/css3-text-line-break-baspglwj-084.html
i18n/css3-text-line-break-baspglwj-085.html
i18n/css3-text-line-break-baspglwj-086.html
i18n/css3-text-line-break-baspglwj-090.html
i18n/css3-text-line-break-baspglwj-091.html
i18n/css3-text-line-break-baspglwj-092.html
i18n/css3-text-line-break-baspglwj-093.html
i18n/css3-text-line-break-baspglwj-095.html
i18n/css3-text-line-break-baspglwj-096.html
i18n/css3-text-line-break-baspglwj-097.html
i18n/css3-text-line-break-baspglwj-098.html
i18n/css3-text-line-break-baspglwj-099.html
i18n/css3-text-line-break-baspglwj-100.html
i18n/css3-text-line-break-baspglwj-101.html
i18n/css3-text-line-break-baspglwj-102.html
i18n/css3-text-line-break-baspglwj-103.html
i18n/css3-text-line-break-baspglwj-104.html
i18n/css3-text-line-break-baspglwj-105.html
i18n/css3-text-line-break-baspglwj-106.html
i18n/css3-text-line-break-baspglwj-107.html
i18n/css3-text-line-break-baspglwj-108.html
i18n/css3-text-line-break-baspglwj-109.html
i18n/css3-text-line-break-baspglwj-110.html
i18n/css3-text-line-break-baspglwj-111.html
i18n/css3-text-line-break-baspglwj-112.html
i18n/css3-text-line-break-baspglwj-113.html
i18n/css3-text-line-break-baspglwj-114.html
i18n/css3-text-line-break-baspglwj-115.html
i18n/css3-text-line-break-baspglwj-116.html
i18n/css3-text-line-break-baspglwj-117.html
i18n/css3-text-line-break-baspglwj-118.html
i18n/css3-text-line-break-opclns-001.html
i18n/css3-text-line-break-opclns-002.html
i18n/css3-text-line-break-opclns-003.html
i18n/css3-text-line-break-opclns-004.html
i18n/css3-text-line-break-opclns-005.html
i18n/css3-text-line-break-opclns-006.html
i18n/css3-text-line-break-opclns-007.html
i18n/css3-text-line-break-opclns-008.html
i18n/css3-text-line-break-opclns-009.html
i18n/css3-text-line-break-opclns-010.html
i18n/css3-text-line-break-opclns-011.html
i18n/css3-text-line-break-opclns-012.html
i18n/css3-text-line-break-opclns-014.html
i18n/css3-text-line-break-opclns-015.html
i18n/css3-text-line-break-opclns-016.html
i18n/css3-text-line-break-opclns-017.html
i18n/css3-text-line-break-opclns-018.html
i18n/css3-text-line-break-opclns-019.html
i18n/css3-text-line-break-opclns-020.html
i18n/css3-text-line-break-opclns-021.html
i18n/css3-text-line-break-opclns-022.html
i18n/css3-text-line-break-opclns-023.html
i18n/css3-text-line-break-opclns-024.html
i18n/css3-text-line-break-opclns-025.html
i18n/css3-text-line-break-opclns-026.html
i18n/css3-text-line-break-opclns-027.html
i18n/css3-text-line-break-opclns-028.html
i18n/css3-text-line-break-opclns-029.html
i18n/css3-text-line-break-opclns-030.html
i18n/css3-text-line-break-opclns-031.html
i18n/css3-text-line-break-opclns-032.html
i18n/css3-text-line-break-opclns-033.html
i18n/css3-text-line-break-opclns-034.html
i18n/css3-text-line-break-opclns-035.html
i18n/css3-text-line-break-opclns-036.html
i18n/css3-text-line-break-opclns-037.html
i18n/css3-text-line-break-opclns-038.html
i18n/css3-text-line-break-opclns-039.html
i18n/css3-text-line-break-opclns-040.html
i18n/css3-text-line-break-opclns-041.html
i18n/css3-text-line-break-opclns-042.html
i18n/css3-text-line-break-opclns-043.html
i18n/css3-text-line-break-opclns-044.html
i18n/css3-text-line-break-opclns-045.html
i18n/css3-text-line-break-opclns-046.html
i18n/css3-text-line-break-opclns-047.html
i18n/css3-text-line-break-opclns-049.html
i18n/css3-text-line-break-opclns-050.html
i18n/css3-text-line-break-opclns-051.html
i18n/css3-text-line-break-opclns-052.html
i18n/css3-text-line-break-opclns-053.html
i18n/css3-text-line-break-opclns-054.html
i18n/css3-text-line-break-opclns-055.html
i18n/css3-text-line-break-opclns-056.html
i18n/css3-text-line-break-opclns-057.html
i18n/css3-text-line-break-opclns-058.html
i18n/css3-text-line-break-opclns-059.html
i18n/css3-text-line-break-opclns-060.html
i18n/css3-text-line-break-opclns-061.html
i18n/css3-text-line-break-opclns-062.html
i18n/css3-text-line-break-opclns-063.html
i18n/css3-text-line-break-opclns-064.html
i18n/css3-text-line-break-opclns-065.html
i18n/css3-text-line-break-opclns-100.html
i18n/css3-text-line-break-opclns-101.html
i18n/css3-text-line-break-opclns-102.html
i18n/css3-text-line-break-opclns-103.html
i18n/css3-text-line-break-opclns-104.html
i18n/css3-text-line-break-opclns-105.html
i18n/css3-text-line-break-opclns-106.html
i18n/css3-text-line-break-opclns-107.html
i18n/css3-text-line-break-opclns-108.html
i18n/css3-text-line-break-opclns-109.html
i18n/css3-text-line-break-opclns-110.html
i18n/css3-text-line-break-opclns-111.html
i18n/css3-text-line-break-opclns-112.html
i18n/css3-text-line-break-opclns-113.html
i18n/css3-text-line-break-opclns-114.html
i18n/css3-text-line-break-opclns-115.html
i18n/css3-text-line-break-opclns-116.html
i18n/css3-text-line-break-opclns-117.html
i18n/css3-text-line-break-opclns-119.html
i18n/css3-text-line-break-opclns-120.html
i18n/css3-text-line-break-opclns-121.html
i18n/css3-text-line-break-opclns-122.html
i18n/css3-text-line-break-opclns-123.html
i18n/css3-text-line-break-opclns-124.html
i18n/css3-text-line-break-opclns-125.html
i18n/css3-text-line-break-opclns-126.html
i18n/css3-text-line-break-opclns-127.html
i18n/css3-text-line-break-opclns-128.html
i18n/css3-text-line-break-opclns-129.html
i18n/css3-text-line-break-opclns-130.html
i18n/css3-text-line-break-opclns-131.html
i18n/css3-text-line-break-opclns-132.html
i18n/css3-text-line-break-opclns-133.html
i18n/css3-text-line-break-opclns-134.html
i18n/css3-text-line-break-opclns-135.html
i18n/css3-text-line-break-opclns-136.html
i18n/css3-text-line-break-opclns-137.html
i18n/css3-text-line-break-opclns-138.html
i18n/css3-text-line-break-opclns-139.html
i18n/css3-text-line-break-opclns-140.html
i18n/css3-text-line-break-opclns-141.html
i18n/css3-text-line-break-opclns-142.html
i18n/css3-text-line-break-opclns-143.html
i18n/css3-text-line-break-opclns-144.html
i18n/css3-text-line-break-opclns-145.html
i18n/css3-text-line-break-opclns-146.html
i18n/css3-text-line-break-opclns-147.html
i18n/css3-text-line-break-opclns-148.html
i18n/css3-text-line-break-opclns-149.html
i18n/css3-text-line-break-opclns-150.html
i18n/css3-text-line-break-opclns-151.html
i18n/css3-text-line-break-opclns-152.html
i18n/css3-text-line-break-opclns-153.html
i18n/css3-text-line-break-opclns-155.html
i18n/css3-text-line-break-opclns-156.html
i18n/css3-text-line-break-opclns-157.html
i18n/css3-text-line-break-opclns-158.html
i18n/css3-text-line-break-opclns-159.html
i18n/css3-text-line-break-opclns-160.html
i18n/css3-text-line-break-opclns-161.html
i18n/css3-text-line-break-opclns-162.html
i18n/css3-text-line-break-opclns-163.html
i18n/css3-text-line-break-opclns-164.html
i18n/css3-text-line-break-opclns-165.html
i18n/css3-text-line-break-opclns-166.html
i18n/css3-text-line-break-opclns-167.html
i18n/css3-text-line-break-opclns-168.html
i18n/css3-text-line-break-opclns-169.html
i18n/css3-text-line-break-opclns-170.html
i18n/css3-text-line-break-opclns-171.html
i18n/css3-text-line-break-opclns-200.html
i18n/css3-text-line-break-opclns-201.html
i18n/css3-text-line-break-opclns-202.html
i18n/css3-text-line-break-opclns-203.html
i18n/css3-text-line-break-opclns-204.html
i18n/css3-text-line-break-opclns-205.html
i18n/css3-text-line-break-opclns-206.html
i18n/css3-text-line-break-opclns-207.html
i18n/css3-text-line-break-opclns-208.html
i18n/css3-text-line-break-opclns-209.html
i18n/css3-text-line-break-opclns-210.html
i18n/css3-text-line-break-opclns-211.html
i18n/css3-text-line-break-opclns-212.html
i18n/css3-text-line-break-opclns-213.html
i18n/css3-text-line-break-opclns-214.html
i18n/css3-text-line-break-opclns-215.html
i18n/css3-text-line-break-opclns-217.html
i18n/css3-text-line-break-opclns-218.html
i18n/css3-text-line-break-opclns-219.html
i18n/css3-text-line-break-opclns-220.html
i18n/css3-text-line-break-opclns-221.html
i18n/css3-text-line-break-opclns-222.html
i18n/css3-text-line-break-opclns-223.html
i18n/css3-text-line-break-opclns-225.html
i18n/css3-text-line-break-opclns-226.html
While CSS does not fully define where soft wrap opportunities occur,
some controls are provided to distinguish common variations:
The 'line-break' property allows choosing various levels of “strictness”
for line breaking restrictions.
The 'word-break' property controls what types of letters
are glommed together to form unbreakable “words”,
causing CJK characters to behave like non-CJK text or vice versa.
The 'hyphens' property controls whether automatic hyphenation
is allowed to break words in scripts that hyphenate.
The 'overflow-wrap' property allows the UA to take a break anywhere
in otherwise-unbreakable strings that would otherwise overflow.
Note: [[UAX14]] defines a baseline behavior
for line breaking for all scripts in Unicode,
which is expected to be further tailored.
Further information on line breaking conventions
can be found in [[JLREQ]] and [[JIS4051]] for Japanese,
[[CLREQ]] and [[ZHMARK]] for Chinese.
See also the
Internationalization Working Group’s
Typography Index [[TYPOGRAPHY]]
which includes more information on additional languages.
Any guidance for additional appropriate references
would be much appreciated.
The interaction of [=line breaking=] and bidirectional text is defined by
[[!CSS-WRITING-MODES-3]] and [[!UAX9]],
see in particular [[css-writing-modes-3#bidi-algo]]
and UAX9§3.4 Reordering Resolved Levels.
bidi-breaking-001.xht
bidi-breaking-002.xht
bidi-breaking-003.xht
Regardless of the 'white-space' value,
lines always break at each preserved forced break character:
thus
for all values, line-breaking behavior defined for
the BK and NL Unicode line breaking classes [[!UAX14]]
must be honored.
Note: The bidi implications of such [=forced line breaks=] are defined by [[!UAX9]].
Except where explicitly defined otherwise
(e.g. for ''line-break: anywhere'' or ''overflow-wrap: anywhere'')
line breaking behavior defined for
the WJ, ZW, GL, and ZWJ Unicode line breaking classes [[!UAX14]]
must be honored.
word-break/word-break-normal-001.html
line-breaking/line-breaking-001.html
line-breaking/line-breaking-002.html
line-breaking/line-breaking-003.html
line-breaking/line-breaking-004.html
line-breaking/line-breaking-005.html
line-breaking/line-breaking-006.html
line-breaking/line-breaking-007.html
line-breaking/line-breaking-008.html
line-breaking/line-breaking-021.html
i18n/css3-text-line-break-baspglwj-001.html
i18n/css3-text-line-break-baspglwj-002.html
i18n/css3-text-line-break-baspglwj-120.html
i18n/css3-text-line-break-baspglwj-121.html
i18n/css3-text-line-break-baspglwj-122.html
i18n/css3-text-line-break-baspglwj-123.html
i18n/css3-text-line-break-baspglwj-124.html
i18n/css3-text-line-break-baspglwj-125.html
i18n/css3-text-line-break-baspglwj-126.html
i18n/css3-text-line-break-baspglwj-127.html
i18n/css3-text-line-break-baspglwj-128.html
i18n/css3-text-line-break-baspglwj-130.html
i18n/css3-text-line-break-baspglwj-131.html
word-break/word-break-break-all-018.html
word-break/word-break-break-all-021.html
word-break/word-break-break-all-022.html
UAs that allow wrapping at punctuation
other than word separators
in writing systems that use them
should prioritize breakpoints.
(For example, if breaks after slashes are given a lower priority than spaces,
the sequence “check /etc” will never break between the "/" and the "e".)
As long as care is taken to avoid such awkward breaks,
allowing breaks at appropriate punctuation other than word separators
is recommended,
as it results in more even-looking margins, particularly in narrow measures.
The UA may use the width of the containing block, the text's language,
the 'line-break' value,
and other factors in assigning priorities:
CSS does not define prioritization of line breaking opportunities.
Prioritization of word separators is not expected,
however,
if ''word-break: break-all'' is specified
(since this value explicitly requests line breaking behavior
not based on breaking at word separators)--
and is forbidden under ''line-break: anywhere''.
Out-of-flow elements
and inline element boundaries
do not introduce a forced line break
or soft wrap opportunity in the flow.
line-breaking/line-breaking-012.html
line-breaking/line-breaking-015.html
line-breaking/line-breaking-016.html
line-breaking/line-breaking-017.html
line-breaking/line-breaking-018.html
line-breaking/line-breaking-019.html
For Web-compatibility
there is a [=soft wrap opportunity=]
before and after each replaced element or other [=atomic inline=],
even when adjacent to a character that would normally suppress them,
such as U+00A0 NO-BREAK SPACE.
line-breaking/line-breaking-atomic-001.html
line-breaking/line-breaking-atomic-002.html
line-breaking/line-breaking-atomic-003.html
line-breaking/line-breaking-atomic-004.html
line-breaking/line-breaking-atomic-005.html
line-breaking/line-breaking-atomic-006.html
line-breaking/line-breaking-atomic-007.html
line-breaking/line-breaking-atomic-008.html
line-breaking/line-breaking-atomic-009.html
line-breaking/line-breaking-replaced-001.html
line-breaking/line-breaking-replaced-002.html
line-breaking/line-breaking-replaced-003.html
line-breaking/line-breaking-replaced-004.html
line-breaking/line-breaking-replaced-005.html
line-breaking/line-breaking-replaced-006.html
line-breaking/line-breaking-atomic-nowrap-001.html
For soft wrap opportunities created by characters that disappear at the line break (e.g. U+0020 SPACE),
properties on the box directly containing that character control the line breaking at that opportunity.
For soft wrap opportunities defined by the boundary between two characters,
the 'white-space' property on the nearest common ancestor of the two characters controls breaking;
which elements’ 'line-break', 'word-break', and 'overflow-wrap' properties
control the determination of [=soft wrap opportunities=]
at such boundaries
is undefined in Level 3.
line-breaking/line-breaking-009.html
line-breaking/line-breaking-010.html
line-breaking/line-breaking-011.html
line-breaking/line-breaking-ic-001.html
line-breaking/line-breaking-ic-002.html
line-breaking/line-breaking-ic-003.html
white-space/white-space-wrap-after-nowrap-001.html
word-break/word-break-break-all-inline-010.html
overflow-wrap/overflow-wrap-anywhere-inline-003.html
For soft wrap opportunities before the first or after the last character of a box,
the break occurs immediately before/after the box (at its margin edge)
rather than breaking the box between its content edge and the content.
Line breaking in/around Ruby is defined in CSS Ruby [[!CSS-RUBY-1]].
In order to avoid unexpected overflow,
if the User Agent is unable to perform the requisite lexical or orthographic analysis
for line breaking any [=content language=] that requires it--
for example due to lacking a dictionary for certain languages--
it must assume a [=soft wrap opportunity=]
between pairs of [=typographic letter units=] in that writing system.
Note: This provision is not triggered merely when
the UA fails to find a word boundary in a particular text run;
the text run may well be a single unbreakable word.
It applies for example
when a text run is composed of Khmer characters (U+1780 to U+17FF)
if the User Agent does not know how to determine
word boundaries in Khmer.
Breaking Rules for Letters: the 'word-break' property
Name: word-break
Value: normal | keep-all | break-all | break-word
Initial: normal
Applies to: inline boxes
Inherited: yes
Canonical order: n/a
Computed value: specified keyword
Animation type: discrete
This property specifies soft wrap opportunities between letters,
i.e. where it is “normal” and permissible to break lines of text.
Specifically it controls whether a soft wrap opportunity
generally exists
between adjacent typographic letter units,
treating non-lettertypographic character units
belonging to the
NU, AL, AI, or ID
Unicode line breaking classes [[!UAX14]]
as typographic letter units for this purpose (only).
It does not affect rules governing the soft wrap opportunities
created by [=white space=] (as well as by [=other space separators=]) and around punctuation.
(See 'line-break' for controls affecting punctuation and small kana.)
word-break/word-break-break-all-010.html
word-break/word-break-break-all-020.html
word-break/word-break-keep-all-005.html
word-break/word-break-keep-all-006.html
word-break/word-break-keep-all-007.html
word-break/word-break-keep-all-008.html
word-break/word-break-break-all-012.html
word-break/word-break-break-all-013.html
word-break/word-break-break-all-015.html
word-break/word-break-break-all-016.html
word-break/word-break-break-all-019.html
word-break/word-break-break-all-020.html
word-break/word-break-break-all-023.html
word-break/word-break-break-all-024.html
word-break/word-break-break-all-025.html
word-break/word-break-break-all-026.html
word-break/word-break-break-all-027.html
word-break/word-break-break-all-028.html
word-break/word-break-break-all-inline-008.html
white-space/break-spaces-008.html
word-break/word-break-min-content-003.html
For example, in some styles of CJK typesetting, English words are allowed
to break between any two letters, rather than only at spaces or hyphenation points;
this can be enabled with ''word-break:break-all''.
An example of English text embedded in Japanese
being broken at an arbitrary point in the word.
As another example, Korean has two styles of line-breaking:
between any two Korean syllables (''word-break: normal'')
or, like English, mainly at spaces (''word-break: keep-all'').
각 줄의 마지막에 한글이 올 때 줄 나눔 기
준을 “글자” 또는 “어절” 단위로 한다.
각 줄의 마지막에 한글이 올 때 줄 나눔
기준을 “글자” 또는 “어절” 단위로 한다.
Ethiopic similarly has two styles of line-breaking,
either only breaking at [=word separators=] (''word-break: normal''),
or also allowing breaks between letters within a word (''word-break: break-all'').
Note: To enable additional break opportunities only in the case of overflow,
see 'overflow-wrap'.
Values have the following meanings:
normal
Words break according to their customary rules,
as described above.
Korean, which commonly exhibits two different behaviors,
allows breaks between any two consecutive Hangul/Hanja.
For Ethiopic, which also exhibits two different behaviors,
such breaks within words are not allowed.
word-break/word-break-normal-ar-000.html
word-break/word-break-normal-bo-000.html
word-break/word-break-normal-en-000.html
word-break/word-break-normal-hi-000.html
word-break/word-break-normal-ja-000.html
word-break/word-break-normal-ja-001.html
word-break/word-break-normal-ja-002.html
word-break/word-break-normal-ja-004.html
word-break/word-break-normal-km-000.html
word-break/word-break-normal-ko-000.html
word-break/word-break-normal-lo-000.html
word-break/word-break-normal-my-000.html
word-break/word-break-normal-tdd-000.html
word-break/word-break-normal-th-000.html
word-break/word-break-normal-zh-000.html
break-all
Breaking is allowed within “words”:
specifically,
in addition to soft wrap opportunities allowed for ''word-break/normal'',
any typographic letter units
(and any typographic character units resolving to the
NU (“numeric”), AL (“alphabetic”), or SA (“Southeast Asian”)
line breaking classes [[!UAX14]])
are instead treated as ID (“ideographic characters”)
for the purpose of line-breaking.
Hyphenation is not applied.
word-break/word-break-break-all-000.html
word-break/word-break-break-all-001.html
word-break/word-break-break-all-002.html
word-break/word-break-break-all-003.html
word-break/word-break-break-all-005.html
word-break/word-break-break-all-006.html
word-break/word-break-break-all-012.html
word-break/word-break-break-all-013.html
word-break/word-break-break-all-014.html
word-break/word-break-break-all-015.html
word-break/word-break-break-all-016.html
word-break/word-break-break-all-017.html
word-break/word-break-break-all-018.html
word-break/word-break-break-all-019.html
word-break/word-break-break-all-020.html
word-break/word-break-break-all-021.html
word-break/word-break-break-all-022.html
word-break/word-break-break-all-023.html
word-break/word-break-break-all-024.html
word-break/word-break-break-all-025.html
word-break/word-break-break-all-026.html
word-break/word-break-break-all-027.html
word-break/word-break-break-all-028.html
word-break/word-break-break-all-029.html
word-break/word-break-break-all-030.html
word-break/word-break-break-all-inline-001.html
word-break/word-break-break-all-inline-002.html
word-break/word-break-break-all-inline-003.html
word-break/word-break-break-all-inline-004.html
word-break/word-break-break-all-inline-005.html
word-break/word-break-break-all-inline-006.html
word-break/word-break-break-all-inline-007.html
word-break/word-break-break-all-inline-008.html
word-break/word-break-break-all-inline-009.html
word-break/word-break-break-all-inline-010.html
white-space/break-spaces-006.html
white-space/break-spaces-008.html
white-space/break-spaces-before-first-char-004.html
white-space/break-spaces-before-first-char-005.html
white-space/break-spaces-before-first-char-006.html
overflow-wrap/overflow-wrap-anywhere-006.html
line-break/line-break-loose-hyphens-002.html
line-break/line-break-normal-hyphens-002.html
line-break/line-break-strict-hyphens-002.html
Note: This value does not affect
whether there are soft wrap opportunities
around punctuation characters.
To allow breaks anywhere, see ''line-break: anywhere''.
white-space/break-spaces-before-first-char-011.html
Note: This option enables the other common behavior for Ethiopic.
It is also often used in a context where
the text consists predominantly of CJK characters
with only short non-CJK excerpts,
and it is desired that the text be better distributed on each line.
keep-all
Breaking is forbidden within “words”:
implicit soft wrap opportunities between typographic letter units
(or other typographic character units
belonging to the
NU, AL, AI, or ID
Unicode line breaking classes [[!UAX14]])
are suppressed,
i.e. breaks are prohibited between pairs of such characters
(regardless of 'line-break' settings other than ''line-break/anywhere'')
except where opportunities exist due to dictionary-based breaking.
Otherwise this option is equivalent to ''word-break/normal''.
In this style, sequences of CJK characters do not break.
word-break/word-break-keep-all-000.html
word-break/word-break-keep-all-001.html
word-break/word-break-keep-all-002.html
word-break/word-break-keep-all-003.html
Note: This is the other common behavior for Korean (which uses [=spaces=] between words),
and is also useful for mixed-script text where CJK snippets are mixed
into another language that uses [=spaces=] for separation.
Symbols that line-break the same way as letters of a particular category
are affected the same way as those letters.
Here's a mixed-script sample text:
这是一些汉字 and some Latin و کمی خط عربی และตัวอย่างการเขียนภาษาไทย በጽሑፍ፡ማራዘሙን፡አንዳንድ፡
The break-points are determined as follows (indicated by ‘·’):
Japanese is usually typeset allowing line breaks within words.
However, it is sometimes preferred to suppress these wrapping opportunities
and to only allow wrapping at the end of certain sentence fragments.
This is most commonly done in very short pieces of text,
such as headings and table or figure captions.
This can be achieved by marking the allowed wrapping points
with <{wbr}> or U+200B ZERO WIDTH SPACE,
and suppressing the other ones using ''word-break: keep-all''.
For instance, the following markup can produce either of the renderings below,
depending on the value of the 'word-break' property:
<h1>窓ぎわの<wbr>トットちゃん</h1>
h1 { word-break: normal }
h1 { word-break: keep-all }
Expected rendering
窓ぎわのトットちゃ
ん
窓ぎわの
トットちゃん
Result in your browser
窓ぎわのトットちゃん
窓ぎわのトットちゃん
When shaping scripts such as Arabic
are allowed to break within words due to ''word-break/break-all''
the characters must still be shaped
as if the word were not broken
(see [[#word-break-shaping]]).
word-break/word-break-break-all-004.html
For compatibility with legacy content,
the 'word-break' property also supports a deprecated break-word keyword.
When specified, this has the same effect as
''word-break: normal'' and ''overflow-wrap: anywhere'',
regardless of the actual value of the 'overflow-wrap' property.
white-space/pre-wrap-008.html
white-space/pre-wrap-016.html
word-break/word-break-break-word-overflow-wrap-interactions.html
word-break/word-break-break-word-crash-001.html
white-space/break-spaces-003.html
white-space/break-spaces-004.html
white-space/break-spaces-008.html
white-space/break-spaces-before-first-char-010.html
word-break/word-break-min-content-001.html
word-break/word-break-min-content-002.html
word-break/word-break-min-content-003.html
word-break/word-break-min-content-004.html
word-break/word-break-min-content-005.html
word-break/word-break-min-content-006.html
Line Breaking Strictness: the 'line-break' property
Name: line-break
Value: auto | loose | normal | strict | anywhere
Initial: auto
Applies to: inline boxes
Inherited: yes
Canonical order: n/a
Computed value: specified keyword
Animation type: discrete
This property specifies the strictness of line-breaking rules applied
within an element:
especially how wrapping interacts with punctuation and symbols.
Values have the following meanings:
auto
The UA determines the set of line-breaking restrictions to use,
and it may vary the restrictions based on the length of the line; e.g.,
use a less restrictive set of line-break rules for short lines.
loose
Breaks text using the least restrictive set of line-breaking
rules. Typically used for short lines, such as in newspapers.
normal
Breaks text using the most common set of line-breaking rules.
strict
Breaks text using the most stringent set of line-breaking
rules.
anywhere
There is a soft wrap opportunity around every typographic character unit,
including around any punctuation character or [=preserved white spaces=],
or in the middle of words,
disregarding any prohibition against line breaks,
even those introduced by characters with the GL, WJ, or ZWJ character class (see [[UAX14]])
or mandated by the 'word-break' property.
The different wrapping opportunities must not be prioritized.
Hyphenation is not applied.
line-break/line-break-anywhere-001.html
line-break/line-break-anywhere-002.html
line-break/line-break-anywhere-003.html
line-break/line-break-anywhere-004.html
line-break/line-break-anywhere-005.html
line-break/line-break-anywhere-006.html
line-break/line-break-anywhere-007.html
line-break/line-break-anywhere-008.html
line-break/line-break-anywhere-009.html
line-break/line-break-anywhere-010.html
line-break/line-break-anywhere-011.html
line-break/line-break-anywhere-012.html
line-break/line-break-anywhere-013.html
line-break/line-break-anywhere-014.html
line-break/line-break-anywhere-015.html
line-break/line-break-anywhere-016.html
line-break/line-break-anywhere-017.html
line-break/line-break-anywhere-and-white-space-001.html
line-break/line-break-anywhere-and-white-space-002.html
line-break/line-break-anywhere-and-white-space-003.html
line-break/line-break-anywhere-and-white-space-004.html
line-break/line-break-anywhere-and-white-space-005.html
line-break/line-break-anywhere-and-white-space-006.html
line-break/line-break-anywhere-and-white-space-007.html
line-break/line-break-anywhere-and-white-space-008.html
line-break/line-break-anywhere-and-white-space-009.html
line-break/line-break-anywhere-and-white-space-009.html
line-break/line-break-anywhere-overrides-uax-behavior-001.html
line-break/line-break-anywhere-overrides-uax-behavior-002.html
line-break/line-break-anywhere-overrides-uax-behavior-003.html
line-break/line-break-anywhere-overrides-uax-behavior-004.html
line-break/line-break-anywhere-overrides-uax-behavior-005.html
line-break/line-break-anywhere-overrides-uax-behavior-006.html
line-break/line-break-anywhere-overrides-uax-behavior-007.html
line-break/line-break-anywhere-overrides-uax-behavior-008.html
line-break/line-break-anywhere-overrides-uax-behavior-009.html
line-break/line-break-anywhere-overrides-uax-behavior-010.html
line-break/line-break-anywhere-overrides-uax-behavior-011.html
line-break/line-break-anywhere-overrides-uax-behavior-012.html
line-break/line-break-anywhere-overrides-uax-behavior-013.html
line-break/line-break-anywhere-overrides-uax-behavior-014.html
line-break/line-break-anywhere-overrides-uax-behavior-015.html
line-break/line-break-anywhere-overrides-uax-behavior-016.html
line-break/line-break-shaping-001.html
white-space/break-spaces-before-first-char-007.html
white-space/break-spaces-before-first-char-011.html
Note: This value triggers the line breaking rules typically seen in terminals.
Note: ''line-break/anywhere'' only allows [=preserved white spaces=] at the end of the line
to be wrapped to the next line when 'white-space' is set to ''white-space/break-spaces'',
because in other cases:
* [=preserved white space=] at the end/start of the line is discarded (''white-space/normal'', ''white-space/pre-line'')
* wrapping is forbidden altoghether (''nowrap'', ''pre'')
* the [=preserved white space=] [=hang=] (''pre-wrap'').
When it does have an effect on [=preserved white space=],
with ''white-space: break-spaces'',
it allows breaking before the first space of a sequence,
which ''break-spaces'' on its own does not.
The rules here are following guidelines from KLREQ for Korean,
which don't allow the Chinese/Japanese-specific breaks.
However, the resulting behavior could use some review and feedback to make sure they are correct,
particularly when “word basis” breaking is used (''word-break: keep-all'') in Korean.
CSS distinguishes between four levels of strictness in the rules for
text wrapping.
The precise set of rules in effect for each of ''line-break/loose'', ''line-break/normal'', and ''line-break/strict'' is up to the UA
and should follow language conventions.
However, this specification does require that:
The following breaks are forbidden in ''strict'' line breaking
and allowed in ''line-break/normal'' and ''loose'':
breaks before Japanese small kana or the Katakana-Hiragana prolonged sound mark,
i.e. character from the Unicode line breaking class CJ [[!UAX14]].
line-break/line-break-loose-011.xht
line-break/line-break-loose-012.xht
line-break/line-break-normal-011.xht
line-break/line-break-normal-012.xht
line-break/line-break-strict-011.xht
line-break/line-break-strict-012.xht
i18n/ja/css-text-line-break-ja-cj-loose.html
i18n/ja/css-text-line-break-ja-cj-normal.html
i18n/ja/css-text-line-break-ja-cj-strict.html
i18n/zh/css-text-line-break-zh-cj-loose.html
i18n/zh/css-text-line-break-zh-cj-normal.html
i18n/zh/css-text-line-break-zh-cj-strict.html
i18n/other-lang/css-text-line-break-de-cj-loose.html
i18n/other-lang/css-text-line-break-de-cj-normal.html
i18n/other-lang/css-text-line-break-de-cj-strict.html
i18n/unknown-lang/css-text-line-break-cj-loose.html
i18n/unknown-lang/css-text-line-break-cj-normal.html
i18n/unknown-lang/css-text-line-break-cj-strict.html
The following breaks are allowed for ''line-break/normal'' and ''loose'' line breaking
if the writing system is Chinese or Japanese,
and are otherwise forbidden:
The following breaks are allowed for ''loose'' line breaking
if the preceding character belongs to the Unicode line breaking class ID [[!UAX14]]
(including when the preceding character is treated as ID due to ''word-break: break-all''),
and are otherwise forbidden:
breaks between inseparable characters
(such as ‥ U+2025, … U+2026)
i.e. characters from the Unicode line breaking class IN [[!UAX14]].
line-break/line-break-loose-015.xht
line-break/line-break-normal-015a.xht
line-break/line-break-normal-015b.xht
line-break/line-break-strict-015a.xht
line-break/line-break-strict-015b.xht
i18n/ja/css-text-line-break-ja-in-loose.html
i18n/ja/css-text-line-break-ja-in-normal.html
i18n/ja/css-text-line-break-ja-in-strict.html
i18n/zh/css-text-line-break-zh-in-loose.html
i18n/zh/css-text-line-break-zh-in-normal.html
i18n/zh/css-text-line-break-zh-in-strict.html
i18n/other-lang/css-text-line-break-de-in-loose.html
i18n/other-lang/css-text-line-break-de-in-normal.html
i18n/other-lang/css-text-line-break-de-in-strict.html
i18n/unknown-lang/css-text-line-break-in-loose.html
i18n/unknown-lang/css-text-line-break-in-normal.html
i18n/unknown-lang/css-text-line-break-in-strict.html
The following breaks are allowed for ''loose''
if the writing system is Chinese or Japanese
and are otherwise forbidden:
breaks before suffixes:
Characters with the Unicode line breaking class PO [[!UAX14]]
and the East Asian Width property [[!UAX11]]
Ambiguous, Fullwidth, or Wide.
line-break/line-break-loose-017a.xht
line-break/line-break-loose-017b.xht
line-break/line-break-normal-017a.xht
line-break/line-break-normal-017b.xht
line-break/line-break-strict-017a.xht
line-break/line-break-strict-017b.xht
i18n/ja/css-text-line-break-ja-po-loose.html
i18n/ja/css-text-line-break-ja-po-normal.html
i18n/ja/css-text-line-break-ja-po-strict.html
i18n/zh/css-text-line-break-zh-po-loose.html
i18n/zh/css-text-line-break-zh-po-normal.html
i18n/zh/css-text-line-break-zh-po-strict.html
i18n/other-lang/css-text-line-break-de-po-loose.html
i18n/other-lang/css-text-line-break-de-po-normal.html
i18n/other-lang/css-text-line-break-de-po-strict.html
i18n/unknown-lang/css-text-line-break-po-loose.html
i18n/unknown-lang/css-text-line-break-po-normal.html
i18n/unknown-lang/css-text-line-break-po-strict.html
breaks after prefixes:
Characters with the Unicode line breaking class PR [[!UAX14]]
and the East Asian Width property [[!UAX11]]
Ambiguous, Fullwidth, or Wide.
line-break/line-break-loose-018.xht
line-break/line-break-normal-018.xht
line-break/line-break-strict-018.xht
i18n/ja/css-text-line-break-ja-pr-loose.html
i18n/ja/css-text-line-break-ja-pr-normal.html
i18n/ja/css-text-line-break-ja-pr-strict.html
i18n/zh/css-text-line-break-zh-pr-loose.html
i18n/zh/css-text-line-break-zh-pr-normal.html
i18n/zh/css-text-line-break-zh-pr-strict.html
i18n/other-lang/css-text-line-break-de-pr-loose.html
i18n/other-lang/css-text-line-break-de-pr-normal.html
i18n/other-lang/css-text-line-break-de-pr-strict.html
i18n/unknown-lang/css-text-line-break-pr-loose.html
i18n/unknown-lang/css-text-line-break-pr-normal.html
i18n/unknown-lang/css-text-line-break-pr-strict.html
Note:
The requirements listed above
only create distinctions in CJK text.
In an implementation that matches only the rules above,
and no additional rules,
'line-break' would only affect CJK code points
unless the writing system is tagged as
Chinese or Japanese.
Future levels may add additional specific rules
for other writing systems and languages
as their requirements become known.
As UAs can add additional distinctions
between ''line-break/strict''/''line-break/normal''/''line-break/loose'' modes,
these values can exhibit differences in other writing systems as well.
For example, a UA with sufficiently-advanced Thai language processing ability
could choose to map different levels of strictness in Thai line-breaking
to these keywords,
e.g. disallowing breaks within compound words in ''line-break/strict'' mode
(e.g. breaking ตัวอย่างการเขียนภาษาไทย as ตัวอย่าง·การเขียน·ภาษาไทย)
while allowing more breaks in ''line-break/loose''
(ตัวอย่าง·การ·เขียน·ภาษา·ไทย).
Note: The CSSWG recognizes that in a future edition of the
specification finer control over line breaking may be necessary to
satisfy high-end publishing requirements.
Hyphenation: the 'hyphens' property
Hyphenation
is the controlled splitting of words
where they usually would not be allowed to break
to improve the layout of paragraphs,
typically splitting words at syllabic or morphemic boundaries,
and visually indicating the split (usually by inserting a hyphen, U+2010).
In some cases, hyphenation may also alter the spelling of a word.
Regardless, hyphenation is a rendering effect only:
it must have no effect on the underlying document content
or on text selection or searching.
Hyphenation occurs when the line breaks at a valid
hyphenation opportunity,
which is a type of soft wrap opportunity
that exists within a word where hyphenation is allowed.
In CSS hyphenation opportunities are controlled
with the 'hyphens' property.
CSS Text Level 3 does not define the exact rules for hyphenation;
however UAs are strongly encouraged
to optimize their choice of break points
and to chose language-appropriate hyphenation points.
hyphens/hyphens-overflow-001.html
Note: The [=soft wrap opportunity=] introduced by
the U+002D HYPHEN-MINUS character
or the U+2010 HYPHEN character
is not a [=hyphenation opportunity=],
as no visual indication of the split is created when wrapping:
these characters are visible whether the line is wrapped at that point or not.
Hyphenation opportunities are considered when calculating
min-content intrinsic sizes.
This property controls whether hyphenation is allowed to create more
soft wrap opportunities within a line of text.
Values have the following meanings:
none
Words are not hyphenated, even if characters inside
the word explicitly define hyphenation opportunities.
Note: This does not suppress the existing [=soft wrap opportunities=]
introduced by always visible characters such as
U+002D HYPHEN-MINUS
or U+2010 HYPHEN.
hyphens/hyphens-none-011.html
hyphens/hyphens-none-012.html
hyphens/hyphens-none-013.html
manual
Words are only hyphenated where there are characters inside the word
that explicitly suggest hyphenation opportunities.
hyphens/hyphens-overflow-001.html
hyphens/hyphens-manual-010.html
hyphens/hyphens-manual-011.html
hyphens/hyphens-manual-012.html
hyphens/hyphens-manual-013.html
hyphens/hyphens-out-of-flow-001.html
hyphens/hyphens-shaping-001.html
hyphens/hyphens-shaping-002.html
hyphens/hyphens-span-001.html
hyphens/shy-styling-001.html
In Unicode, U+00AD is a conditional "soft hyphen" and U+2010 is an
unconditional hyphen. Unicode Standard Annex #14 describes the
role of soft hyphens in
Unicode line breaking. [[!UAX14]]
In HTML, ­ represents the soft hyphen character,
which suggests a hyphenation opportunity.
ex­ample
auto
Words may be broken at hyphenation opportunities
determined automatically by a language-appropriate hyphenation resource
in addition to those indicated explicitly by a conditional hyphen.
Automatic hyphenation opportunities within a word must be ignored
if the word contains a conditional hyphen (­ or U+00AD),
in favor of the conditional hyphen(s).
However, if, even after breaking at such opportunities,
a portion of that word is is still too long to fit on one line,
an automatic hyphenation opportunity may be used.
hyphenation-control-1.html
hyphens/hyphens-auto-001.html
hyphens/hyphens-auto-010.html
hyphens/hyphens-out-of-flow-002.html
hyphens/hyphens-span-002.html
Note: In some languages
(such as English but not German),
it may be appropriate to avoid having hyphenation opportunities in mixed case words,
as those may indicate proper nouns.
This type of heuristic is however not mandated by this specification,
as it is up to the User Agent and its language-specific hyphenation resource.
Correct automatic hyphenation requires a hyphenation resource
appropriate to the language of the text being broken.
The UA must therefore only automatically hyphenate text
for which the content language is known
and for which it has an appropriate hyphenation resource.
hyphens/hyphens-auto-001.html
For the purpose of the 'hyphens' property,
what constitutes a “word” is UA-dependent.
However, inline element boundaries
and out-of-flow elements
must be ignored when determining word boundaries.
hyphens/hyphens-span-001.html
hyphens/hyphens-span-002.html
hyphens/hyphens-out-of-flow-001.html
hyphens/hyphens-out-of-flow-002.html
Authors should correctly tag their content’s language
(e.g. using the HTML lang attribute
or XML xml:lang attribute)
in order to obtain correct automatic hyphenation.
The character or characters visually shown due to hyphenation
at a [=hyphenation opportunity=] created by a conditional hyphen character (­ or U+00AD)
are inserted in-place,
and styled according to any property that applies to the conditional hyphen character.
hyphens/shy-styling-001.html
ex<span style="color:red">­</span>ample
When the markup above is hyphenated, it is rendered as
ex-
ample
When shaping scripts such as Arabic are allowed to break within words
due to hyphenation,
the characters must still be shaped
as if the word were not broken
(see [[#word-break-shaping]]).
hyphens/hyphens-shaping-001.html
hyphens/hyphens-shaping-002.html
For example, if the Uyghur word “داميدى”
were hyphenated, it would appear as
not as
Overflow Wrapping: the 'overflow-wrap'/'word-wrap' property
Name: overflow-wrap, word-wrap
Value: normal | break-word | anywhere
Initial: normal
Applies to: inline boxes
Inherited: yes
Canonical order: n/a
Computed value: specified keyword
Animation type: discrete
This property specifies whether the UA may break at otherwise disallowed points within a line
to prevent overflow,
when an otherwise-unbreakable string is too long to fit within the line box.
It only has an effect when
'white-space' allows wrapping. Possible values:
overflow-wrap/overflow-wrap-002.html
normal
Lines may break only at allowed break points. However, the restrictions
introduced by ''word-break: keep-all'' may be relaxed to match ''word-break: normal''
if there are no otherwise-acceptable break points in the line.
An otherwise unbreakable sequence of characters
may be broken at an arbitrary point if
there are no otherwise-acceptable break points in the line.
Shaping characters are still shaped as if the word were not
broken, and grapheme clusters must stay together as one unit.
No hyphenation character is inserted at the break point.
Soft wrap opportunities introduced by ''overflow-wrap/anywhere''
are considered
when calculating min-content intrinsic sizes.
overflow-wrap/overflow-wrap-min-content-size-001.html
overflow-wrap/overflow-wrap-min-content-size-002.html
overflow-wrap/overflow-wrap-min-content-size-003.html
overflow-wrap/overflow-wrap-anywhere-001.html
overflow-wrap/overflow-wrap-anywhere-003.html
overflow-wrap/overflow-wrap-anywhere-006.html
overflow-wrap/overflow-wrap-anywhere-007.html
overflow-wrap/overflow-wrap-anywhere-008.html
overflow-wrap/overflow-wrap-anywhere-009.html
overflow-wrap/overflow-wrap-anywhere-010.html
overflow-wrap/overflow-wrap-anywhere-inline-001.html
overflow-wrap/overflow-wrap-anywhere-inline-002.html
overflow-wrap/overflow-wrap-anywhere-inline-003.html
overflow-wrap/overflow-wrap-anywhere-inline-004.html
overflow-wrap/overflow-wrap-anywhere-fit-content-001.html
overflow-wrap/overflow-wrap-min-content-size-005.html
overflow-wrap/overflow-wrap-min-content-size-007.html
overflow-wrap/overflow-wrap-cluster-002.html
overflow-wrap/overflow-wrap-shaping-002.html
white-space/break-spaces-before-first-char-009.html
white-space/break-spaces-before-first-char-013.html
word-break/word-break-min-content-001.html
word-break/word-break-min-content-002.html
word-break/word-break-min-content-003.html
word-break/word-break-min-content-004.html
word-break/word-break-min-content-005.html
word-break/word-break-min-content-006.html
white-space/pre-wrap-009.html
white-space/pre-wrap-010.html
white-space/break-spaces-with-overflow-wrap-002.html
white-space/break-spaces-with-overflow-wrap-004.html
white-space/break-spaces-with-overflow-wrap-006.html
white-space/break-spaces-with-overflow-wrap-008.html
white-space/break-spaces-with-overflow-wrap-010.html
break-word
As for ''overflow-wrap/anywhere'' except that
soft wrap opportunities introduced by ''overflow-wrap/break-word''
are not considered
when calculating min-content intrinsic sizes.
overflow-wrap/overflow-wrap-001.html
overflow-wrap/overflow-wrap-break-word-001.html
overflow-wrap/overflow-wrap-break-word-003.html
overflow-wrap/overflow-wrap-break-word-008.html
overflow-wrap/overflow-wrap-break-word-009.html
overflow-wrap/overflow-wrap-break-word-010.html
overflow-wrap/overflow-wrap-min-content-size-004.html
overflow-wrap/overflow-wrap-min-content-size-006.html
overflow-wrap/overflow-wrap-min-content-size-008.html
overflow-wrap/overflow-wrap-break-word-fit-content-001.html
overflow-wrap/overflow-wrap-cluster-001.html
overflow-wrap/overflow-wrap-shaping-001.html
white-space/break-spaces-before-first-char-008.html
white-space/break-spaces-before-first-char-012.html
overflow-wrap/overflow-wrap-break-word-keep-all-001.html
white-space/break-spaces-with-overflow-wrap-001.html
white-space/break-spaces-with-overflow-wrap-003.html
white-space/break-spaces-with-overflow-wrap-005.html
white-space/break-spaces-with-overflow-wrap-007.html
white-space/break-spaces-with-overflow-wrap-009.html
For legacy reasons, UAs must treat 'word-wrap' as a [=legacy name alias=]
of the 'overflow-wrap' property.
overflow-wrap/word-wrap-alias.html
overflow-wrap/word-wrap-001.html
overflow-wrap/word-wrap-002.html
overflow-wrap/word-wrap-004.html
Shaping Across Intra-word Breaks
When shaping scripts such as Arabic wrap
at unforced soft wrap opportunities within words
(such as when breaking due to
''word-break: break-all'',
''line-break: anywhere'',
''overflow-wrap: break-word'',
''overflow-wrap: anywhere'',
or when hyphenating)
the characters must still be shaped
(their joining forms chosen)
as if the word were still whole.
hyphens/hyphens-shaping-001.html
hyphens/hyphens-shaping-002.html
line-break/line-break-shaping-001.html
overflow-wrap/overflow-wrap-shaping-001.html
overflow-wrap/overflow-wrap-shaping-002.html
word-break/word-break-break-all-004.html
For example, if the word “نوشتن” is broken between the “ش” and “ت”,
the “ش” still takes its initial form (“ﺷ”), and the “ت” its medial form (“ﺘ”)--
forming as in “ﻧﻮﺷ | ﺘﻦ”, not as in “نوش | تن”.
Alignment and Justification
Alignment and justification controls how inline content is distributed within a line box.
Text Alignment: the 'text-align' shorthand
Name: text-align
Value: start | end | left | right | center | justify | match-parent | justify-all
Initial: start
Applies to: block containers
Inherited: yes
Canonical order: n/a
Animation type: discrete
This shorthand property
sets the 'text-align-all' and 'text-align-last' properties
and describes how the inline-level content of a block
is aligned along the inline axis
if the content does not completely fill the line box.
Values other than ''justify-all'' or ''match-parent'' are assigned to 'text-align-all'
and reset 'text-align-last' to ''text-align-last/auto''.
Values have the following meanings:
start
Inline-level content is aligned to the start edge of the line box.
text-align/text-align-006.html
text-align/text-align-start-001.html
text-align/text-align-start-002.html
text-align/text-align-start-003.html
text-align/text-align-start-004.html
text-align/text-align-start-005.html
text-align/text-align-start-006.html
text-align/text-align-start-007.html
text-align/text-align-start-008.html
text-align/text-align-start-009.html
text-align/text-align-start-010.html
text-align/text-align-start-014.html
text-align/text-align-start-015.html
text-align/text-align-start-016.html
text-align/text-align-start-017.html
end
Inline-level content is aligned to the end edge of the line box.
text-align/text-align-007.html
text-align/text-align-end-001.html
text-align/text-align-end-002.html
text-align/text-align-end-003.html
text-align/text-align-end-004.html
text-align/text-align-end-005.html
text-align/text-align-end-006.html
text-align/text-align-end-007.html
text-align/text-align-end-008.html
text-align/text-align-end-009.html
text-align/text-align-end-010.html
text-align/text-align-end-014.html
text-align/text-align-end-015.html
text-align/text-align-end-016.html
text-align/text-align-end-017.html
left
Inline-level content is aligned to the
line left
edge of the line box.
(In vertical writing modes,
this will be either the physical top or bottom,
depending on 'text-orientation'.) [[CSS-WRITING-MODES-3]]
right
Inline-level content is aligned to the
line right
edge of the line box.
(In vertical writing modes,
this will be either the physical top or bottom,
depending on 'text-orientation'.) [[CSS-WRITING-MODES-3]]
center
Inline-level content is centered within the line box.
justify
Text is justified according to the method specified by the 'text-justify' property,
in order to exactly fill the line box.
Unless otherwise specified by 'text-align-last',
the last line before a forced break or the end of the block is ''start''-aligned.
text-align/text-align-justify-001.html
text-align/text-align-justify-002.html
text-align/text-align-justify-003.html
text-align/text-align-justify-004.html
text-align/text-align-justify-005.html
text-align/text-align-justify-006.html
letter-spacing-justify-001.xht
text-align-white-space-002.xht
text-align-white-space-004.xht
text-align-white-space-006.xht
text-align-white-space-008.xht
word-spacing-justify-001.xht
justify-all
Sets both 'text-align-all' and 'text-align-last' to ''text-align/justify'',
forcing the last line to justify as well.
text-align/text-align-justifyall-001.html
text-align/text-align-justifyall-002.html
text-align/text-align-justifyall-003.html
text-align/text-align-justifyall-004.html
text-align/text-align-justifyall-005.html
text-align/text-align-justifyall-006.html
letter-spacing/letter-spacing-bidi-003.xht
match-parent
This value behaves the same as ''inherit''
(computes to its parent's computed value)
except that an inherited value of ''start'' or ''end''
is interpreted against the parent’s
(or the initial containing block’s, if there is no parent)
'direction' value
and results in a computed value of either 'left' or 'right'.
When specified on the 'text-align' shorthand,
sets both 'text-align-all' and 'text-align-last' to ''text-align/match-parent''.
text-align/text-align-008.html
text-align-match-parent-01.html
text-align-match-parent-02.html
text-align-match-parent-03.html
text-align-match-parent-04.html
text-align-match-parent-root-ltr.html
text-align-match-parent-root-rtl.html
A block of text is a stack of
line boxes.
This property specifies how the inline-level boxes within each line box
align with respect to the start and end sides of the line box.
Alignment is not with respect to the
viewport
or containing block.
In the case of ''justify'', the UA may stretch or shrink any inline boxes
by adjusting their text. (See 'text-justify'.)
If an element's [=white space=] is not collapsible,
then the UA is not required to adjust its text for the purpose of justification
and may instead treat the text as having no justification opportunities.
If the UA chooses to adjust the text, then it must ensure
that tab stops continue to line up as required by the
white space processing rules.
text-align-white-space-001.xht
text-align-white-space-003.xht
text-align-white-space-005.xht
text-align-white-space-007.xht
If (after justification, if any) the inline contents of a line box are too long to fit within it,
then the contents are start-aligned:
any content that doesn't fit overflows the line box's end edge.
Default Text Alignment: the 'text-align-all' property
Name: text-align-all
Value: start | end | left | right | center | justify | match-parent
Initial: start
Applies to: block containers
Inherited: yes
Computed value: keyword as specified, except for ''match-parent'' which computes as defined above
Canonical order: n/a
Animation type: discrete
This longhand of the 'text-align' shorthand property
specifies the inline alignment of all lines of inline content in the block container,
except for last lines overridden by a non-''text-align-last/auto'' value of 'text-align-last'.
See 'text-align' for a full description of values.
Authors should use the 'text-align' shorthand instead of this property.
Last Line Alignment: the 'text-align-last' property
Name: text-align-last
Value: auto | start | end | left | right | center | justify | match-parent
Initial: auto
Applies to: block containers
Inherited: yes
Canonical order: n/a
Computed value: specified keyword
Animation type: discrete
This property describes how the last line of a block or a line
right before a forced line break is aligned.
text-align/text-align-last-010.html
text-align/text-align-last-011.html
text-align/text-align-last-wins-001.html
If auto is specified,
content on the affected line is aligned per 'text-align-all'
unless 'text-align-all' is set to ''justify'',
in which case it is start-aligned.
All other values are interpreted as described for 'text-align'.
Justification Method: the 'text-justify' property
Name: text-justify
Value: auto | none | inter-word | inter-character
Initial: auto
Applies to: inline boxes
Inherited: yes
Canonical order: n/a
Computed value: specified keyword
Animation type: discrete
This property selects the justification method used when a line's
alignment is set to ''justify'' (see 'text-align').
The property applies to inlines,
but is inherited from block containers to the root inline box containing their inline-level contents.
It takes the following values:
text-justify/text-justify-006.html
auto
The UA determines the justification algorithm to follow, based
on a balance between performance and adequate presentation quality.
Since justification rules vary by writing system and language,
UAs should, where possible, use a justification algorithm appropriate to the text.
text/letter-spacing-justify-001.xht
text/word-spacing-justify-001.xht
For example, the UA could use by default a justification method that is a
simple universal compromise for all writing systems—such as
primarily expanding word separators
and between CJK typographic letter units
along with secondarily expanding between Southeast Asian typographic letter units.
Then, in cases where the content language of the paragraph is known,
it could choose a more language-tailored justification behavior
e.g. following [[JLREQ]] for Japanese,
using cursive elongation for Arabic,
using ''inter-word'' for German,
etc.
An example of cursively-justified Arabic text,
rendered by Tasmeem.
Like English, Arabic can be justified by adjusting the spacing between words,
but in most styles it can also be justified by calligraphically elongating or compressing the letterforms themselves.
In this example, the upper text is extended to fill the line by the use of elongated (kashida) forms and swash forms,
while the bottom line is compressed slightly by using a stacked combination for the characters between ت and م.
By employing traditional calligraphic techniques,
a typesetter can justify the line while preserving flow and color,
providing a very high quality justification effect.
However, this is by its nature a very script-specific effect.
Mixed-script text with ''text-justify: auto'':
this interpretation uses a universal-compromise justification method,
expanding at spaces as well as between CJK and Southeast Asian letters.
This effectively uses inter-word + inter-ideograph spacing
for lines that have word-separators and/or CJK characters
and falls back to inter-cluster behavior for lines that don't
or for which the space stretches too far.
none
Justification is disabled: there are no justification opportunities within the text.
text-justify/text-justify-001.html
text-justify/text-justify-006.html
text-justify-none-001.html
Mixed-script text with ''text-justify: none''
Note: This value is intended for use in user stylesheets
to improve readability or for accessibility purposes.
inter-word
Justification adjusts spacing at word separators only
(effectively varying the used 'word-spacing' on the line).
This behavior is typical for languages that separate words using spaces,
like English or Korean.
text-justify-inter-word-001.html
Mixed-script text with ''text-justify: inter-word''
inter-character
Justification adjusts spacing between each pair of adjacent typographic character units
(effectively varying the used 'letter-spacing' on the line).
This value is sometimes used in East Asian systems such as Japanese.
text-justify-inter-character-001.html
Mixed-script text with ''text-justify: inter-character''
For legacy reasons, UAs must also support the alternate keyword distribute
with the exact same meaning and behavior.
text-justify-distribute-001.html
Since optimal justification is language-sensitive,
authors should correctly language-tag their content for the best results.
Note: The guidelines in this level of CSS do not describe a complete
justification algorithm. They are merely a minimum set of requirements
that a complete algorithm should meet. Limiting the set of requirements
gives UAs some latitude in choosing a justification algorithm that
meets their needs and desired balance of quality, speed, and complexity.
Expanding and Compressing Text
When justifying text, the user agent takes the remaining space between
the ends of a line's contents and the edges of its line box, and
distributes that space throughout its contents so that the contents
exactly fill the line box.
The user agent may alternatively distribute negative space,
putting more content on the line than would otherwise fit under normal spacing conditions.
Space distributed by justification is in addition to
the spacing defined by the 'letter-spacing' or 'word-spacing' properties.
When such additional space is distributed to a word separatorjustification opportunity,
it is applied under the same rules as for 'word-spacing'.
Similarly, when space is distributed to an justification opportunity between
two typographic character units,
it is applied under the same rules as for 'letter-spacing'.
A justification algorithm may divide justification opportunities into different priority levels.
All justification opportunities within a given level
are expanded or compressed at the same priority,
regardless of which typographic character units created that opportunity.
For example, if justification opportunities between two Han characters
and between two Latin letters are defined to be at the same level
(as they are in the ''inter-character'' justification style),
they are not treated differently because they originate from different typographic character units.
It is not defined in this level
whether or how other factors
(such as font size, letter-spacing, glyph shape, position within the line, etc.)
may influence the distribution of space to justification opportunities within the line.
The UA may enable or break optional ligatures or use other font features
such as alternate glyphs or glyph compression
to help justify the text under any method.
This behavior is not controlled by this level of CSS.
However, UAs must not break required ligatures
or otherwise disable features required to correctly shape complex scripts.
If a justification opportunity exists within a line,
and text alignment specifies
full justification (''text-align/justify'') for that line,
it must be justified.
However, by typographic tradition there may be additional rules
controlling the justification of symbols and punctuation.
Therefore, the UA may reassign specific characters
or introduce additional levels of prioritization
to handle justification opportunities involving symbols and punctuation.
For example, there are traditionally no justification opportunities
between consecutive
U+2014 Em Dash ‘—’,
U+2015 Horizontal Bar ‘―’,
U+2026 Horizontal Ellipsis ‘…’,
or U+2025 Two Dot Leader ‘‥’
characters [[JLREQ]];
thus a UA might assign these characters to a “never” prioritization level.
As another example, certain fullwidth punctuation characters
(such as U+301A Left White Square Bracket ‘〚’)
are considered to contain a justification opportunity in Japanese.
The UA might therefore assign these characters to a higher prioritization
level than the opportunities between ideographic characters.
Unexpandable Text
If the inline contents of a line cannot be stretched to the full width of the line box,
then they must be aligned as specified by the 'text-align-last' property.
(If 'text-align-last' is ''justify'', then
they must be aligned as for ''center''.)
text-align/text-align-last-empty-inline.html
The following are examples of unacceptable justification:
Adding gaps between every pair of Arabic letters
Adding gaps between every pair of unjoined Arabic letters
Some font designs allow for the use of the tatweel character for justification.
A UA that performs tatweel-based justification must properly handle the rules for its use.
Note that correct insertion of tatweel characters depends on context, including
the letter-combinations involved, location within the word, and location of the word within the line.
Minimum Requirements for ''text-justify/auto'' Justification
For auto justification, this specification does not define
what all of the justification opportunities are,
how they are prioritized, or
when and how multiple levels of justification opportunities interact.
However, it does require that
Unless contraindicated by the typographic traditions of the content language or adjacent symbols/punctuation,
each of the following provides a justification opportunity:
All letters belonging to all block scripts are treated the same,
and all letters belonging to all clustered scripts are treated the same.
For example, no distiction is made between
the justification opportunity between a Han letter followed by another Han letter,
vs. the justification opportunity between a Han letter followed by a Hangul letter.
CSS offers control over text spacing
via the 'word-spacing' and 'letter-spacing' properties, which specify additional space
around word separators or between typographic character units, respectively.
Word Spacing: the 'word-spacing' property
Name: word-spacing
Value: normal | <>
Initial: normal
Applies to: inline boxes
Inherited: yes
Percentages: N/A
Computed value: an absolute length
Animation type: by computed value type
Canonical order: n/a
This property specifies additional spacing
between “words”.
Missing values are assumed to be ''word-spacing:normal''.
Values are interpreted as defined below:
normal
No additional spacing is applied.
Computes to zero.
word-spacing-100.xht
<length>
Specifies extra spacing in addition to
the intrinsic inter-word spacing defined by the font.
tab-size/tab-size-spacing-001.html
word-spacing-004.xht
word-spacing-005.xht
word-spacing-006.xht
word-spacing-007.xht
word-spacing-008.xht
word-spacing-016.xht
word-spacing-017.xht
word-spacing-018.xht
word-spacing-019.xht
word-spacing-020.xht
word-spacing-028.xht
word-spacing-029.xht
word-spacing-030.xht
word-spacing-031.xht
word-spacing-032.xht
word-spacing-040.xht
word-spacing-041.xht
word-spacing-042.xht
word-spacing-043.xht
word-spacing-044.xht
word-spacing-052.xht
word-spacing-053.xht
word-spacing-054.xht
word-spacing-055.xht
word-spacing-056.xht
word-spacing-064.xht
word-spacing-065.xht
word-spacing-066.xht
word-spacing-067.xht
word-spacing-068.xht
word-spacing-076.xht
word-spacing-077.xht
word-spacing-078.xht
word-spacing-079.xht
word-spacing-080.xht
word-spacing-088.xht
word-spacing-089.xht
word-spacing-090.xht
word-spacing-091.xht
word-spacing-092.xht
word-spacing-097.xht
word-spacing-098.xht
word-spacing-099.xht
Additional spacing is applied to each word separator
left in the text after the white space processing rules have been applied,
and should be applied half on each side of the character
unless otherwise dictated by typographic tradition.
Values may be negative, but there may be implementation-dependent limits.
word-spacing-remove-space-001.xht
word-spacing-remove-space-002.xht
word-spacing-remove-space-003.xht
word-spacing-remove-space-004.xht
word-spacing-remove-space-005.xht
word-spacing-remove-space-006.xht
Word-separator characters
are typographic character units whose purpose and general usage is to separate words.
In [[UNICODE]] this includes
the space (U+0020), the no-break space (U+00A0), the Ethiopic word space (U+1361),
the Aegean word separators (U+10100,U+10101), the Ugaritic word divider (U+1039F),
and the Phoenician Word Separator (U+1091F).
word-spacing-characters-001.xht
If there are no word-separator characters, or if a word-separating
character has a zero advance width (such as the zero width space U+200B)
then the user agent must not create an additional spacing between words.
word-spacing-characters-003.xht
General punctuation (from U+2000 to U+206F)
and fixed-width spaces (such as U+3000 and U+2000 through U+200A)
are not considered word-separator characters.
word-spacing-characters-002.xht
Tracking: the 'letter-spacing' property
Name: letter-spacing
Value: normal | <>
Initial: normal
Applies to: inline boxes
Inherited: yes
Computed value: an absolute length
Animation type: by computed value type
Canonical order: n/a
This property specifies additional spacing (commonly called tracking)
between adjacent typographic character units.
Letter-spacing is applied after
bidi reordering [[CSS-WRITING-MODES-3]]
and is in addition to
kerning [[CSS-FONTS-3]]
and 'word-spacing'.
Depending on the justification rules in effect,
user agents may further increase or decrease the space between typographic character units
in order to justify text.
letter-spacing/letter-spacing-bidi-001.html
letter-spacing/letter-spacing-bidi-004.xht
letter-spacing/letter-spacing-nesting-003.xht
letter-spacing/letter-spacing-bidi-003.xht
letter-spacing/letter-spacing-bidi-004.xht
letter-spacing/letter-spacing-bidi-005.xht
letter-spacing/letter-spacing-206.html
letter-spacing/letter-spacing-211.html
letter-spacing/letter-spacing-212.html
bidi-005.xht
bidi-006.xht
bidi-007.xht
bidi-008.xht
bidi-009.xht
bidi-010.xht
bidi-text/bidi-005a.xht
bidi-text/bidi-005b.xht
bidi-text/bidi-006a.xht
bidi-text/bidi-006b.xht
bidi-text/bidi-007b.xht
bidi-text/bidi-008b.xht
bidi-text/bidi-009b.xht
bidi-text/bidi-010b.xht
text/letter-spacing-justify-001.xht
Values have the following meanings:
normal
No additional spacing is applied. Computes to zero.
letter-spacing-100.xht
<length>
Specifies additional spacing between typographic character units.
Values may be negative, but there may be implementation-dependent limits.
tab-size/tab-size-spacing-001.html
letter-spacing-004.xht
letter-spacing-005.xht
letter-spacing-006.xht
letter-spacing-007.xht
letter-spacing-008.xht
letter-spacing-016.xht
letter-spacing-017.xht
letter-spacing-018.xht
letter-spacing-019.xht
letter-spacing-020.xht
letter-spacing-028.xht
letter-spacing-029.xht
letter-spacing-030.xht
letter-spacing-031.xht
letter-spacing-032.xht
letter-spacing-040.xht
letter-spacing-041.xht
letter-spacing-042.xht
letter-spacing-043.xht
letter-spacing-044.xht
letter-spacing-052.xht
letter-spacing-053.xht
letter-spacing-054.xht
letter-spacing-055.xht
letter-spacing-056.xht
letter-spacing-064.xht
letter-spacing-065.xht
letter-spacing-066.xht
letter-spacing-067.xht
letter-spacing-068.xht
letter-spacing-076.xht
letter-spacing-077.xht
letter-spacing-078.xht
letter-spacing-079.xht
letter-spacing-080.xht
letter-spacing-088.xht
letter-spacing-089.xht
letter-spacing-090.xht
letter-spacing-091.xht
letter-spacing-092.xht
letter-spacing-097.xht
letter-spacing-098.xht
letter-spacing-099.xht
letter-spacing-101.xht
letter-spacing-102.xht
For legacy reasons,
a computed 'letter-spacing' of zero
yields a resolved value (getComputedStyle() return value)
of ''letter-spacing/normal''.
For the purpose of 'letter-spacing', each consecutive run of atomic
inlines (such as images and inline blocks) is treated as a single
typographic character unit.
letter-spacing/letter-spacing-204.html
Letter-spacing must not be applied at the beginning or at the end of a line.
letter-spacing/letter-spacing-end-of-line-001.html
tab-size/tab-size-spacing-001.html
letter-spacing/letter-spacing-200.html
letter-spacing/letter-spacing-201.html
white-space/white-space-letter-spacing-001.html
Because letter-spacing is not applied at the beginning or end of a line,
text always fits flush with the edge of the block.
p { letter-spacing: 1em; }
<p>abc</p>
a b c
a b c
UAs therefore must not append letter spacing to the right or trailing edge of a line:
a b c
Letter spacing between two typographic character units effectively “belongs”
to the innermost element that contains the two typographic character units:
the total letter spacing between two adjacent typographic character units (after bidi reordering)
is specified by and rendered within
the innermost element that contains the boundary between the two typographic character units.
letter-spacing/letter-spacing-bidi-002.html
letter-spacing/letter-spacing-bidi-004.xht
letter-spacing/letter-spacing-bidi-005.xht
letter-spacing/letter-spacing-nesting-001.html
letter-spacing/letter-spacing-nesting-002.html
letter-spacing/letter-spacing-nesting-003.xht
letter-spacing/letter-spacing-203.html
letter-spacing/letter-spacing-205.html
A given value of 'letter-spacing' only affects the spacing
between characters completely contained within the element for which it is specified:
An inline box only includes
letter spacing between characters completely contained within that element:
p { letter-spacing: 1em; }
<p>a<span>bb</span>c</p>
a b b c
It is incorrect to include the letter spacing on the right or trailing edge of the element:
a b b c
Letter spacing is inserted after RTL reordering,
so the letter spacing applied to the inner span below has no effect,
since after reordering the "c" doesn't end up next to "א":
p { letter-spacing: 1em; }
span { letter-spacing: 2em; }
<!-- abc followed by Hebrew letters alef (א), bet (ב) and gimel (ג) -->
<!-- Reordering will display these in reverse order. -->
<p>ab<span>cא</span>בג</p>
a b cא ב ג
Letter spacing ignores invisible zero-width formatting characters
(such as those from the Unicode Cf category).
Spacing must be added as if those characters did not exist in the document.
letter-spacing/letter-spacing-control-chars-001.html
letter-spacing/letter-spacing-202.html
For example, 'letter-spacing' applied to
A​B is identical to AB,
regardless of where any element boundaries might fall.
When the effective spacing between two characters is not zero
(due to either justification
or a non-zero value of 'letter-spacing'),
user agents should not apply optional ligatures,
i.e. those that are not defined as required
for fundamentally correct glyph shaping.
However, ligatures and other font features
specified via the low-level 'font-feature-settings' property
take precedence over this rule.
See [[css-fonts-3#feature-precedence]].
For example, if the word “filial” is letter-spaced,
an “fi” ligature should not be used
as it will prevent even spacing of the text.
filial vs filial
Note: In OpenType, required ligatures are expected
to be associated to the rlig feature.
All other ligatures are therefore considered optional.
In some cases, however, UA or platform heuristics
apply additional ligatures in order to handle broken fonts;
this specification does not define or override such exceptional handling.
Cursive Scripts
If it is able, the UA may apply letter spacing to cursive scripts
by translating the total extra space to be distributed to a run of such letters
into some form of cursive elongation (or compression, for negative tracking values) for that run
that results in an equivalent total expansion (or compression) of the run.
Otherwise, if the UA cannot expand text from a cursive script
without breaking its cursive connections,
it must not apply spacing
between any pair of that script's typographic letter units at all
(effectively treating each word as a single typographic letter unit
for the purpose of letter-spacing).
Both cases will result in an effective spacing of zero between such letters;
however the former will preserve the sense of stretching out the text.
Below are some appropriate and inappropriate examples of spacing out Arabic text.
—
Original text
BAD
Even distribution of space between each letter.
Notice this breaks cursive joins!
OK
Distributing ∑letter-spacing by typographically-appropriate cursive elongation.
The resulting text is as long as the previous evenly-spaced example.
OK
Suppressing 'letter-spacing' between Arabic letters.
Notice 'letter-spacing' is nonetheless applied to non-Arabic characters (like [=spaces=]).
BAD
Applying 'letter-spacing' only between non-joined letters.
This distorts typographic color and obfuscates word boundaries.
Note: Proper cursive elongation or compression of a text
can vary depending on the
script, typeface, language,
location within a word, location within a line,
implementation complexity, font capabilities,
and calligraphic preferences,
and may not be possible in certain cases at all.
It may involve the use of shortening ligatures,
swash variants, contextual forms,
elongation glyphs such as U+0640 ARABIC TATWEEL,
or other microtypography.
It is outside the scope of CSS to define rules for these effects.
Authors should avoid applying 'letter-spacing' to cursive scripts
unless they are prepared to accept non-interoperable results.
Shaping Across Element Boundaries
Text shaping must be broken at inline box boundaries
when any of the following are true
for any box whose boundary separates the two typographic character units:
* Any of 'margin'/'border'/'padding'
separating the two typographic character units in the inline axis
is non-zero.
shaping/shaping-009.html
shaping/reference/shaping-009-ref.html
shaping/shaping-010.html
shaping/reference/shaping-010-ref.html
shaping/shaping-011.html
shaping/reference/shaping-011-ref.html
boundary-shaping/boundary-shaping-003.html
boundary-shaping/boundary-shaping-004.html
boundary-shaping/boundary-shaping-005.html
boundary-shaping/boundary-shaping-007.html
boundary-shaping/boundary-shaping-009.html
* 'vertical-align' is not ''vertical-align/baseline''.
boundary-shaping/boundary-shaping-002.html
boundary-shaping/boundary-shaping-006.html
* The boundary is a bidi isolation boundary.
shaping/shaping-012.html
shaping/reference/shaping-012-ref.html
shaping/shaping-013.html
boundary-shaping/boundary-shaping-008.html
Text shaping must not be broken across inline box boundaries
when there is no effective change in formatting,
or if the only formatting changes do not affect the glyphs
(as in applying text decoration).
text-transform/text-transform-shaping-001.html
text-transform/text-transform-shaping-002.html
text-transform/text-transform-shaping-003.html
shaping/shaping-000.html
shaping/reference/shaping-000-ref.html
shaping/shaping-004.html
shaping/shaping-005.html
shaping/shaping-006.html
shaping/shaping-007.html
shaping/shaping-014.html
shaping/reference/shaping-014-ref.html
shaping/shaping-016.html
shaping/reference/shaping-016-ref.html
shaping/shaping-022.html
shaping/reference/shaping-022-ref.html
shaping/shaping-025.html
shaping/reference/shaping-025-ref.html
shaping/shaping_lig-000.html
boundary-shaping/boundary-shaping-001.html
boundary-shaping/boundary-shaping-010.html
Text shaping should not be broken across inline box boundaries otherwise,
if it is reasonable and possible for that case given the limitations of the font technology.
shaping/shaping-001.html
shaping/reference/shaping-001-ref.html
shaping/shaping-002.html
shaping/reference/shaping-002-ref.html
shaping/shaping-003.html
shaping/reference/shaping-003-ref.html
shaping/shaping-008.html
shaping/reference/shaping-008-ref.html
shaping/shaping-017.html
shaping/shaping-018.html
shaping/shaping-020.html
shaping/reference/shaping-020-ref.html
shaping/shaping-021.html
shaping/reference/shaping-021-ref.html
shaping/shaping-023.html
shaping/reference/shaping-023-ref.html
shaping/shaping-024.html
shaping/reference/shaping-024-ref.html
boundary-shaping/boundary-shaping-009.html
An example of reasonable and possible shaping across boundaries
is Arabic shaping:
in many systems this is performed by the font engine,
allowing the font to provide variant glyphs
with potentially very sophisticated contextual shaping.
It's not generally possible to rely on this system across a font change
unless the font engine has an API to provide context,
but it is straightforward and therefore quite reasonable
for an engine to work around this limitation by, for example,
using the zero-width-joiner (U+200D) or zero-width-non-joiner (U+200C)
as appropriate to solicit the correct choice of
initial/medial/final/isolated glyph.
An example of possible but not reasonable shaping across boundaries
is handling a font that is sensitive to 20 characters of context
on either side to choose its glyphs:
passing all the text before and after the string in question,
even through multiple inline boundaries with formatting changes,
is complicated.
The UA could handle such cases,
but is not required to,
as they are not typical or fundamentally required
by any modern writing system.
An example of impossible shaping accross boundaries
is a change in font weight partway through the word “and”
in a font where a ligature would replace
all three letters of the word “and”
with an ampersand glyph (“&”).
Edge Effects
Edge effects control
the indentation of lines with respect to other lines in the block ('text-indent')
and how content is measured at the start and end edges of a line ('hanging-punctuation').
First Line Indentation: the 'text-indent' property
Name: text-indent
Value: [ <> ] && hanging? && each-line?
Initial: 0
Applies to: block containers
Inherited: yes
Percentages: refers to block container’s own inline-axisinner size
Computed value: computed <> value, plus any specified keywords
Animation type: by computed value type
Canonical order: per grammar
This property specifies the indentation applied to lines of inline
content in a block. The indent is treated as a margin applied to
the start edge of the line box.
text-indent-010.xht
text-indent-012.xht
text-indent-013.xht
text-indent-115.xht
text-indent-on-blank-line-rtl-left-align.html
text-indent-overflow-001.xht
text-indent-overflow-002.xht
text-indent-overflow-003.xht
text-indent-overflow-004.xht
text-indent-rtl-001.xht
text-indent-rtl-002.xht
Unless otherwise specified by the ''each-line'' and/or hanging keywords,
only lines that are the
first formatted line [[!CSS2]]
of an element are affected.
For example, the first line of an anonymous block box is only affected
if it is the first child of its parent element.
text-indent-014.xht
text-indent-wrap-001.xht
Values have the following meanings:
<length>
Gives the amount of the indent as an absolute length.
Gives the amount of the indent as a percentage of
the block container’s own logical width
text-indent-011.xht
text-indent-100.xht
text-indent-101.xht
text-indent-102.xht
text-indent-103.xht
text-indent-104.xht
text-indent-percent-001.xht
Percentages must be treated as ''0'' for the purpose of calculating [=intrinsic size contributions=],
but are always resolved normally when performing layout.
text-indent/percentage-value-intrinsic-size.html
text-indent/text-indent-percentage-001.xht
text-indent/text-indent-percentage-002.html
text-indent/text-indent-percentage-003.html
text-indent/text-indent-percentage-004.html
Note: This can lead to the element overflowing.
It is not recommended to use percentage indents and intrinsic sizing toghether.
each-line
Indentation affects the first line of each block container
and each line after a forced line break
(but not lines after a soft wrap break).
hanging
Inverts which lines are affected.
If 'text-align' is ''start'' and 'text-indent' is ''5em'' in
left-to-right text with no floats present, then first line of text
will start 5em into the block:
Since CSS1 it has been possible to
indent the first line of a block element
5em by setting the 'text-indent' property
to '5em'.
If we add the ''text-indent/hanging'' keyword,
then the first line will start flush,
but other lines will be indented 5em:
In CSS3 we can instead indent all other
lines of the block element by 5em
by setting the 'text-indent' property
to 'hanging 5em'.
Since the 'text-indent' property only affects the “first formatted line”,
a line after a forced break will not be indented.
For example, in the middle of
this paragraph is an equation,
which is centered:
x + y = z
The first line after the equation
is flush (else it would look like
we started a new paragraph).
However, sometimes (as in poetry or code),
it is appropriate to indent each line
that happens to be long enough to wrap.
In the following example, 'text-indent'
is given a value of ''3em hanging each-line'',
giving the third line of the poem a hanging indent
where it soft-wraps at the block's right boundary:
In a short line of text
There need be no wrapping,
But when we go on and on and on
and on,
Sometimes a soft break
Can help us stay on the page.
Note: Since the 'text-indent' property inherits,
when specified on a block element, it will affect descendant
inline-block elements.
For this reason, it is often wise to specify 'text-indent: 0' on
elements that are specified 'display: inline-block'.
Hanging Glyphs
When a glyph at the start or end edge of a line hangs,
it is not considered
when measuring the line's contents for fit, alignment, or justification.
Depending on the line's alignment/justification, this can
result in the mark being placed outside the line box.
The [=hanging=] glyph is also not taken into account
when computing intrinsic sizes (min-content size and max-content size),
and any sizes derived thereof.
(The interaction of this measurement and kerning is currently UA-defined;
the CSSWG welcomes advice on this point.)
A hanging glyph
is still enclosed inside its parent inline box
and still participates in text justification:
its character advance is just not measured when determining
how much content fits on the line,
how much the line's contents need to be expanded or compressed for justification,
or how to position the content within the line box for text alignment.
Effectively, the hanging glyph character advance
is re-interpreted as an additional negative margin
on the affected edge of its parent inline box;
the line is otherwise laid out as usual.
An overflowing [=hanging glyph=] should typically be considered
[=ink overflow=] [[!CSS-OVERFLOW-3]]
so as to avoid creating unnecessary scrollbars,
but the UA may treat it as [=scrollable overflow=]
when the content is editable
or in other circumstances where treating it as [=scrollable overflow=]
would be useful to the user.
hanging-punctuation/hanging-scrollable-001.html
white-space/white-space-intrinsic-size-019.html
white-space/white-space-intrinsic-size-020.html
In some cases, a glyph at the end of a line
can conditionally hang:
it [=hangs=] only if it does not otherwise fit in the line prior to justification.
It is not considered when measuring the line’s contents for fit;
however, any part of it that does not fit
is considered to [=hang=].
Glyphs that [=conditionally hang=] are not taken into account
when computing [=min-content sizes=]
and any sizes derived thereof,
but they are taken into account for [=max-content sizes=]
and any sizes derived thereof.
white-space/white-space-intrinsic-size-013.html
white-space/white-space-intrinsic-size-014.html
white-space/white-space-intrinsic-size-017.html
Non-zero inline-axis borders or padding between
a hangable glyph and the edge of the line prevent the glyph from hanging.
For example, a period at the end of an inline box with end padding
does not hang at the end edge of a line.
Multiple adjacent glyphs can hang together,
however there may be limits on how many are allowed to hang
(e.g. at most one punctuation character may hang at each edge of the line).
Hanging Punctuation: the 'hanging-punctuation' property
Name: hanging-punctuation
Value: none | [ first || [ force-end | allow-end ] || last ]
Initial: none
Applies to: inline boxes
Inherited: yes
Canonical order: per grammar
Computed value: specified keyword(s)
Animation type: discrete
This property determines whether a punctuation mark, if one is present,
hangs and may be placed outside the line box (or in the indent)
at the start or at the end of a line of text.
Note: If there is not sufficient padding on the
block container, 'hanging-punctuation' can trigger overflow.
An opening bracket or quote at the start of the
first
formatted line of an element hangs.
This applies to all characters in the Unicode categories Ps, Pf, Pi
plus the ASCII quote marks “'” U+0027 and “"” U+0022.
hanging-punctuation/hanging-punctuation-first-001.xht
last
A closing bracket or quote at the end of the
last formatted line of an element hangs.
This applies to all characters in the Unicode categories Pe, Pf, Pi
plus the ASCII quote marks “'” U+0027 and “"” U+0022.
hanging-punctuation/hanging-punctuation-last-001.xht
force-end
A stop or comma at the end of a line hangs.
hanging-punctuation/hanging-punctuation-force-end-001.xht
The UA may include other characters as appropriate.
Note: The CSS Working Group would appreciate if UAs including
other characters would inform the working group
of such additions.
The ''allow-end'' and ''force-end'' are two variations of
hanging punctuation used in East Asia.
p {
text-align: justify;
hanging-punctuation: allow-end;
}
p {
text-align: justify;
hanging-punctuation: force-end;
}
The punctuation at the end of the first line for ''allow-end''
does not hang, because it fits without hanging.
However, if ''force-end'' is used, it is forced to hang.
The justification measures the line without the hanging punctuation.
Therefore when the line is expanded, the punctuation is pushed outside the line.
Bidirectionality and Line Boxes
The start and end edges of a line box
are determined by the inline base direction of the line box.
In most cases, this is given by its containing block's computed 'direction'.
However if its containing block has ''unicode-bidi: plaintext'' [[!CSS-WRITING-MODES-3]],
the line box's inline base direction must be determined
by the inline base direction of the bidi paragraph to which it belongs:
that is, the bidi paragraph for which the line box holds content.
An empty line box
(i.e. one that contains no atomic inlines or
characters other than the line-breaking character, if any),
takes its inline base direction from the preceding line box (if any), or,
if this is the first line box in the containing block,
then from the 'direction' property of the containing block.
In the following example, assuming the <block>
is a preformatted block (''display: block; white-space: pre'') inheriting
''text-align: start'', every other line is right-aligned:
<block style="unicode-bidi: plaintext">
Latin
و·کمی
Latin
و·کمی
Latin
و·کمی
</block>
Note: The inline base direction determined here
applies to the line box itself, and not to its contents.
It affects 'text-align-all', 'text-align-last', 'text-indent', and 'hanging-punctuation',
i.e. the position and alignment of its contents with respect to its edges.
It does not affect the formatting or ordering of its content.
The result should be a left-aligned line looking like this:
"!שלום", he said.
The line is left-aligned
(despite the containing block having ''direction: rtl'')
because the containing block (the <para>) has ''unicode-bidi:plaintext'',
and the line box belongs to a bidi paragraph that is LTR.
This is because that paragraph's first character with a strong direction
is the LTR "h" from "he". The RTL "שלום!" does precede the "he",
but it sits in its own bidi-isolated paragraph that is not
immediately contained by the <para>,
and is thus irrelevant to the line box's alignment.
From from the standpoint of the bidi paragraph immediately contained
by the <para> containing block,
the <quote>’s bidi-isolated paragraph inside it is,
by definition, just a neutral U+FFFC character,
so the immediately-contained paragraph becomes LTR by virtue
of the "he" following it.
As expected, the "Hello!" should be displayed LTR
(i.e. with the exclamation mark on the right end,
despite the <textarea>’s ''direction:rtl'')
and left-aligned.
This makes the empty line following it left-aligned as well,
which means that the caret on that line should appear at its
left edge. The first empty line, on the other hand, should
be right-aligned, due to the RTL direction of its containing
paragraph, the <textarea>.
Appendix A:
Text Processing Order of Operations
The following list defines the order of text operations.
(Implementations are not bound to this order as long as the resulting layout is the same.)
This appendix is normative for the purpose of plaintext copy-paste operations.
When a CSS-rendered document is converted to a plaintext format,
it is expected that:
The 'text-transform' property has no effect.
[[#white-space-phase-1]] is applied and any sequence of
collapsible [=white space=] at the beginning of a block
or immediately following a forced line break is removed.
Appendix C: Default UA Stylesheet
This appendix is informative,
and is to help UA developers to implement a default stylesheet for HTML,
but UA developers are free to ignore or modify as appropriate.
/* make list items and option elements align together */
li, option { text-align: match-parent; }
If you find any issues, recommendations to add, or corrections,
please send the information to www-style@w3.org
with [css-text] in the subject line.
Appendix D: Scripts and Spacing
This appendix is normative.
Typographic behavior varies somewhat by language, but varies drastically by writing system.
This appendix categorizes some common scripts in Unicode 6.0
according to their justification and spacing behavior.
Category descriptions are descriptive, not prescriptive;
the determining factor is the prioritization of justification opportunities.
block scripts
CJK and by extension all Wide characters (see [[!UAX11]].)
The following Unicode scripts are included:
Bopomofo, Han, Hangul, Hiragana, Katakana, and Yi.
Characters of the East Asian Width propertyWide and Fullwidth are also included,
but Ambiguous characters are included only if
the writing system is
Chinese, Korean, or Japanese.
clustered scripts
Clustered scripts have discrete units
and break only at word boundaries,
but do not use visible word separators.
They prioritize stretching spaces,
but comfortably admit inter-character spacing for justification.
The clustered scripts include, but are not limited to, the following Unicode scripts:
Khmer,
Lao,
Myanmar,
New Tai Lue,
Tai Le,
Tai Tham,
Tai Viet,
Thai
cursive scripts
Cursive scripts do not admit gaps between their letters for either justification or 'letter-spacing'.
The following Unicode scripts are included:
Arabic,
Mandaic,
Mongolian,
N'Ko,
Phags Pa,
Syriac
User agents should update this list as they update their Unicode support
to handle as-yet-unencoded cursive scripts in future versions of Unicode,
and are encouraged to ask the CSSWG to update this spec accordingly.
Should block and cluster scripts be merged?
They have different tolerances for space-justification vs inter-character justification,
but both admit both.
Appendix E.
Characters and Properties
Unicode defines four code point-level properties that are referenced
in CSS typesetting:
Defined in [[!UAX24]] and given as the Script property
in the Unicode Character Database [[!UAX44]].
(UAs must include any ScriptExtensions.txt assignments in this mapping.)
Defined in [[!UTR50]] as the Vertical_Orientation property
and given in the UTR50 data file.
Unicode defines properties for individual code points, but sometimes
it is necessary to determine the properties of a typographic character unit.
For the purposes of CSS Text,
the properties of a typographic character unit are given by
the base character of its first grapheme cluster—except in two cases:
Grapheme clusters formed with an Enclosing Mark (Me) of the Common script
are considered to be Other Symbols (So) in the Common script.
They are assumed to have the same Unicode properties as the Replacement Character U+FFFD.
Grapheme clusters formed with a Space Separator (Zs) as the base
are considered to be Modifier Symbols (Sk).
They are assumed to have the same East Asian Width property as the base,
but take their other properties from the first combining character in the sequence.
Appendix F.
Space-Discarding Unicode Characters
This appendix is normative.
Characters belonging to the space-discarding character set
bias adjacent [=segment breaks=] towards
being discarded rather than being transformed into U+0020 SPACE
when collapsing white space, see [[#line-break-transform]].
Characters from the following blocks in Unicode 13.0 [[UNICODE]]
are considered part of the [=space-discarding character set=]:
Space-discarding Unicode Bocks
Codepoint Range
Block Name
U+2E80..U+2EFF
CJK Radicals Supplement
U+2F00..U+2FDF
Kangxi Radicals
U+2FF0..U+2FFF
Ideographic Description Characters
U+3000..U+303F
CJK Symbols and Punctuation
U+3040..U+309F
Hiragana
U+30A0..U+30FF
Katakana
U+3130..U+318F
Kanbun
U+3190..U+319F
Kanbun
U+31C0..U+31EF
CJK Strokes
U+31F0..U+31FF
Katakana Phonetic Extensions
U+3300..U+33FF
CJK Compatibility
U+3400..U+4DBF
CJK Unified Ideographs Extension A
U+4E00..U+9FFF
CJK Unified Ideographs
U+A000..U+A48F
Yi Syllables
U+A490..U+A4CF
Yi Radicals
U+F900..U+FAFF
CJK Compatibility Ideographs
U+FE10..U+FE1F
Vertical Forms
U+FE30..U+FE4F
CJK Compatibility Forms
U+FE50..U+FE6F
Small Form Variants
U+FF00..U+FFEF
Halfwidth and Fullwidth Forms
U+1B000..U+1B0FF
Kana Supplement
U+1B100..U+1B12F
Kana Extended-A
U+1B130..U+1B16F
Small Kana Extension
U+20000..U+2A6DF
CJK Unified Ideographs Extension B
U+2A700..U+2B73F
CJK Unified Ideographs Extension C
U+2B740..U+2B81F
CJK Unified Ideographs Extension D
U+2B820..U+2CEAF
CJK Unified Ideographs Extension E
U+2CEB0..U+2EBEF
CJK Unified Ideographs Extension F
U+2F800..U+2FA1F
CJK Compatibility Ideographs Supplement
U+30000..U+3134F
CJK Unified Ideographs Extension G
For future revisions of [[UNICODE]],
any new block whose initially-allocated contents comprise
at least 50% codepoints belonging to the
Han, Hiragana, Katakana, or Yi script
shall also be considered part of the [=space-discarding character set=].
Wherefore this table of “space-discarding characters”?
The purpose of the [[#line-break-transform|segment break transformation rules]]
is to “unbreak” text that has been formatted
with extra white space for source code readability,
see [[#line-break-transform]].
In most cases, “unbreaking” a line of text requires joining them with a space,
but some writing systems don't use spaces
so such texts need to be joined without any space.
CSS uses the characters before and after to determine
whether to join lines with or without a space.
For simplicity and for ease of implementation,
the classification of characters as space-discarding or space-preserving
is done by Unicode code block.
Ideally, such a list would be maintained in [[UNICODE]],
but the Unicode Technical Committee has yet
to express any intention of taking on this task.
In the meantime, in the interest of bringing
more of the text-processing facilities of CSS and HTML
that are available to Western writing systems
to Eastern writing systems as well,
the CSSWG is maintaining this appendix
and refining the rules in [[#line-break-transform]],
and hopes that in the future,
once CSS has demonstrated its viability,
the Unicode Consortium will recognize the need for an “unbreaking” algorithm
and take over maintenance of such.
Appendix G.
Identifying the Content Writing System
This appendix is normative.
While most languages have a preferred writing system,
some have multiple, and
most can also be transcribed into one or more foreign writing systems.
As a common example, most languages have at least one Latin transcription,
and can thus be written in the Latin writing system.
Transcribed texts typically adopt the typographic conventions of the writing system:
for example Japanese “romaji” and Chinese Pinyin use Latin letters and word spaces,
and follow Latin line-breaking and justification practices accordingly.
As another example, historical ideographic Korean
(ko-Hani)
does not use word spaces,
and should therefore be typeset similar to Chinese
rather than modern Korean.
In [[HTML]] or any other document language using [[BCP47]] to declare the [=content language=],
authors can disambiguate or indicate the use of an atypical writing system
with script subtags.
For example, to indicate use of the Latin writing system
for languages which don't natively use it,
the -Latn script subtag can be added,
e.g. ja-Latn for Japanese romaji.
Other subtags exist for other writing systems,
see [[ISO15924]] and the ISO15924 script tag registry.
Some common/historical examples of using [[BCP47]] tags with script subtags:
zh-Latn
Chinese, written in Latin transcription.
ko-Hani
Korean, written in Hanja (Chinese ideographic characters).
tr-Arab
Turkish, written in Arabic script.
mn-Cyrl
Mongolian, written in Cyrillic.
mn-Mong
Mongolian, written in traditional Mongolian script.
However, [[BCP47]] script subtags are not typically used
(and are in fact discouraged)
for languages strongly associated with a single writing system:
instead that writing system is expected to be implied
when no other is specified.
IANA maintains a database of various languages’ most common writing system
via the Suppress-Script field in its
language subtag registry
for this purpose.
Note: More advice on language tagging can be found in
the Internationalization Working Group’s
“Language tags in HTML and XML”
and “Choosing a Language Tag”.
When no writing system is explicitly indicated,
UAs should assume the most common writing system
of the declared content language
for language-sensitive typographic behaviors
such as line-breaking or justification.
However, UAs must not assume that writing system
if the author has explicitly declared a different one.
If the UA has no language-specific knowledge
of a particular language and writing system combination,
it must use the typographic conventions of the declared writing system
(assuming the conventions of a different language if necessary),
not the conventions of the declared language in an assumed writing system,
which would be inappropriate to the declared writing system.
writing-system/writing-system-font-001.html
writing-system/writing-system-text-transform-001.html
writing-system/writing-system-segment-break-001.html
writing-system/writing-system-line-break-001.html
writing-system/writing-system-line-break-002.html
The full correspondence between languages and their most common writing systems
is out of scope for this document.
However, User Agents must assume at least the following:
* If the [=content language=] is Chinese and the [=writing system=] is unspecified,
or for any [=content language=] if the [=writing system=] to specified to be one of the ''Hant'', ''Hans'', ''Hani'', ''Hanb'', or ''Bopo'' [[ISO15924]] codes,
then the [=writing system=] is Chinese.
* If the [=content language=] is Japanese and the [=writing system=] is unspecified,
or for any [=content language=] if the [=writing system=] to specified to be one of the ''Jpan'', ''Hrkt'', ''Hira'' or ''Kana'' [[ISO15924]] codes,
then the [=writing system=] is Japanese.
writing-system/writing-system-line-break-002.html
writing-system/writing-system-segment-break-001.html
* If the [=content language=] is Korean and the [=writing system=] is unspecified,
or for any [=content language=] if the [=writing system=] to specified to be one of the ''Kore'', ''Hang'', or ''Jamo'' [[ISO15924]] codes,
then the [=writing system=] is Korean.
* The [=writing system=] is only considered to be unknown
if the [=content language=] itself is unknown,
or if it explicitly indicates an unknown writing system.
Note: Mere omission of the [=writing system=] information when the [=content language=] is declared
means the that the [=writing system=] is implied, not unknown.
This specification introduces no new security considerations.
This specification leaks the user's installed hyphenation and line-breaking dictionaries.
Acknowledgements
This specification would not have been possible without the help from:
Ayman Aldahleh, Bert Bos, Tantek Çelik, James Clark, Emilio Cobos Álvarez, Stephen Deach, John Daggett,
Martin Dürst,
Laurie Anna Edlund, Ben Errez, Yaniv Feinberg, Arye Gittelman, Ian
Hickson, Martin Heijdra, Richard Ishida, Masayasu Ishikawa,
Michael Jochimsen, Eric LeVine, Ambrose Li, Håkon Wium Lie, Chris Lilley,
Ken Lunde, Myles Maxfield, Nat McCully, IM Mincheol, Shinyu Murakami, Paul Nelson,
Chris Pratley, Xidorn Quan, Marcin Sawicki,
Arnold Schrijver, Rahul Sonnad, Alan Stearns, Michel Suignard, Takao Suzuki,
Frank Tang, Chris Thrasher, Etan Wexler, Chris Wilson, Masafumi Yabe
and Steve Zilles.
* Clarify that ''hyphens: none'' does not suppress wrapping opportunities after U+002D or U+2010.
* Fix stray text leftover when removing percentages as a possible value for word-spacing.
* Add support for ''word-break: break-word'' as a deprecated value, as needed for compatibility with this previously proprietary syntax.
* Use ''allow-end'' hanging rules rather than ''force-end'' rules for trailing white space;
define how it impacts intrinsic size contributions of text.
(Issue 3440)
* Generalize the logic that handles spaces at the end of lines to
sequences of Unicode Zs spaces (except NBSP), not just of Space U+0020.
* Clarify the interaction of collapslible spaces at the end of a line and bidi.
Changes between 6 December 2018 and 12 December 2018 consist of
some minor cleanup in the line breaking section
and deferring <> values of 'word-spacing' to Level 4
with the expectation that they will be redefined
(see discussion in issue 2165).