Skip to content

[css-text-3] Insufficient normative reference to UAX14 for the ID line breaking class #567

Closed
@frivoal

Description

@frivoal

Css-text-3 refers normatively to UAX14 in a few places, including:

  • “[..] BK, CR, LF, CM, NL, and SG line breaking classes in [UAX14] must be honored.”
  • “[...] WJ, ZW, and GL line-breaking classes in [UAX14] must be honored”
  • “The line breaking behavior of a replaced element or other atomic inline is equivalent to an ideographic character (Unicode linebreaking class ID [UAX14]) [...]”
  • “[...] any typographic character units resolving to the NU (“numeric”), AL (“alphabetic”), or SA (“Southeast Asian”) line breaking classes [UAX14] are instead treated as ID (“ideographic characters”) for the purpose of line-breaking.”

However, I cannot find any normative reference that requires the line-breaking behavior for characters with the line breaking class ID in UAX14 (Ideographic characters) to be honored, either directly or as part of a broader claim.

The 3rd and 4th bullets above suggest that it is expected, since something else is expected to behave like characters with that line breaking class, which doesn't make much sense if no particular behavior is expected of that class. Also, the design of the break-word: normal implicitly depends on this behavior being honored.

The following paragraph in section 5 also indicates that this behavior is expected, but this sentence reads like informative prose, or at least seems too vague to be effectively testable.

In several other writing systems, (including Chinese, Japanese, Yi, and sometimes also Korean) a soft wrap opportunity is based on syllable boundaries, not word boundaries. In these systems a line can break anywhere except between certain character combinations. Additionally the level of strictness in these restrictions can vary with the typesetting style.

The spec does (normatively) state that

CSS does not fully define where soft wrap opportunities occur

and (informatively) that

Further information on line breaking conventions can be found in [JLREQ] and [JIS4051] for Japanese, [ZHMARK] for Chinese, and in [UAX14] for all scripts in Unicode.

and for sure, the full logic of where soft wrap opportunities should go is complex and impractical to specify, but without going into the full gory details of opening and closing punctuation and non-starter characters etc, it should be possible to ensure that at least the general case works out as expected.

I think we should add a bullet point to section 5.1 "Line breaking details. Maybe something like:

  • When the white-space property allows wrapping, there is a soft wrap opportunity between pairs of characters with the ID line breaking class (see [!UAX14]). Additionally, there is a soft wrap opportunity before (and respectively after) characters with the ID line breaking class, unless the preceding (respectively following) character has the WJ or GL line breaking class (see [!UAX14]), or otherwise forbids breaks as determined by the line-break property.

This still leaves some wiggle room since the line-break property itself doesn't define exhaustive rules, but I think this should give a decent baseline requirement.

EDIT: typos

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions