Skip to content

[css-text-3] Insufficient normative reference to UAX14 for the ID line breaking class #567

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
frivoal opened this issue Oct 4, 2016 · 4 comments
Assignees
Labels
Closed Rejected as Wontfix by CSSWG Resolution Commenter Satisfied Commenter has indicated satisfaction with the resolution / edits. css-text-3 Current Work i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on. Tested Memory aid - issue has WPT tests Tracked in DoC

Comments

@frivoal
Copy link
Collaborator

frivoal commented Oct 4, 2016

Css-text-3 refers normatively to UAX14 in a few places, including:

  • “[..] BK, CR, LF, CM, NL, and SG line breaking classes in [UAX14] must be honored.”
  • “[...] WJ, ZW, and GL line-breaking classes in [UAX14] must be honored”
  • “The line breaking behavior of a replaced element or other atomic inline is equivalent to an ideographic character (Unicode linebreaking class ID [UAX14]) [...]”
  • “[...] any typographic character units resolving to the NU (“numeric”), AL (“alphabetic”), or SA (“Southeast Asian”) line breaking classes [UAX14] are instead treated as ID (“ideographic characters”) for the purpose of line-breaking.”

However, I cannot find any normative reference that requires the line-breaking behavior for characters with the line breaking class ID in UAX14 (Ideographic characters) to be honored, either directly or as part of a broader claim.

The 3rd and 4th bullets above suggest that it is expected, since something else is expected to behave like characters with that line breaking class, which doesn't make much sense if no particular behavior is expected of that class. Also, the design of the break-word: normal implicitly depends on this behavior being honored.

The following paragraph in section 5 also indicates that this behavior is expected, but this sentence reads like informative prose, or at least seems too vague to be effectively testable.

In several other writing systems, (including Chinese, Japanese, Yi, and sometimes also Korean) a soft wrap opportunity is based on syllable boundaries, not word boundaries. In these systems a line can break anywhere except between certain character combinations. Additionally the level of strictness in these restrictions can vary with the typesetting style.

The spec does (normatively) state that

CSS does not fully define where soft wrap opportunities occur

and (informatively) that

Further information on line breaking conventions can be found in [JLREQ] and [JIS4051] for Japanese, [ZHMARK] for Chinese, and in [UAX14] for all scripts in Unicode.

and for sure, the full logic of where soft wrap opportunities should go is complex and impractical to specify, but without going into the full gory details of opening and closing punctuation and non-starter characters etc, it should be possible to ensure that at least the general case works out as expected.

I think we should add a bullet point to section 5.1 "Line breaking details. Maybe something like:

  • When the white-space property allows wrapping, there is a soft wrap opportunity between pairs of characters with the ID line breaking class (see [!UAX14]). Additionally, there is a soft wrap opportunity before (and respectively after) characters with the ID line breaking class, unless the preceding (respectively following) character has the WJ or GL line breaking class (see [!UAX14]), or otherwise forbids breaks as determined by the line-break property.

This still leaves some wiggle room since the line-break property itself doesn't define exhaustive rules, but I think this should give a decent baseline requirement.

EDIT: typos

@frivoal frivoal added the css-text-3 Current Work label Oct 4, 2016
@r12a r12a added the i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on. label Oct 14, 2016
@frivoal
Copy link
Collaborator Author

frivoal commented Oct 31, 2016

Agenda+ing because this should be addressed in the disposition of comments for css-text-3.

@kojiishi
Copy link
Contributor

kojiishi commented Oct 31, 2016

The proposed text looks like a partial copy of UAX 14, I think we should avoid copying the logic.

I'm fine to say "ID must be honored", which I suppose should suffice your request, but I think @fantasai had a strong opinion on which class to include and which not to, so I'd like to see what she says.

@frivoal
Copy link
Collaborator Author

frivoal commented Oct 31, 2016

If "ID must be honored" is acceptable, I'd be OK with that.

I was worried that it wouldn't be, because the rules in UAX14 that pertains to ID characters are not well contained, and defining the interaction between ID characters and other types of characters when they're adjacent pulls in a large part of the rest of the spec, and at the same time, that doesn't resolve all ambiguities, and yet still defines a specific behavior for kinsoku-shori, which is something we were trying to leave partly undefined, partly influenced by the line-break property.

The text I proposed was an attempt to extract from UAX14 the key part of what makes line breaking for ID characters work the way it should, and defer the rest to the line-break property (which itself leaves large parts intentionally undefined).

But if you think that's overkill or fragile, and that we can just point to UAX14 and require that ID line breaking be honored, then great.

@frivoal
Copy link
Collaborator Author

frivoal commented Nov 2, 2016

From the teleconf:

Not normatively requiring UAX14 is intentional, because the rules are not only complex, but also not ideal. They are a good baseline, but not necessarily something that must be followed to the letter.

The argument against inlining into our spec some simplified statements like the one I proposed is that including such rules would be a significant scope increase to css-text, and not one we are well equipped to handle.

However, the spec does have a requirement that customary line breaking rules be followed. This is vague, but it is normative, and by group consensus can be used to justify certain obvious line breaking behaviors in tests, even if they are not explicitly spelled out individually in the spec. For instance, this is sufficient to expect wrapping opportunities between two kanji, and to depend on this in tests.

I'll use this to simplify the tests I proposed in w3c/csswg-test#1135.

@frivoal frivoal added Commenter Satisfied Commenter has indicated satisfaction with the resolution / edits. Closed Rejected as Wontfix by CSSWG Resolution and removed Agenda+ labels Nov 2, 2016
@frivoal frivoal closed this as completed Nov 2, 2016
frivoal added a commit to frivoal/csswg-test that referenced this issue Nov 3, 2016
Assuming that there are wrapping opportunities between ideographic characters
allows a few tests to be simplified. Although this is not explicitly
mentioned in the specification, it is still normatively required.

See
w3c/csswg-drafts#567 (comment)
for rationale.
@frivoal frivoal added the Tested Memory aid - issue has WPT tests label Apr 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closed Rejected as Wontfix by CSSWG Resolution Commenter Satisfied Commenter has indicated satisfaction with the resolution / edits. css-text-3 Current Work i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on. Tested Memory aid - issue has WPT tests Tracked in DoC
Projects
None yet
Development

No branches or pull requests

4 participants