Skip to content

[css-text] Providing alternative breaking behaviours for Ethiopic #4765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
r12a opened this issue Feb 10, 2020 · 11 comments
Closed

[css-text] Providing alternative breaking behaviours for Ethiopic #4765

r12a opened this issue Feb 10, 2020 · 11 comments
Labels
Closed Accepted as Editorial Commenter Timed Out (Assumed Satisfied) css-text-3 Current Work i18n-elreq Ethiopic language enablement i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. Tested Memory aid - issue has WPT tests Tracked in DoC

Comments

@r12a
Copy link
Contributor

r12a commented Feb 10, 2020

https://www.w3.org/TR/css-text-3/#word-break-property
https://www.w3.org/TR/css-text-3/#line-break-property

Modern Ethiopic text is generally wrapped word by word. If wordspace separators are used, they are wrapped with the word, and should not appear alone at the beginning of a line.

However, older Ethiopic text is generally wrapped wherever it hits the right margin, whether wordspace or space are used to separate words, and no hyphenation occurs.

Observation: It's possible that a rule is sometimes applied to letter-based wrapping that requires a minimum of 2 letters at the end of a line for printed text (as opposed to handwritten manuscripts). This was observed by Daniel Yacob in the book, "ዜናዊ ፓርልማ" from 1953 (1946EC).

Whatever style of wrapping is used, however, punctuation wrapping rules apply, which means that a wordspace separator should not appear at the start of a line, nor various other punctuation, even when letter-by-letter wrapping occurs.

So my question is: how can an Ethiopic author can apply the different wrapping styles to Ethiopic?

My best guess is that line-break:anywhere is not appropriate, since it doesn't respect punctuation rules. However, word-break:break-all may be the right thing, although it doesn't specifically mention punctuation-specific rules. Am i correct? It wasn't abundantly clear from reading the spec.

@frivoal
Copy link
Collaborator

frivoal commented Feb 11, 2020

My best guess is that line-break:anywhere is not appropriate, since it doesn't respect punctuation rules. However, word-break:break-all may be the right thing, although it doesn't specifically mention punctuation-specific rules. Am i correct?

Yes. word-break: break-all will do what you want here, and does not affect punctuation: UAX14 rules define which punctuation characters can / cannot be separated from the previous character by line breaking, and word-break:break-all does not change that. The Ethiopic wordspace will also have the right behavior: It has the BA line-breaking class in UAX 14, making it inseparable from its preceding letter, and it will therefore not be placed at the beginning of a line.

Requiring a minimum of X letters on a line is not addressed, in css-text-3, but it expected to be addressed in css-text-4. See https://drafts.csswg.org/css-text-4/#last-line-limits

It wasn't abundantly clear from reading the spec.

This bit of text, to be found under the definition of the word-break property, should be providing enough context to make the definition of break-all unambiguous in that respect:

This property specifies soft wrap opportunities between letters, i.e. where it is “normal” and permissible to break lines of text. Specifically it controls whether a soft wrap opportunity generally exists between adjacent typographic letter units (and/or non-letter typographic character units belonging to the NU, AL, AI, or ID Unicode line breaking classes [UAX14]). It does not affect rules governing the soft wrap opportunities created by white space (as well as by other space separators) and around punctuation.

I think this is clear enough, but if you have a suggestion for improvement, or a concern about some of it, feedback is very much welcome.

On the other hand, an example about using word-break:break-all to switch between the two Ethiopic behaviors may be a more productive way of illustrating this. Happy to include one if someone can provide me with the right text.

@r12a
Copy link
Contributor Author

r12a commented Feb 11, 2020

Thanks @frivoal. The 'around punctuation' text is a little vague for me. Note also that none of the major browser engines does what you'd expect here. Try changing the width of the bounding box in this test. You'll see that the wordspace wraps to the next line alone. It shouldn't do that, and note in particular that none of them wrap the wordspace to the next line by default (try this test), so i'm assuming that the browser implementers all misunderstood the point here too, since they made a change that does the wrong thing.

Here's some suggested text (inline markup showing here, but just for C&P convenience):

As a final example, in modern use of the Ethiopic script words are surrounded by spaces and usually wrap, unbroken, to the next line. Sometimes, however, Ethiopic may be written with <span class="codepoint" translate="no"><span lang="am">&#x1361;</span> [<span class="uname">U+1361 ETHIOPIC WORDSPACE</span>]</span> rather than a space, and split words while wrapping, with no hyphenation. word-break: break-all can be used for this. Note that applying word-break:break-all doesn't affect the Ethiopic rules for punctuation, which require that there is no line-break opportunity before an Ethiopic wordspace.

I can provide a screen shot of some Amharic text, if you like.

@dyacob
Copy link
Member

dyacob commented Feb 12, 2020

A bichromatic scan of the work that @r12a referenced is here: https://drive.google.com/open?id=1wJm53QevGzAZGBiMHBnGB6oPfhP-sA8B , the copyright declaration is no longer applicable. The work also presents an example of justification in presence of the wordspace.

@frivoal frivoal added the css-text-3 Current Work label Feb 12, 2020
@frivoal frivoal self-assigned this Feb 12, 2020
@himorin himorin added the i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. label Feb 13, 2020
@r12a
Copy link
Contributor Author

r12a commented Feb 26, 2020

Requiring a minimum of X letters on a line is not addressed, in css-text-3, but it expected to be addressed in css-text-4. See https://drafts.csswg.org/css-text-4/#last-line-limits

Clarification on this point: it's not about the last line in a para, but rather about the last word on a (any) line, ie. no word is broken such that only the last character in the word appears at the start of a line. (I vaguely remember hearing about a similar rule recently, but i can't remember which script/language was the context. So this might possibly be a rule that affects other languages than those that use Ethiopic script.)

Screenshot 2020-02-26 at 13 17 04

@astearns
Copy link
Member

I wonder if there's a way we could tie this in to the hyphenate-limit-chars property. The three-value version of the property allows you to say hyphenate-limit-chars: auto 2 auto to avoid a single character before a hyphen. This seems like the same thing, just without the hyphen. Perhaps we need a break-limit-chars property?

@fantasai
Copy link
Collaborator

fantasai commented Apr 1, 2020

@r12a Breaking before Ethiopic Word Space in that test case looks like a mistake. For example, UAs don't break before commas and periods and colons. I think this might just need a WPT testcase and some bugs filed. We can also put an example in the spec, given some sample text.

@astearns Good point, though I think this is a little different than hyphenation in that you can break between any two characters, not a particular lexically-allowed points in the word. It would be nice to re-use the same controls, though.

@fantasai
Copy link
Collaborator

fantasai commented Apr 1, 2020

@r12a While we're on the topic, @r12a can you get ELREQ updated? Last publication is 2016.

@fantasai
Copy link
Collaborator

fantasai commented Apr 1, 2020

Alright, edited in Ethiopic as one of the writing samples for word-break. @r12a Let me know if the text looks correct. I suggest we file break limits as a separate issue.

@r12a
Copy link
Contributor Author

r12a commented May 21, 2020

Thanks @fantasai. I have a small suggestion: add a link to https://www.w3.org/TR/css-text-3/#word-separator on the following text.

Ethiopic similarly has two styles of line-breaking, either only breaking at word separators

That will clarify that we're talking about both spaces and ethiopic word space characters.

@frivoal frivoal closed this as completed in 1feb256 Jun 5, 2020
@frivoal
Copy link
Collaborator

frivoal commented Jun 5, 2020

have a small suggestion: add a link

done: 1feb256

@frivoal frivoal reopened this Jun 5, 2020
@frivoal
Copy link
Collaborator

frivoal commented Jun 5, 2020

@r12a, if you agree to file the break limits as a separate issue (against level 4), I think we're done here and can close. Can you confirm?

frivoal added a commit to frivoal/wpt that referenced this issue Dec 29, 2022
@frivoal frivoal added Tested Memory aid - issue has WPT tests and removed Needs Testcase (WPT) labels Dec 29, 2022
frivoal added a commit to web-platform-tests/wpt that referenced this issue Dec 29, 2022
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Jan 5, 2023
… a=testonly

Automatic update from web-platform-tests
Add tests for word breaking in Ethiopic

See w3c/csswg-drafts#4765

--

wpt-commits: 6555517ac5179b32a26705f2e27f0c8daadc59ec
wpt-pr: 37697
jamienicol pushed a commit to jamienicol/gecko that referenced this issue Jan 13, 2023
… a=testonly

Automatic update from web-platform-tests
Add tests for word breaking in Ethiopic

See w3c/csswg-drafts#4765

--

wpt-commits: 6555517ac5179b32a26705f2e27f0c8daadc59ec
wpt-pr: 37697
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closed Accepted as Editorial Commenter Timed Out (Assumed Satisfied) css-text-3 Current Work i18n-elreq Ethiopic language enablement i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. Tested Memory aid - issue has WPT tests Tracked in DoC
Projects
None yet
Development

No branches or pull requests

6 participants