Skip to content

[css-text] Ability to control hyphenation of proper nouns/capitalised words #5157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
buttercookie42 opened this issue Jun 3, 2020 · 13 comments

Comments

@buttercookie42
Copy link

buttercookie42 commented Jun 3, 2020

As of #3927, the css-text spec has gained the guidance that

In some languages (such as English but not German), it may [be] appropriate to avoid having hyphenation opportunities in mixed case words, as those may indicate proper nouns

and Gecko has since implemented something like that.
That's all very nice, but now I've run precisely into the concerns voiced here and here, namely that the default behaviour of the browser doesn't fit my own requirements [1], but I have currently no way of overriding it.

So as per

Note that systems such as TeX (the \uchyph parameter) and InDesign (the "Hyphenate Capitalized Words" option in paragraph formatting) do expose this question to authors, recognizing that there is not a simple "correct" behavior that the application can universally use.

I'd like to have the same ability in CSS as well.

[1] I have a very narrow column of text, so

  1. I am prepared to accept hyphenation even of proper nouns as preferable to the behaviour of overflow-wrap: break-word
  2. For my content, automatic hyphenation seems to work well enough, so enabling automatic hyphenation and fixing up some possible instances of over-eager hyphenation with a Word Joiner (Edit: or a hyphenation-disabling span) is less work than having to sprinkle around soft hyphens everywhere.
  3. That column is mostly written in Title Case, so any implementation basing its decisions on capitalisation will misdetect all those words as proper nouns and disable hyphenation without me actually wanting to.
@frivoal frivoal added css-text-3 Current Work css-text-4 and removed css-text-3 Current Work labels Jun 3, 2020
@frivoal
Copy link
Collaborator

frivoal commented Jun 4, 2020

Level 3 doesn't force the user agent to behave either way, merely notes that capitalization is one of the thing a UA may consider when deciding to hyphenate or not.

Given that TeX and InDesign offer controls to authors over this, I think it is reasonable to consider the possibility for level 4.

I think that the default will necessarily have to be an "auto" value, to let browsers use the best heuristics they know of, but what should the other values be?

@buttercookie42
Copy link
Author

Something along the lines of auto | always | never possibly?
Certainly always (or some equivalent naming) to always allow automatic hyphenation of words containing capital letters, since that is what I was asking for, but a case could also be made for explicit control in the other direction, i.e. never automatically hyphenate words containing capital letters:

  1. For some languages (like German, where all nouns are capitalised) the default behaviour of auto would probably rather correspond to always, so for manual control you'd also need an opposite option.
  2. @jfkthame mentioned that one possible future refinement of Gecko's default behaviour (corresponding to a future auto value) might be to vary the decision based on line length, i.e. be less strict about not hyphenating capitalised words for short lines. In that case full manual control again would require both always and never values.

@jfkthame
Copy link
Contributor

jfkthame commented Jun 4, 2020

Something like hyphenate-capitalized: auto | always | never sounds good to me. Or maybe shorten it to hyphenate-caps, noting that the form caps is already used in font-variant-caps.

@frivoal
Copy link
Collaborator

frivoal commented Jun 4, 2020

always is probably the wrong term. If the UA wouldn't allow hyphenation in the word for other reasons (too short, not in the dictionary…), this value won't create hyphenation opportunity.
Maybe auto | allow | forbid?

@jfkthame
Copy link
Contributor

jfkthame commented Jun 4, 2020

Sure, I'd be fine with that too (although I didn't feel that always implies hyphenation will suddenly happen to capitalized words that couldn't normally be hyphenated even without caps).

@jfkthame
Copy link
Contributor

jfkthame commented Jun 4, 2020

Maybe even simpler, auto | yes | no or auto | on | off ?

@buttercookie42
Copy link
Author

Good point, yes.

And for completeness to make it clear and put it up for discussion: My understanding of the idea is that this should only affect the behaviour of hyphens: auto. I.e. forbid + hyphens: auto (or manual for that matter, not that that combination would make much sense then) would still consider hyphenation opportunities introduced by explicit (soft-)hyphens.

@jfkthame
Copy link
Contributor

jfkthame commented Jun 4, 2020

Agreed, this is only about whether hyphens: auto gets to attempt hyphenation. Explicit hyphens or ­ should not be affected. If the author put them there, presumably they're intended to be used.

@mnater
Copy link

mnater commented Jun 4, 2020

this is only about whether hyphens: auto gets to attempt hyphenation. Explicit hyphens or ­ should not be affected. If the author put them there, presumably they're intended to be used.

I beg to differ.
a) There are JavaScript tools that use soft hyphens everywhere, so they are not necessarily put there with intention.
b) The other hyphenation control properties seem to apply to both types of hyphenation (automatic or with soft hyphens, see #5090).

In my opinion the new property should work in both cases...

@buttercookie42
Copy link
Author

Then for completeness you'd have to have some sort of manual value (never automatically hyphenate capitalised words, but still respect soft-hyphens), too, wouldn't you?

@jfkthame
Copy link
Contributor

jfkthame commented Jun 5, 2020

this is only about whether hyphens: auto gets to attempt hyphenation. Explicit hyphens or ­ should not be affected. If the author put them there, presumably they're intended to be used.

I beg to differ.
a) There are JavaScript tools that use soft hyphens everywhere, so they are not necessarily put there with intention.

JavaScript tools that insert soft-hyphens have access to the text, and can make their own decisions whether to insert them in capitalized words or not. That (IMO) is out of scope for what the CSS hyphenation properties are seeking to control.

b) The other hyphenation control properties seem to apply to both types of hyphenation (automatic or with soft hyphens, see #5090).

For the (proposed/draft) properties that provide control over where hyphenation is used within the overall layout, yes. For properties that fine-tune where the hyphenation algorithm may insert potential breaks, that's less clear; I think such properties should not affect manual soft hyphens. (See #5090 (comment).)

@HeikkiYlipaavalniemi
Copy link

it would be really good to be able to control also for words which are starting a sentence with a capital letter.

Both Finnish and Swedish languages can have really long words and in a mobile layout they can easily go over the screen width if they are starting a sentence. Either it will just wrap to the next line without hyphenation or break the layout.

@arknu
Copy link

arknu commented Apr 21, 2022

(cross-posting from issue #3927, since I find it unclear where we should actually discuss this)

Why would you not want to hyphenate capitalized words? That decision makes absolutely no sense. It seems that, as usual, decisions are taken looking only at English and not taking into account that other languages may have different needs.

For English hyphenation may be a luxury, but for many languages with longer words (Danish, Norwegian, Swedish, Finnish, German and lots more) hyphenation is an absolute necessity for proper text layout, especially on mobile where lines are quite short. You just broke text layout for a large number of languages.

Take this example:
image

A single-word headline which, as you would expect, starts with a capital letter. Not getting hyphenated because of this stupid argument. Did no-one stop to think that not hyphenating the first word in a sentence might be a bad idea?

You absolutely CANNOT use capital letters to detect proper nouns. German capitalizes every single noun and those can be quite long. You cannot make random exceptions for different languages. The web is supposed to work for all languages, yet we are once again seeing the one-sided American view that "every language must work like English".

And why would you not want to hyphenate proper nouns in the first place? They are words like any other and they need to be hyphenated when they would protrude out of their box.

The current situation makes CSS hyphenation pretty much useless, forcing us to use bloated JS libraries for what should be something that just works in the browser. Word processors have been doing automatic hyphenation for decades, it can't be that hard.

aarongable pushed a commit to chromium/chromium that referenced this issue Apr 25, 2022
A request was made not to hyphenate capitalized words in
English at crbug.com/963039 because it is likely that they are
proper nouns. The CSS WG discussion[1] concluded to do so for
languages except German. No objections were made and Gecko
shipped in stable. Blink followed to match at r895487
crrev.com/c/2982497.

A following discussion[2] was raised to give the control to
authors. Unfortunately the WG has not concluded yet, but a
good number of opinions not to do so for other languages than
English were raised, in the CSS WG discussion[2] and
crbug.com/1318385.

This patch changes the logic applicable only to English.

[1] w3c/csswg-drafts#3927
[2] w3c/csswg-drafts#5157

Bug: 1318385, 963039
Change-Id: Ifd04b596ee5457e51bff848e7e4b8798bc4a0ffe
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3602055
Auto-Submit: Koji Ishii <kojii@chromium.org>
Reviewed-by: Kent Tamura <tkent@chromium.org>
Commit-Queue: Kent Tamura <tkent@chromium.org>
Cr-Commit-Position: refs/heads/main@{#995561}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants