Skip to content

[css-text] Need additional value of word-break for Korean #4285

@frivoal

Description

@frivoal

In regular situations, word-break: normal is expected to pick the right kind of word breaking for various scripts, keeping letters of a word together in languages that have word-based line breaking, while allowing wraps in the between letters of a word in languages where that's the normal behavior.

However, Korean typography has been evolving, and while the normal values corresponds to what used to be normal (allowing wraps in the middle of words), and needs to continue to have this behavior for compat reasons, the preferred behavior is increasingly the one achieved by keep-all.

In a document that where all parts are properly language tagged, * { word-break: normal; } lang(ko) { word-break: keep-all; } achieves the desired behavior.

However, this is not quite enough to solve the problem in the case of documents with user-generated content: when a user types content in a textarea, or a contenteditable (of if user generated content is retrieved from a database), the author of the page does not generally know what the language is, and cannot tag it in the markup. The following options are available to them, none of them great:

  • use word-break: normal on elements accepting user input: This will do "the right thing" for all languages, except for that style of Korean, which will break too often.
  • use word-break: keep-all on elements accepting user input: this will do "the right thing" for space separated languages, including that style of Korean, but will badly break languages like Japanese or Chinese, by disabling wrapping opportunities and causing potential overflow.
  • use word-break: normal on elements accepting user input, but also add a piece of javascript that monitors the content for changes, and switches the whole element to work-break: keep-all if any hangul text is detected:
    • This breaks if the content input by the user contains a mixture of Korean and languages like Japanese or Chinese, as it would apply keep-all to them as well.
    • This isn't a purely declarative solution, so it fail if Javascript is disabled
  • Use * { word-break: normal; } lang(ko) { word-break: keep-all; } together with a piece of Javascript that adds the lang=ko attribute (and creates spans/divs as necessary to apply it) on the parts of the text input by the user that contain hangul, and lang="" (or lang=somethingelse, if the somethingelse can be detected reliably) on parts that don't:
    • Getting this script right is very difficult. Not merely because of how it must analyse the content and adjust the markup accordingly, but also because of how it would need to integrate with editing operations: how to make these DOM modifications inside a content editable in a way that is compatible with the browser's undo stack? How to make them in a way that doesn't interfere with ongoing IME operations? How to make them in a way that is compatible with the hodge podge of markup that different browsers may generate inside a contenteditable element? etc
    • Getting this script to be correct AND performant is even harder. But performance is important: not all user input is tweet-sized. Think for instance of an online document editor, which may contain multiple pages of (multilingual) rich text.
    • This isn't a purely declarative solution, so it fail if Javascript is disabled

So, to solve this, I propose that we add a keep-all-hangul value (or just keep-hangul), that behaves the same as keep-all for the unicode characters that correspond to hangul, and normal for everything else.

Metadata

Metadata

Assignees

Labels

Agenda+ LaterLower-priority items that are flagged for CSSWG discussioncss-text-4i18n-klreqKorean language enablementi18n-trackerGroup bringing to attention of Internationalization, or tracked by i18n but not needing response.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions