@@ -5499,6 +5499,45 @@ Space-Discarding Unicode Characters</h2>
54995499 Han, Hiragana, Katakana, or Yi script
55005500 shall also be considered part of the [=space-discarding character set=] .
55015501
5502+ <details class="note">
5503+ <summary> Wherefore this table of “space-discarding characters”?</summary>
5504+
5505+ The purpose of the [[#line-break-transform|segment break transformation rules]]
5506+ is to “unbreak” text that has been formatted
5507+ with extra white space for source code readability,
5508+ see [[#line-break-transform]] .
5509+
5510+ In most cases, “unbreaking” a line of text requires joining them with a space,
5511+ but some writing systems don't use spaces
5512+ so such texts need to be joined without any space.
5513+ CSS uses the characters before and after to determine
5514+ whether to join lines with or without a space.
5515+
5516+ For simplicity and for ease of implementation,
5517+ the classification of characters as space-discarding or space-preserving
5518+ is done by Unicode code block.
5519+ Ideally, such a list would be maintained in [[UNICODE]] ,
5520+ but the Unicode Technical Committee has yet
5521+ to express any intention of taking on this task.
5522+ In the meantime, in the interest of bringing
5523+ more of the text-processing facilities of CSS and HTML
5524+ that are available to Western writing systems
5525+ to Eastern writing systems as well,
5526+ the CSSWG is maintaining this appendix
5527+ and refining the rules in [[#line-break-transform]] ,
5528+ and hopes that in the future,
5529+ once CSS has demonstrated its viability,
5530+ the Unicode Consortium will recognize the need for an “unbreaking” algorithm
5531+ and take over maintenance of such.
5532+
5533+ <!-- things that could use an unbreaking algorithm:
5534+ * HTML/CSS
5535+ * Markdown
5536+ * TeX
5537+ * text editors' “unbreak lines” commands
5538+ -->
5539+ </details>
5540+
55025541<h2 id="script-tagging" class="no-num">Appendix G.
55035542Tagging Content by Writing System</h2>
55045543
0 commit comments