-
Notifications
You must be signed in to change notification settings - Fork 757
Description
In #337 we decided to key line-break transformation behavior by Unicode Block. Most of the blocks are pretty straightforward: Han, Kana, Yi, and CJK punctuation blokcs discard, and everything else converts to a space. But there are a few interesting cases...
One interesting case are some symbols that seem to originate primarily in CJK usage:
https://en.wikipedia.org/wiki/Yijing_Hexagram_Symbols_(Unicode_block)
https://en.wikipedia.org/wiki/Taixuanjing
https://en.wikipedia.org/wiki/Counting_Rod_Numerals_(Unicode_block)
Our intent is to discard if it's safe to do so (Chinese / Japanese context) but not otherwise (Korean, English, etc.). Note that we only discard if both sides (before and after) the line break are part of the space-discarding character set.
What should we do with these blocks?