-
Notifications
You must be signed in to change notification settings - Fork 756
Description
The first rule for collapsing segment breaks is:
If the character immediately before or immediately after the segment break is the zero-width space character (
U+200B), then the break is removed, leaving behind the zero-width space.
It is not clear to me what should happen if there are multiple segment breaks involve here. For example, if I have ZWSP LF LF LF x, would this rule produce:
ZWSP LF LF x(with only the firstLFremoved), orZWSP x(with allLFremoved because of recursively applying this rule)?
(In the first case, the remaining LFs would be converted to whitespaces by the last rule there, and the second whitespace would be removed by step 4 of Phase I, so the final result would be ZWSP WS x.)
This may also affect the second rule:
Otherwise, if the East Asian Width property of both the character before and after the line feed is
F,W, orH(notA), and neither side is Hangul, then the segment break is removed.
If I have W LF LF W, should the two LFs be removed by this rule?
It seems to me that removing all segment breaks together would be easier for implementation, so I would propose making the rules that way if there are no other concerns.