-
Notifications
You must be signed in to change notification settings - Fork 791
[css-text] Should zero width space break Arabic shaping? #3861
Copy link
Copy link
Open
Labels
Closed Rejected as OutOfScopeCommenter SatisfiedCommenter has indicated satisfaction with the resolution / edits.Commenter has indicated satisfaction with the resolution / edits.css-text-4i18n-afrlreqAfrican language enablementAfrican language enablementi18n-alreqArabic language enablementArabic language enablementi18n-mlreqMongolian language enablementMongolian language enablementi18n-trackerGroup bringing to attention of Internationalization, or tracked by i18n but not needing response.Group bringing to attention of Internationalization, or tracked by i18n but not needing response.
Metadata
Metadata
Assignees
Labels
Closed Rejected as OutOfScopeCommenter SatisfiedCommenter has indicated satisfaction with the resolution / edits.Commenter has indicated satisfaction with the resolution / edits.css-text-4i18n-afrlreqAfrican language enablementAfrican language enablementi18n-alreqArabic language enablementArabic language enablementi18n-mlreqMongolian language enablementMongolian language enablementi18n-trackerGroup bringing to attention of Internationalization, or tracked by i18n but not needing response.Group bringing to attention of Internationalization, or tracked by i18n but not needing response.
Type
Fields
Give feedbackNo fields configured for issues without a type.
This is probably more of a unicode issue than a css issue, but we have a fair bit of people involved with text layout and i18n over here, so filing it here first to figure out if we should take it to unicode or not.
When writing web-platform-tests/wpt#14673, I had misread the unicode standard, and though that ZERO WIDTH SPACE was supposed to break arabic shaping, based on a table that said "all spacing characters" do so. But there's a distinction between "spacing characters" and "spaces characters", and ZERO WIDTH SPACE is part of the later, not the former.
https://www.unicode.org/Public/UCD/latest/ucd/ArabicShaping.txt gives further details about which character does what to shaping, and classifies ZERO WIDTH SPACE as T (transparent), which neither forces nor breaks shaping, and just behaves as if it wasn't there for shaping purposes.
So Unicode has a definite answer as to what's supposed to happen, but several people in the thread about my tests were surprised by that answer (including @behdad, @r12a, and myself), because ZERO WIDTH SPACE is used as a word divider, and that suggests it ought to be breaking shaping. @r12a brought up nastaliq as a reasonable use case, because:
So, what do we collectively think? Is unicode likely enough to be mistaken that we should raise this issue with them? Is there a know good reason for why things are the way they are?