[css-text] Line breaking for ambiguous characters; e.g., U+2010, U+2013 · Issue #4419 · w3c/csswg-drafts (original) (raw)

@litherum found that Gecko handles U+2010 very nicely, and I'd like to consider using their idea.

Currently, the line-break property requires:

The following breaks are allowed for normal and loose line breaking if the writing system is Chinese or Japanese, and are otherwise forbidden:
breaks before hyphens:
‐ U+2010, – U+2013, 〜 U+301C, ゠ U+30A0

U+2010 and U+2013 are unified code points, and that it may affect English words in an undesired way. Not sure if this is intentional or not, Gecko supports this only when they follow Japanese characters, and not when they follow Latin letters, regardless of the content language.

jsbin test

It looks to me that this is a very good idea. Maybe not applicable to all cases, but at least these two code points a) are unified and ambiguous, and b) prohibit break before, so looking at the previous character makes sense to me.

Note, the jsbin test includes U+2010 and U+2013 in common CJK fonts, it looks like fonts disagree which code points have full-width CJK glyph and which has Latin glyph.

Thoughts?

/cc @fantasai @frivoal @emilio @jfkthame @drott