BIDI in Vertical Context

by fantasai

This page been copied to http://fantasai.inkedblade.net/style/discuss/vertical-bidi. Also, a more detailed discussion of this topic can be found at http://www.unicode.org/notes/tn22/.

Comments on CSS3 refer to the last working draft of the CSS3 Text module.

There are two major shortcomings in CSS3 Text's handling of vertical inline progression. The first is that the interaction with the BIDI algorithm is poorly and incorrectly expressed. The second is that it cannot handle what I've found to be the most common style of incorporating horizontal scripts into vertical layout -- that is, having the horizontal script read top to bottom regardless of its inherent direction -- without creating a mess of bidi overrides and unnecessary markup.

The BIDI Problem

'China' diagram

Suppose I have the following sequence:

  [start] CHINA (zhong1 guo2) [end] 

If I render it horizontally, everything's fine. [diagram, top]

Now, I decide to lay this out vertically, in a left-to-right block progression. (Left-to-right text goes from bottom to top in a left-to-right block progression.) The block's 'direction' is 'ltr', and all characters are L, so there's no reordering before 'auto' rotation takes effect. I get [A], which is wrong, because going from top to bottom it should read "zhong1 guo2" as in [B], not "guo2 zhong1".

I decide, well, I'd rather have all the characters upright. So I apply "glyph-orientation-vertical: 0deg", which forces the Latin upright. Again, 'direction' is 'ltr', and all characters are L, so there is no reordering. I get [C], which isn't quite what I wanted [D]. I could apply a BIDI override, of course, to make 'direction' and all the characters go right-to-left. However, if the block progression doesn't stay left-to-right (which it won't in most browsers due to lack of support, or maybe a vagrant user stylesheet), I'll get [E], which is just useless.

Conclusion:

To make upright vertical text go in the correct direction, all upright characters must behave as left-to-right characters in an 'rl' block progression and as right-to-left characters in an 'lr' block progression.

Vertical Writing Styles

Scripts can be classified into three categories:

Vertical scripts don't read bottom to top. Similarly, most horizontal scripts don't read backwards. (Since CSS doesn't support scripts that alternate directions, we can assume this is the case for all scripts.)

Given a block progression, there are three ways of orienting scripts that don't match the block's orientation:

natural
where the text orients itself wrt the block progression. For example, English text in a left-to-right block progression naturally reads bottom-to-top. (Think table headers.)
Examples:
context
where the text takes on the inline progression of the containing block. For example, Latin text in Mongolian will often read top-to- bottom (rotated 90deg clockwise), even though its natural direction for left-to-right block progression is from bottom-to-top.
Examples:
upright
where the text takes on the inline progression of the containing block but also forces the glyphs to be set upright. You see this in Motel signs, book covers, and the like. (It's not used with cursive scripts afaik.)
Examples:

The "Natural" Orientation Style

Text is laid out with respect to the block progression. Horizontal scripts simply behave as if the 'before' edge was the top edge, and orient and reorder their glyphs accordingly. Vertical scripts are laid out from top to bottom, with the top edge of each glyph towards the top of the block.

BIDI reordering is applied to all text. However, all directional characters in vertical scripts are treated as

  • L (left-to-right) if 'block-progression' is 'rl'
  • R (right-to-left) if 'block-progression' is 'lr'

If the element's dominant script is HAN, HIRAGANA, KATAKANA, HANGUL, BOPOMOFO, or MONGOLIAN, then any available vertical glyph variants should be used for punctuation characters. Otherwise, horizontal punctuation glyphs should be used, rotated so the top edge faces the 'before' edge of the block.

The "Context" Orientation Style

Text is laid out from top to bottom regardless of inherent direction. The BIDI algorithm is applied. However, reordering is not applied to "context"-styled text. Instead,

  • glyphs for all characters in even embedding levels are rotated 90 degrees clockwise
  • glyphs for all characters in odd embedding levels are rotated 90 degrees counter-clockwise
  • all vertical script glyphs are oriented with their bottoms toward the bottom of the block

Also, the boundaries of "context"-styled text have fixed directionality; they are:

  • L (left-to-right) if 'block-progression' is 'rl'
  • R (right-to-left) if 'block-progression' is 'lr'

If the element's dominant script is HAN, HIRAGANA, KATAKANA, HANGUL, BOPOMOFO, or MONGOLIAN any available vertical glyphs are used for punctuation. Otherwise, horizontal punctuation glyphs are used, rotated in the appropriate direction.

The "Upright" Orientation Style

Text is laid out from top to bottom regard less of inherent direction. The BIDI algorithm does not internally affect "upright"-styled text. However, the boundaries of "upright"-styled text have fixed directionality and the entire run of text behaves as if embedded at an infinitely high embedding level when interacting with BIDI reordering applied to surrounding text. The text behaves as if it had

  • an even embedding level and L (left-to-right) directionality at the boundaries if 'block-progression' is 'rl'
  • an odd embedding level and R (right-to-left) directionality at the boundaries if 'block-progression' is 'lr'

All grapheme clusters are oriented with their bottom towards the bottom of the block and laid out each below the previous.

Vertical alternates of the glyphs are used. Enclosing punctuation such as parentheses should thus be rotated to face in to the text they enclose, em dashes should be vertical lines, and exclamation points should be upright.

CSS3 Text currently only provides for "natural" styling, really. It can be forced to do "context" or "upright", but getting correct results requires awkward overrides and, in many cases, extra markup. To keep authors from using such overrides and related markup and/or scripting, I propose that CSS3 Text provide for all three orientation styles.

I'd have posted this as a Last Call comment. However, it's taken days of research, reading, and (mostly) thinking to sort this all out.

For a full list of multi-script scans and text direction diagrams, see https://fantasai.tripod.com/www-style/2003/directions

Acknowledgements

Many thanks go to Martin Heijdra for taking the time to explain various scripts, their typing, and their typography, and for letting me borrow books from his private collection. (The Mongolian texts are his.)

Thanks also go to Ian Hickson for offering to host my scans. :)

Thanks also go to the Library! Hurray! The other books were borrowed from Firestone and Gest.