Wide Character Support in NetBSD Curses Library
Name: Ruibiao Qiu
Contact: [email protected]
School: Washington University
Major: Doctoral Candidate, Computer Science and Engineering
Project: Wide Character Support in Curses
Project Page: http://netbsd-soc.sourceforge.net/projects/wcurses/
Mentors: Julian Coleman and Brett Lymn
Mentoring Organization: The NetBSD Project (http://www.netbsd.org/)
The current NetBSD curses library doesn't support wide characters, which limits the use of NetBSD in countries with wide-character locales. The "Wide Character Support in curses" project adds wide-character support to the NetBSD curses library, complying with the X/Open Curses Reference to provide internationalization and localization.
The difficulty of adding wide-character support to NetBSD curses lies in its internal character storage data structure and related functions, which assume an 8-bit character in each display cell. Adding wide-character support means adding a new character storage data structure to hold wide-character information. This structure holds not only the character but also the attributes, including any nonspacing characters associated with the display cell.
The internal character storage data structure adds two linked lists for foreground/background nonspacing characters and uses spare bits in the attribute field for the character width, which are required for multicolumn characters. There is one storage cell per column, but the width fields are set differently for a multicolumn character. For an m-column-wide character, the first cell holds the width of the character, and the other m-1 cells hold the position information in their width fields. This offset is negative, making it easy to detect a cell belonging to a multicolumn character.
To read a wide character from a keyboard, a distinction must be made between a function key sequence and a wide-character sequence. The keymap routines for narrow character input are used to detect function keys, and the stateful wide-character conversion routine mbrtowc() is used to assemble input bytes into a valid wide character.
Some existing narrow character routines have been modified to work with wide characters. The new storage data structure makes screen-refreshing code more complicated because the NetBSD curses library uses a hash function to determine if a screen needs to be refreshed. For wide-character support, the hash function must include the nonspacing characters as well to capture the changes in rendition. Another issue is when a character is added or deleted, a check must be made to detect if that character was part of a multicolumn character. All parts of the multicolumn character are removed in this case.
The modified curses library was tested with three wide-character localesSimplified Chinese, Traditional Chinese, and Japanese. Test results show that twice the memory is generally required to support wide characters.
DDJ