INTERNAL/REVIEW: Displaying 2-Byte Unicode Characters In IDL Graphics Windows
Anonym
[Needs to be reviewed for Compliance and IP issues (i.e. .pro file included)]
Topic:
This Tech Tip shows the code steps that are required to display 2-byte Unicode characters on IDL "draw windows". It demonstrates approaches in both Direct and Object Graphics. There are two downloadable example programs attached to this Tech Tip, as well as a smaller code swatch displayed at the bottom of this Tech Tip, which you can copy and paste to your IDL editor.
It is important to note that the display of Unicode is available only on IDL Graphics Windows, and only for TrueType fonts that support Unicode encoding. (Most publicly available TrueType fonts
do support Unicode encoding.) However, Widget text display (other than inside a WIDGET_DRAW, that is), is controlled strictly by your Operating System, using the fonts that IDL calls "Device" or "Hardware" fonts. IDL is not able to address these fonts with more than 1-byte per character.
The examples in this file demonstrate display on computer monitors, but the same approaches will work with PostScript file creation in IDL or with the Printer Device, if users choose TrueType as their
FONT mode. For Object Graphics, Unicode encoding is supported on all destination objects (including windows, buffers, clipboards, and printers) that are used to display text objects using a TrueType font.Discussion:
The main issue for displaying 2-byte Unicode characters in IDL is getting IDL to recognize that characters are encoded using a 2-byte index rather than a 1-byte index. Typically, characters are encoded using a single byte per character using the standard ASCII encoding. An IDL string variable can be used to represent these, with each character in the string corresponding to a character to be displayed. For Unicode, each character corresponds to a 2-byte index. A special formatting code is required to indicate that this type of encoding should be honored.
Within IDL, the only FONT mode that is able to interpret a 2-byte mapping is the TrueType FONT mode. This is the font mode that IDL code works with when either:
In Direct Graphics:
- the value of !p.font is set to 1(one), or...
- the value of the FONT keyword in an IDL plotting command is 1(one) (e.g. PLOT, XYOUTS, etc.), and...
- the value of DEVICE, SET_FONT is set to the name of your 2-byte font set
In Object Graphics:
- the 'Fontname' of your IDLgrFont object is a 2-byte TrueType (i.e. non-Hershey) font
The choice of TrueType fonts gives users access to an embedded format command called !Z, which is not available to Device or Hershey fonts. The !Z format command takes an array of unsigned shorts (2-byte integers) as its argument. For each short in the array, it searches the TrueType .ttf resource file for the drawing instructions associated with that 2-byte index. The following commands, for example:
device, set_font='AR PL Kaitim Big5', /tt_font
plot, findgen(10), font=1, $
title="Chinese Nonsense Title !Z('4eff'x, '4e9f'x, '4eaf'x)", $
charsize=2, color=0, background=255
would produce a plot title like the following:

For an even better result, check out the quality of the new TrueType-rendering algorithms implemented in Object Graphics starting with version 6.0 IDL:
oFont = obj_new('IDLgrFont', 'AR PL KaitiM Big5')
oText = obj_new('IDLgrText', $
"Chinese Nonsense Title !Z('4e01'x, '4e9f'x, '4eff'x)", $
font=oFont, /enable_formatting)
xobjview, oText
IDL Widgets do not have access to any font other than the Operating System device fonts, and are coded in such a way that they can only create strings of 1-byte ASCII characters. That is, string arguments used in IDL widget calls can only map to the first 256 chars in any operating system device fontset. Hershey fonts are not relevant for Unicode characters, because IDL has not packaged any Hershey font tables that have more than 128 elements. For this reason, TrueType fonts are the user's only option for Unicode mapping, and IDL "draw windows" are the only place where they can be displayed.
The example code shown on this Tech Tip and in the downloadable .pro files is based on a Chinese "AR PL KaitiM Big5" true type font set, which we downloaded from the Free Software Foundation's www.gnu.org website. We have provided a link to that download page for those who want to practice. The above-named font is in the 'bkai00mp.ttf.gz' file.
On Windows, many users will probably find TrueType fonts already loaded that include Unicode character mappings beyond index 255 (the end of ASCII). Windows has a utility at 'Start->Program Files->Accessories->System Tools->Character Map' that displays all the fontsets in the '...\WINDOWS\Fonts\' (or '...\WINNT\Fonts') directory, and any character in any fontset labeled with the "TT" TrueType icon is accessible to IDL. If you look at the end of any table in the 'Character Map' utility, you can highlight any character and see in the status bar the Unicode numeric index of the highlighted character.
If you are trying out GNU's 'AR PL KaitiM Big5' font file, we recommend you download and unzip it to IDL's '.../resource/fonts/tt/' directory. If you take this path, you will also have to modify in a text editor the 'ttfont.map' file in that directory. Add the following line to the end of that configuration file:
"AR PL KaitiM Big5" bkai00mp.ttf 1.0 1.0
with 2 tabs after "AR PL KaitiM Big5" and single tabs between the other columns. Windows users could alternatively install this in their OS \Fonts\ directory, where this fontset could be viewed with the 'Character Map' utility. No modification to IDL's 'ttfont.map' file would be required in that case. (You might need the 'Character Map' utility just to help you find the name identifier for the font; it does not appear to be mentioned in the README file at this GNU site.)
The example code displayed below shows how IDL might display a single Unicode character on a plot window in IDL Direct Graphics. The downloadable files are similar, but they both provide a viewer for multiple Unicode characters, a viewer that can be invoked with just one IDL command. One of them demonstrates Unicode in IDL Object Graphics, the other is simply its direct graphics equivalent. Syntax instructions for those procedures is provided in the header comments of the files.
GNU Font Downloads ex_obj_grafx_unicode.pro ex_direct_grafx_unicode.proSolution:
; File: display_unicode_char.pro
; Syntax: e.g. DISPLAY_UNICODE_CHAR, 'AR PL KaitiM Big5', '9eff'x
; The above call would get its glyphs (i.e. drawing instructions) from
; the 'bkai00mp.ttf' file, and display the character with code number
; 0x9eff (= 40703L) in the center of a 100x100 IDL direct graphics
; window.
PRO display_unicode_char, fontname, unicode_id
if n_params( ) ne 2 $
or size(fontname, /TYPE) ne 7 $ ; String
or long(unicode_id) gt 'ffff'x or long(unicode_id) lt 1L $ ; 2b unsigned
then begin
result = dialog_message(["Usage:", "DISPLAY_UNICODE_CHAR, " + $
"'fontname', unicode_short_int_value"], /ERROR)
return
end
; Don't interrupt user's current IDL session state
; Set the device to TrueType fonts
oldFontType = !p.font
if oldFontType eq 1 then device, GET_CURRENT_FONT=oldFont $
else !p.font = 1
; Set the font
device, SET_FONT=fontname, /TT_FONT
old_x_ch_size = !d.x_ch_size
old_y_ch_size = !d.y_ch_size
; 32x40 pixels is just a guess at a valid ratio
device, SET_CHARACTER_SIZE=[32,40]
index_string = strtrim(string(unicode_id, FORMAT='(Z)'), 2)
window, XSIZE=100, YSIZE=100, TITLE='Unicode index ' + $
index_string + ' in Direct Graphics'
; Below are the critical steps required to use Unicode chars in IDL.
; The Hershey-font embedded format command '!Z' insures that IDL
; sees 'unicode_id' as a 2-byte Unicode char. "STRING(FORMAT='(Z)'"
; insures that 'unicode_char' is translated into its hexadecimal value.
str = '!Z(' + strtrim(string(unicode_id, FORMAT='(Z)'), 2) + ')'
xyouts, 0.5, 0.5, str, ALIGNMENT=0.5, /NORMAL
; Restore previous IDL session state
device, SET_CHARACTER_SIZE=[old_x_ch_size,old_y_ch_size]
if oldFontType eq 1 then device, SET_FONT=oldFont, /TT_FONT $
else !p.font = oldFontType
END