Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

A Simple HTML Display Class


August 2000/A Simple HTML Display Class



Many programs use a multi-line edit control to display information, such as copyright and license information. For a recent project, I wanted to be able to use color and styles (bold, italics) in the text and also supply hypertext links that the user could click on to go to my company’s website. The rich edit control that comes with Windows can do all this, but the complexity of the rich edit interface makes it a chore to even use basic text formatting, let alone hypertext links. I therefore designed SimpleHTMLDisplay, a straight C++ (no MFC required) class that adds basic HTML handling capabilities to a Win32 rich edit control.

Rich edit controls come in several flavors, and SimpleHTMLDisplay supports them all. To use the hypertext link capabilities, you need at least Rich Edit 2.0 (riched20.dll). Rich Edit 1.0 (riched32.dll) supports the style and color tags, but not hypertext links. SimpleHTMLDisplay also works with plain vanilla edit controls, by displaying only plain text and stripping out any HTML tags.

The source code for SimpleHTMLDisplay is in simphtml.hpp (Listing 1) and simphtml.cpp (Listing 2). Complete source code is in this month’s code archive.

The Constructor

The SimpleHTMLDisplay constructor is the only function in this class that you call directly, passing it the window handle of the rich edit control. The constructor uses SetWindowLong() to subclass the control and stores a pointer to its data structure in a window property tag using SetProp(). It also queries the control’s window class information to determine exactly what type of edit control has been supplied.

The Subclass Procedure

SubProc() is the replacement window procedure. It uses GetProp() to retrieve the address of the SimpleHTMLDisplay data structure. It then looks for and intercepts several different messages. The most important of these is WM_SETTEXT. When this message arrives, SubProc() passes the text string to SimpleHTMLDisplay::PrintText(), which performs the actual HTML parsing. SubProc() also intercepts WM_DESTROY and deletes the SimpleHTMLDisplay data structure at that point. This means you don’t have to keep a pointer to the SimpleHTMLDisplay class and free it yourself — it is automatically disposed of when the rich edit control is destroyed.

SimpleHTMLDisplay uses two custom messages that are also handled in SubProc(): SHDMSG_GETURL and SHDMSG_HIGHLIGHTURL. They are described later. Finally, SubProc() uses CallWindowProc() to call the original window procedure for the edit control.

Text Processing

SimpleHTMLDisplay::PrintText() handles most of the HTML interpretation. First, PrintText() clears the control’s current contents. If a NULL string was supplied in the WM_SETTEXT message, then PrintText() returns at this point. The existing text is cleared using a WM_SETTEXT message, and note that this is sent directly to the original window procedure using CallWindowProc(). This is a very important point — if I used SendMessage() to send WM_SETTEXT, it would be treated as a normal message and sent straight back to SubProc(), which would then intercept the message and call PrintText(), and so on, resulting in an infinite loop.

PrintText() then initializes a CHARFORMAT2 structure. This is used to set the text color and style within the rich edit control. PrintText() initializes the cbSize field of the CHARFORMAT2 structure differently depending on the version of the control. If a Rich Edit 1.0 control is in use, it is set to the size of the old CHARFORMAT structure.

PrintText() treats the rich edit control as a kind of sequential output device. The user-supplied string is “printed” to the control in stages, using the EM_REPLACESEL message. The function iterates through the incoming text string, looking for embedded HTML tags, and uses the iStart variable to keep a track of the start of the last block of plain text.

When PrintText() finds a left angle bracket, it prints to the control any plain text that has been encountered (from iStart to the current position). Because EM_REPLACESEL doesn’t let you specify the length of the string, I have to allocate a temporary buffer to copy the plain-text substring into, before printing it to the control. I also increment a count in the variable iInTag, to determine when I am within a pair of angle brackets. A weakness in the current implementation is that a single, left angle bracket that is not part of a tag is not handled correctly; however, this could be easily rectified.

PrintText() replaces the line break (<br>) and paragraph tags (<p>) with a CR/LF pair. These are the only tags that are supported by a plain edit control. If the control is a rich edit control, PrintText() then looks for the style tags for bold, underline, italic, and font. If it finds a match, PrintText() sets the appropriate flags in the CHARFORMAT2 structure. For example, for <b>, the CFM_BOLD and CFE_BOLD flags are set to turn on bold style. The <font> tag is slightly more complex and is handled by a separate function (HandleFont()) described later.

If the control is Rich Edit 2.0 or better control, PrintText() also looks for the <a> tag to support hypertext links. If a hypertext link tag is encountered, PrintText() extracts the URL and adds it to an internal list using AddLink(). I then set the text styles for a link (CFE_LINK and CFE_UNDERLINE) and set the text color to blue (which I define as URL_NORMALCOLOR).

Rich Edit 2.0 uses the CFE_LINK style to mark a section of text as a hypertext link. When the mouse cursor is moved over text marked with this style, the control generates EN_LINK notification messages. Unfortunately, Rich Edit 2.0 doesn’t offer any way to actually store the URL of the hypertext link and instead supplies only the character position of the hypertext link text in the EN_LINK message. In keeping with the “simple” philosophy, I wanted an easy way to retrieve the URL when I received an EN_LINK message. I therefore turned to another field in the CHARFORMAT2 structure: lcid.

The Windows SDK describes the lcid field as a 32-bit locale identifier and says that “this member has no effect on the text displayed by a rich edit control, but spelling and grammar checkers can use it to deal with language-dependent problems.” In effect, then, it is a 32-bit user data field, which the rich edit control itself makes no use of. I decided to use this field to store a pointer to the URL (or more precisely, to the node returned by AddLink()).

After parsing the HTML tag, PrintText() sends any style changes to the control with the EM_SETCHARFORMAT message. When it finds a right angle bracket, PrintText() decrements iInTag — if it reaches zero, I know that I am no longer within an angle-bracket pair, and the iStart value is updated to mark the beginning of the next block of plain text. The loop then continues until reaching the end of the string.

AddLink()

When PrintText() encounters an <a> tag, it calls AddLink() to add the URL to an internally maintained list. The URL is extracted from the text string, copied into a SimpleHTML_UrlNode structure, and then linked into the list. AddLink() then returns the pointer to the new node, which is stored by PrintText() in the lcid field as described previously.

Color Changes

HandleFont() handles the <font> tag. It currently only supports color changes, but this could be extended to handle the other aspects of the <font> tag. The function searches for the “color=” substring and extracts the RGB color value using an inline function called hexcharval() to turn the hexadecimal color value into the COLORREF that the rich edit control uses.

Retrieving URL Nodes

When the user moves the mouse over or clicks on a hypertext link, the rich edit control sends an EN_LINK notification to its parent window. The custom message SHDMSG_GETURL is then used to retrieve the URL associated with the hypertext link. The EN_LINK message is sent back to the control as a parameter to the SHDMSG_GETURL message and is passed to RetrieveURL(). The EN_LINK structure contains the character range of the selected hypertext link in the chrg member, and RetrieveURL() uses the EM_GETCHARFORMAT message to retrieve the value of the lcid field for this range of characters. As described above, I store the address of the URL node in this field.

EM_GETCHARFORMAT does not allow you to specify a character range, but instead operates on the current selection. I therefore have to use EM_EXSETSEL to set the selection to the range specified in the EN_LINK message. I also use EM_HIDESELECTION to hide the selection marker, to avoid the selection being highlighted.

After sending the EM_GETCHARFORMAT message, I position the cursor at the end of the hypertext link and enable the selection marker again by sending an EM_HIDESELECTION with a FALSE parameter. I then return the address of the actual URL string (not the node), which is sent back to the user as the result of the SHDMSG_GETURL message.

Highlighting Hypertext Link Text

When SubProc() gets a SHDMSG_HIGHLIGHTURL message, it calls HighlightURL(). Just like a web browser, the goal is to use a different color for hypertext links that the user has clicked on. It lets you provide some visible feedback to the user that clicking on the hypertext link actually did something.

HighlightURL() works in the same manner as RetrieveURL(); it first hides the selection marker and then selects the range of text surrounding the hypertext link. It then uses the EM_SETCHARFORMAT message to change the color of the text to either blue or purple (URL_NORMALCOLOR or URL_CLICKCOLOR) depending on the value of the fState flag, which is passed as the wParam parameter for the SHDMSG_HIGHLIGHTURL message.

Destructor

You may notice that the class destructor is not publicly accessible. As described previously, it is not designed to be called by the user, but instead is called by SubProc() when the edit control receives a WM_DESTROY message. The destructor handles freeing the memory used to store the linked list of URLs. It also removes the subclass from the edit control and calls RemoveProp() to remove the window property used to store the data pointer.

An Example Program

The example program in wdjtest.cpp (Listing 3) and wdjtest.rc (Listing 4) shows how to use SimpleHTMLDisplay. Using a rich edit control in a program is not quite as straightforward as using the other Windows common controls. You have to load the rich edit DLL manually, a task complicated by the fact that the Rich Edit 1.0 and Rich Edit 2.0 DLLs have different names. This example program only looks for riched20.dll; however if you want your program to run on a standard Windows 95 machine (which only includes Rich Edit 1.0), you will need to open riched32.dll if riched20.dll is not present.

An additional complication is that the two versions of the control have different window class names. The Rich Edit 1.0 window class is called “RICHEDIT,” whereas the 2.0 window class is called either “RichEdit20A” or “RichEdit20W” depending on whether the ASCII or Unicode version is used. In a real-world program, you therefore cannot use a resource file (as I have done in this example) to create the rich edit control, but instead will need to create it at runtime using CreateWindowEx(), specifying the appropriate class name. See the Windows SDK for more information about this.

The example program opens a dialog box, which displays a Rich Edit 2.0 control. (If you do not have “riched20.dll” on your system, it will not run.) As you can see, the dialog box procedure, TestDlgProc(), is quite small. Very little actual work on the user’s part is needed to use SimpleHTMLDisplay.

When the dialog procedure receives a WM_INITDIALOG message, it first creates a new instance of the SimpleHTMLDisplay class, passing it the window handle of the rich edit control. It then calls SetDlgItemText() to set the actual text in the control. (SetDlgItemText() calls WM_SETTEXT internally, which is trapped and handled by the control as described before.) WM_CLOSE and WM_COMMAND messages are handled simply to allow the user to close the dialog.

The only other message handled is WM_NOTIFY, and this is used to support hypertext links. (There is one hypertext link in the example text.) At this point, I call HandleURL(), which returns a Boolean value to indicate whether the message was processed or not. This value is returned in DWL_MSGRESULT using SetWindowLong() — this is only necessary in a dialog box. If you were using SimpleHTMLDisplay in a normal window, you would simply return this value as the result of the message. If FALSE is returned, the rich edit control will carry out the default processing for this message. This is useful because it handles things like changing the mouse cursor to a hand image automatically.

HandleURL() examines the NMHDR parameter of the WM_NOTIFY message to see if it is an EN_LINK message. If so, the NMHDR parameter is actually an ENLINK structure, which contains information about the link and the action that occurred. If the user clicked the left mouse button on the link, the msg field will be set to WM_LBUTTONDOWN. In this case, I send the rich edit control the custom SHDMSG_GETURL message, passing the address of the ENLINK structure back to it. This message will return zero for failure or otherwise the address of the URL string. If a valid string is returned, I use SHDMSG_HIGHLIGHTURL with wParam set to TRUE to highlight the hypertext link. I then call ShellExecute() to actually open the URL, passing the URL string as the lpFile parameter. Finally, I call SHDMSG_HIGHLIGHTURL again to remove the highlighting and then return TRUE to indicate that the message was processed.

HandleURL() could be extended to handle other messages; for example, WM_SETCURSOR could be handled to change the mouse cursor to a custom image. (By default, the rich edit control will set it to a hand image when the mouse is over a hypertext link.)

Conclusion

While SimpleHTMLDisplay is far from being a complete HTML parser, it does add a more usable and familiar interface to what is actually a very powerful common control. The rich edit control supports many more effects that have HTML equivalents, so SimpleHTMLDisplay could certainly be enhanced. However, I believe in its current form it provides adequate functionality for most purposes. This code should also be a good example of the things that can be accomplished through window subclassing.

Jonathan Potter lives in Sydney, Australia, and runs his own Windows software development business. He can be reached at [email protected].

About the Author

Jonathan Potter lives in Sydney, Australia, and runs his own Windows software development business. He can be reached at [email protected].

Get Source Code


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.