Al's most recent book is MFC Black Book (Coriolis, 1997). He can be contacted at http://www.al-williams.com/.
Remember the movie 2001: A Space Odyssey? One of the most memorable characters in that movie was, of course, "Hal" the computer. Like most movie computers (the exception being the Macintosh in Star Trek IV), Hal was smart and spoke with humans. Well, it isn't quite 2001, but it's getting pretty darned close, and still, real computers don't speak and understand as well as their movie counterparts. However, recent developments have brought a plethora of voice-recognition tools for Windows. Though these tools are mainly for dictating into word processors, as a developer you can certainly think of other applications. Wouldn't it be nice to piggyback your program on an existing voice recognition system? You can, using toolkits such as Dragon Systems' DragonXTools.
DragonXTools visual controls make it possible for you to speech-enable Windows applications with up to 60,000-word dictation using Visual Basic, Visual C++, and other development tools that support VBX/OCX controls or C-callable libraries. Users then use DragonDictate for Windows to enter text, data, and commands into Windows apps simply by speaking. You can distribute the DragonXTools custom controls royalty free. However, you'll need run-time licenses to distribute DragonDictate for Windows.
In this article, I'll show how to use DragonXTools custom controls to add speech recognition to programs. In doing so, I'll use Visual Basic 5 to write a voice-activated autodialer (available electronically; see "Resource Center," page 3). Since the controls are ActiveX controls, however, you can use most any language with them.
How Does it Work?
Even if you do not take special action, DragonDictate still works with your program, although it may not work well, depending on how your program is written. When DragonDictate sees your program running, it scans your menu items and control captions. Since DragonDictate generates its own speech models, it can respond to users speaking your menu items and control captions. Of course, some captions work better than others. Sometimes DragonDictate can't determine an appropriate speech model. Also, if you use custom controls for graphical buttons, DragonDictate can't decide what that means.
DragonDictate operates in either command or dictate mode. Usually, DragonDictate is in command mode, which lets you speak menu commands, or key names. In command mode, you can operate the computer without using a mouse or a keyboard. However, entering text in command mode is a chore. You'd have to speak each letter individually. Instead, when you want to enter text you say "dictate mode." Now, DragonDictate interprets user speech as words until you say the command "command mode." (One problem is that users must remember to enter dictation mode. A speech-aware program might handle this automatically.)
In short, it isn't strictly necessary to alter your programs in any way to make them work with DragonDictate. However, with DragonXTools, you can get significant benefits. For example, you can set your own pronunciations for words that DragonDictate might misinterpret (or graphics that DragonDictate can't read at all). You can also respond to words with your own actions, use DragonDictate's macro language, and share the sound system with DragonDictate (many sound cards can only listen or talk at one time).
The DragonXTool Toolkit
DragonXTools includes a control that lets you recognize speech and interact with the Dragon engine, and another that converts plain text into speech. This works about as you expect; the speech sounds computer-generated, which is not always pleasing.
Using the components isn't difficult and the manual provides examples to help you get started. The examples are for VB, but the manual includes advice on how to use the components in C++, Delphi, and Java as well.
Dragon separates words it will recognize into vocabularies and groups. It scans different vocabularies depending on the current situation. The Dragon speech control lets you manage vocabularies and groups. You can create new words, and control which groups Dragon examines for speech recognition.
Like all ActiveX controls, the Dragon control has properties, methods, and events. Table 1 is a list of the members used in the VB program I present here. Many functions that you'll need to use require you to access Dragon's scripting language (via the Script property).
Using the controls is straightforward once you get the hang of it. You first have to make sure DragonDictate is running. If it isn't, you can start it before your program proceeds using something like Example 1. Once DragonDictate is running, you have to attach your speech control to the speech recognition engine. You can do this with the Attach property. Then you are ready to begin creating your own vocabularies. You can set the currently active vocabulary and group using the Vocabulary and Group properties. When Dragon recognizes any of the words in your control's group, it sends an event so that you can act on the word.
Problems
Although the tools are generally easy to use, I did find a few things I wish had been different. First, DragonDictate doesn't work well under Windows NT. The Dragon web site (http://www.dragonsystems.com/) has a FAQ about this. You can use Dragon and the tools under NT, but the behavior is quirky. Occasionally, you'll be thrown into another window, for example. Worse, during development, the system frequently crashes and hangs VB. This seemed to have something to do with breakpoints, so I don't expect it would be a problem in the shipped program, but it sure made writing software a chore.
Another thing I thought odd is the way Dragon handles sleeping. If you are like me, you can't really leave your microphone on all the time for DragonDictate to listen. There are phones ringing, dogs barking, and all manner of other noises in my office. Suppose you are dictating text into a program and the phone rings. You can say "go to sleep," which puts Dragon in a dormant state. It still listens, but it doesn't do anything until you say "wake up." However, DragonDictate still notifies you when it recognizes any of your words. That means you have to know if Dragon is asleep or not. However, you can't ask Dragon if it is asleep. If you want to handle this problem, you have to define your own "go to sleep" and "wake up" commands and do the work yourself. Then you'll still get the events for other words, but your program will know it should be asleep. The Dragon manual has examples of several ways to do this.
The Design
I've always thought the voice-activated autodialers on some high-end telephones are a great idea. You just speak a name into the phone and it dials away. (And before you ask, yes, you can get a headset that works with Dragon and your phone.) For the first cut, I tried to make the program understand the name I was saying and I figured I would store the phone numbers in a flat file or database.
As I got the speech part working, I realized I really didn't need a database, since I could store the phone number and name along with the speech model. Of course, you don't want to have to say the name and the number to dial on the phone, right? That defeats the whole purpose. However, Dragon lets you specify alternate pronunciations for words, as I'll show you in a bit. Since users have their own vocabulary, that means individual users also have a private phone book.
Figure 1 shows a completed dialer application. You don't need any special code or controls to handle the three buttons -- Dragon takes care of them automatically. But you do need some special work to make the phone automatically dial when you speak a name.
When you say "add" or click the Add button, the program brings up a simple form that lets you make a new entry. The form has two fields, Name and Number, that you can jump to by simply saying the appropriate word. Also, when the Name field has the focus, the program automatically places Dragon in dictation mode. It is impressive how many common names Dragon correctly interprets.
Saying "number" (or clicking the button with that name) brings up an input box. From here you can just say numbers aloud to dial them. Say "okay" when you are done, or "cancel" to abort. Dialing the phone is easy with an MSCOMM ActiveX control. The dialer assumes you have a Hayes-compatible modem on COM1 (although that's easy to change).
Implementation
The trick to this program is setting up word recognition. When you add a new name, the program constructs a string to "teach" Dragon. The string consists of the name, a tab character, and the phone number. However, this would be awkward to pronounce, so the program also adds a square bracket, the name alone, and closing bracket. By placing alternate text in brackets, you are telling Dragon that the text between the brackets is the correct pronunciation for the preceding word. The program then feeds this string to the AddWord method of the speech control. If Dragon can't deduce a speech model for the word, AddWord returns False and the program lets the user train the word in question. Listing One ncludes this logic (see the Add_Click subroutine).
Interestingly, the listbox holds names in the same format (but no pronunciation in square brackets). This makes it easy to create a single Dial routine that handles a string from the voice recognition or listbox. It also makes it easy to reconstruct the listbox from the vocabulary data on startup (see the Form_Load subroutine in Listing One).
The DDSpeech1_SpeechRecognized routine handles the voice dialing. The only reason there is more than one line of code in this routine is that I wanted to change the listbox selection to reflect the dialed number. Visual feedback is important when you are dealing with voice command, because voice is not 100 percent accurate. When you delete a name, for example, the program is careful to prompt you before taking action. It might be a good idea to add a similar safeguard to the dialing routine, too.
The form used to add names is available online. There, you can find the code that sets Dragon's mode when each text box receives the focus. This allows users to dictate names without having to explicitly set the dictation mode.
Other Possibilities
Once you have the ability to work with voice commands and dictation, there are many other ways you can make your application more voice friendly. For example, by using the SetHomeGroup script command, you could restrict the phone-number fields to accept only words that make sense for phone numbers. You can also use DgnTTS control to convert words back to voice (although for simple uses, you might be better off just playing prerecorded wave files).
Although DragonXTools has some problems (poor NT compatibility and difficult to manage sleep mode), it is exciting to watch a program respond to spoken words. Probably the biggest disadvantage is that users have to already have one of the Dragon products that provide the actual speech processing. Of course, if you are building a dedicated system, or you are willing to license the product from Dragon, this may not be a problem. Just try to resist the urge to speak into your mouse.
For More Information
Dragon Systems Inc.
320 Nevada Street
Newton, MA 02160
617-965-5200
http://www.dragonsystems.com/
DDJ
Listing One
VERSION 5.00Object = "{C9F1DD69-49F9-11D0-B5C5-444553540000}#1.0#0"; "dd32.ocx" Object = "{648A5603-2C6E-101B-82B6-000000000014}#1.1#0"; "MSCOMM32.OCX" Begin VB.Form MainForm Caption = "Voice Dialer" ClientHeight = 3195 ClientLeft = 60 ClientTop = 345 ClientWidth = 4680 LinkTopic = "Form1" ScaleHeight = 3195 ScaleWidth = 4680 StartUpPosition = 3 'Windows Default Begin VB.CommandButton ManDial Caption = "Number" Height = 495 Left = 120 TabIndex = 3 Top = 1560 Width = 975 End Begin MSCommLib.MSComm MSComm1 Left = 720 Top = 2520 _ExtentX = 1005 _ExtentY = 1005 _Version = 327680 DTREnable = 0 'False End Begin VB.CommandButton Delete Caption = "Remove" Height = 495 Left = 120 TabIndex = 2 Top = 840 Width = 975 End Begin VB.CommandButton Add Caption = "Add" Height = 495 Left = 120 TabIndex = 1 Top = 120 Width = 975 End Begin VB.ListBox List1 Height = 2790 Left = 1320 Sorted = -1 'True TabIndex = 0 Top = 120 Width = 3135 End Begin DDSpeechLib.DDSpeech DDSpeech1 Left = 120 Top = 2640 _Version = 65536 _ExtentX = 741 _ExtentY = 741 _StockProps = 0 End End Attribute VB_Name = "MainForm" Attribute VB_GlobalNameSpace = False Attribute VB_Creatable = False Attribute VB_PredeclaredId = True Attribute VB_Exposed = False Option Explicit </p> Private Sub Add_Click() ' Name, number, and generic string Dim n As String, nm As String, s As String, word As String AddForm.Show vbModal If AddForm.Cancelled <> True Then n = AddForm.NewName nm = AddForm.NewNumber s = n & Chr(9) & nm List1.AddItem s word = s & "[" & n & "]" If DDSpeech1.AddWord("PhBook", "TelNum", word, "'") = EXP_ERR_WORD_HAS_NO_MODEL Then DDSpeech1.TrainWord = word End If Unload AddForm End If End Sub </p> ' Dial a number in the format of name (tab) number [xxx] ' The brackets, if present at all, are ignored Sub Dial(ByVal word As String) Dim n As Integer Dim t0 As Date Dim dn As String, nam As String ' Dial number, name n = InStr(word, Chr(9)) dn = Right(word, Len(word) - n) nam = Left(word, n - 1) n = InStr(dn, "[") If n <> 0 Then dn = Left(dn, n - 1) MSComm1.PortOpen = True MSComm1.Output = "ATV1E0DT" & dn & Chr(13) t0 = DateAdd("s", 5, Now) Do DoEvents Loop Until Now > t0 ' Wait 5 seconds MSComm1.PortOpen = False MsgBox dn, vbOKOnly, "Dialed " & nam End Sub </p> 'Delete Entry Private Sub Delete_Click() Dim n As Integer Dim word As String, nam As String n = List1.ListIndex If n <> -1 Then If MsgBox("Delete this entry", vbYesNo) = vbNo Then Exit Sub word = List1.Text nam = Left(word, InStr(word, Chr(9)) - 1) word = word & "[" & nam & "]" ' Delete word from dragon dictionary If DDSpeech1.DeleteWord("PhBook", "TelNum", word) Then List1.RemoveItem n Else MsgBox "Can't remove name" End If Else MsgBox "Please select a name first" End If End Sub </p> ' Manual dial a number Private Sub ManDial_Click() Dim nr As String nr = InputBox("Enter or say the number to dial") If nr <> "" Then Dial ("Manual Dial" & Chr(9) & nr) End Sub </p> Private Sub DDSpeech1_SpeechRecognized(word As String, WordValue As String) Dim SearchWord As String Dim i As Integer ' Find string in listbox so we can highlight it SearchWord = Left(word, InStr(word, "[") - 1) List1.ListIndex = -1 For i = 0 To List1.ListCount - 1 If SearchWord = List1.List(i) Then List1.ListIndex = i Exit For End If Next i Dial word ' Do it End Sub </p> Private Sub Form_Load() Dim s As String Dim n As Integer ' Start Dragon if not already started If Not IsDDWinRunning() Then If Not StartDDWin() Then MsgBox "Can't start Dragon Dictate", vbExclamation End End If End If DDSpeech1.Attach = True DDSpeech1.AddVocabulary "PhBook" DDSpeech1.AddGroup "PhBook", "TelNum" DDSpeech1.Vocabulary = "PhBook" DDSpeech1.Group = "TelNum" ' Load phone numbers already in vocabulary s = DDSpeech1.WordFirst Do While s <> "" n = InStr(s, "[") List1.AddItem (Left(s, n - 1)) s = DDSpeech1.WordNext Loop End Sub </p> ' Double click for those who are speechless! Private Sub List1_DblClick() Dial List1.Text End Sub
Copyright © 1998, Dr. Dobb's Journal