Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

.NET

Examining the Dragon Speech-Recognition System


Dr. Dobb's Journal July 1998: Examining the Dragon Speech-Recognition System

Al's most recent book is MFC Black Book (Coriolis, 1997). He can be contacted at http://www.al-williams.com/.


Remember the movie 2001: A Space Odyssey? One of the most memorable characters in that movie was, of course, "Hal" the computer. Like most movie computers (the exception being the Macintosh in Star Trek IV), Hal was smart and spoke with humans. Well, it isn't quite 2001, but it's getting pretty darned close, and still, real computers don't speak and understand as well as their movie counterparts. However, recent developments have brought a plethora of voice-recognition tools for Windows. Though these tools are mainly for dictating into word processors, as a developer you can certainly think of other applications. Wouldn't it be nice to piggyback your program on an existing voice recognition system? You can, using toolkits such as Dragon Systems' DragonXTools.

DragonXTools visual controls make it possible for you to speech-enable Windows applications with up to 60,000-word dictation using Visual Basic, Visual C++, and other development tools that support VBX/OCX controls or C-callable libraries. Users then use DragonDictate for Windows to enter text, data, and commands into Windows apps simply by speaking. You can distribute the DragonXTools custom controls royalty free. However, you'll need run-time licenses to distribute DragonDictate for Windows.

In this article, I'll show how to use DragonXTools custom controls to add speech recognition to programs. In doing so, I'll use Visual Basic 5 to write a voice-activated autodialer (available electronically; see "Resource Center," page 3). Since the controls are ActiveX controls, however, you can use most any language with them.

How Does it Work?

Even if you do not take special action, DragonDictate still works with your program, although it may not work well, depending on how your program is written. When DragonDictate sees your program running, it scans your menu items and control captions. Since DragonDictate generates its own speech models, it can respond to users speaking your menu items and control captions. Of course, some captions work better than others. Sometimes DragonDictate can't determine an appropriate speech model. Also, if you use custom controls for graphical buttons, DragonDictate can't decide what that means.

DragonDictate operates in either command or dictate mode. Usually, DragonDictate is in command mode, which lets you speak menu commands, or key names. In command mode, you can operate the computer without using a mouse or a keyboard. However, entering text in command mode is a chore. You'd have to speak each letter individually. Instead, when you want to enter text you say "dictate mode." Now, DragonDictate interprets user speech as words until you say the command "command mode." (One problem is that users must remember to enter dictation mode. A speech-aware program might handle this automatically.)

In short, it isn't strictly necessary to alter your programs in any way to make them work with DragonDictate. However, with DragonXTools, you can get significant benefits. For example, you can set your own pronunciations for words that DragonDictate might misinterpret (or graphics that DragonDictate can't read at all). You can also respond to words with your own actions, use DragonDictate's macro language, and share the sound system with DragonDictate (many sound cards can only listen or talk at one time).

The DragonXTool Toolkit

DragonXTools includes a control that lets you recognize speech and interact with the Dragon engine, and another that converts plain text into speech. This works about as you expect; the speech sounds computer-generated, which is not always pleasing.

Using the components isn't difficult and the manual provides examples to help you get started. The examples are for VB, but the manual includes advice on how to use the components in C++, Delphi, and Java as well.

Dragon separates words it will recognize into vocabularies and groups. It scans different vocabularies depending on the current situation. The Dragon speech control lets you manage vocabularies and groups. You can create new words, and control which groups Dragon examines for speech recognition.

Like all ActiveX controls, the Dragon control has properties, methods, and events. Table 1 is a list of the members used in the VB program I present here. Many functions that you'll need to use require you to access Dragon's scripting language (via the Script property).

Using the controls is straightforward once you get the hang of it. You first have to make sure DragonDictate is running. If it isn't, you can start it before your program proceeds using something like Example 1. Once DragonDictate is running, you have to attach your speech control to the speech recognition engine. You can do this with the Attach property. Then you are ready to begin creating your own vocabularies. You can set the currently active vocabulary and group using the Vocabulary and Group properties. When Dragon recognizes any of the words in your control's group, it sends an event so that you can act on the word.

Problems

Although the tools are generally easy to use, I did find a few things I wish had been different. First, DragonDictate doesn't work well under Windows NT. The Dragon web site (http://www.dragonsystems.com/) has a FAQ about this. You can use Dragon and the tools under NT, but the behavior is quirky. Occasionally, you'll be thrown into another window, for example. Worse, during development, the system frequently crashes and hangs VB. This seemed to have something to do with breakpoints, so I don't expect it would be a problem in the shipped program, but it sure made writing software a chore.

Another thing I thought odd is the way Dragon handles sleeping. If you are like me, you can't really leave your microphone on all the time for DragonDictate to listen. There are phones ringing, dogs barking, and all manner of other noises in my office. Suppose you are dictating text into a program and the phone rings. You can say "go to sleep," which puts Dragon in a dormant state. It still listens, but it doesn't do anything until you say "wake up." However, DragonDictate still notifies you when it recognizes any of your words. That means you have to know if Dragon is asleep or not. However, you can't ask Dragon if it is asleep. If you want to handle this problem, you have to define your own "go to sleep" and "wake up" commands and do the work yourself. Then you'll still get the events for other words, but your program will know it should be asleep. The Dragon manual has examples of several ways to do this.

The Design

I've always thought the voice-activated autodialers on some high-end telephones are a great idea. You just speak a name into the phone and it dials away. (And before you ask, yes, you can get a headset that works with Dragon and your phone.) For the first cut, I tried to make the program understand the name I was saying and I figured I would store the phone numbers in a flat file or database.

As I got the speech part working, I realized I really didn't need a database, since I could store the phone number and name along with the speech model. Of course, you don't want to have to say the name and the number to dial on the phone, right? That defeats the whole purpose. However, Dragon lets you specify alternate pronunciations for words, as I'll show you in a bit. Since users have their own vocabulary, that means individual users also have a private phone book.

Figure 1 shows a completed dialer application. You don't need any special code or controls to handle the three buttons -- Dragon takes care of them automatically. But you do need some special work to make the phone automatically dial when you speak a name.

When you say "add" or click the Add button, the program brings up a simple form that lets you make a new entry. The form has two fields, Name and Number, that you can jump to by simply saying the appropriate word. Also, when the Name field has the focus, the program automatically places Dragon in dictation mode. It is impressive how many common names Dragon correctly interprets.

Saying "number" (or clicking the button with that name) brings up an input box. From here you can just say numbers aloud to dial them. Say "okay" when you are done, or "cancel" to abort. Dialing the phone is easy with an MSCOMM ActiveX control. The dialer assumes you have a Hayes-compatible modem on COM1 (although that's easy to change).

Implementation

The trick to this program is setting up word recognition. When you add a new name, the program constructs a string to "teach" Dragon. The string consists of the name, a tab character, and the phone number. However, this would be awkward to pronounce, so the program also adds a square bracket, the name alone, and closing bracket. By placing alternate text in brackets, you are telling Dragon that the text between the brackets is the correct pronunciation for the preceding word. The program then feeds this string to the AddWord method of the speech control. If Dragon can't deduce a speech model for the word, AddWord returns False and the program lets the user train the word in question. Listing One ncludes this logic (see the Add_Click subroutine).

Interestingly, the listbox holds names in the same format (but no pronunciation in square brackets). This makes it easy to create a single Dial routine that handles a string from the voice recognition or listbox. It also makes it easy to reconstruct the listbox from the vocabulary data on startup (see the Form_Load subroutine in Listing One).

The DDSpeech1_SpeechRecognized routine handles the voice dialing. The only reason there is more than one line of code in this routine is that I wanted to change the listbox selection to reflect the dialed number. Visual feedback is important when you are dealing with voice command, because voice is not 100 percent accurate. When you delete a name, for example, the program is careful to prompt you before taking action. It might be a good idea to add a similar safeguard to the dialing routine, too.

The form used to add names is available online. There, you can find the code that sets Dragon's mode when each text box receives the focus. This allows users to dictate names without having to explicitly set the dictation mode.

Other Possibilities

Once you have the ability to work with voice commands and dictation, there are many other ways you can make your application more voice friendly. For example, by using the SetHomeGroup script command, you could restrict the phone-number fields to accept only words that make sense for phone numbers. You can also use DgnTTS control to convert words back to voice (although for simple uses, you might be better off just playing prerecorded wave files).

Although DragonXTools has some problems (poor NT compatibility and difficult to manage sleep mode), it is exciting to watch a program respond to spoken words. Probably the biggest disadvantage is that users have to already have one of the Dragon products that provide the actual speech processing. Of course, if you are building a dedicated system, or you are willing to license the product from Dragon, this may not be a problem. Just try to resist the urge to speak into your mouse.

For More Information

Dragon Systems Inc.
320 Nevada Street
Newton, MA 02160
617-965-5200
http://www.dragonsystems.com/

DDJ

Listing One

VERSION 5.00Object = "{C9F1DD69-49F9-11D0-B5C5-444553540000}#1.0#0"; "dd32.ocx"
Object = "{648A5603-2C6E-101B-82B6-000000000014}#1.1#0"; "MSCOMM32.OCX"
Begin VB.Form MainForm 
   Caption         =   "Voice Dialer"
   ClientHeight    =   3195
   ClientLeft      =   60
   ClientTop       =   345
   ClientWidth     =   4680
   LinkTopic       =   "Form1"
   ScaleHeight     =   3195
   ScaleWidth      =   4680
   StartUpPosition =   3  'Windows Default
   Begin VB.CommandButton ManDial 
      Caption         =   "Number"
      Height          =   495
      Left            =   120
      TabIndex        =   3
      Top             =   1560
      Width           =   975
   End
   Begin MSCommLib.MSComm MSComm1 
      Left            =   720
      Top             =   2520
      _ExtentX        =   1005
      _ExtentY        =   1005
      _Version        =   327680
      DTREnable       =   0   'False
   End
   Begin VB.CommandButton Delete 
      Caption         =   "Remove"
      Height          =   495
      Left            =   120
      TabIndex        =   2
      Top             =   840
      Width           =   975
   End
   Begin VB.CommandButton Add
      Caption         =   "Add"
      Height          =   495
      Left            =   120
      TabIndex        =   1
      Top             =   120
      Width           =   975
   End
   Begin VB.ListBox List1 
      Height          =   2790
      Left            =   1320
      Sorted          =   -1  'True
      TabIndex        =   0
      Top             =   120
      Width           =   3135
   End
   Begin DDSpeechLib.DDSpeech DDSpeech1 
      Left            =   120
      Top             =   2640
      _Version        =   65536
      _ExtentX        =   741
      _ExtentY        =   741
      _StockProps     =   0
   End
End
Attribute VB_Name = "MainForm"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = True
Attribute VB_Exposed = False
Option Explicit


</p>
Private Sub Add_Click()
' Name, number, and generic string
Dim n As String, nm As String, s As String, word As String
AddForm.Show vbModal
If AddForm.Cancelled <> True Then
  n = AddForm.NewName
  nm = AddForm.NewNumber
  s = n & Chr(9) & nm
  List1.AddItem s
  word = s & "[" & n & "]"
  If DDSpeech1.AddWord("PhBook", "TelNum", word, "'") 
                                       = EXP_ERR_WORD_HAS_NO_MODEL Then
    DDSpeech1.TrainWord = word
  End If
Unload AddForm
End If
End Sub


</p>
' Dial a number in the format of name (tab) number [xxx]
' The brackets, if present at all, are ignored
Sub Dial(ByVal word As String)
Dim n As Integer
Dim t0 As Date
Dim dn As String, nam As String  ' Dial number, name
n = InStr(word, Chr(9))
dn = Right(word, Len(word) - n)
nam = Left(word, n - 1)
n = InStr(dn, "[")
If n <> 0 Then dn = Left(dn, n - 1)
MSComm1.PortOpen = True
MSComm1.Output = "ATV1E0DT" & dn & Chr(13)
t0 = DateAdd("s", 5, Now)
Do
    DoEvents
Loop Until Now > t0  ' Wait 5 seconds
MSComm1.PortOpen = False
MsgBox dn, vbOKOnly, "Dialed " & nam
End Sub


</p>
'Delete Entry
Private Sub Delete_Click()
Dim n As Integer
Dim word As String, nam As String
n = List1.ListIndex
If n <> -1 Then
  If MsgBox("Delete this entry", vbYesNo) = vbNo Then Exit Sub
  word = List1.Text
  nam = Left(word, InStr(word, Chr(9)) - 1)
  word = word & "[" & nam & "]"
' Delete word from dragon dictionary
  If DDSpeech1.DeleteWord("PhBook", "TelNum", word) Then
    List1.RemoveItem n
  Else
    MsgBox "Can't remove name"
  End If
Else
  MsgBox "Please select a name first"
End If
End Sub


</p>
' Manual dial a number
Private Sub ManDial_Click()
Dim nr As String
nr = InputBox("Enter or say the number to dial")
If nr <> "" Then Dial ("Manual Dial" & Chr(9) & nr)
End Sub


</p>
Private Sub DDSpeech1_SpeechRecognized(word As String, WordValue As String)
Dim SearchWord As String
Dim i As Integer
' Find string in listbox so we can highlight it
SearchWord = Left(word, InStr(word, "[") - 1)
List1.ListIndex = -1
For i = 0 To List1.ListCount - 1
  If SearchWord = List1.List(i) Then
    List1.ListIndex = i
    Exit For
  End If
Next i
Dial word  ' Do it
End Sub


</p>
Private Sub Form_Load()
Dim s As String
Dim n As Integer
' Start Dragon if not already started
If Not IsDDWinRunning() Then
  If Not StartDDWin() Then
    MsgBox "Can't start Dragon Dictate", vbExclamation
    End
  End If
End If
DDSpeech1.Attach = True
DDSpeech1.AddVocabulary "PhBook"
DDSpeech1.AddGroup "PhBook", "TelNum"
DDSpeech1.Vocabulary = "PhBook"
DDSpeech1.Group = "TelNum"
' Load phone numbers already in vocabulary
s = DDSpeech1.WordFirst
Do While s <> ""
  n = InStr(s, "[")
  List1.AddItem (Left(s, n - 1))
  s = DDSpeech1.WordNext
Loop
End Sub


</p>
' Double click for those who are speechless!
Private Sub List1_DblClick()
Dial List1.Text
End Sub

Back to Article


Copyright © 1998, Dr. Dobb's Journal

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.