Visual Cryptography
Dear DDJ,
Concerning visual threshold schemes, as discussed by Doug Stinson in his article "Visual Cryptography and Threshold Schemes" (DDJ, April 1998): In the special case of a two-out-of-two visual threshold scheme, there's a simple and obvious scheme that doesn't involve the creation of subpixels if the use of a stereoscopic viewer is allowed for decoding. Black or white pixels for one share are assigned by means of random coin flips, coin flips, or other random or pseudorandom means. The other share is obtained by XORing the first share with the simple black and white picture (meaning that it uses black or white pixels and no grays, of course) to be encrypted. The result is what is called something like a Bell's random-dot stereogram (not to be confused with the popular single-image stereograms). It works because the brain can detect the differences in the corresponding right left pixels.
August Pamplona
[email protected]
EccoFan
Dear DDJ,
Bravo! I'm so glad to see Dennis Shasha's Dr. Ecco appear as a regular column in Dr. Dobb's Journal. As a fan of Mr. Shasha's books, I'm looking forward to wasting much time working through Dr. Ecco's new challenges each month.
Mike Kurtinitis
[email protected]
Help Wanted
Dear DDJ,
Following up on Jonathan Erickson's April 1998 "Editorial," in some respect I tend to doubt the alleged shortage of programmers. The recruiters seem to be inundated with résumés, which would indicate that people are seeking jobs, which might lead you to believe that their present employers aren't trying very hard to keep employees.
And not only the recruiters. Check out http://www.scientific.com/ and look at the "resume workbook." A well-known agency states that employers skim résumés when hiring, not positively looking for the best applicant, but eliminating people with things wrong. Not the situation you'd expect with a dearth of programmers.
Then, too, companies suffer from hardening of the categories. They look for a "Windows developer" or a "UNIX developer" as though they were two species that didn't interbreed. Again, if there were a shortage of programmers, companies wouldn't have the luxury of seeking only someone who already did for the last three years exactly what they were being hired to do.
On the other hand, thinking about how buggy so much software is, it may be that there are few good programmers. That might also explain the pigeonholing, if many programmers lack flexibility. But I don't hear about the shortage of good programmers, just the shortage of programmers.
Stuart Ambler
[email protected]
Ternary Searches
Dear DDJ,
I was pleased that your April 1998 included an article by two of my favorite algorithmists -- Jon Bentley and Robert Sedgewick. The article was well written and represented original research in "Fast Algorithms for Sorting and Searching Strings" (which you can download at http://www.cs.princeton/edu/~rs/strings/).
All the same, I do have some comments about their presentation of the Ternary Search Tree (TST).
- Comparison of TSTs to Hash Tables. First, the comparison of TSTs to hash tables seems a bit of a "straw man" argument, since hash tables were never intended to provide the kind of string-matching functionality possible with trees/tries. However, the arguments against hash tables would have been somewhat stronger if closed hashing (open addressing) were used instead of open hashing (separate chaining) with linked lists. Closed hashing is somewhat more inflexible in that a new table has to be generated when the table becomes full, whereas open hashing can keep adding entries -- with obvious performance penalties. However, properly designed closed hashing is often faster than open hashing.
Furthermore, a relatively naive hash function was used (a) without any of the consideration for performance optimization given to the TSTs and (b) where better, widely used algorithms exist (for instance, the 32-bit ELF hash function). This is important because the performance of open and closed hashing deteriorates if there are too many collisions.
Also, it should be evident that hash table algorithms are substantially less complex for all operations (insertions, deletion, and even table resizing) than the TST algorithms -- especially the optimized TST variants. This is an important consideration for real-world programming.
- Static versus Dynamic Algorithm/Data Structure. Within the context of the examples discussed, the issues of table growth and shrinkage are essentially irrelevant, since the data presented is fixed and table sizes are readily computed in advance. This makes the argument against hash tables somewhat less strong. Indeed, TSTs seem, at least in the algorithmic form presented, to be appropriate only for static rather than dynamic data sets. First, there is the omission of a "delete" function, an apparent oversight. Second, the construction of TSTs following the DDJ article can be performed only for a completely specified set.
However, there are two references in the original academic article to papers by Vaishnavi and Sleator/Tarjan for techniques to balance ternary search trees. It would have been useful to have presented TST variants of these algorithms in the DDJ article.
- TSTs versus FSAs. A comparison of TSTs to FSAs would have been very useful. FSAs can offer all the functional advantages of TSTs, including all the string-matching functions (although this requires careful implementation of the automata). Furthermore, in their static variants, FSAs can be constructed as DAGs, which allows for substantial reduction in the size of the automaton because of shared "tails." Furthermore, FSA variants such as Aho-Corasick automata allow for additional matching functionality (that is, "parallel" and multiple substring matching). It is not for nothing that FSAs have achieved such popularity in natural language processing (NLP) applications, including excellent work done at AT&T by Mohri, Pereira, and Riley. (See, for example, E. Roche and Y. Schabes [eds.], Finite-State Language Processing [MIT Press 1997].)
Some Specific Observations. It appears that the "rsearch" function matches the input string only as a PREFIX of some string in the TST. This function does not verify that the search string matches fully an original null-terminated input string inserted into the TST. It is not difficult to modify the code to correct this behavior.
Finally, a number of additional string functions -- which are clearly relevant to OCR processing -- are possible: (a) character-set, optional-character, and Kleene-star matching; and (b) most-probable character matching.
Win Carus, Chief Technologist
Inso Corp.
[email protected]
Dear DDJ,
I really liked the April 1998 issue on algorithms, something that is repeatedly ignored in this day of class libraries and APIs. The article "Ternary Search Tree" by Jon Bentley and Robert Sedgewick was particularly interesting -- especially since it is similar to the method I used to implement an LZW compressor several years ago. The one trick the authors missed was to use a different comparison method for searching the tree. Instead of a simple magnitude comparison, test a bit of the character that is being searched for. I've converted their Listing Three to the following, which uses this method.
int search(char *s){ Tptr p; unsigned mask = 1; p = root; while (p) { if (*s == p->splitchar) { if (*s++ == 0) return 1; else { p = p->eqkid; mask = 1; } } else { p = (*s & mask) ? (p->hikid) : (p->lokid); mask =<< 1; } return 0; }
The advantage to using this method is that the number of nodes you have to search for any particular code will never be more than 1+number of bits to represent the code. For 26 letters, mapped to the integers 0-25, you will never have to test more than six nodes to find a match, regardless of what order things are put into the tree.
The disadvantage to this method is that it is difficult to generate a sorted list. This may be easier if the high bit was used first instead, but this tends to make the search speed suffer somewhat. My application didn't need that particular feature, so I haven't worried about it.
Tim McCaffrey
[email protected]
Window Sizes and the Registry
Dear DDJ,
Upon reading Al Stevens April 1998 column, it occurred to me that the problem his friend was experiencing with the application that would not restore was a similar (maybe the same) problem we experienced with our product. As Al stated, the application was storing the window coordinates in the registry. When the application is run again it restores itself where it was.
We encountered a similar symptom during testing when taking our application from Windows NT 3.51 to Windows 95. When an application is minimized and closed on Windows 95 or NT 4.0, very strange window coordinates are returned, unlike 3.51, which returned the coordinates of the icon. We made a simple change to not record the coordinates if the application was minimized.
I'm going to look back into that class and put some checking in to make sure that the window coordinates are on the screen and valid. Changing screen resolution could give similar results.
Pete Sage
[email protected]
DDJ
Copyright © 1998, Dr. Dobb's Journal