brian has been a Perl user since 1994. He is founder of the first Perl Users Group, NY.pm, and Perl Mongers, the Perl advocacy organization. He has been teaching Perl through Stonehenge Consulting for the past five years, and has been a featured speaker at The Perl Conference, Perl University, YAPC, COMDEX, and Builder.com. Contact brian at [email protected].
If I can detect a wireless access point, I can figure out who made the hardware. Plenty of people like to make maps of wireless hot spots, and now that every coffee shop, bookstore, and McDonalds seems to have wireless access, the maps are pretty well covered. I'm interested in which hardware people are using, though.
In a cab on the way from from the airport last week, I had a lot of juice left in my Powerbook, so I opened it and started doing some work. I was immediately prompted to join a wireless network.
I declined the invitation, of course, but I decided to see what was available. I had a half-hour until I would get home, and I would cover about 20 miles of a main surface street along the way. I fired up MacStumbler to see what was around. This is usually called "war driving," and although I'm not a war driver (I don't even have a car, a basic requirement), I gave it a try from the cab.
I got a lot of hits. Every time I passed a school, whether elementary school or college, MacStumbler would start pinging wildly, and within a couple of blocks the list of access points had scrolled the screen.
Every hit reveals a lot of detail, and I could get even more if I connected a GPS receiver. I can export the information to a text file in wiscan format. There are several columns in the text format, but I've pulled out the first several columns because the full record is too long to fit here. The first two columns are earth coordinates: I've left out the actual coordinates. The next column, contained in parentheses, is the access point name (SSID in wireless parlance). The next column contains the mode: The BSS says that this access point is set up as a Base Station Subsystem. The last column that I show is the one I want because it has information about the hardware manufacturer.
N 0 E 0 (Otter Office) BSS (00:11:24:00:c5:a1) ... N 0 E 0 (Otter Base) BSS (00:0d:93:8b:7b:3d) ...
That last column is the MAC address, which uniquely identifies the hardware that handles the network interface (NIC). The first 3 bytes identify the manufacturer, and come from the Institute of Electrical and Electronics Engineers (IEEE). This is also known as the Organizational Unit Identifier (OUI). The IEEE maintains a list of all of the OUIs, along with the manufacturer and address.
I don't have to worry about the IEEE listing, though, because I have a module, Net::MAC::Vendor, that handles it for me. I originally made this module to explore my home network and identify machines by their NIC manufacturers. Now I can use it to identify the vendors of wireless access points I came across on the way home.
First, I need to get the MAC out of the text file. I could write a full program to do this, but I want to do this from the command-line because it's more fun that way. I saved the data in a file I named wiscan.txt:
% perl -nle 'm/\s([0-9a-f:]{17})\s/i and \ print $1' wiscan.txt
I just want the MAC portion. I could create a full parser to get me every column, but I'm in a cab and I have a half hour before I get home. I want to write the program and run it several times before I have to get out of the cab. I cheat in the regular expression a bit because I know that the 17-character sequence of hexadecimal digits and colons is almost always going to be the MAC. I certainly hope no one has named their access point with 17 colons in a row. If the match works, I continue through the and short circuit and print the value of $1, which is the part of the regular expression wrapped in parentheses. The -n switch wraps a while(<>){ } loop around the program I specified on the command line with -e, and the -l switch adds a newline to the end of my print statement (it also does a chomp on the input line, but I don't care about that).
My output is a list of MAC addresses. I'll want to use these to get the OUI information, and then I can count the different manufacturers:
00:11:24:08:5e:25 00:11:24:00:c5:a1 00:0d:93:8b:7b:3d 00:0c:41:79:04:95
For the next part, I use the lookup() function from Net::MAC::Vendor. I'll pipe a list of MACs into another program to translate them to vendor names. The lookup() function uses the IEEE web site to get the OUI for the MAC, then returns an anonymous array of that vendor's information. The first element in that array is the vendor name. For those four MACs, I get these vendor names. Three of those access points belong to Apple, but that's no big deal: Those are the access points I saw right before I got out of the cab, and they all live in my apartment. That Linksys is downstairs.
Apple Computer Apple Computer Apple Computer The Linksys Group, Inc.
I start with a simple program that takes a line of input and passes it to the lookup() function then prints the first element in the anonymous array it gets back. Before I do any of that, I use the load_cache() function, which fetches the entire list of OUIs and parses it so lookup() doesn't have to use the network every time I call it. I pay the network penalty just once with load_cache():
#!/usr/bin/perl use Net::MAC::Vendor; package Net::MAC::Vendor; load_cache(); while( <> ) { my $oui = lookup( $_ ); print $oui->[0], "\n"; }
I save this program as "oui" and can add it to my command pipeline to get that list of vendor names:
% perl -nle 'm/\s([0-9a-f:]{17})\s/i and \ print $1' wiscan.txt | oui
Before I get too far, I should state this caveat: The vendor name does not necessarily tell you whose name is on the outer case of the product. I'll ignore that in this article. I want to count the number of times each vendor appears. That's a bit tricky, though, since the same vendor can have several different blocks of numbers, and it seems that each block has a different form of the vendor name. Here are a few examples:
2WIRE, INC. 2Wire, Inc Apple Computer Apple Computer, Inc. Cisco Systems Cisco-Linksys Cisco-Linksys, LLC D-Link Corporation D-Link Systems, Inc.
I modified my OUI program to try to clean up the data a bit. I lowercase everything then uppercase letters after a word boundary, and I get rid of anything after a comma or anything that looks like INC, LTD, and so on. It's not going to be perfect, because I can already see I can't handle something like D-Link's different representations.
#!/usr/bin/perl use Net::MAC::Vendor; package Net::MAC::Vendor; load_cache(); while( <> ) { my $oui = lookup( $_ ); my $vendor = lc $oui->[0]; $vendor =~ s/,.*//; $vendor =~ s/\b(llc|inc|ltd|co|corp)(\.|\b)//i; $vendor =~ s/\s+$//i; $vendor =~ s/\b(\w)/\U$1/g; print $vendor, "\n"; }
That cleans up that list quite nicely, save for a few entries such as D-Link. Now I need to count them. I modified my program again to accumulate counts with a hash, then report the results after it has seen all the lines. In the foreach loop, I sort the hash keys by their values so I get a descending list of keys based on the number of times I saw an access point from that vendor. I also keep track of the total number of access points that I saw so I can compute a percentage:
#!/usr/bin/perl use Net::MAC::Vendor; package Net::MAC::Vendor; load_cache(); my %count; my $total; while( <> ) { my $oui = lookup( $_ ); my $vendor = lc $oui->[0]; $vendor =~ s/,.*//; $vendor =~ s/\b(llc|inc|ltd|co|corp)(\.|\b)//i; $vendor =~ s/\s+$//i; $vendor =~ s/\b(\w)/\U$1/g; $count{ $vendor }++; $total++; } foreach my $key ( sort { $count{$b} <=> $count{$a} } keys %count ) { printf "%3d %2d%% %s\n", $count{$key}, $count{$key} / $total * 100, $key; }
Finally, I get my results, and the top entries don't surprise me: LinkSys has the most routers, and its name shows up in the first two entries. Chicago public schools use those, and there are a lot of schools between my apartment and the airport. Even if I subtract the several Apple Airports I have, Apple is still doing pretty well, and much better than I expected, considering they are probably the most expensive (even if they are the easiest on the eyes).
25 19% The Linksys Group 22 17% Cisco-Linksys 21 16% 2wire 16 12% Apple Computer 8 6% Cisco Systems 7 5% D-Link Corporation
If I cared enough, I could do the same sort of thing with the mapping work other war drivers have done and locate clusters of different sorts of access points, but I'll leave that up to you. Drive safely.
References
Net::MAC::Vendor: http://search.cpan.org/dist/Net-MAC-Vendor/
OUI: http://standards.ieee.org/regauth/oui/oui.txt
War driving: http://faq.wardrive.net/
MacStumbler: http://www.macstumbler.com/
TPJ