Thomas has authored four books on C, and coauthored Efficient C (with Jim Brodie) and C++ Programming Guidelines (with Daniel Saks). Plum Hall (his company) provides test suites for C, C++, Java, and C#. Thomas can be contacted at [email protected].
For any number of reasons, people always want to know the relative popularity of various programming languages. To provide one measure of language popularity, we focused upon one publicly available and objective measurementthe number of web-based job offers that specify requirements for different programming languages. Our most recent analysis covered the 12-month period from July 2002 to June 2003. We scanned job offers in the category "software" from several employment web sites, eliminating duplicates. Table 1 presents our results.
There are several technical issues: We eliminated duplicate offers on a monthly basis. We used case-insensitive matches. In counting "Java" requirements, we had to avoid false hits on "JAVASCRIPT" (obviously). We counted "J2EE," "J2SE," and "J2ME" as equivalent to "JAVA." We added "JAVASCRIPT," "JSCRIPT," and "ECMASCRIPT" together to make a "J*script" total. To exclude "VBA" and "VBSCRIPT" from the "Vbasic" total, we matched "VB followed by any letter except A or S," adding that to the matches for "VISUAL BASIC." The total indicated as "Vbasic.net" counts job offers that matched "Vbasic" and also ".NET." We noticed false hits on "PASCAL" when the name "Pascale" appeared in the job offer, so we counted only "PASCAL followed by a nonletter." Several web sites could not properly handle "C#" as a lookup keyword, so we scanned the full text of all "software" offers and performed our own keyword search. The number of (nonduplicate) job offers per month varied, but was never less than 4000.
Determining the percentages for "C" was the most challenging. Our first attempts used a simple regular expression like the other matches: "[^A-Za-z0-9]C[^.A-Za-Z0-9#+]" (which means "the letter C, preceded by a character that isn't a letter or digit, and followed by a character that isn't a letter or a digit or a period or a sharp-sign or a plus"). We then visually scanned a month's data, finding that about 5 percent of the "C" hits are clearly not C programming jobs: "Bldg C," "Suite C," "Unit C," "A/C" (air conditioning?), "C-Level" ("C-level sales," "C-level executive"), "4.6.C," "C-1426," and so on. Therefore, we prefiltered "C-LANG" and "C-CODE" into plain "C," we prefiltered "C-SHARP" or "C SHARP" into "C#," we prefiltered "C ++" or "C PLUS PLUS" into "C++," then we excluded "period before or after C," and we excluded "hyphen after C."
We added another category"C/C++"which contained all the "C" cases that also contain "C++" somewhere (usually, but not always, the keyword "C/C++"). The "C/C++" percentage was usually more than half of the "C" percentage, and about half of the "C++" percentage.
From early reviewers, we received some comments on our methodology. One reader cautioned that percentages based on published job offers overlook those offers filled internally within the organization and that the percentages of jobs filled internally might be significantly different. We agree, but can't study internally filled jobs with our methodology. Other readers requested more languages. We've added all the languages requested so far; if you want more languages or more detailed analysis, just ask us.
If you believe that some publicly available job sites are biased in favor of, or against, any particular programming language, we would be grateful for your information. (As of today, we are unaware of any such biases.)
My special thanks to Doug Teeple ([email protected]) and John Breeden ([email protected]), for valuable assistance with the survey software.
TPJ