I certainly do not have any expertise in Web search. I can claim some, however, in another area that is highly relevant to it: trying to optimize the performance of statistical algorithms.
In my case, the algorithms are designed to predict the outcome of elections. FiveThirtyEight’s algorithm to forecast U.S. House races, for instance, considers several different “ingredients”– like polls, fund-raising data, and demographic data — and mixes them together in such a way as to maximize the accuracy of the forecasts. Although there are a few nuances one needs to consider, for the most part this is based on which combination of ingredients would have produced the best forecasts in past elections.
Search is similar. The search engines are trying to predict the relevance of the results generated by each query. (How is relevance determined? Basically just by what a human being finds most relevant, which can be measured either through objective data, like how many clicks each link generates, or through testing that asks users for a subjective evaluation of the quality of different results.) There are different inputs that the search engines can consider to rank results — for instance, the text contained on each Web site, its traffic statistics, and the behavior of previous users of the search engine — and they can mix these in different ratios depending on what seems to do be doing the best job of maximizing performance.
A big controversy erupted Tuesday in search: Google accused Microsoft’s search engine, Bing, of cheating off of it. The rather clever way in which they determined this is by manually inserting nonsensical results in response to nonsensical queries. For instance, in response to the gibberish string of characters “uegosdeben1ogrande,” Google rigged its algorithm to return a link to a site selling hip-hop jewelery as the “best” response. This could not plausibly be arrived at unless another search engine were using Google’s results as one of their ingredients. But for several of these queries, lo and behold, Bing returned exactly the same results that Google did.
Microsoft has not really disputed the claim — instead, they’ve said that it’s not such a big deal. Yes, Google search results — which it obtains by tracking the behavior of people who use its Internet Explorer 8 toolbar, perhaps along with other means — are one of the inputs that Bing uses, Microsoft says. But it also uses more than 1,000 others. The reason that Google’s results appear verbatim in response to nonsense queries like “uegosdeben1ogrande” is because in such a case, only the Google results are “relevant”; the other 999 variables don’t shed any light on the problem.
Let me try to referee this dispute. I’m not going to weigh in on the legality or even necessarily the ethics of this — I’m just trying to consider what makes for better search.
Microsoft’s defense boils down to this: Google results are just one of the many ingredients that we use. For two reasons, this argument is not necessarily convincing.
First, not all of the inputs are necessarily equal. It could be, for instance, that the Google results are weighted so heavily that they are as important as the other 999 inputs combined.
And it may also be that an even larger fraction of what creates value for Bing users are Google’s results. Bing might consider hundreds of other variables, but these might produce little overall improvement in the quality of its search, or might actually detract from it. (Microsoft might or might not recognize this, since measuring relevance is tricky: it could be that features that they think are improving the relevance of their results actually aren’t helping very much.)
Second, it is problematic for Microsoft to describe Google results as just one of many “signals and features”. Google results are not any ordinary kind of input; instead, they are more of a finished (albeit ever-evolving) product, one which itself is made up of hundreds of different “signals and features”, which have already been combined together in such a way as to maximize performance in response to the exact same queries that Bing is getting.
Imagine that you opened an Italian restaurant across the street from Mario Batali’s Lupa. It would be one thing if you merely took inspiration from Lupa’s spaghetti carbonara — if you tried to use some of the same ingredients and some of the same techniques. Maybe you’d even go so far as to track down the butcher who sells Mr. Batali his pancetta. All of this would be in the spirit, most of us would think, of good ol’ American competition.
But this is more like, when a customer orders the carbonara, sending a runner across the street to order a plate of it at Lupa, reheating it, and then maybe adding some mushrooms or snow peas. The alterations you made to the dish, whether slight or substantial, might not be all that likely to make it better: it would be hard to improve on Mr. Batali’s carbonara (if peas really made the dish yummier, wouldn’t he already have included them in the recipe?), just as it would be hard to improve on Google’s search results.
There are certainly cases, of course, in which a finished product can made better. In the algorithms that we use to forecast U.S. House races, for instance, one of the inputs is the race ratings (such as “toss-up” or “leans Republican”) generated by outside agencies like Cook Political and CQ Politics. These, essentially, already constitute finished products, since these groups are already trying to weigh a number of different factors together to optimize them.
What we’ve found is that the ratings generated by these groups are very good, and that it would be hard to improve on them by using statistical inputs alone. (This is not the case for Senate or gubernatorial races, where the polling is much more abundant and much more reliable, but it is the case for House races.) However, a combination of these race ratings and statistical inputs (not necessarily weighted equally — the race ratings make up something like 20 percent of the FiveThirtyEight forecasts) does slightly better than either one taken alone.
There are undoubtedly many improvements that Microsoft has in fact made to search; there are a lot of very, very smart engineers there, after all, and I’m sure they have come across a few tricks that Google hasn’t. Also, there are some cases in which Microsoft has access to proprietary data that Google does not. For instance, because Microsoft makes software products, they probably have a good understanding of what things a user does when — as has been known to happen on occasion — Windows Vista crashes. They may therefore do better than Google in responding to queries for technical support, something which search engines are often quite poor at. Bing, also, runs a travel site (which is very good), whereas Google does not, so they might do better in responding to a query like “best fare JFK LAX“.
How much value Microsoft’s engineers are ultimately adding is hard to say. Both they and Google are extremely circumspect about revealing any detail about their algorithms. And in contrast to election forecasts — where it’s fairly straightforward to evaluate performance (who called more races right?) — it is hard to measure the efficacy of a set of search results.
Still, Google has spent many, many years trying to perfect search — probably an order of magnitude more man-hours than Microsoft has. To the extent that Bing’s results are competitive with Google’s across the broad spectrum of potential searches, one wonders how much this is because they are simply using Google’s data.