Using Behavioral Data to Improve Search

Since 2006, “Best Match” has been the default sort order for users on eBay. It is designed to balance the many objectives that users have on the site–for example finding relevant items to their query, scoring great deals, finding timely auctions, and locating items by reliable sellers. For users that want to emphasize one of these areas over others, there are other sort options and eBay’s Advanced Search function, but we feel it’s important to help users have a good experience even without exerting a lot of effort or customization. As Best Match has evolved over time, it has incorporated more and more information to help it distinguish relevant items (more interesting, more sales worthy, more attractive) from irrelevant ones. In the past 18 months since we founded the Search Science team, we’ve made substantial strides. For example, here are snapshots of the results for the query “plasma tv” using the version of Best Match that was used on the site 18 months ago versus the same query using today’s Best Match:

The differences are striking, with no actual TVs in the top 10 results in the old version, compared to a number of TVs at good price points in the new version. So what is the driver behind these improvements? One of the biggest differences between these two versions is in how we are using behavioral data. By behavioral data, I mean observations about how our users behave when shown a listing. Do they click on it? Do they place a bid? Do they Buy It Now? All of us have seen the “wisdom of crowds” in action. The best restaurants are frequently crowded and always have a wait, and we’re suspicious if a restaurant is empty in a hot neighborhood on Friday evening. Similarly, when users have found an item interesting in the past, by clicking on it, bidding on it, or purchasing it, we expect that tells us something about how relevant that item might be to other users.

Best Match has used behavioral data for several years, but in the last 18 months, we’ve learned a lot about how to interpret and clean that data before using it for ranking. For sales and click data, there is one obvious bias that has been well-studied by search engines before: position bias. Position bias means the items that appear in the top positions of a search result page are far more likely to be clicked and purchased by users. Left unchecked, this bias makes it appear that the listings at the top of the search are more relevant, because they’re clicked more frequently than lower results. This can result in a feedback loop when you use clicks and sales in your ranking function: an item ranks highly, so it gets more clicks, so it ranks even higher, and gets still more clicks, and . . . well, you get the point.

There are several ways to reduce this bias, the simplest of which is to normalize each click by its Expected Clicks based on its position. Let’s look at an example. Let’s assume we live in a world where users on average click on the top-ranked listing 10% of the time, the second-highest listing 9% of the time, and so on until the 10th-best listing, which is clicked on 1% of the time. Now let’s assume we want to rank a listing which has been clicked on 20% of the time. If all of those clicks came while the item was in the top position, where Expected Clicks were 10%, we’d say this item performed 2x (20% actual / 10% expected) better than expected. If on the other hand the item got all its clicks while it was in the 10th position, we’d say it performed 20x (20% actual / 1% expected) better than expected. You’ll notice that items that are ranked lower get more credit for their clicks–this counterbalances the feedback loop we described above and makes sure all items have an opportunity to improve their ranking if they perform better than expected.

Another bias in behavioral data that is less-well studied in academic literature is a price bias. Basically, it’s easier for people to buy cheap things than it is to buy expensive things. If I showed you a fantastic #2 pencil and offered it to you for the great price of 5 cents, how long would it take for you to consider buying it? Maybe 10 seconds? Now if I showed you a fantastic plasma TV and offered it to you for the great price of $500, how long would it take to consider buying it? Well, that’s a lot of money…maybe you should look at some reviews, or talk to that friend of yours who knows a lot about TVs, or do some price comparisons. It might take you days, weeks, even months to re-assure yourself that you’re using your money wisely.

Let’s get back to behavioral data. Since pencils sell faster than plasma TVs, is the pencil a better item than the TV? No, we said above that they’re both fantastic and they’re both offered at a great price. So we don’t want to penalize the TV just because it has a naturally slower selling rate. And we definitely don’t want to always show pencils before TVs in our rankings–what if the user searched specifically for “plasma tv”? We need to correct for this price bias in our sales data. We can apply the same principle we did for position bias, and figure out the Expected Sales for an item of price X based on historical data.

These two changes (correcting position bias and correcting price bias) combined have led to significant improvements in the relevance of Best Match, and more items are selling on eBay as a result. We’re experimenting and learning every day about how to better use the feedback our users are already giving us–what they choose to click on and what they choose to buy. Stay tuned for more information about our efforts to enhance eBay’s search relevance.

Tags: Machine Learning, Search Science