eBay’s Transformation to a Modern AI Platform

How the AI transformation at eBay was powered by a modern AI platform with a unified and open approach.

Have you ever wanted to find an item and struggled to describe it in words? Now, with computer vision powered by eBay’s modern AI platform, the technology helps you find items based on the click of your camera or an image. Users can go onto the eBay app and take a photo of what they are looking for and within milliseconds, the platform surfaces items that match the image. The user has not only activated computer vision technology, but they have also tapped into some advanced AI capabilities, including deep learning, distributed training and inferencing. The computer vision algorithm sifts through more than half a billion images and eBay’s 1.4 billion listings to find and show you the most relevant listings that are visually similar. 

A primary reason why this can be done so effectively at scale and with precision is because of Krylov, eBay’s modern, state-of-the-art AI platform designed to boost eBay’s productivity with AI and accelerate time to market of AI models at scale.

AI platforms are having a huge impact in leading companies across all industries. Public cloud providers like Google use AI platforms to provide many of their products and services.The AI platform at Facebook, called FBLearner Flow, personalizes news feeds and filters out offensive content. At Uber, the machine-learning platform Michelangelo powers the ability to give customers an accurate prediction of when a restaurant meal they’ve ordered through UberEats will be delivered. 

Similarly, eBay built Krylov from the ground up as a scalable and multi-tenant, cloud-based AI platform that powers a diverse set of AI use cases at scale. In 2019 alone, data scientists at eBay used Krylov to run thousands of model training experiments per month spanning AI use cases, such as computer vision, natural language processing (NLP), merchandising recommendations, buyer personalization, seller price guidance, risk, trust, shipping estimates, and more.

eBay SEI Conference 110419 v3c

Figure 1. eBay AI strategy.

 

From Months to Minutes

Before Krylov, data scientists building models needed weeks, sometimes even months, to become productive. They would need to procure and manage infrastructure, move data to the machines, and install frameworks –  and sometimes still encounter issues, leading to productivity overheads. Training models on large data sets cannot be scaled across nodes.

Now that the infrastructure is available on demand on the AI cloud, data scientists have access to the latest software, hardware, models and runtimes, such as Notebooks, Tensorflow, PyTorch and H20. Through these runtimes we can train models like BERT (for language understanding) or ResNet (for Computer Vision) at scale on our inventory of 1.4 billion listings.

Data scientists can train models on large data sets using distributed training. They can run experiments and hyper-parameter tuning in parallel; record and visualize the experiments; and deploy the best model experiments. For example, our AI researchers have used Krylov to train neural machine translation models, deep and wide models for recommendations as well as computer vision models to power image search. This is key to improving model precision as well as time to market for eBay’s machine translation technology, which is a significant contributor to enabling cross-border trade, which makes up 59% of eBay’s international revenue. 

Krylov allows our AI teams to maximize the power of the vast repositories of data, both batch and real time, that eBay has. If you think of data as the fuel for artificial intelligence and machine learning, Krylov is the sophisticated vehicle being powered by that fuel.

And it’s a fast-moving vehicle. Today, data scientists can spin up an AI workspace with popular software frameworks (Tensorflow, Scikitlearn, Math libraries, Jupyter notebooks, etc.) on compute configurations of their choice (GPU, high-memory high-cores) in less than a minute. Previously, this process could take days. 

Data scientists can also run automated AI workflows (pipelines) using Python, Java or Scala interfaces to experiment with various approaches (hyper-parameters) and record their experiments/compare the output of the experiments. The ability to do hyper-parameter tuning and run distributed training on large datasets and models have resulted in marked improvements in the accuracy of models. 

eBay designed and built its own specialized servers to better manage the vast amounts of data that moves through its system. The new servers allow eBay data scientists and engineers to accelerate the production of new features, reducing development time from weeks to hours.

The business impact is a dramatic improvement in time to deployment. eBay can now automate model training and deploy the models over individualized or a common inference as a platform in days, compared to the months that were once required. This has led to improvements to important functionality like Image Search, which allows shoppers to browse for an item they want by uploading a picture of a similar item. 

Building a Three-Pronged, Unified Team for a Unified AI Platform

While Krylov is highly innovative, so was the way in which it was developed.  

A unified platform for eBay needs to scale across a diverse set of use cases, such as computer vision, natural language processing (NLP) and recommendations. Consequently, developers and data scientists had a diverse set of needs. This was a multi-year platform transformation. Implementing Krylov was an exercise in breaking down varied silos and coming together across functions and geographies to develop and execute on a common unified vision. 

To guide the project, we put together the Unified AI Initiative Core Team (ICT). The ICT included representatives from the AI platform team, which is the provider of the service, the owner and builder of the platform. Also represented were AI platform dependencies: hardware, compute, network, storage and data services.

The third component of ICT was the AI domain teams, the internal customers of the platform, such as AI research and engineering in ads, computer vision, NLP, risk, trust and marketing. These AI teams have a vested interest in defining, shaping and adopting the platform for their everyday, AI lifecycle management.

Together, these experts created a unified AI vision for eBay – the strategy, roadmap and key tenets of the platform. This was a hands-on process. At various points, researchers and engineers from the domain teams either contributed or embedded themselves in an internal open source manner to build parts of the platform. Since these engineers and researchers were closer to the domain problems (AI lifecycle) or had built frameworks/platforms for their specific needs in the past, they were able to provide critical input. In some cases, there were frameworks or platforms that were absorbed into “Unified AI Platform,” because they solved a specific problem really well and could help accelerate the evolution of the platform for the broader eBay AI community.

Additionally, we also instituted an eBay Machine Learning (ML) Engineering Fellowship program, where any engineer at eBay could embed themselves into the AI platform team similar to an internship program to help build the platform features from the product backlog. This fellowship program aims to familiarize eBay engineers with ML concepts and technologies. Participants are mentored on ML engineering concepts by senior domain experts. 

The Internal Open Source model as well as the ML Engineering fellowship program helped in not only code contributions but also as a feedback mechanism to the development of the platform as we scaled up our scientists’ and engineers’ skill sets.

Understanding Pain Points

In the discovery phase of building Krylov, global eBay teams across different geographies worked together to better understand the pain points and challenges of building eBay’s AI. This included understanding needs and wants; showing empathy to and appreciating the day-in-the-life of the AI researchers and developers; and researching the existing approaches in the industry.

The phased strategy to build, adopt and transform AI over multiple years required:

  • AI training cluster with easy, secure and performant access to data with powerful compute (GPUs, high-memory and high-cores)
  • Training platform: automatable training workflows and interactive workspaces, SDK and clients (Python, Java, Scala, REST)
  • AI model lifecycle management: model experiment management, model management service, deployment services, AI Hub (web-based UI)
  • Model serving platform and feedback loop: deploy AI models as a service tied to the experimentation framework and monitoring systems (operational as well as model performance)
  • Data lifecycle abstraction for modeling, deployment and inferencing lifecycle that consisted of data discovery, preparation, feature store and serving, and feedback loop
  • In addition, the platform had to be built with a few key tenets to address the diverse AI use cases and operational patterns of data scientists and engineering teams at eBay. The key tenets we established were:
    • Support for heterogeneous software frameworks — Tensorflow, PyTorch, Cafe, Notebooks, any framework of choice
    • Heterogeneous hardware architecture — support GPUs, high-memory CPU based
    • Built for scale 
    • Using open source technologies, in an open source manner 
Figure 2. End-to-end AI model lifecycle management using the AI Platform.

Figure 2. End-to-end AI model lifecycle management using the AI Platform.

  

Figure 3: AI Hub (UI for end-2-end lifecycle management of models) showing a mod

Figure 3: AI Hub (UI for end-2-end lifecycle management of models) showing a model training experiment in AI project with collaborators.

 
Figure 4: AI Hub showing comparisons between metrics for two model training expe

Figure 4: AI Hub showing comparisons between metrics for two model training experiments in an AI project.

 

Figure 3: AI Hub showing visualization of an ML model training workflow (DAG), w

Figure 5: AI Hub showing visualization of an ML model training workflow (DAG), where the user can see the status as well more details of each task in the workflow. Users can also attach logs and assets, specify configurations, and view deployment status.

 

As the platform was built, we would provide previews, alpha, beta access for the AI ICT teams to get early access and to test the platform. This iterative and collaborative engagement with a unified vision and execution helped build a unified platform by the eBay AI community for the eBay AI community.

While the early results are promising, we are by no means finished. AI is an evolutionary journey with no final destination. With Krylov by our side, we are in a great position to evolve our use of AI as customer needs change and opportunities arise.

Looking ahead, we will continue down the path of innovation through eBay’s AI-managed marketplace, knowing that the scope of what’s possible with AI expands every day. We’ll continue to share what we’re discovering and how we’re incorporating AI on our platform to create the most fulfilling commerce experience for our customers.