An Automatic Mirror Testing Solution for Messaging Systems in eBay

A look at how we developed a solution for automatic mirror testing to overcome the challenges of migrating eBay’s proprietary messaging applications to a more modern technical stack.

Business event stream (BES) is a reliable messaging framework developed at eBay that is widely used by nearly all domain teams in eBay. It is designed to provide a standard mechanism for propagation and asynchronous processing of events.

A typical BES application workflow is shown below. Events published by PLSQL/API are populated into the database. Then, BES consumers will retrieve events from the database and process these events with specific business logic.

A1

In eBay, several BES applications still reside on the legacy V3 platform, which is out of support. These applications are being migrated to a new technical stack (newer JDK, app server and OS). After migration, the expectation is that the application will retain the same event processing semantics as before.

Requirements

The process of BES application migration must be transparent to end-users, which means all event process logic should be the same after migration without introducing any new issues. To achieve this goal, a well-defined testing strategy is necessary to guarantee quality. 

A typical way to test is by building up test events and checking the event status once they are consumed. However, there are four major challenges for migration with the traditional test approach.

  1. In a messaging framework, consumers are horizontally scaled to consume available events in parallel. It’s not easy to ensure specific events can be consumed by migrated consumers instead of legacy consumers.
  2. Legacy BES applications do not have enough end-to-end test cases to support mirror testing. Thus, there is no baseline for existing behavior that can validate behavior on the new platform.
  3. A lack of an automatic way to test large numbers of events with different types and generate test results.
  4. The inability to detect any performance impact for BES application changes.

Solution

The BES mirror system is a brand new verification solution. It is designed as a centralized, independent system. It supports mirroring the same events to two instances based on separate platforms (V3/Raptor) and provides end-to-end comparison automatically. The system provides four capabilities to solve the challenges in testing.

Affinity Support

BES offers an affinity functionality that allows for specific events to be consumed by an appointed instance. To enable affinity, it is required to change centralized configuration and push changes to both publisher and consumer. Then, events from this publisher will have an affinity value and can only be consumed by consumers that can recognize this value. 

The mirror system leverages this functionality and provides a non-invasive way for BES applications to test specific events. It adds affinity value on target events directly with no publisher side code changes involved.

Existing Events Propagation 

BES leverages existing events from the database directly and republishes them with affinity will fill the gap that the lack of test cases on legacy applications.

Event Level Comparison

The solution tracks the whole event process flow and obtains all related information (e.g. event status, details logs and exceptions) to do the comparison. This approach can clearly expose any behavioral changes.

Performance Metrics Aggregation and Comparison

By aggregating and comparing the performance metrics after mirroring large numbers of events, BES indicates whether performance is impacted because of migration.

BES Mirror System Workflow

The workflow of the BES mirror system is shown below.

A2

The detailed, mirror steps are:

  1. Select two boxes. One is from the legacy V3 BES pool and the other is from the migrated Raptor BES pool. Enable affinity on both, and set host name as the Affinity value.
  2. Retrieve existing events from the database by filters. For each event, add the hostname of V3 box as the Affinity value and republish it, doing the same for Raptor box. TheV3 and Raptor consumer will consume their corresponding Affinity value (hostname).
  3. Compare the processed event status, metrics and exceptions in the log system. Every behavior should be the same before and after.
  4. Generate mirror reports.

Since all the events will be published into databases by the publisher, the BES mirror system leverages the existing events in the database and republishes them. The system simulates the process of affinity enabled on publishers instead of changing on the publisher side. Thus, it will be consumed by specific instances which also enable affinity (add the same affinity value). It is capable of comparing “instance to instance” with the same events consuming. Leveraging existing events also helps resolve the lack of test cases for the legacy V3 BES pool.

Duplicated Event Process

In BES consumers, some event processing logic has the duplication check. If the events with the same payload are used to be processed, they will be skipped when republished, which causes major process flow that is not covered in testing. To solve this, the BES mirror system provides the capability to amend the payload for existing events to be republished. A sample event payload looks like below:

userId:12345|eventTimeStamp=2020-05-01|siteId=0

Duplicated event payload may skip to be processed. The BES mirror system supports overwriting partial payload like user ID to a new value before republishing events. The major event process logic will be touched during mirror testing.

Inside the BES Mirror System

BES mirror system supports mirror events by specific event types, time windows and consumers. After mirror is done, one report will be generated to show comparison results based on related metrics to indicate if mirror testing passed or not.

There is one job framework inside of the BES mirror system to run each mirror test, so it can be scalable to support massive events mirror testing. The below diagram shows the architecture of the BES mirror system.

Picture14

Before testing, it will create a job configuration with primary and candidate boxes, the target consumer and target event types for comparison. Once the target time window is selected, one corresponding job will be triggered to enable affinity for consumers; retrieve existing events within the time window; and set affinity value for events properly. Then, it will keep polling events process status from two mirror candidate boxes. Once the events are processed completely, all related status, metrics and thrown exceptions will be collected and compared. Any differences in test results between primary and candidate boxes will indicate the behavior changes during migration and will need to be fixed. The testing will be re-triggered until all tests passed.

With the BES mirror testing system, the couples of V3 BES applications have been migrated to the Raptor platform successfully without any site issues. It guarantees the migration quality and benefits the regression testing for normal BES application’s features development, platform upgrade and performance testing.

Summary

The BES mirror testing system provides a successful testing solution for the messaging system in eBay, which already benefits legacy V3 BES migration. It can overcome the unprecedented challenges of lacking end-to-end test cases for legacy BES applications and the instance to instance comparison. The test system is able to create a sustainable testing practice for BES applications, which enables all eBay developers to verify their applications for quality assurance.