eBay Data Challenge


High Accuracy Recall Task


About


The eBay SIGIR 2019 eCommerce Search Challenge: High Accuracy Recall Task is organized by eBay Search group. The challenge targets a common problem in eCommerce search: Identifying the items to show when using non-relevance sorts. Users of eCommerce search applications often sort by dimensions other than relevance. Popularity, review score, price, distance, recency, etc. This is a notable difference from traditional information oriented search, including web search, where documents are surfaced in relevance order.

Relevance ordering obviates the need for explicit relevant-or-not decisions on individual documents. Many well studied search methodologies take advantage of this. Non-relevance sorts orders are less well studied, but raise a number of interesting research topics. Evaluation metrics, ranking formulas, performance optimization, user experience, and more. These topics are discussed in the High Accuracy Recall Task paper, published at the SIGIR 2018 Workshop on eCommerce.

This search challenge focuses on the most basic aspect of this problem: identifying the items to include in the recall set when using non-relevance sorts. This is already a difficult problem, and includes typical search challenges like ambiguity, multiple query intents, etc.


Dataset


The challenge data consists of a set of popular search queries and a fair size set of candidate documents. Challenge participants make a boolean relevant-or-not decision for each query-document pair. Human judgments are used to create labeled training and evaluation data for a subset of the query-document pairs. Evaluation of submissions will be based on the traditional F1 metric, incorporating components of both recall and precision.

The challenge task and the evaluation basis are only a first step. The hope is to have a challenge interesting and useful by itself, while at the same time fostering additional insight into the problem space. Participants are encouraged to provide their learning's in the form of system description papers submitted to the 2019 SIGIR Workshop on eCommerce. Alternative evaluation metrics (other than F1) are an obvious area to pursue, one the challenge organizers hope to examine during the challenge. Item price is included in dataset enabling exploration of sort-type specific evaluation metrics.

The challenge dataset consists of selected fields from approximately 900K listings from eBay's Collectibles categories, plus 150 popular search queries relevant to collectibles listings. Information provided for each listing includes the Title, Price, Category Breadcrumb, and an image URL. For example, a listing for a pinball machine would have fields like
  1. Title: Bally Twilight Zone Pinball Machine
  2. Price: 3995.00
  3. Category: Collectibles > Arcade, Jukeboxes & Pinball > Pinball > Machines
plus an image URL. Queries are simply the text of the query, for example, "bally pinball machine". Query-document pairs selected for human judgment include a wide mix of both relevant and non-relevant pairs.

Phases

First phase of the challenge (Unsupervised phase) will start on May 17, 2019. Participants will be able to access the data and can start working on the given task. However, submissions will open on May 27, 2019. Starting May 27, 2019, participants will be able to submit their predictions and a leaderboard will be generated. More details about the phases of the challenge is given below:

Unsupervised Phase

This is the first phase of the challenge and it will be fully unsupervised. No training data will be released and participants are expected to predict relevance of all query-item pairs. Each submission will be judged using a held out evaluation set and a leaderboard will be generated.

Supervised Phase

In the second phase of the challenge, we will release some training data that can be used to train a model to predict relevance or lack thereof. Similar to the first phase we will judge the submissions using a held out evaluation set and the leaderboard will be updated.

Final Phase

This phase will start in the final week of the challenge and a separate evaluation set will be used to generate a second and the final leaderboard. No additional training data will be released. Only different held out set will be used to generate the final and separate leaderboard.

Timeline and Important Dates
When ? What ?
May 17, 2019 Challenge Starts (Unsupervised Phase)
May 27, 2019 Submissions Open on EvalAI
June 02, 2019 - 11:59 PM (PDT) Unsupervised Phase Ends
June 03, 2019 Supervised Phase Starts
July 02, 2019 - 11:59 PM (PDT) Supervised Phase Ends
July 03, 2019 Final Phase Starts
July 18, 2019 - 11:59 PM (PDT) Challenge Ends
July 25, 2019 eCom Full Day Workshop
All the deadlines above are based on Pacific Daylight Time (PDT).

Data Access

To participate in this Data Challenge, please email DL-eBay-SigIR-Ecom2019-Data-Challenge@ebay.com for further instructions. The Data Challenge is governed by the rules and announcements on this web site and a Data Challenge Agreement that is sent for each team to review and sign before participation.

We will be hosting the data challenge on EvalAI, current leader board, submission instructions are available here.
eBay Data Challenge on EvalAI: https://evalai.cloudcv.org/web/challenges/challenge-page/361/overview

Data Challenge Posters and Papers

All the participants in the data challenge are invited to present a poster with a short description at the workshop explaining their system and methodology. Due date to submit posters with their short description is July 10, 2019 (11:59 PM AoE). The workshop chairs reserve the right to reject any posters that are off topic or are of insufficient quality. Participants can use any template they like and can send their posters with short descriptions to DL-eBay-SigIR-Ecom2019-Data-Challenge@ebay.com. Participants are expected to bring their posters to the workshop on July 25, 2019.

Participants are also encouraged to submit papers about their approach, system, architecture and findings to the workshop. Papers will be peer reviewed (single-blind) and must be formatted according to the latest ACM SIG proceedings template (LaTeX users use sample-sigconf.tex as a template). Papers are due after the workshop on August 17, 2019 (11:59 PM AoE). Notification of acceptance will be sent on August 30, 2019. Data challenge papers from last year can be found here for reference. Papers can be submitted through EasyChair: https://easychair.org/conferences/?conf=ecom2019dc.

Contact Us

For any questions or concerns about the challenge, please contact us at DL-eBay-SigIR-Ecom2019-Data-Challenge@ebay.com