Coveo Data Challenge

In-session prediction for purchase intent and recommendations

    Click here for the data challenge overview paper

    Click here for the Leaderboards

    Join the data challenge slack channel: DC Slack

Welcome to the Data Challenge leaderboard page for the 2021 SIGIR Workshop on eCommerce! Training data, evaluation scripts and rules can be found in the official challenge repository; relevant literature and background information about the challenge and relevant industry use cases can be found in the challenge paper pre-print.

How to participate

To participate, you just have to

  1. Sign up (below);
  2. Make a submission for one of the two tasks in the challenge (or both).
Please note that participation in the challenge implies acceptance of the Terms and Conditions.


If you have problems signing-up, questions about your submission or general inquiries about the challenge, please join the Data Challenge Slack or send us an email. For more info about the dataset and Coveo, and to get the latest from the workshop organizers, check the links below regularly:

Challenge Overview

This challenge addresses the growing need for reliable predictions within the boundaries of a shopping session, as customer intentions can be different depending on the occasion. In the context of e-commerce technology, the feedback loop determined by behavioural signals spans from hours to a few seconds and machine learning models need to adapt as fast as possible to the continuously changing nature of the customer journey.

The need for efficient procedures for personalization is even clearer if we consider the e-commerce landscape more broadly: outside of giant digital retailers, the constraints of the problem are stricter, due to smaller user bases and the realization that most users are not frequently returning customers.

We release a new session-based dataset including fine-grained browsing events (detail, add, purchase), enriched by linguistic behavior (queries made by shoppers, with items clicked and items not clicked after the query) and catalog meta-data (image, text, pricing information). On this dataset, we ask participants to showcase innovative solutions for two open problems:

  1. a recommendation task, where a model is shown k events at the start of a session, and it is asked to predict future product interactions in the same session;
  2. an intent prediction task, where a model is shown a session containing an add-to-cart event, and it is asked to predict whether the item will be bought before the end of the session.
Please refer to the public repository for details on rules, evaluations and everything related to the dataset.


Please note that registration for the data challenge is now closed. To sign up for the challenge, please fill the form below with your official e-mail, your organization (university or company), first and last name of the team lead and a nickname for the leaderboard. You will receive a confirmation e-mail including your user id and AWS write-only credentials to upload your json files to the challenge bucket - please use the information responsibly. After submitting the form, you will be sent back to this page: please wait a few minutes and check again your inbox, including the spam folder before re-submitting. We suggest adding to your list of trusted senders.

This web-app is hosted on a dedicated AWS account and all data will be destroyed at the end of the Data Challenge: if you want to use the same serverless back-end to run a lightweight leaderboard website, please get in touch.

For the submission format and the general rules of the contest, please consult the relevant section in the README; if you want a ready-made script to upload your submission, check the one provided in the repository.

Organizing Committee


The organizers wish to thank Luca Bigon for his outstanding support in data collection, and Surya Kallumadi, Massimo Quadrana, Dietmar Jannach, Ajinkya Kale for precious feedback on a previous version of this paper. Finally, special thanks to Richard Tessier and Coveo's legal team for believing in this data sharing initiative.

System Description Paper

We solicit the submission of system papers which describe in detail modelling choices, data insights and interesting findings. System description papers will be peer reviewed (single-blind) by the program committee (we do not accept anonymized submissions): we accept contributions up to 4 pages (plus references and appendix if needed).

Typically, a system paper would include sections on data analysis, related work, architecture and experiments (with baselines) - a good example from last year can be found here. Please refer to the challenge paper for a list of interesting questions in the target domain.

Paper submissions can be made between June, 10th and June, 25th. All submissions should be made here and must be formatted according to the latest ACM SIG proceedings template. Please note that at least one author of each accepted paper must register for the workshop and present the paper.

Timeline  (UTC)
April 21 Data Challenge registration opens; Stage 1 opens
June 5 Data Challenge registration deadline
June 10 Stage 1 closes
June 11 Stage 2 opens; Paper submission opens
June 17 Stage 2 closes
June 25 Paper submission closes
July 7 Paper Accept/Reject
July 10 Camera ready paper submission deadline
July 15 Workshop

Leaderboards (Stage 2)

Next Item Prediction Leaderboard

Subsequent Items Prediction Leaderboard

Purchase Intent Prediction Leaderboard

Leaderboards (Stage 1)

Next Item Prediction

Subsequent Items Prediction

Purchase Intent Prediction