Coveo Data Challenge


In-session prediction for purchase intent and recommendations


    Click here for the data challenge overview paper

    Click here for the Leaderboards

    Join the data challenge slack channel: DC Slack


The 2021 edition of the SIGIR eCom Data Challenge, hosted by Coveo, ran from April 21 to June 12. Over 20 teams, coming from both industry and academia, participated in the Data Challenge and a total of 6 final design papers, where teams shared their insights and methods, were accepted. The final results of the Data Challenge were presented on July 15, 2021 during the SIGIR eCom'21 Workshop, and featured an invited talk from the NVIDIA team and a round table discussion with various participating teams.

Training data, evaluation scripts and rules can be found in the official challenge repository; relevant literature and background information about the challenge and relevant industry use cases can be found in the challenge paper pre-print.

Contacts

For questions regarding the dataset, please contact Jacopo Tagliabue.


Challenge Overview

This challenge addresses the growing need for reliable predictions within the boundaries of a shopping session, as customer intentions can be different depending on the occasion. In the context of e-commerce technology, the feedback loop determined by behavioural signals spans from hours to a few seconds and machine learning models need to adapt as fast as possible to the continuously changing nature of the customer journey.

The need for efficient procedures for personalization is even clearer if we consider the e-commerce landscape more broadly: outside of giant digital retailers, the constraints of the problem are stricter, due to smaller user bases and the realization that most users are not frequently returning customers.

We release a new session-based dataset including fine-grained browsing events (detail, add, purchase), enriched by linguistic behavior (queries made by shoppers, with items clicked and items not clicked after the query) and catalog meta-data (image, text, pricing information). On this dataset, we ask participants to showcase innovative solutions for two open problems:

  1. a recommendation task, where a model is shown k events at the start of a session, and it is asked to predict future product interactions in the same session;
  2. an intent prediction task, where a model is shown a session containing an add-to-cart event, and it is asked to predict whether the item will be bought before the end of the session.
Please refer to the public repository for details on rules, evaluations and everything related to the dataset.


Organizing Committee

Acknowledgements

The organizers wish to thank Luca Bigon for his outstanding support in data collection, and Surya Kallumadi, Massimo Quadrana, Dietmar Jannach, Ajinkya Kale for precious feedback on a previous version of this paper. Finally, special thanks to Richard Tessier and Coveo's legal team for believing in this data sharing initiative.


System Description Paper

The following system description papers were accepted:

    1  . Transformers with multi-modal features and post-fusion context for e-commerce session-based recommendation  [PDF]
              Gabriel Moreira, Sara Rabhi, Ronay Ak, Md Yasin Kabir and Even Oldridge.

    2  . Comparison of Transformer-Based Sequential Product Recommendation Models for the Coveo Data Challenge  [PDF]
              Elisabeth Fischer, Daniel Zoller and Andreas Hotho.

    3  . Utilizing Graph Neural Network to Predict Next Items in Large-sized Session-based Recommendation Industry Data  [PDF]
              Tianqi Wang, Zhongfen Deng, Houwei Chou, Lei Chen and Wei-Te Chen.

    4  . Session-based Recommender System Using an Ensemble of Multiple NN Models with LSTM and Matrix Factorization  [PDF]
              Yoshihiro Sakatani.

    5  . Adversarial Validation to Select Validation Data for Evaluating Performance in E‑commerce Purchase Intent Prediction  [PDF]
              Shotaro Ishihara, Shuhei Goda and Hidehisa Arai.

    6  . A Session-aware DeepWalk Model for Session-based Recommendation  [PDF]
              Kaiyuan Li, Pengfei Wang and Long Xia.


Timeline  (UTC)
April 21 Data Challenge registration opens; Stage 1 opens
June 5 Data Challenge registration deadline
June 10 Stage 1 closes
June 11 Stage 2 opens; Paper submission opens
June 17 Stage 2 closes
June 29 Paper submission closes
July 7 Paper Accept/Reject
July 10 Camera ready paper submission deadline
July 15 Workshop

Leaderboards (Stage 2)

Next Item Prediction Leaderboard
Position Nickname Score (MRR) Timestamp (UTC)
1DeepBlueAI0.2772568568633522021-06-16 15:00:24.054893
2NVIDIA Merlin0.2771507229949932021-06-17 23:47:39.824805
3tsotfsk0.261717236627812021-06-17 23:51:05.140458
4scitator0.2284146207912572021-06-15 21:00:07.655583
5louis0.2238345297992782021-06-16 17:04:01.073889
6Yoshi0.2148555309135552021-06-17 23:32:29.575103
7old0.1918714015295672021-06-16 15:56:11.615827
8busdriver0.183410560047162021-06-16 13:23:46.65911
9DSWue0.1393301401159612021-06-16 21:26:42.031644
10Beantown0.1160776753972212021-06-11 23:44:25.022806
11eggie50.03118741091818212021-06-14 05:09:44.15369

Subsequent Items Prediction Leaderboard
Position Nickname Score (F1) Timestamp (UTC)
1NVIDIA Merlin0.07440664808747662021-06-17 23:47:39.824805
2Yoshi0.0713231437825632021-06-17 23:32:29.575103
3DeepBlueAI0.07126700361976822021-06-16 15:00:24.054893
4louis0.06917234420312482021-06-16 17:04:01.073889
5tsotfsk0.06769565137780512021-06-17 23:51:05.140458
6scitator0.06223075646543242021-06-15 21:00:07.655583
7DSWue0.05152633620205482021-06-17 21:28:58.418355
8Beantown0.04763886569480512021-06-11 23:44:25.022806
9old0.04752421595007832021-06-16 15:56:11.615827
10busdriver0.04580357097303562021-06-16 13:23:46.65911
11eggie50.0141903899134742021-06-14 05:09:44.15369

Purchase Intent Prediction Leaderboard
Position Nickname Score (Weighted Micro-F1) Timestamp (UTC)
1DeepBlueAI3.634397745293352021-06-13 09:06:15.335559
2NVIDIA Merlin3.63405308476652021-06-14 23:29:52.034462
3hakubishin33.630315951087222021-06-14 04:18:15.102488
4Shawn3.630066200837472021-06-11 10:21:15.292887
5Yoshi3.58291089626372021-06-14 22:50:19.714592
6busdriver3.515636944209122021-06-12 00:13:41.575177

Leaderboards (Stage 1)

Next Item Prediction
Position Nickname Score (MRR) Timestamp (UTC)
1DeepBlueAI0.2595673725884362021-06-10 07:06:38.503671
2NVIDIA Merlin0.2578388469245392021-06-10 14:21:19.516696
3tsotfsk0.2549006686652022021-06-09 15:51:53.606362
4gspmoreira0.2466588797473722021-06-02 04:43:46.826176
5scitator0.2277626215297332021-06-09 17:15:01.304097
6louis0.2198132758348222021-06-10 16:50:29.592666
7hakubishin30.218507838963672021-06-07 13:55:34.853107
8Yoshi0.2032714221778732021-06-10 02:55:04.23637
9Wanna0.1748386628539262021-05-27 02:20:01.378616
10eggie50.1698535374232242021-04-28 01:33:58.986311
11old0.1689845660550262021-06-10 12:54:39.244947
12busdriver0.1646269099733332021-06-10 12:50:38.254958
13ECNU_DM0.123178662595792021-06-06 02:43:27.514704
14Beantown0.1159195472707152021-06-10 23:58:17.605326
15learner0.1133094915230422021-05-01 11:02:27.474228
16ECNU_Rec0.1126771448846672021-05-13 11:40:32.372782
17KonigsbergGuy0.08927204126895852021-06-07 16:08:09.70735
18Rick0.04707469419013962021-05-04 18:31:44.117046
19DSWue0.0017801581052012021-05-28 08:52:29.361252
20Nastya02021-05-19 12:45:44.220495

Subsequent Items Prediction
Position Nickname Score (F1) Timestamp (UTC)
1NVIDIA Merlin0.07044738909235752021-06-10 14:21:19.516696
2Yoshi0.06831509614325712021-06-04 06:18:47.454721
3louis0.06756714890080722021-05-29 14:12:16.80038
4gspmoreira0.06715800292281432021-06-02 04:43:46.826176
5tsotfsk0.0659424253954362021-06-10 11:14:27.903234
6DeepBlueAI0.06525330727111022021-06-10 07:06:38.503671
7hakubishin30.06273820936805292021-06-07 13:55:34.853107
8scitator0.06051720729583472021-06-09 16:15:12.179543
9Wanna0.05348423891442612021-05-27 03:36:59.695761
10ECNU_DM0.0518210448117312021-05-26 02:41:42.600018
11eggie50.05133604142561272021-06-04 02:09:42.642162
12old0.04206261464807622021-06-10 12:54:39.244947
13Beantown0.04133451571613342021-06-04 23:54:19.308318
14busdriver0.04097788202679472021-06-10 12:50:38.254958
15KonigsbergGuy0.02430429248488452021-06-07 16:08:09.70735
16learner0.01264611150788982021-05-01 11:02:27.474228
17ECNU_Rec0.01256078598429692021-05-16 13:29:12.389733
18Rick0.005510705800656922021-05-04 18:31:44.117046
19DSWue0.0005780460289402612021-05-28 08:52:29.361252
20Nastya02021-05-19 12:45:44.220495

Purchase Intent Prediction
Position Nickname Score (Weighted Micro-F1) Timestamp (UTC)
1NVIDIA Merlin3.631422023945562021-06-10 22:19:39.439075
2DeepBlueAI3.628985818695122021-06-10 12:47:23.969359
3ronai3.622234422880642021-06-07 16:28:00.728037
4hakubishin33.621476895880722021-06-10 12:14:35.016894
5Shawn3.620341822100932021-06-01 02:30:38.991516
6SunnySideUp3.616638790854212021-06-03 00:28:11.293968
7Yoshi3.614859591697832021-06-04 10:33:04.753004
8busdriver3.593796282466562021-06-09 05:05:34.153941