Participation and Data
As part of this challenge, Rakuten will be releasing 1M product listings in tsv format, including the train (0.8M) and test set (0.2M), consisting of product titles and their corresponding category ID paths. The followings are some examples from the training set,
Title |
CategoryIdPath |
Replacement Viewsonic VG710 LCD Monitor 48Watt AC Adapter 12V 4A |
3292>114>1231 |
Ka-Bar Desert MULE Serrated Folding Knife |
4238>321>753>3121 |
5.11 TACTICAL 74280 Taclite TDU Pants, R/M, Dark Navy |
4015>3285>1443>20 |
Skechers 4lb S Grip Jogging Weight set of 2- Black |
2075>945>2183>3863 |
The test set contains only the title field and the goal is to predict the CategoryIdPath for each title. Please sign up the form for participation (
sign up here). Accepted contributions will be presented during the eCom Workshop in SIGIR 2018.
Submission
The submission is team-based, so only team leader can submit the prediction file. We will send you the detail about file submission after we receive your sign-up. There is no limit on maximum team size. The prediction file has to be the same tsv format as the training file where the first column is the product title from the test set and the second column is your predicted CategoryIdPath. Please
DO NOT change the order of the test titles in your submission file.
Evaluation (Updated!)
The evaluation metrics will be
weighted-{precision, recall, F1} (
reference) on the test set of
EXACT CategoryIdPath match. Since the product distribution over the taxonomy tree is highly imbalanced, weighted-{precision, recall, F1} make much more sense than macro- or micro- {precision, recall, F1} do. Please note that partial path match does not count as a correct prediction. Evaluation script is provided
here and the usage is shown below. Both PREDCITION_FILE and GOLD_FILE must be in the same tsv format as the training file where the first column is product title and the second column is CategoryIdPath. The title order must be the same in both files.
$ python eval.py -pred $PREDCITION_FILE -gold $GOLD_FILE
Stage 1 - Model Building (April 9 - June 23)
Participants build and test models on the training data. The leaderboard only shows the model performance on a
SUBSET of the test set according to your
LATEST submission. Each team can submit at most 3 times per day (UTC time) in this stage and the leader board will update every 15 minutes.
Stage 2 - Model Evaluation (June 24)
The final leaderboard will freeze on June 24 and show the model performance on the
ENTIRE test set according to your
LATEST submission.
System Description Paper (New!)
System description papers will be peer reviewed (single-blind) by the program committee. All submissions must be formatted according to the
latest ACM SIG proceedings template available at
http://www.acm.org/publications/proceedings-template (LaTeX users use sample-sigconf.tex as a template). There will be no specific constraint on the content but it should cover the implementation details, such as data preprocessing, including token normalization and feature extraction, additional data used from external sources; model descriptions, including specific implementations, parameter tuning, etc. and error analysis, if any.
Submissions of system description paper should be made at
https://easychair.org/conferences/?conf=ecom18dc. The deadline for paper submission is
June 8, 2018 (11: 59 P.M. UTC).
Timeline (Updated!)
When ? |
What ? |
April 09 |
Evaluation Stage 1 Starts! |
May 15 |
Data Challenge Registration Deadline |
June 08 |
System Description Paper Submission (Suggested paper length 4-8 pages) |
June 15 |
Paper Acceptance Notification |
June 24 |
Evaluation Stage 2 (Final Leaderboard) |
July 06 |
Camera Ready Version of Papers Due |
July 12 |
eCom Full Day Workshop |
Note: The timeline is subject to slight modifications.
If you have any question, please contact Yiu-Chang Lin (yiuchang.lin@rakuten.com).
Data Challenge Checklist
Register for the data challenge: Registration now open!
Get download links for the dataset - 1 link each for training and testing data (You will receive an email with these links within 24 of hrs registration)
Send team details to Yiu-Chang Lin - yiuchang.lin@rakuten.com (this is necessary to receive a team specific submission link)
Receive Test set submission link for your team (You will receive an email with the link within 24 hrs of sending team details)
Join the data challenge slack channel (email with details will be sent after you register)