Information Retrieval Datasets

MS Marco

Publication: Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, RanganMajumder, and Li Deng. 2016. MS MARCO: A Human Generated MAchineReading COmprehension Dataset. In CoCo@NIPS.

See https://github.com/microsoft/MSMARCO-Passage-Ranking for more details

Dataset com.microsoft.msmarco.passage.collection.etc

datamaestro.data.Folder

Documents and some more files

External link: https://github.com/microsoft/MSMARCO-Passage-Ranking

Dataset com.microsoft.msmarco.passage.collection

datamaestro_text.data.ir.csv.Documents

MS-Marco documents

This file contains each passage in the larger MSMARCO dataset.

Format is TSV (PID Passage)

Dataset com.microsoft.msmarco.passage.train.run

datamaestro_text.data.ir.csv.AdhocRunWithText

TSV format: qid, pid, query, passage

Dataset com.microsoft.msmarco.passage.train.queries

datamaestro_text.data.ir.csv.Topics

Dataset com.microsoft.msmarco.passage.train.qrels

datamaestro_text.data.ir.trec.TrecAdhocAssessments

Dataset com.microsoft.msmarco.passage.train

datamaestro_text.data.ir.Adhoc

MS-Marco train dataset

Tasks: information retrieval, passage retrieval

External link: https://github.com/microsoft/MSMARCO-Passage-Ranking

Dataset com.microsoft.msmarco.passage.train.withrun

datamaestro_text.data.ir.RerankAdhoc

MSMarco train dataset, including the top-1000 to documents to re-rank

Tasks: information retrieval, passage retrieval

External link: https://github.com/microsoft/MSMARCO-Passage-Ranking

Dataset com.microsoft.msmarco.passage.train.idtriples

datamaestro_text.data.ir.TrainingTripletsLines

Full training triples (query, positive passage, negative passage) with IDs

External link: https://github.com/microsoft/MSMARCO-Passage-Ranking

Dataset com.microsoft.msmarco.passage.train.texttriples.small

datamaestro_text.data.ir.TrainingTripletsLines

Small training triples (query, positive passage, negative passage) with text

External link: https://github.com/microsoft/MSMARCO-Passage-Ranking

Dataset com.microsoft.msmarco.passage.train.texttriples.full

datamaestro_text.data.ir.TrainingTripletsLines

Full training triples (query, positive passage, negative passage) with text

External link: https://github.com/microsoft/MSMARCO-Passage-Ranking

Dataset com.microsoft.msmarco.passage.dev.queries

datamaestro_text.data.ir.csv.Topics

Dataset com.microsoft.msmarco.passage.dev.run

datamaestro_text.data.ir.csv.AdhocRunWithText

Dataset com.microsoft.msmarco.passage.dev.qrels

datamaestro_text.data.ir.trec.TrecAdhocAssessments

Dataset com.microsoft.msmarco.passage.dev

datamaestro_text.data.ir.Adhoc

MS-Marco dev dataset

Tasks: information retrieval, passage retrieval

External link: https://github.com/microsoft/MSMARCO-Passage-Ranking

Dataset com.microsoft.msmarco.passage.dev.withrun

datamaestro_text.data.ir.RerankAdhoc

MSMarco dev dataset, including the top-1000 to documents to re-rank

Tasks: information retrieval, passage retrieval

External link: https://github.com/microsoft/MSMARCO-Passage-Ranking

Dataset com.microsoft.msmarco.passage.eval.withrun

datamaestro_text.data.ir.csv.AdhocRunWithText

Dataset com.microsoft.msmarco.passage.dev.small.queries

datamaestro_text.data.ir.csv.Topics

External link: https://github.com/microsoft/MSMARCO-Passage-Ranking

Dataset com.microsoft.msmarco.passage.dev.small.qrels

datamaestro_text.data.ir.trec.TrecAdhocAssessments

External link: https://github.com/microsoft/MSMARCO-Passage-Ranking

Dataset com.microsoft.msmarco.passage.dev.small

datamaestro_text.data.ir.Adhoc

External link: https://github.com/microsoft/MSMARCO-Passage-Ranking

Dataset com.microsoft.msmarco.passage.eval.queries.small

datamaestro_text.data.ir.csv.Topics

External link: https://github.com/microsoft/MSMARCO-Passage-Ranking

Dataset com.microsoft.msmarco.passage.trec2019.test.queries

datamaestro_text.data.ir.csv.Topics

Dataset com.microsoft.msmarco.passage.trec2019.test.run

datamaestro_text.data.ir.csv.AdhocRunWithText

Dataset com.microsoft.msmarco.passage.trec2019.test.qrels

datamaestro_text.data.ir.trec.TrecAdhocAssessments

Dataset com.microsoft.msmarco.passage.trec2019.test

datamaestro_text.data.ir.Adhoc

TREC Deep Learning (2019)

Tasks: information retrieval, passage retrieval

External link: https://microsoft.github.io/msmarco/TREC-Deep-Learning-2019.html

Dataset com.microsoft.msmarco.passage.trec2019.test.withrun

datamaestro_text.data.ir.RerankAdhoc

TREC Deep Learning (2019), including the top-1000 to documents to re-rank

Tasks: information retrieval, passage retrieval

External link: https://microsoft.github.io/msmarco/TREC-Deep-Learning-2019.html

Dataset com.microsoft.msmarco.passage.trec2020.test.queries

datamaestro_text.data.ir.csv.Topics

TREC Deep Learning 2019 (topics)

Topics of the TREC 2019 MS-Marco Deep Learning track

Dataset com.microsoft.msmarco.passage.trec2020.test.run

datamaestro_text.data.ir.csv.AdhocRunWithText

TREC Deep Learning (2020)

Tags: reranking

Tasks: information retrieval, passage retrieval

External link: https://microsoft.github.io/msmarco/TREC-Deep-Learning-2020.html

Set of query/passages for the passage re-ranking task re-rank (TREC 2020)

TREC Adhoc

See https://trec.nist.gov/data/test_coll.html

Dataset gov.nist.trec.adhoc.1.documents

datamaestro_text.data.ir.trec.TipsterCollection

TREC-1 to TREC-3 documents (TIPSTER volumes 1 and 2)

Dataset gov.nist.trec.adhoc.1.topics

datamaestro_text.data.ir.trec.TrecTopics

Dataset gov.nist.trec.adhoc.1.assessments

datamaestro_text.data.ir.trec.TrecAdhocAssessments

Dataset gov.nist.trec.adhoc.1

datamaestro_text.data.ir.Adhoc

Ad-hoc task of TREC 1 (1992)

Dataset gov.nist.trec.adhoc.2.topics

datamaestro_text.data.ir.trec.TrecTopics

Dataset gov.nist.trec.adhoc.2.assessments

datamaestro_text.data.ir.trec.TrecAdhocAssessments

Dataset gov.nist.trec.adhoc.2

datamaestro_text.data.ir.Adhoc

Ad-hoc task of TREC 2 (1993)

Dataset gov.nist.trec.adhoc.3.topics

datamaestro_text.data.ir.trec.TrecTopics

Dataset gov.nist.trec.adhoc.3.assessments

datamaestro_text.data.ir.trec.TrecAdhocAssessments

Dataset gov.nist.trec.adhoc.3

datamaestro_text.data.ir.Adhoc

Ad-hoc task of TREC 3 (1994)

Dataset gov.nist.trec.adhoc.4.documents

datamaestro_text.data.ir.trec.TipsterCollection

TREC-4 documents

Dataset gov.nist.trec.adhoc.4.topics

datamaestro_text.data.ir.trec.TrecTopics

Dataset gov.nist.trec.adhoc.4.assessments

datamaestro_text.data.ir.trec.TrecAdhocAssessments

Dataset gov.nist.trec.adhoc.4

datamaestro_text.data.ir.Adhoc

Ad-hoc task of TREC 4 (1995)

Dataset gov.nist.trec.adhoc.5.documents

datamaestro_text.data.ir.trec.TipsterCollection

TREC-5 documents

Dataset gov.nist.trec.adhoc.5.topics

datamaestro_text.data.ir.trec.TrecTopics

Dataset gov.nist.trec.adhoc.5.qrels

datamaestro_text.data.ir.trec.TrecAdhocAssessments

Dataset gov.nist.trec.adhoc.5

datamaestro_text.data.ir.Adhoc

Ad-hoc task of TREC 5 (1996)

Dataset gov.nist.trec.adhoc.6.documents

datamaestro_text.data.ir.trec.TipsterCollection

TREC-5 documents

Dataset gov.nist.trec.adhoc.6.topics

datamaestro_text.data.ir.trec.TrecTopics

Dataset gov.nist.trec.adhoc.6.qrels

datamaestro_text.data.ir.trec.TrecAdhocAssessments

Dataset gov.nist.trec.adhoc.6

datamaestro_text.data.ir.Adhoc

Ad-hoc task of TREC 6 (1997)

Dataset gov.nist.trec.adhoc.7.documents

datamaestro_text.data.ir.trec.TipsterCollection

TREC-7 documents

Dataset gov.nist.trec.adhoc.7.topics

datamaestro_text.data.ir.trec.TrecTopics

Dataset gov.nist.trec.adhoc.7.qrels

datamaestro_text.data.ir.trec.TrecAdhocAssessments

Dataset gov.nist.trec.adhoc.7

datamaestro_text.data.ir.Adhoc

Ad-hoc task of TREC 3 (1994)

Dataset gov.nist.trec.adhoc.8.topics

datamaestro_text.data.ir.trec.TrecTopics

Dataset gov.nist.trec.adhoc.8.qrels

datamaestro_text.data.ir.trec.TrecAdhocAssessments

Dataset gov.nist.trec.adhoc.8

datamaestro_text.data.ir.Adhoc

Ad-hoc task of TREC 8 (1999)

Dataset gov.nist.trec.adhoc.robust.2004.topics

datamaestro_text.data.ir.trec.TrecTopics

Dataset gov.nist.trec.adhoc.robust.2004.qrels

datamaestro_text.data.ir.trec.TrecAdhocAssessments

Dataset gov.nist.trec.adhoc.robust.2004

datamaestro_text.data.ir.Adhoc

Ad-hoc task of TREC Robust (2004)

Dataset gov.nist.trec.adhoc.robust.2005.topics

datamaestro_text.data.ir.trec.TrecTopics

Dataset gov.nist.trec.adhoc.robust.2005.qrels

datamaestro_text.data.ir.trec.TrecAdhocAssessments

Dataset gov.nist.trec.adhoc.robust.2005

datamaestro_text.data.ir.Adhoc

Ad-hoc task of TREC Robust (2005)