Information Retrieval Datasets
MS Marco
Publication: Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, RanganMajumder, and Li Deng. 2016. MS MARCO: A Human Generated MAchineReading COmprehension Dataset. In CoCo@NIPS.
See https://github.com/microsoft/MSMARCO-Passage-Ranking for more details
-
Dataset com.microsoft.msmarco.passage.collection.etc
datamaestro.data.Folder
Documents and some more files
External link: https://github.com/microsoft/MSMARCO-Passage-Ranking
-
Dataset com.microsoft.msmarco.passage.collection
datamaestro_text.data.ir.csv.Documents
MS-Marco documents
This file contains each passage in the larger MSMARCO dataset.
Format is TSV (PID Passage)
-
Dataset com.microsoft.msmarco.passage.train.run
datamaestro_text.data.ir.csv.AdhocRunWithText
TSV format: qid, pid, query, passage
-
Dataset com.microsoft.msmarco.passage.train.queries
-
Dataset com.microsoft.msmarco.passage.train.qrels
-
Dataset com.microsoft.msmarco.passage.train
datamaestro_text.data.ir.Adhoc
MS-Marco train dataset
Tasks: passage retrieval, information retrieval
External link: https://github.com/microsoft/MSMARCO-Passage-Ranking
-
Dataset com.microsoft.msmarco.passage.train.withrun
datamaestro_text.data.ir.RerankAdhoc
MSMarco train dataset, including the top-1000 to documents to re-rank
Tasks: passage retrieval, information retrieval
External link: https://github.com/microsoft/MSMARCO-Passage-Ranking
-
Dataset com.microsoft.msmarco.passage.train.idtriples
datamaestro_text.data.ir.TrainingTripletsLines
Full training triples (query, positive passage, negative passage) with IDs
External link: https://github.com/microsoft/MSMARCO-Passage-Ranking
-
Dataset com.microsoft.msmarco.passage.train.texttriples.small
datamaestro_text.data.ir.TrainingTripletsLines
Small training triples (query, positive passage, negative passage) with text
External link: https://github.com/microsoft/MSMARCO-Passage-Ranking
-
Dataset com.microsoft.msmarco.passage.train.texttriples.full
datamaestro_text.data.ir.TrainingTripletsLines
Full training triples (query, positive passage, negative passage) with text
External link: https://github.com/microsoft/MSMARCO-Passage-Ranking
-
Dataset com.microsoft.msmarco.passage.dev.queries
-
Dataset com.microsoft.msmarco.passage.dev.run
-
Dataset com.microsoft.msmarco.passage.dev.qrels
-
Dataset com.microsoft.msmarco.passage.dev
datamaestro_text.data.ir.Adhoc
MS-Marco dev dataset
Tasks: passage retrieval, information retrieval
External link: https://github.com/microsoft/MSMARCO-Passage-Ranking
-
Dataset com.microsoft.msmarco.passage.dev.withrun
datamaestro_text.data.ir.RerankAdhoc
MSMarco dev dataset, including the top-1000 to documents to re-rank
Tasks: passage retrieval, information retrieval
External link: https://github.com/microsoft/MSMARCO-Passage-Ranking
-
Dataset com.microsoft.msmarco.passage.eval.withrun
-
Dataset com.microsoft.msmarco.passage.dev.small.queries
datamaestro_text.data.ir.csv.Topics
External link: https://github.com/microsoft/MSMARCO-Passage-Ranking
-
Dataset com.microsoft.msmarco.passage.dev.small.qrels
datamaestro_text.data.ir.trec.TrecAdhocAssessments
External link: https://github.com/microsoft/MSMARCO-Passage-Ranking
-
Dataset com.microsoft.msmarco.passage.dev.small
datamaestro_text.data.ir.Adhoc
External link: https://github.com/microsoft/MSMARCO-Passage-Ranking
-
Dataset com.microsoft.msmarco.passage.eval.queries.small
datamaestro_text.data.ir.csv.Topics
External link: https://github.com/microsoft/MSMARCO-Passage-Ranking
-
Dataset com.microsoft.msmarco.passage.trec2019.test.queries
-
Dataset com.microsoft.msmarco.passage.trec2019.test.run
-
Dataset com.microsoft.msmarco.passage.trec2019.test.qrels
-
Dataset com.microsoft.msmarco.passage.trec2019.test
datamaestro_text.data.ir.Adhoc
TREC Deep Learning (2019)
Tasks: passage retrieval, information retrieval
External link: https://microsoft.github.io/msmarco/TREC-Deep-Learning-2019.html
-
Dataset com.microsoft.msmarco.passage.trec2019.test.withrun
datamaestro_text.data.ir.RerankAdhoc
TREC Deep Learning (2019), including the top-1000 to documents to re-rank
Tasks: passage retrieval, information retrieval
External link: https://microsoft.github.io/msmarco/TREC-Deep-Learning-2019.html
-
Dataset com.microsoft.msmarco.passage.trec2020.test.queries
datamaestro_text.data.ir.csv.Topics
TREC Deep Learning 2019 (topics)
Topics of the TREC 2019 MS-Marco Deep Learning track
-
Dataset com.microsoft.msmarco.passage.trec2020.test.run
datamaestro_text.data.ir.csv.AdhocRunWithText
TREC Deep Learning (2020)
Tags: reranking
Tasks: passage retrieval, information retrieval
External link: https://microsoft.github.io/msmarco/TREC-Deep-Learning-2020.html
Set of query/passages for the passage re-ranking task re-rank (TREC 2020)
TREC Adhoc
See https://trec.nist.gov/data/test_coll.html
-
Dataset gov.nist.trec.adhoc.1.documents
datamaestro_text.data.ir.trec.TipsterCollection
TREC-1 to TREC-3 documents (TIPSTER volumes 1 and 2)
-
Dataset gov.nist.trec.adhoc.1.topics
-
Dataset gov.nist.trec.adhoc.1.assessments
-
Dataset gov.nist.trec.adhoc.1
datamaestro_text.data.ir.Adhoc
Ad-hoc task of TREC 1 (1992)
-
Dataset gov.nist.trec.adhoc.2.topics
-
Dataset gov.nist.trec.adhoc.2.assessments
-
Dataset gov.nist.trec.adhoc.2
datamaestro_text.data.ir.Adhoc
Ad-hoc task of TREC 2 (1993)
-
Dataset gov.nist.trec.adhoc.3.topics
-
Dataset gov.nist.trec.adhoc.3.assessments
-
Dataset gov.nist.trec.adhoc.3
datamaestro_text.data.ir.Adhoc
Ad-hoc task of TREC 3 (1994)
-
Dataset gov.nist.trec.adhoc.4.documents
datamaestro_text.data.ir.trec.TipsterCollection
TREC-4 documents
-
Dataset gov.nist.trec.adhoc.4.topics
-
Dataset gov.nist.trec.adhoc.4.assessments
-
Dataset gov.nist.trec.adhoc.4
datamaestro_text.data.ir.Adhoc
Ad-hoc task of TREC 4 (1995)
-
Dataset gov.nist.trec.adhoc.5.documents
datamaestro_text.data.ir.trec.TipsterCollection
TREC-5 documents
-
Dataset gov.nist.trec.adhoc.5.topics
-
Dataset gov.nist.trec.adhoc.5.qrels
-
Dataset gov.nist.trec.adhoc.5
datamaestro_text.data.ir.Adhoc
Ad-hoc task of TREC 5 (1996)
-
Dataset gov.nist.trec.adhoc.6.documents
datamaestro_text.data.ir.trec.TipsterCollection
TREC-5 documents
-
Dataset gov.nist.trec.adhoc.6.topics
-
Dataset gov.nist.trec.adhoc.6.qrels
-
Dataset gov.nist.trec.adhoc.6
datamaestro_text.data.ir.Adhoc
Ad-hoc task of TREC 6 (1997)
-
Dataset gov.nist.trec.adhoc.7.documents
datamaestro_text.data.ir.trec.TipsterCollection
TREC-7 documents
-
Dataset gov.nist.trec.adhoc.7.topics
-
Dataset gov.nist.trec.adhoc.7.qrels
-
Dataset gov.nist.trec.adhoc.7
datamaestro_text.data.ir.Adhoc
Ad-hoc task of TREC 3 (1994)
-
Dataset gov.nist.trec.adhoc.8.topics
-
Dataset gov.nist.trec.adhoc.8.qrels
-
Dataset gov.nist.trec.adhoc.8
datamaestro_text.data.ir.Adhoc
Ad-hoc task of TREC 8 (1999)
-
Dataset gov.nist.trec.adhoc.robust.2004.topics
-
Dataset gov.nist.trec.adhoc.robust.2004.qrels
-
Dataset gov.nist.trec.adhoc.robust.2004
datamaestro_text.data.ir.Adhoc
Ad-hoc task of TREC Robust (2004)
-
Dataset gov.nist.trec.adhoc.robust.2005.topics
-
Dataset gov.nist.trec.adhoc.robust.2005.qrels
-
Dataset gov.nist.trec.adhoc.robust.2005
datamaestro_text.data.ir.Adhoc
Ad-hoc task of TREC Robust (2005)