IR-Datasets Integration
Datamaestro-text provides an interface to the ir-datasets library, giving access to hundreds of IR benchmarks through a unified API.
Install ir-datasets:
pip install ir-datasets
Usage:
from datamaestro import prepare_dataset
# Load any ir-datasets collection via the irds namespace
dataset = prepare_dataset("irds.msmarco-passage")
# Same API as native datasets
for doc in dataset.documents.iter_documents():
print(doc)
The list below is auto-generated and may not reflect the exact version of ir-datasets installed on your system.
Data Types
These wrapper types provide the datamaestro interface for ir-datasets data:
- XPM Configdatamaestro_text.datasets.irds.data.Topics(*, irds, id)
Bases:
TopicsStore,IRDSId- irds: str
The id to load the dataset from ir_datasets
- id: str
The unique (sub-)dataset ID
- XPM Configdatamaestro_text.datasets.irds.data.Documents(*, irds, id, count, file_access)
Bases:
DocumentStore,IRDSId- irds: str
The id to load the dataset from ir_datasets
- id: str
The unique (sub-)dataset ID
- count: int
Number of documents
- file_access: FileAccess = FileAccess.MMAP
How to access the file collection (might not have any impact, depends on the docstore)
- XPM Configdatamaestro_text.datasets.irds.data.AdhocAssessments(*, irds, id)
Bases:
AdhocAssessments,IRDSId- irds: str
The id to load the dataset from ir_datasets
- id: str
The unique (sub-)dataset ID
See also LZ4DocumentStore in the Information Retrieval API section.
Available Datasets
ANTIQUE
"ANTIQUE is a non-factoid quesiton answering dataset based on the questions and answers of Yahoo! Webscope L6."
- Documents: Short answer passages (from Yahoo Answers)
- Queries: Natural language questions (from Yahoo Answers)
- Dataset Paper
-
Dataset irds.antique.documents
datamaestro_text.datasets.irds.data.Documents
"ANTIQUE is a non-factoid quesiton answering dataset based on the questions and answers of Yahoo! Webscope L6."
- Documents: Short answer passages (from Yahoo Answers)
- Queries: Natural language questions (from Yahoo Answers)
- Dataset Paper
-
Dataset irds.antique.test.queries
datamaestro_text.datasets.irds.data.Topics
Official test set of the ANTIQUE dataset.
-
Dataset irds.antique.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official test set of the ANTIQUE dataset.
-
Dataset irds.antique.test
datamaestro_text.datasets.irds.data.Adhoc
Official test set of the ANTIQUE dataset.
-
Dataset irds.antique.test.non-offensive.queries
datamaestro_text.datasets.irds.data.Topics
antique/test without a set of queries deemed by the authors of ANTIQUE to be "offensive (and noisy)."
-
Dataset irds.antique.test.non-offensive.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
antique/test without a set of queries deemed by the authors of ANTIQUE to be "offensive (and noisy)."
-
Dataset irds.antique.test.non-offensive
datamaestro_text.datasets.irds.data.Adhoc
antique/test without a set of queries deemed by the authors of ANTIQUE to be "offensive (and noisy)."
-
Dataset irds.antique.train.queries
datamaestro_text.datasets.irds.data.Topics
Official train set of the ANTIQUE dataset.
-
Dataset irds.antique.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official train set of the ANTIQUE dataset.
-
Dataset irds.antique.train
datamaestro_text.datasets.irds.data.Adhoc
Official train set of the ANTIQUE dataset.
-
Dataset irds.antique.train.split200-train.queries
datamaestro_text.datasets.irds.data.Topics
antique/train without the 200 queries used by antique/train/split200-valid.
-
Dataset irds.antique.train.split200-train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
antique/train without the 200 queries used by antique/train/split200-valid.
-
Dataset irds.antique.train.split200-train
datamaestro_text.datasets.irds.data.Adhoc
antique/train without the 200 queries used by antique/train/split200-valid.
-
Dataset irds.antique.train.split200-valid.queries
datamaestro_text.datasets.irds.data.Topics
A held-out subset of 200 queries from antique/train. Use in conjunction with antique/train/split200-train.
-
Dataset irds.antique.train.split200-valid.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A held-out subset of 200 queries from antique/train. Use in conjunction with antique/train/split200-train.
-
Dataset irds.antique.train.split200-valid
datamaestro_text.datasets.irds.data.Adhoc
A held-out subset of 200 queries from antique/train. Use in conjunction with antique/train/split200-train.
AOL-IA (Internet Archive)
This is a version of the AOL Query Log. Documents use versions that appeared around the time of the query log (early 2006) via the Internet Archive.
The query log does not include document or query IDs. These are instead created by ir_datasets. Document IDs are assigned using a hash of the URL that appears in the query log. Query IDs are assigned using the a hash of the noramlised query. All unique normalized queries are available from queries, and all clicked documents are available from qrels (iteration value set to the user ID). Full information (including original query) are available from qlogs.
-
Dataset irds.aol-ia.documents
datamaestro_text.datasets.irds.data.Documents
This is a version of the AOL Query Log. Documents use versions that appeared around the time of the query log (early 2006) via the Internet Archive.
The query log does not include document or query IDs. These are instead created by ir_datasets. Document IDs are assigned using a hash of the URL that appears in the query log. Query IDs are assigned using the a hash of the noramlised query. All unique normalized queries are available from queries, and all clicked documents are available from qrels (iteration value set to the user ID). Full information (including original query) are available from qlogs.
-
Dataset irds.aol-ia.queries
datamaestro_text.datasets.irds.data.Topics
This is a version of the AOL Query Log. Documents use versions that appeared around the time of the query log (early 2006) via the Internet Archive.
The query log does not include document or query IDs. These are instead created by ir_datasets. Document IDs are assigned using a hash of the URL that appears in the query log. Query IDs are assigned using the a hash of the noramlised query. All unique normalized queries are available from queries, and all clicked documents are available from qrels (iteration value set to the user ID). Full information (including original query) are available from qlogs.
-
Dataset irds.aol-ia.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
This is a version of the AOL Query Log. Documents use versions that appeared around the time of the query log (early 2006) via the Internet Archive.
The query log does not include document or query IDs. These are instead created by ir_datasets. Document IDs are assigned using a hash of the URL that appears in the query log. Query IDs are assigned using the a hash of the noramlised query. All unique normalized queries are available from queries, and all clicked documents are available from qrels (iteration value set to the user ID). Full information (including original query) are available from qlogs.
-
Dataset irds.aol-ia
datamaestro_text.datasets.irds.data.Adhoc
This is a version of the AOL Query Log. Documents use versions that appeared around the time of the query log (early 2006) via the Internet Archive.
The query log does not include document or query IDs. These are instead created by ir_datasets. Document IDs are assigned using a hash of the URL that appears in the query log. Query IDs are assigned using the a hash of the noramlised query. All unique normalized queries are available from queries, and all clicked documents are available from qrels (iteration value set to the user ID). Full information (including original query) are available from qlogs.
AQUAINT
A document collection of about 1M English newswire text. Sources are the Xinhua News Service (People's Republic of China), the New York Times News Service, and the Associated Press Worldstream News Service.
-
Dataset irds.aquaint.documents
datamaestro_text.datasets.irds.data.Documents
A document collection of about 1M English newswire text. Sources are the Xinhua News Service (People's Republic of China), the New York Times News Service, and the Associated Press Worldstream News Service.
-
Dataset irds.aquaint.trec-robust-2005.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Robust 2005 dataset. Contains a subset of 50 "hard" queries from trec-robust04.
- Documents: News articles
- Queries: keyword queries, descriptions, narratives
- Relevance: Deep judgments
- Shared task site
- Task overview paper
- See also: trec-robust04
-
Dataset irds.aquaint.trec-robust-2005.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Robust 2005 dataset. Contains a subset of 50 "hard" queries from trec-robust04.
- Documents: News articles
- Queries: keyword queries, descriptions, narratives
- Relevance: Deep judgments
- Shared task site
- Task overview paper
- See also: trec-robust04
-
Dataset irds.aquaint.trec-robust-2005
datamaestro_text.datasets.irds.data.Adhoc
The TREC Robust 2005 dataset. Contains a subset of 50 "hard" queries from trec-robust04.
- Documents: News articles
- Queries: keyword queries, descriptions, narratives
- Relevance: Deep judgments
- Shared task site
- Task overview paper
- See also: trec-robust04
args.me version 1.0
Corpus version 1.0 with 387 606 arguments crawled from Debatewise, IDebate.org, Debatepedia, Debate.org. It was released on July 9, 2019 on Zenodo. The cleaned version argsme/1.0-cleaned should be preferred.
This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.
-
Dataset irds.argsme.1.0.documents
datamaestro_text.datasets.irds.data.Documents
Corpus version 1.0 with 387 606 arguments crawled from Debatewise, IDebate.org, Debatepedia, Debate.org. It was released on July 9, 2019 on Zenodo. The cleaned version argsme/1.0-cleaned should be preferred.
This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.
-
Dataset irds.argsme.1.0.touche-2020-task-1.uncorrected.queries
datamaestro_text.datasets.irds.data.Topics
Version of argsme/2020-04-01/touche-2020-task-1 that uses the argsme/1.0 corpus with uncorrected relevance judgements derived from crowdworkers. This dataset's relevance judgements should not be used without preprocessing.
-
Dataset irds.argsme.1.0.touche-2020-task-1.uncorrected.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of argsme/2020-04-01/touche-2020-task-1 that uses the argsme/1.0 corpus with uncorrected relevance judgements derived from crowdworkers. This dataset's relevance judgements should not be used without preprocessing.
-
Dataset irds.argsme.1.0.touche-2020-task-1.uncorrected
datamaestro_text.datasets.irds.data.Adhoc
Version of argsme/2020-04-01/touche-2020-task-1 that uses the argsme/1.0 corpus with uncorrected relevance judgements derived from crowdworkers. This dataset's relevance judgements should not be used without preprocessing.
args.me version 1.0 cleaned
Corpus version 1.0-cleaned with 382 545 arguments crawled from Debatewise, IDebate.org, Debatepedia, Debate.org. This version contains the same arguments as argsme/1.0, but was cleaned as described in the corresponding publication. It was released on October 27, 2020 on Zenodo.
This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.
-
Dataset irds.argsme.1.0-cleaned.documents
datamaestro_text.datasets.irds.data.Documents
Corpus version 1.0-cleaned with 382 545 arguments crawled from Debatewise, IDebate.org, Debatepedia, Debate.org. This version contains the same arguments as argsme/1.0, but was cleaned as described in the corresponding publication. It was released on October 27, 2020 on Zenodo.
This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.
argsme/2020-04-01/debateorg
Subset of the 338 620 arguments from argsme/2020-04-01 that were crawled from the debate portal Debate.org.
-
Dataset irds.argsme.2020-04-01.debateorg.documents
datamaestro_text.datasets.irds.data.Documents
Subset of the 338 620 arguments from argsme/2020-04-01 that were crawled from the debate portal Debate.org.
argsme/2020-04-01/debatepedia
Subset of the 21 197 arguments from argsme/2020-04-01 that were crawled from the debate portal Debatepedia.
-
Dataset irds.argsme.2020-04-01.debatepedia.documents
datamaestro_text.datasets.irds.data.Documents
Subset of the 21 197 arguments from argsme/2020-04-01 that were crawled from the debate portal Debatepedia.
argsme/2020-04-01/debatewise
Subset of the 14 353 arguments from argsme/2020-04-01 that were crawled from the debate portal Debatewise.
-
Dataset irds.argsme.2020-04-01.debatewise.documents
datamaestro_text.datasets.irds.data.Documents
Subset of the 14 353 arguments from argsme/2020-04-01 that were crawled from the debate portal Debatewise.
argsme/2020-04-01/idebate
Subset of the 13 522 arguments from argsme/2020-04-01 that were crawled from the debate portal IDebate.org.
-
Dataset irds.argsme.2020-04-01.idebate.documents
datamaestro_text.datasets.irds.data.Documents
Subset of the 13 522 arguments from argsme/2020-04-01 that were crawled from the debate portal IDebate.org.
argsme/2020-04-01/parliamentary
Subset of the 48 arguments from argsme/2020-04-01 that were crawled from Canadian Parliament discussions.
-
Dataset irds.argsme.2020-04-01.parliamentary.documents
datamaestro_text.datasets.irds.data.Documents
Subset of the 48 arguments from argsme/2020-04-01 that were crawled from Canadian Parliament discussions.
argsme/2020-04-01/processed
Pre-processed version of argsme/2020-04-01 where each argument is split into sentences.
-
Dataset irds.argsme.2020-04-01.processed.documents
datamaestro_text.datasets.irds.data.Documents
Pre-processed version of argsme/2020-04-01 where each argument is split into sentences.
-
Dataset irds.argsme.2020-04-01.processed.touche-2022-task-1.queries
datamaestro_text.datasets.irds.data.Topics
Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2022 is the third lab on argument retrieval at CLEF 2022 featuring three tasks.
Given a query about a controversial topic, retrieve and rank a relevant pair of sentences from a collection of arguments (argsme/2020-04-01-processed).
Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.
-
Dataset irds.argsme.2020-04-01.processed.touche-2022-task-1.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2022 is the third lab on argument retrieval at CLEF 2022 featuring three tasks.
Given a query about a controversial topic, retrieve and rank a relevant pair of sentences from a collection of arguments (argsme/2020-04-01-processed).
Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.
-
Dataset irds.argsme.2020-04-01.processed.touche-2022-task-1
datamaestro_text.datasets.irds.data.Adhoc
Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2022 is the third lab on argument retrieval at CLEF 2022 featuring three tasks.
Given a query about a controversial topic, retrieve and rank a relevant pair of sentences from a collection of arguments (argsme/2020-04-01-processed).
Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.
args.me
Corpus version 2020-04-01 with 387 740 arguments crawled from Debatewise, IDebate.org, Debatepedia, Debate.org, and from Canadian Parliament discussions. It was released on April 1, 2020 on Zenodo.
This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.
-
Dataset irds.argsme.2020-04-01.documents
datamaestro_text.datasets.irds.data.Documents
Corpus version 2020-04-01 with 387 740 arguments crawled from Debatewise, IDebate.org, Debatepedia, Debate.org, and from Canadian Parliament discussions. It was released on April 1, 2020 on Zenodo.
This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.
-
Dataset irds.argsme.2020-04-01.touche-2020-task-1.queries
datamaestro_text.datasets.irds.data.Topics
Decision making processes, be it at the societal or at the personal level, eventually come to a point where one side will challenge the other with a why-question, which is a prompt to justify one's stance. Thus, technologies for argument mining and argumentation processing are maturing at a rapid pace, giving rise for the first time to argument retrieval. Touché 2020 is the first lab on Argument Retrieval at CLEF 2020 featuring two tasks.
Given a question on a controversial topic, retrieve relevant arguments from a focused crawl of online debate portals (argsme/2020-04-01).
Documents are judged based on their general topical relevance.
-
Dataset irds.argsme.2020-04-01.touche-2020-task-1.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Decision making processes, be it at the societal or at the personal level, eventually come to a point where one side will challenge the other with a why-question, which is a prompt to justify one's stance. Thus, technologies for argument mining and argumentation processing are maturing at a rapid pace, giving rise for the first time to argument retrieval. Touché 2020 is the first lab on Argument Retrieval at CLEF 2020 featuring two tasks.
Given a question on a controversial topic, retrieve relevant arguments from a focused crawl of online debate portals (argsme/2020-04-01).
Documents are judged based on their general topical relevance.
-
Dataset irds.argsme.2020-04-01.touche-2020-task-1
datamaestro_text.datasets.irds.data.Adhoc
Decision making processes, be it at the societal or at the personal level, eventually come to a point where one side will challenge the other with a why-question, which is a prompt to justify one's stance. Thus, technologies for argument mining and argumentation processing are maturing at a rapid pace, giving rise for the first time to argument retrieval. Touché 2020 is the first lab on Argument Retrieval at CLEF 2020 featuring two tasks.
Given a question on a controversial topic, retrieve relevant arguments from a focused crawl of online debate portals (argsme/2020-04-01).
Documents are judged based on their general topical relevance.
-
Dataset irds.argsme.2020-04-01.touche-2021-task-1.queries
datamaestro_text.datasets.irds.data.Topics
Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2021 is the second lab on argument retrieval at CLEF 2021 featuring two tasks.
Given a question on a controversial topic, retrieve relevant arguments from a focused crawl of online debate portals (argsme/2020-04-01).
Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.
-
Dataset irds.argsme.2020-04-01.touche-2021-task-1.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2021 is the second lab on argument retrieval at CLEF 2021 featuring two tasks.
Given a question on a controversial topic, retrieve relevant arguments from a focused crawl of online debate portals (argsme/2020-04-01).
Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.
-
Dataset irds.argsme.2020-04-01.touche-2021-task-1
datamaestro_text.datasets.irds.data.Adhoc
Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2021 is the second lab on argument retrieval at CLEF 2021 featuring two tasks.
Given a question on a controversial topic, retrieve relevant arguments from a focused crawl of online debate portals (argsme/2020-04-01).
Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.
-
Dataset irds.argsme.2020-04-01.touche-2020-task-1.uncorrected.queries
datamaestro_text.datasets.irds.data.Topics
Version of argsme/2020-04-01/touche-2020-task-1 that uses uncorrected relevance judgements derived from crowdworkers. This dataset's relevance judgements should not be used without preprocessing.
-
Dataset irds.argsme.2020-04-01.touche-2020-task-1.uncorrected.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of argsme/2020-04-01/touche-2020-task-1 that uses uncorrected relevance judgements derived from crowdworkers. This dataset's relevance judgements should not be used without preprocessing.
-
Dataset irds.argsme.2020-04-01.touche-2020-task-1.uncorrected
datamaestro_text.datasets.irds.data.Adhoc
Version of argsme/2020-04-01/touche-2020-task-1 that uses uncorrected relevance judgements derived from crowdworkers. This dataset's relevance judgements should not be used without preprocessing.
beir/arguana
A version of the ArguAna Counterargs dataset, for argument retrieval.
-
Dataset irds.beir.arguana.documents
datamaestro_text.datasets.irds.data.Documents
A version of the ArguAna Counterargs dataset, for argument retrieval.
-
Dataset irds.beir.arguana.queries
datamaestro_text.datasets.irds.data.Topics
A version of the ArguAna Counterargs dataset, for argument retrieval.
-
Dataset irds.beir.arguana.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the ArguAna Counterargs dataset, for argument retrieval.
-
Dataset irds.beir.arguana
datamaestro_text.datasets.irds.data.Adhoc
A version of the ArguAna Counterargs dataset, for argument retrieval.
beir/climate-fever
A version of the CLIMATE-FEVER dataset, for fact verification on claims about climate.
-
Dataset irds.beir.climate-fever.documents
datamaestro_text.datasets.irds.data.Documents
A version of the CLIMATE-FEVER dataset, for fact verification on claims about climate.
-
Dataset irds.beir.climate-fever.queries
datamaestro_text.datasets.irds.data.Topics
A version of the CLIMATE-FEVER dataset, for fact verification on claims about climate.
-
Dataset irds.beir.climate-fever.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the CLIMATE-FEVER dataset, for fact verification on claims about climate.
-
Dataset irds.beir.climate-fever
datamaestro_text.datasets.irds.data.Adhoc
A version of the CLIMATE-FEVER dataset, for fact verification on claims about climate.
beir/cqadupstack/android
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the android StackExchange subforum.
-
Dataset irds.beir.cqadupstack.android.documents
datamaestro_text.datasets.irds.data.Documents
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the android StackExchange subforum.
-
Dataset irds.beir.cqadupstack.android.queries
datamaestro_text.datasets.irds.data.Topics
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the android StackExchange subforum.
-
Dataset irds.beir.cqadupstack.android.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the android StackExchange subforum.
-
Dataset irds.beir.cqadupstack.android
datamaestro_text.datasets.irds.data.Adhoc
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the android StackExchange subforum.
beir/cqadupstack/english
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the english StackExchange subforum.
-
Dataset irds.beir.cqadupstack.english.documents
datamaestro_text.datasets.irds.data.Documents
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the english StackExchange subforum.
-
Dataset irds.beir.cqadupstack.english.queries
datamaestro_text.datasets.irds.data.Topics
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the english StackExchange subforum.
-
Dataset irds.beir.cqadupstack.english.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the english StackExchange subforum.
-
Dataset irds.beir.cqadupstack.english
datamaestro_text.datasets.irds.data.Adhoc
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the english StackExchange subforum.
beir/cqadupstack/gaming
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the gaming StackExchange subforum.
-
Dataset irds.beir.cqadupstack.gaming.documents
datamaestro_text.datasets.irds.data.Documents
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the gaming StackExchange subforum.
-
Dataset irds.beir.cqadupstack.gaming.queries
datamaestro_text.datasets.irds.data.Topics
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the gaming StackExchange subforum.
-
Dataset irds.beir.cqadupstack.gaming.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the gaming StackExchange subforum.
-
Dataset irds.beir.cqadupstack.gaming
datamaestro_text.datasets.irds.data.Adhoc
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the gaming StackExchange subforum.
beir/cqadupstack/gis
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the gis StackExchange subforum.
-
Dataset irds.beir.cqadupstack.gis.documents
datamaestro_text.datasets.irds.data.Documents
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the gis StackExchange subforum.
-
Dataset irds.beir.cqadupstack.gis.queries
datamaestro_text.datasets.irds.data.Topics
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the gis StackExchange subforum.
-
Dataset irds.beir.cqadupstack.gis.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the gis StackExchange subforum.
-
Dataset irds.beir.cqadupstack.gis
datamaestro_text.datasets.irds.data.Adhoc
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the gis StackExchange subforum.
beir/cqadupstack/mathematica
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the mathematica StackExchange subforum.
-
Dataset irds.beir.cqadupstack.mathematica.documents
datamaestro_text.datasets.irds.data.Documents
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the mathematica StackExchange subforum.
-
Dataset irds.beir.cqadupstack.mathematica.queries
datamaestro_text.datasets.irds.data.Topics
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the mathematica StackExchange subforum.
-
Dataset irds.beir.cqadupstack.mathematica.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the mathematica StackExchange subforum.
-
Dataset irds.beir.cqadupstack.mathematica
datamaestro_text.datasets.irds.data.Adhoc
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the mathematica StackExchange subforum.
beir/cqadupstack/physics
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the physics StackExchange subforum.
-
Dataset irds.beir.cqadupstack.physics.documents
datamaestro_text.datasets.irds.data.Documents
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the physics StackExchange subforum.
-
Dataset irds.beir.cqadupstack.physics.queries
datamaestro_text.datasets.irds.data.Topics
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the physics StackExchange subforum.
-
Dataset irds.beir.cqadupstack.physics.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the physics StackExchange subforum.
-
Dataset irds.beir.cqadupstack.physics
datamaestro_text.datasets.irds.data.Adhoc
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the physics StackExchange subforum.
beir/cqadupstack/programmers
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the programmers StackExchange subforum.
-
Dataset irds.beir.cqadupstack.programmers.documents
datamaestro_text.datasets.irds.data.Documents
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the programmers StackExchange subforum.
-
Dataset irds.beir.cqadupstack.programmers.queries
datamaestro_text.datasets.irds.data.Topics
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the programmers StackExchange subforum.
-
Dataset irds.beir.cqadupstack.programmers.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the programmers StackExchange subforum.
-
Dataset irds.beir.cqadupstack.programmers
datamaestro_text.datasets.irds.data.Adhoc
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the programmers StackExchange subforum.
beir/cqadupstack/stats
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the stats StackExchange subforum.
-
Dataset irds.beir.cqadupstack.stats.documents
datamaestro_text.datasets.irds.data.Documents
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the stats StackExchange subforum.
-
Dataset irds.beir.cqadupstack.stats.queries
datamaestro_text.datasets.irds.data.Topics
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the stats StackExchange subforum.
-
Dataset irds.beir.cqadupstack.stats.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the stats StackExchange subforum.
-
Dataset irds.beir.cqadupstack.stats
datamaestro_text.datasets.irds.data.Adhoc
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the stats StackExchange subforum.
beir/cqadupstack/tex
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the tex StackExchange subforum.
-
Dataset irds.beir.cqadupstack.tex.documents
datamaestro_text.datasets.irds.data.Documents
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the tex StackExchange subforum.
-
Dataset irds.beir.cqadupstack.tex.queries
datamaestro_text.datasets.irds.data.Topics
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the tex StackExchange subforum.
-
Dataset irds.beir.cqadupstack.tex.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the tex StackExchange subforum.
-
Dataset irds.beir.cqadupstack.tex
datamaestro_text.datasets.irds.data.Adhoc
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the tex StackExchange subforum.
beir/cqadupstack/unix
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the unix StackExchange subforum.
-
Dataset irds.beir.cqadupstack.unix.documents
datamaestro_text.datasets.irds.data.Documents
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the unix StackExchange subforum.
-
Dataset irds.beir.cqadupstack.unix.queries
datamaestro_text.datasets.irds.data.Topics
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the unix StackExchange subforum.
-
Dataset irds.beir.cqadupstack.unix.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the unix StackExchange subforum.
-
Dataset irds.beir.cqadupstack.unix
datamaestro_text.datasets.irds.data.Adhoc
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the unix StackExchange subforum.
beir/cqadupstack/webmasters
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the webmasters StackExchange subforum.
-
Dataset irds.beir.cqadupstack.webmasters.documents
datamaestro_text.datasets.irds.data.Documents
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the webmasters StackExchange subforum.
-
Dataset irds.beir.cqadupstack.webmasters.queries
datamaestro_text.datasets.irds.data.Topics
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the webmasters StackExchange subforum.
-
Dataset irds.beir.cqadupstack.webmasters.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the webmasters StackExchange subforum.
-
Dataset irds.beir.cqadupstack.webmasters
datamaestro_text.datasets.irds.data.Adhoc
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the webmasters StackExchange subforum.
beir/cqadupstack/wordpress
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the wordpress StackExchange subforum.
-
Dataset irds.beir.cqadupstack.wordpress.documents
datamaestro_text.datasets.irds.data.Documents
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the wordpress StackExchange subforum.
-
Dataset irds.beir.cqadupstack.wordpress.queries
datamaestro_text.datasets.irds.data.Topics
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the wordpress StackExchange subforum.
-
Dataset irds.beir.cqadupstack.wordpress.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the wordpress StackExchange subforum.
-
Dataset irds.beir.cqadupstack.wordpress
datamaestro_text.datasets.irds.data.Adhoc
A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the wordpress StackExchange subforum.
beir/dbpedia-entity
A version of the DBPedia-Entity-v2 dataset for entity retrieval.
-
Dataset irds.beir.dbpedia-entity.documents
datamaestro_text.datasets.irds.data.Documents
A version of the DBPedia-Entity-v2 dataset for entity retrieval.
-
Dataset irds.beir.dbpedia-entity.queries
datamaestro_text.datasets.irds.data.Topics
A version of the DBPedia-Entity-v2 dataset for entity retrieval.
-
Dataset irds.beir.dbpedia-entity.dev.queries
datamaestro_text.datasets.irds.data.Topics
A random sample of 67 queries from the official test set, used as a dev set.
-
Dataset irds.beir.dbpedia-entity.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A random sample of 67 queries from the official test set, used as a dev set.
-
Dataset irds.beir.dbpedia-entity.dev
datamaestro_text.datasets.irds.data.Adhoc
A random sample of 67 queries from the official test set, used as a dev set.
-
Dataset irds.beir.dbpedia-entity.test.queries
datamaestro_text.datasets.irds.data.Topics
A the official test set, without 67 queries used as a dev set.
-
Dataset irds.beir.dbpedia-entity.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A the official test set, without 67 queries used as a dev set.
-
Dataset irds.beir.dbpedia-entity.test
datamaestro_text.datasets.irds.data.Adhoc
A the official test set, without 67 queries used as a dev set.
beir/fever
A version of the FEVER dataset for fact verification. Includes queries from the /train /dev and /test subsets.
-
Dataset irds.beir.fever.documents
datamaestro_text.datasets.irds.data.Documents
A version of the FEVER dataset for fact verification. Includes queries from the /train /dev and /test subsets.
-
Dataset irds.beir.fever.queries
datamaestro_text.datasets.irds.data.Topics
A version of the FEVER dataset for fact verification. Includes queries from the /train /dev and /test subsets.
-
Dataset irds.beir.fever.dev.queries
datamaestro_text.datasets.irds.data.Topics
The official dev set.
-
Dataset irds.beir.fever.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The official dev set.
-
Dataset irds.beir.fever.dev
datamaestro_text.datasets.irds.data.Adhoc
The official dev set.
-
Dataset irds.beir.fever.test.queries
datamaestro_text.datasets.irds.data.Topics
The official test set.
-
Dataset irds.beir.fever.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The official test set.
-
Dataset irds.beir.fever.test
datamaestro_text.datasets.irds.data.Adhoc
The official test set.
-
Dataset irds.beir.fever.train.queries
datamaestro_text.datasets.irds.data.Topics
The official train set.
-
Dataset irds.beir.fever.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The official train set.
-
Dataset irds.beir.fever.train
datamaestro_text.datasets.irds.data.Adhoc
The official train set.
beir/fiqa
A version of the FIQA-2018 dataset (financial opinion question answering). Queries include those in the /train /dev and /test subsets.
-
Dataset irds.beir.fiqa.documents
datamaestro_text.datasets.irds.data.Documents
A version of the FIQA-2018 dataset (financial opinion question answering). Queries include those in the /train /dev and /test subsets.
-
Dataset irds.beir.fiqa.queries
datamaestro_text.datasets.irds.data.Topics
A version of the FIQA-2018 dataset (financial opinion question answering). Queries include those in the /train /dev and /test subsets.
-
Dataset irds.beir.fiqa.dev.queries
datamaestro_text.datasets.irds.data.Topics
Random sample of 500 queries from the official dataset.
-
Dataset irds.beir.fiqa.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Random sample of 500 queries from the official dataset.
-
Dataset irds.beir.fiqa.dev
datamaestro_text.datasets.irds.data.Adhoc
Random sample of 500 queries from the official dataset.
-
Dataset irds.beir.fiqa.test.queries
datamaestro_text.datasets.irds.data.Topics
Random sample of 648 queries from the official dataset.
-
Dataset irds.beir.fiqa.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Random sample of 648 queries from the official dataset.
-
Dataset irds.beir.fiqa.test
datamaestro_text.datasets.irds.data.Adhoc
Random sample of 648 queries from the official dataset.
-
Dataset irds.beir.fiqa.train.queries
datamaestro_text.datasets.irds.data.Topics
Official dataset without the 1148 queries sampled for /dev and /test.
-
Dataset irds.beir.fiqa.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official dataset without the 1148 queries sampled for /dev and /test.
-
Dataset irds.beir.fiqa.train
datamaestro_text.datasets.irds.data.Adhoc
Official dataset without the 1148 queries sampled for /dev and /test.
beir/hotpotqa
A version of the Hotpot QA dataset for multi-hop question answering. Queries include all those in /train /dev and /test.
-
Dataset irds.beir.hotpotqa.documents
datamaestro_text.datasets.irds.data.Documents
A version of the Hotpot QA dataset for multi-hop question answering. Queries include all those in /train /dev and /test.
-
Dataset irds.beir.hotpotqa.queries
datamaestro_text.datasets.irds.data.Topics
A version of the Hotpot QA dataset for multi-hop question answering. Queries include all those in /train /dev and /test.
-
Dataset irds.beir.hotpotqa.dev.queries
datamaestro_text.datasets.irds.data.Topics
Random selection of the 5447 queries from /train.
-
Dataset irds.beir.hotpotqa.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Random selection of the 5447 queries from /train.
-
Dataset irds.beir.hotpotqa.dev
datamaestro_text.datasets.irds.data.Adhoc
Random selection of the 5447 queries from /train.
-
Dataset irds.beir.hotpotqa.test.queries
datamaestro_text.datasets.irds.data.Topics
Official dev set from HotpotQA, here used as a test set.
-
Dataset irds.beir.hotpotqa.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official dev set from HotpotQA, here used as a test set.
-
Dataset irds.beir.hotpotqa.test
datamaestro_text.datasets.irds.data.Adhoc
Official dev set from HotpotQA, here used as a test set.
-
Dataset irds.beir.hotpotqa.train.queries
datamaestro_text.datasets.irds.data.Topics
Official train set, without the random selection of the 5447 queries used for /dev.
-
Dataset irds.beir.hotpotqa.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official train set, without the random selection of the 5447 queries used for /dev.
-
Dataset irds.beir.hotpotqa.train
datamaestro_text.datasets.irds.data.Adhoc
Official train set, without the random selection of the 5447 queries used for /dev.
beir/msmarco
A version of the MS MARCO passage ranking dataset. Includes queries from the /train, /dev, and /test sub-datasets.
Note that this version differs from msmarco-passage, in that it does not correct the encoding problems in the source documents.
- Leaderboard
- Dataset Paper
- See also: msmarco-passage
-
Dataset irds.beir.msmarco.documents
datamaestro_text.datasets.irds.data.Documents
A version of the MS MARCO passage ranking dataset. Includes queries from the /train, /dev, and /test sub-datasets.
Note that this version differs from msmarco-passage, in that it does not correct the encoding problems in the source documents.
- Leaderboard
- Dataset Paper
- See also: msmarco-passage
-
Dataset irds.beir.msmarco.queries
datamaestro_text.datasets.irds.data.Topics
A version of the MS MARCO passage ranking dataset. Includes queries from the /train, /dev, and /test sub-datasets.
Note that this version differs from msmarco-passage, in that it does not correct the encoding problems in the source documents.
- Leaderboard
- Dataset Paper
- See also: msmarco-passage
-
Dataset irds.beir.msmarco.dev.queries
datamaestro_text.datasets.irds.data.Topics
A version of the MS MARCO passage ranking dev set.
- See also: msmarco-passage/dev
- Dataset Paper
-
Dataset irds.beir.msmarco.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the MS MARCO passage ranking dev set.
- See also: msmarco-passage/dev
- Dataset Paper
-
Dataset irds.beir.msmarco.dev
datamaestro_text.datasets.irds.data.Adhoc
A version of the MS MARCO passage ranking dev set.
- See also: msmarco-passage/dev
- Dataset Paper
-
Dataset irds.beir.msmarco.test.queries
datamaestro_text.datasets.irds.data.Topics
A version of the TREC Deep Learning 2019 set.
-
Dataset irds.beir.msmarco.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the TREC Deep Learning 2019 set.
-
Dataset irds.beir.msmarco.test
datamaestro_text.datasets.irds.data.Adhoc
A version of the TREC Deep Learning 2019 set.
-
Dataset irds.beir.msmarco.train.queries
datamaestro_text.datasets.irds.data.Topics
A version of the MS MARCO passage ranking train set.
- See also: msmarco-passage/train
-
Dataset irds.beir.msmarco.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the MS MARCO passage ranking train set.
- See also: msmarco-passage/train
-
Dataset irds.beir.msmarco.train
datamaestro_text.datasets.irds.data.Adhoc
A version of the MS MARCO passage ranking train set.
- See also: msmarco-passage/train
beir/nfcorpus
A version of the NF Corpus (Nutrition Facts). Queries use the "title" variant of the query, which here are often natural language questions. Queries include all those from /train /dev and /test.
Data pre-processing may be different than what is done in nfcorpus.
- Dataset website
- Dataset paper
- See also: nfcorpus
-
Dataset irds.beir.nfcorpus.documents
datamaestro_text.datasets.irds.data.Documents
A version of the NF Corpus (Nutrition Facts). Queries use the "title" variant of the query, which here are often natural language questions. Queries include all those from /train /dev and /test.
Data pre-processing may be different than what is done in nfcorpus.
- Dataset website
- Dataset paper
- See also: nfcorpus
-
Dataset irds.beir.nfcorpus.queries
datamaestro_text.datasets.irds.data.Topics
A version of the NF Corpus (Nutrition Facts). Queries use the "title" variant of the query, which here are often natural language questions. Queries include all those from /train /dev and /test.
Data pre-processing may be different than what is done in nfcorpus.
- Dataset website
- Dataset paper
- See also: nfcorpus
-
Dataset irds.beir.nfcorpus.dev.queries
datamaestro_text.datasets.irds.data.Topics
Combined dev set of NFCorpus.
- See also: nfcorpus/dev
-
Dataset irds.beir.nfcorpus.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Combined dev set of NFCorpus.
- See also: nfcorpus/dev
-
Dataset irds.beir.nfcorpus.dev
datamaestro_text.datasets.irds.data.Adhoc
Combined dev set of NFCorpus.
- See also: nfcorpus/dev
-
Dataset irds.beir.nfcorpus.test.queries
datamaestro_text.datasets.irds.data.Topics
Combined test set of NFCorpus.
- See also: nfcorpus/test
-
Dataset irds.beir.nfcorpus.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Combined test set of NFCorpus.
- See also: nfcorpus/test
-
Dataset irds.beir.nfcorpus.test
datamaestro_text.datasets.irds.data.Adhoc
Combined test set of NFCorpus.
- See also: nfcorpus/test
-
Dataset irds.beir.nfcorpus.train.queries
datamaestro_text.datasets.irds.data.Topics
Combined train set of NFCorpus.
- See also: nfcorpus/train
-
Dataset irds.beir.nfcorpus.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Combined train set of NFCorpus.
- See also: nfcorpus/train
-
Dataset irds.beir.nfcorpus.train
datamaestro_text.datasets.irds.data.Adhoc
Combined train set of NFCorpus.
- See also: nfcorpus/train
beir/nq
A version of the Natural Questions dev dataset.
Data pre-processing differs both from what is done in natural-questions and dpr-w100/natural-questions, especially with respect to the document collection and filtering conducted on the queries. See the Beir paper for details.
-
Dataset irds.beir.nq.documents
datamaestro_text.datasets.irds.data.Documents
A version of the Natural Questions dev dataset.
Data pre-processing differs both from what is done in natural-questions and dpr-w100/natural-questions, especially with respect to the document collection and filtering conducted on the queries. See the Beir paper for details.
-
Dataset irds.beir.nq.queries
datamaestro_text.datasets.irds.data.Topics
A version of the Natural Questions dev dataset.
Data pre-processing differs both from what is done in natural-questions and dpr-w100/natural-questions, especially with respect to the document collection and filtering conducted on the queries. See the Beir paper for details.
-
Dataset irds.beir.nq.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the Natural Questions dev dataset.
Data pre-processing differs both from what is done in natural-questions and dpr-w100/natural-questions, especially with respect to the document collection and filtering conducted on the queries. See the Beir paper for details.
-
Dataset irds.beir.nq
datamaestro_text.datasets.irds.data.Adhoc
A version of the Natural Questions dev dataset.
Data pre-processing differs both from what is done in natural-questions and dpr-w100/natural-questions, especially with respect to the document collection and filtering conducted on the queries. See the Beir paper for details.
beir/quora
A version of the Quora duplicate question detection dataset (QQP). Includes queries from /dev and /test sets.
-
Dataset irds.beir.quora.documents
datamaestro_text.datasets.irds.data.Documents
A version of the Quora duplicate question detection dataset (QQP). Includes queries from /dev and /test sets.
-
Dataset irds.beir.quora.queries
datamaestro_text.datasets.irds.data.Topics
A version of the Quora duplicate question detection dataset (QQP). Includes queries from /dev and /test sets.
-
Dataset irds.beir.quora.dev.queries
datamaestro_text.datasets.irds.data.Topics
A 5,000 question subset of the original dataset, without overlaps in the other subsets.
-
Dataset irds.beir.quora.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A 5,000 question subset of the original dataset, without overlaps in the other subsets.
-
Dataset irds.beir.quora.dev
datamaestro_text.datasets.irds.data.Adhoc
A 5,000 question subset of the original dataset, without overlaps in the other subsets.
-
Dataset irds.beir.quora.test.queries
datamaestro_text.datasets.irds.data.Topics
A 10,000 question subset of the original dataset, without overlaps in the other subsets.
-
Dataset irds.beir.quora.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A 10,000 question subset of the original dataset, without overlaps in the other subsets.
-
Dataset irds.beir.quora.test
datamaestro_text.datasets.irds.data.Adhoc
A 10,000 question subset of the original dataset, without overlaps in the other subsets.
beir/scidocs
A version of the SciDocs dataset, used for citation retrieval.
-
Dataset irds.beir.scidocs.documents
datamaestro_text.datasets.irds.data.Documents
A version of the SciDocs dataset, used for citation retrieval.
-
Dataset irds.beir.scidocs.queries
datamaestro_text.datasets.irds.data.Topics
A version of the SciDocs dataset, used for citation retrieval.
-
Dataset irds.beir.scidocs.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the SciDocs dataset, used for citation retrieval.
-
Dataset irds.beir.scidocs
datamaestro_text.datasets.irds.data.Adhoc
A version of the SciDocs dataset, used for citation retrieval.
beir/scifact
A version of the SciFact dataset, for fact verification. Queries include those form the /train and /test sets.
-
Dataset irds.beir.scifact.documents
datamaestro_text.datasets.irds.data.Documents
A version of the SciFact dataset, for fact verification. Queries include those form the /train and /test sets.
-
Dataset irds.beir.scifact.queries
datamaestro_text.datasets.irds.data.Topics
A version of the SciFact dataset, for fact verification. Queries include those form the /train and /test sets.
-
Dataset irds.beir.scifact.test.queries
datamaestro_text.datasets.irds.data.Topics
The official dev set.
-
Dataset irds.beir.scifact.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The official dev set.
-
Dataset irds.beir.scifact.test
datamaestro_text.datasets.irds.data.Adhoc
The official dev set.
-
Dataset irds.beir.scifact.train.queries
datamaestro_text.datasets.irds.data.Topics
The official train set.
-
Dataset irds.beir.scifact.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The official train set.
-
Dataset irds.beir.scifact.train
datamaestro_text.datasets.irds.data.Adhoc
The official train set.
beir/trec-covid
A version of the TREC COVID (complete) dataset, with titles and abstracts as documents. Queries are the question variant.
Data pre-processing may be different than what is done in cord19/trec-covid.
-
Dataset irds.beir.trec-covid.documents
datamaestro_text.datasets.irds.data.Documents
A version of the TREC COVID (complete) dataset, with titles and abstracts as documents. Queries are the question variant.
Data pre-processing may be different than what is done in cord19/trec-covid.
-
Dataset irds.beir.trec-covid.queries
datamaestro_text.datasets.irds.data.Topics
A version of the TREC COVID (complete) dataset, with titles and abstracts as documents. Queries are the question variant.
Data pre-processing may be different than what is done in cord19/trec-covid.
-
Dataset irds.beir.trec-covid.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the TREC COVID (complete) dataset, with titles and abstracts as documents. Queries are the question variant.
Data pre-processing may be different than what is done in cord19/trec-covid.
-
Dataset irds.beir.trec-covid
datamaestro_text.datasets.irds.data.Adhoc
A version of the TREC COVID (complete) dataset, with titles and abstracts as documents. Queries are the question variant.
Data pre-processing may be different than what is done in cord19/trec-covid.
beir/webis-touche2020
Original version of the Touchè-2020 dataset, for argument retrieval.
-
Dataset irds.beir.webis-touche2020.documents
datamaestro_text.datasets.irds.data.Documents
Original version of the Touchè-2020 dataset, for argument retrieval.
Consider using beir/webis-touche2020/v2 instead; it uses an updated, more complete version of the qrels.
-
Dataset irds.beir.webis-touche2020.queries
datamaestro_text.datasets.irds.data.Topics
Original version of the Touchè-2020 dataset, for argument retrieval.
Consider using beir/webis-touche2020/v2 instead; it uses an updated, more complete version of the qrels.
-
Dataset irds.beir.webis-touche2020.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Original version of the Touchè-2020 dataset, for argument retrieval.
Consider using beir/webis-touche2020/v2 instead; it uses an updated, more complete version of the qrels.
-
Dataset irds.beir.webis-touche2020
datamaestro_text.datasets.irds.data.Adhoc
Original version of the Touchè-2020 dataset, for argument retrieval.
Consider using beir/webis-touche2020/v2 instead; it uses an updated, more complete version of the qrels.
beir/webis-touche2020/v2
Version 2 of the Touchè-2020 dataset, for argument retrieval. This version uses the "corrected" version of the qrels, mapped to version 1 of the corpus.
-
Dataset irds.beir.webis-touche2020.v2.documents
datamaestro_text.datasets.irds.data.Documents
Version 2 of the Touchè-2020 dataset, for argument retrieval. This version uses the "corrected" version of the qrels, mapped to version 1 of the corpus.
-
Dataset irds.beir.webis-touche2020.v2.queries
datamaestro_text.datasets.irds.data.Topics
Version 2 of the Touchè-2020 dataset, for argument retrieval. This version uses the "corrected" version of the qrels, mapped to version 1 of the corpus.
-
Dataset irds.beir.webis-touche2020.v2.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version 2 of the Touchè-2020 dataset, for argument retrieval. This version uses the "corrected" version of the qrels, mapped to version 1 of the corpus.
-
Dataset irds.beir.webis-touche2020.v2
datamaestro_text.datasets.irds.data.Adhoc
Version 2 of the Touchè-2020 dataset, for argument retrieval. This version uses the "corrected" version of the qrels, mapped to version 1 of the corpus.
c4/en-noclean-tr
The "en-noclean" train subset of the corpus, consisting of ~1B documents written in English. Document IDs are assigned as proposed by the TREC Health Misinformation 2021 track.
-
Dataset irds.c4.en-noclean-tr.documents
datamaestro_text.datasets.irds.data.Documents
The "en-noclean" train subset of the corpus, consisting of ~1B documents written in English. Document IDs are assigned as proposed by the TREC Health Misinformation 2021 track.
-
Dataset irds.c4.en-noclean-tr.trec-misinfo-2021.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Health Misinformation 2021 track.
car/v1.5
Version 1.5 of the TREC dataset. This version is used for year 1 (2017) of the TREC CAR shared task.
-
Dataset irds.car.v1.5.documents
datamaestro_text.datasets.irds.data.Documents
Version 1.5 of the TREC dataset. This version is used for year 1 (2017) of the TREC CAR shared task.
-
Dataset irds.car.v1.5.test200.queries
datamaestro_text.datasets.irds.data.Topics
Un-official test set consisting of manually-selected articles. Sometimes used as a validation set.
-
Dataset irds.car.v1.5.test200.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Un-official test set consisting of manually-selected articles. Sometimes used as a validation set.
-
Dataset irds.car.v1.5.test200
datamaestro_text.datasets.irds.data.Adhoc
Un-official test set consisting of manually-selected articles. Sometimes used as a validation set.
-
Dataset irds.car.v1.5.train.fold0.queries
datamaestro_text.datasets.irds.data.Topics
Fold 0 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)
-
Dataset irds.car.v1.5.train.fold0.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Fold 0 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)
-
Dataset irds.car.v1.5.train.fold0
datamaestro_text.datasets.irds.data.Adhoc
Fold 0 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)
-
Dataset irds.car.v1.5.train.fold1.queries
datamaestro_text.datasets.irds.data.Topics
Fold 1 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)
-
Dataset irds.car.v1.5.train.fold1.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Fold 1 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)
-
Dataset irds.car.v1.5.train.fold1
datamaestro_text.datasets.irds.data.Adhoc
Fold 1 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)
-
Dataset irds.car.v1.5.train.fold2.queries
datamaestro_text.datasets.irds.data.Topics
Fold 2 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)
-
Dataset irds.car.v1.5.train.fold2.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Fold 2 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)
-
Dataset irds.car.v1.5.train.fold2
datamaestro_text.datasets.irds.data.Adhoc
Fold 2 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)
-
Dataset irds.car.v1.5.train.fold3.queries
datamaestro_text.datasets.irds.data.Topics
Fold 3 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)
-
Dataset irds.car.v1.5.train.fold3.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Fold 3 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)
-
Dataset irds.car.v1.5.train.fold3
datamaestro_text.datasets.irds.data.Adhoc
Fold 3 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)
-
Dataset irds.car.v1.5.train.fold4.queries
datamaestro_text.datasets.irds.data.Topics
Fold 4 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)
-
Dataset irds.car.v1.5.train.fold4.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Fold 4 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)
-
Dataset irds.car.v1.5.train.fold4
datamaestro_text.datasets.irds.data.Adhoc
Fold 4 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)
-
Dataset irds.car.v1.5.trec-y1.queries
datamaestro_text.datasets.irds.data.Topics
Official test set of TREC CAR 2017 (year 1).
-
Dataset irds.car.v1.5.trec-y1.auto.queries
datamaestro_text.datasets.irds.data.Topics
Official test set of TREC CAR 2017 (year 1), using automatic relevance judgments (assumed from hierarchical structure of pages, i.e., paragraphs under a header are assumed relevant.)
-
Dataset irds.car.v1.5.trec-y1.auto.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official test set of TREC CAR 2017 (year 1), using automatic relevance judgments (assumed from hierarchical structure of pages, i.e., paragraphs under a header are assumed relevant.)
-
Dataset irds.car.v1.5.trec-y1.auto
datamaestro_text.datasets.irds.data.Adhoc
Official test set of TREC CAR 2017 (year 1), using automatic relevance judgments (assumed from hierarchical structure of pages, i.e., paragraphs under a header are assumed relevant.)
-
Dataset irds.car.v1.5.trec-y1.manual.queries
datamaestro_text.datasets.irds.data.Topics
Official test set of TREC CAR 2017 (year 1), using manual graded relevance judgments.
-
Dataset irds.car.v1.5.trec-y1.manual.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official test set of TREC CAR 2017 (year 1), using manual graded relevance judgments.
-
Dataset irds.car.v1.5.trec-y1.manual
datamaestro_text.datasets.irds.data.Adhoc
Official test set of TREC CAR 2017 (year 1), using manual graded relevance judgments.
car/v2.0
Version 2.0 of the TREC CAR dataset.
-
Dataset irds.car.v2.0.documents
datamaestro_text.datasets.irds.data.Documents
Version 2.0 of the TREC CAR dataset.
Highwire (TREC Genomics 2006-07)
Medical document collection from Highwire Press. Includes 162,259 scientific articles from 49 journals.
This dataset is used for the TREC 2006-07 TREC Genomics track.
Note that these documents are split into passages based on paragraph tags in the HTML.
- Documents: Biomedical journal articles
- Information about document collection
-
Dataset irds.highwire.documents
datamaestro_text.datasets.irds.data.Documents
Medical document collection from Highwire Press. Includes 162,259 scientific articles from 49 journals.
This dataset is used for the TREC 2006-07 TREC Genomics track.
Note that these documents are split into passages based on paragraph tags in the HTML.
- Documents: Biomedical journal articles
- Information about document collection
-
Dataset irds.highwire.trec-genomics-2006.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Genomics Track 2006 benchmark. Contains 28 queries with passage-level relevance judgments.
- Documents: Biomedical journal articles
- Queries: Natural language questions
- Qrels: deep, by passage
- Shared task data site
- Shared task paper
-
Dataset irds.highwire.trec-genomics-2006.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Genomics Track 2006 benchmark. Contains 28 queries with passage-level relevance judgments.
- Documents: Biomedical journal articles
- Queries: Natural language questions
- Qrels: deep, by passage
- Shared task data site
- Shared task paper
-
Dataset irds.highwire.trec-genomics-2006
datamaestro_text.datasets.irds.data.Adhoc
The TREC Genomics Track 2006 benchmark. Contains 28 queries with passage-level relevance judgments.
- Documents: Biomedical journal articles
- Queries: Natural language questions
- Qrels: deep, by passage
- Shared task data site
- Shared task paper
-
Dataset irds.highwire.trec-genomics-2007.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Genomics Track 2007 benchmark. Contains 36 queries with passage-level relevance judgments.
- Documents: Biomedical journal articles
- Queries: Natural language questions
- Qrels: deep, by passage
- Shared task data site
- Shared task paper
-
Dataset irds.highwire.trec-genomics-2007.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Genomics Track 2007 benchmark. Contains 36 queries with passage-level relevance judgments.
- Documents: Biomedical journal articles
- Queries: Natural language questions
- Qrels: deep, by passage
- Shared task data site
- Shared task paper
-
Dataset irds.highwire.trec-genomics-2007
datamaestro_text.datasets.irds.data.Adhoc
The TREC Genomics Track 2007 benchmark. Contains 36 queries with passage-level relevance judgments.
- Documents: Biomedical journal articles
- Queries: Natural language questions
- Qrels: deep, by passage
- Shared task data site
- Shared task paper
medline/2004
3M Medline articles including titles and abstracts, used for the TREC 2004-05 Genomics track.
- Documents: Biomedical article titles and abstracts
- Information about document collection
-
Dataset irds.medline.2004.documents
datamaestro_text.datasets.irds.data.Documents
3M Medline articles including titles and abstracts, used for the TREC 2004-05 Genomics track.
- Documents: Biomedical article titles and abstracts
- Information about document collection
-
Dataset irds.medline.2004.trec-genomics-2004.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Genomics Track 2004 benchmark. Contains 50 queries with article-level relevance judgments.
- Documents: Biomedical article titles and abstracts
- Queries: Natural language questions
- Qrels: deep, graded
- Shared task data site
- Shared task paper
-
Dataset irds.medline.2004.trec-genomics-2004.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Genomics Track 2004 benchmark. Contains 50 queries with article-level relevance judgments.
- Documents: Biomedical article titles and abstracts
- Queries: Natural language questions
- Qrels: deep, graded
- Shared task data site
- Shared task paper
-
Dataset irds.medline.2004.trec-genomics-2004
datamaestro_text.datasets.irds.data.Adhoc
The TREC Genomics Track 2004 benchmark. Contains 50 queries with article-level relevance judgments.
- Documents: Biomedical article titles and abstracts
- Queries: Natural language questions
- Qrels: deep, graded
- Shared task data site
- Shared task paper
-
Dataset irds.medline.2004.trec-genomics-2005.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Genomics Track 2005 benchmark. Contains 50 queries with article-level relevance judgments.
- Documents: Biomedical article titles and abstracts
- Queries: Natural language questions
- Qrels: deep, graded
- Shared task data site
- Shared task paper
-
Dataset irds.medline.2004.trec-genomics-2005.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Genomics Track 2005 benchmark. Contains 50 queries with article-level relevance judgments.
- Documents: Biomedical article titles and abstracts
- Queries: Natural language questions
- Qrels: deep, graded
- Shared task data site
- Shared task paper
-
Dataset irds.medline.2004.trec-genomics-2005
datamaestro_text.datasets.irds.data.Adhoc
The TREC Genomics Track 2005 benchmark. Contains 50 queries with article-level relevance judgments.
- Documents: Biomedical article titles and abstracts
- Queries: Natural language questions
- Qrels: deep, graded
- Shared task data site
- Shared task paper
medline/2017
26M Medline and AACR/ASCO Proceedings articles including titles and abstracts. This collection is used for the TREC 2017-18 TREC Precision Medicine track.
- Documents: Biomedical article titles and abstracts
- Information about document collection
-
Dataset irds.medline.2017.documents
datamaestro_text.datasets.irds.data.Documents
26M Medline and AACR/ASCO Proceedings articles including titles and abstracts. This collection is used for the TREC 2017-18 TREC Precision Medicine track.
- Documents: Biomedical article titles and abstracts
- Information about document collection
-
Dataset irds.medline.2017.trec-pm-2017.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Precision Medicine (PM) Track 2017 benchmark. Contains 30 queries containing disease, gene, and target demographic information.
- Documents: Biomedical article titles and abstracts
- Queries: Specific to TREC PM information need
- Qrels: deep, graded
- Shared task data site
- Shared task paper
-
Dataset irds.medline.2017.trec-pm-2017.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Precision Medicine (PM) Track 2017 benchmark. Contains 30 queries containing disease, gene, and target demographic information.
- Documents: Biomedical article titles and abstracts
- Queries: Specific to TREC PM information need
- Qrels: deep, graded
- Shared task data site
- Shared task paper
-
Dataset irds.medline.2017.trec-pm-2017
datamaestro_text.datasets.irds.data.Adhoc
The TREC Precision Medicine (PM) Track 2017 benchmark. Contains 30 queries containing disease, gene, and target demographic information.
- Documents: Biomedical article titles and abstracts
- Queries: Specific to TREC PM information need
- Qrels: deep, graded
- Shared task data site
- Shared task paper
-
Dataset irds.medline.2017.trec-pm-2018.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Precision Medicine (PM) Track 2018 benchmark. Contains 50 queries containing disease, gene, and target demographic information.
- Documents: Biomedical article titles and abstracts
- Queries: Specific to TREC PM information need
- Qrels: deep, graded
- Shared task data site
- Shared task paper
-
Dataset irds.medline.2017.trec-pm-2018.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Precision Medicine (PM) Track 2018 benchmark. Contains 50 queries containing disease, gene, and target demographic information.
- Documents: Biomedical article titles and abstracts
- Queries: Specific to TREC PM information need
- Qrels: deep, graded
- Shared task data site
- Shared task paper
-
Dataset irds.medline.2017.trec-pm-2018
datamaestro_text.datasets.irds.data.Adhoc
The TREC Precision Medicine (PM) Track 2018 benchmark. Contains 50 queries containing disease, gene, and target demographic information.
- Documents: Biomedical article titles and abstracts
- Queries: Specific to TREC PM information need
- Qrels: deep, graded
- Shared task data site
- Shared task paper
clinicaltrials/2017
A snapshot of ClinicalTrials.gov from April 2017 for use with the clinicaltrials/2017/trec-pm-2017 and clinicaltrials/2017/trec-pm-2018 Clinical Trials subtasks.
-
Dataset irds.clinicaltrials.2017.documents
datamaestro_text.datasets.irds.data.Documents
A snapshot of ClinicalTrials.gov from April 2017 for use with the clinicaltrials/2017/trec-pm-2017 and clinicaltrials/2017/trec-pm-2018 Clinical Trials subtasks.
-
Dataset irds.clinicaltrials.2017.trec-pm-2017.queries
datamaestro_text.datasets.irds.data.Topics
The TREC 2017 Precision Medicine clinical trials subtask.
-
Dataset irds.clinicaltrials.2017.trec-pm-2017.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC 2017 Precision Medicine clinical trials subtask.
-
Dataset irds.clinicaltrials.2017.trec-pm-2017
datamaestro_text.datasets.irds.data.Adhoc
The TREC 2017 Precision Medicine clinical trials subtask.
-
Dataset irds.clinicaltrials.2017.trec-pm-2018.queries
datamaestro_text.datasets.irds.data.Topics
The TREC 2018 Precision Medicine clinical trials subtask.
-
Dataset irds.clinicaltrials.2017.trec-pm-2018.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC 2018 Precision Medicine clinical trials subtask.
-
Dataset irds.clinicaltrials.2017.trec-pm-2018
datamaestro_text.datasets.irds.data.Adhoc
The TREC 2018 Precision Medicine clinical trials subtask.
clinicaltrials/2019
A snapshot of ClinicalTrials.gov from May 2019 for use with the clinicaltrials/2019/trec-pm-2019 Clinical Trials subtask.
-
Dataset irds.clinicaltrials.2019.documents
datamaestro_text.datasets.irds.data.Documents
A snapshot of ClinicalTrials.gov from May 2019 for use with the clinicaltrials/2019/trec-pm-2019 Clinical Trials subtask.
-
Dataset irds.clinicaltrials.2019.trec-pm-2019.queries
datamaestro_text.datasets.irds.data.Topics
The TREC 2019 Precision Medicine clinical trials subtask.
-
Dataset irds.clinicaltrials.2019.trec-pm-2019.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC 2019 Precision Medicine clinical trials subtask.
-
Dataset irds.clinicaltrials.2019.trec-pm-2019
datamaestro_text.datasets.irds.data.Adhoc
The TREC 2019 Precision Medicine clinical trials subtask.
clinicaltrials/2021
A snapshot of ClinicalTrials.gov from April 2021 for use with the TREC Clinical Trials 2021 Track.
-
Dataset irds.clinicaltrials.2021.documents
datamaestro_text.datasets.irds.data.Documents
A snapshot of ClinicalTrials.gov from April 2021 for use with the TREC Clinical Trials 2021 Track.
-
Dataset irds.clinicaltrials.2021.trec-ct-2021.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Clinical Trials 2021 track.
-
Dataset irds.clinicaltrials.2021.trec-ct-2021.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Clinical Trials 2021 track.
-
Dataset irds.clinicaltrials.2021.trec-ct-2021
datamaestro_text.datasets.irds.data.Adhoc
The TREC Clinical Trials 2021 track.
-
Dataset irds.clinicaltrials.2021.trec-ct-2022.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Clinical Trials 2022 track.
ClueWeb09
ClueWeb 2009 web document collection. Contains over 1B web pages, in 10 languages.
The dataset is obtained for a fee from CMU, and is shipped as hard drives. More information is provided here.
-
Dataset irds.clueweb09.documents
datamaestro_text.datasets.irds.data.Documents
ClueWeb 2009 web document collection. Contains over 1B web pages, in 10 languages.
The dataset is obtained for a fee from CMU, and is shipped as hard drives. More information is provided here.
-
Dataset irds.clueweb09.trec-mq-2009.queries
datamaestro_text.datasets.irds.data.Topics
TREC 2009 Million Query track.
-
Dataset irds.clueweb09.trec-mq-2009.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
TREC 2009 Million Query track.
-
Dataset irds.clueweb09.trec-mq-2009
datamaestro_text.datasets.irds.data.Adhoc
TREC 2009 Million Query track.
clueweb09/ar
Subset of ClueWeb09 with only Arabic-language documents.
-
Dataset irds.clueweb09.ar.documents
datamaestro_text.datasets.irds.data.Documents
Subset of ClueWeb09 with only Arabic-language documents.
clueweb09/catb
Subset of ClueWeb09 with the first ~50 million English-language documents. Used as a smaller collection for TREC Web Track tasks.
-
Dataset irds.clueweb09.catb.documents
datamaestro_text.datasets.irds.data.Documents
Subset of ClueWeb09 with the first ~50 million English-language documents. Used as a smaller collection for TREC Web Track tasks.
-
Dataset irds.clueweb09.catb.trec-web-2009.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Web Track 2009 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.catb.trec-web-2009.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Web Track 2009 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.catb.trec-web-2009
datamaestro_text.datasets.irds.data.Adhoc
The TREC Web Track 2009 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.catb.trec-web-2009.diversity.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Web Track 2009 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.catb.trec-web-2009.diversity.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Web Track 2009 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.catb.trec-web-2009.diversity
datamaestro_text.datasets.irds.data.Adhoc
The TREC Web Track 2009 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.catb.trec-web-2010.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Web Track 2010 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.catb.trec-web-2010.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Web Track 2010 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.catb.trec-web-2010
datamaestro_text.datasets.irds.data.Adhoc
The TREC Web Track 2010 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.catb.trec-web-2010.diversity.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Web Track 2010 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.catb.trec-web-2010.diversity.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Web Track 2010 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.catb.trec-web-2010.diversity
datamaestro_text.datasets.irds.data.Adhoc
The TREC Web Track 2010 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.catb.trec-web-2011.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Web Track 2011 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.catb.trec-web-2011.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Web Track 2011 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.catb.trec-web-2011
datamaestro_text.datasets.irds.data.Adhoc
The TREC Web Track 2011 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.catb.trec-web-2011.diversity.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Web Track 2011 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.catb.trec-web-2011.diversity.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Web Track 2011 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.catb.trec-web-2011.diversity
datamaestro_text.datasets.irds.data.Adhoc
The TREC Web Track 2011 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.catb.trec-web-2012.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Web Track 2012 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.catb.trec-web-2012.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Web Track 2012 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.catb.trec-web-2012
datamaestro_text.datasets.irds.data.Adhoc
The TREC Web Track 2012 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.catb.trec-web-2012.diversity.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Web Track 2012 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.catb.trec-web-2012.diversity.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Web Track 2012 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.catb.trec-web-2012.diversity
datamaestro_text.datasets.irds.data.Adhoc
The TREC Web Track 2012 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
clueweb09/de
Subset of ClueWeb09 with only German-language documents.
-
Dataset irds.clueweb09.de.documents
datamaestro_text.datasets.irds.data.Documents
Subset of ClueWeb09 with only German-language documents.
clueweb09/en
Subset of ClueWeb09 with only English-language documents.
-
Dataset irds.clueweb09.en.documents
datamaestro_text.datasets.irds.data.Documents
Subset of ClueWeb09 with only English-language documents.
-
Dataset irds.clueweb09.en.trec-web-2009.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Web Track 2009 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.en.trec-web-2009.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Web Track 2009 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.en.trec-web-2009
datamaestro_text.datasets.irds.data.Adhoc
The TREC Web Track 2009 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.en.trec-web-2009.diversity.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Web Track 2009 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.en.trec-web-2009.diversity.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Web Track 2009 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.en.trec-web-2009.diversity
datamaestro_text.datasets.irds.data.Adhoc
The TREC Web Track 2009 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.en.trec-web-2010.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Web Track 2010 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.en.trec-web-2010.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Web Track 2010 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.en.trec-web-2010
datamaestro_text.datasets.irds.data.Adhoc
The TREC Web Track 2010 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.en.trec-web-2010.diversity.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Web Track 2010 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.en.trec-web-2010.diversity.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Web Track 2010 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.en.trec-web-2010.diversity
datamaestro_text.datasets.irds.data.Adhoc
The TREC Web Track 2010 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.en.trec-web-2011.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Web Track 2011 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.en.trec-web-2011.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Web Track 2011 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.en.trec-web-2011
datamaestro_text.datasets.irds.data.Adhoc
The TREC Web Track 2011 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.en.trec-web-2011.diversity.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Web Track 2011 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.en.trec-web-2011.diversity.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Web Track 2011 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.en.trec-web-2011.diversity
datamaestro_text.datasets.irds.data.Adhoc
The TREC Web Track 2011 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.en.trec-web-2012.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Web Track 2012 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.en.trec-web-2012.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Web Track 2012 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.en.trec-web-2012
datamaestro_text.datasets.irds.data.Adhoc
The TREC Web Track 2012 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.en.trec-web-2012.diversity.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Web Track 2012 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.en.trec-web-2012.diversity.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Web Track 2012 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb09.en.trec-web-2012.diversity
datamaestro_text.datasets.irds.data.Adhoc
The TREC Web Track 2012 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
clueweb09/es
Subset of ClueWeb09 with only Spanish-language documents.
-
Dataset irds.clueweb09.es.documents
datamaestro_text.datasets.irds.data.Documents
Subset of ClueWeb09 with only Spanish-language documents.
clueweb09/fr
Subset of ClueWeb09 with only French-language documents.
-
Dataset irds.clueweb09.fr.documents
datamaestro_text.datasets.irds.data.Documents
Subset of ClueWeb09 with only French-language documents.
clueweb09/it
Subset of ClueWeb09 with only Italian-language documents.
-
Dataset irds.clueweb09.it.documents
datamaestro_text.datasets.irds.data.Documents
Subset of ClueWeb09 with only Italian-language documents.
clueweb09/ja
Subset of ClueWeb09 with only Japanese-language documents.
-
Dataset irds.clueweb09.ja.documents
datamaestro_text.datasets.irds.data.Documents
Subset of ClueWeb09 with only Japanese-language documents.
clueweb09/ko
Subset of ClueWeb09 with only Korean-language documents.
-
Dataset irds.clueweb09.ko.documents
datamaestro_text.datasets.irds.data.Documents
Subset of ClueWeb09 with only Korean-language documents.
clueweb09/pt
Subset of ClueWeb09 with only Portuguese-language documents.
-
Dataset irds.clueweb09.pt.documents
datamaestro_text.datasets.irds.data.Documents
Subset of ClueWeb09 with only Portuguese-language documents.
clueweb09/zh
Subset of ClueWeb09 with only Chinese-language documents.
-
Dataset irds.clueweb09.zh.documents
datamaestro_text.datasets.irds.data.Documents
Subset of ClueWeb09 with only Chinese-language documents.
ClueWeb12
ClueWeb 2012 web document collection. Contains 733M web pages.
The dataset is obtained for a fee from CMU, and is shipped as hard drives. More information is provided here.
-
Dataset irds.clueweb12.documents
datamaestro_text.datasets.irds.data.Documents
ClueWeb 2012 web document collection. Contains 733M web pages.
The dataset is obtained for a fee from CMU, and is shipped as hard drives. More information is provided here.
-
Dataset irds.clueweb12.trec-web-2013.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Web Track 2013 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb12.trec-web-2013.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Web Track 2013 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb12.trec-web-2013
datamaestro_text.datasets.irds.data.Adhoc
The TREC Web Track 2013 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb12.trec-web-2013.diversity.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Web Track 2013 diverse ranking benchmark. Contains 50 queries with deep subtopic relevance judgments.
-
Dataset irds.clueweb12.trec-web-2013.diversity.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Web Track 2013 diverse ranking benchmark. Contains 50 queries with deep subtopic relevance judgments.
-
Dataset irds.clueweb12.trec-web-2013.diversity
datamaestro_text.datasets.irds.data.Adhoc
The TREC Web Track 2013 diverse ranking benchmark. Contains 50 queries with deep subtopic relevance judgments.
-
Dataset irds.clueweb12.trec-web-2014.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Web Track 2014 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb12.trec-web-2014.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Web Track 2014 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb12.trec-web-2014
datamaestro_text.datasets.irds.data.Adhoc
The TREC Web Track 2014 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.clueweb12.trec-web-2014.diversity.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Web Track 2014 diverse ranking benchmark. Contains 50 queries with deep subtopic relevance judgments.
-
Dataset irds.clueweb12.trec-web-2014.diversity.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Web Track 2014 diverse ranking benchmark. Contains 50 queries with deep subtopic relevance judgments.
-
Dataset irds.clueweb12.trec-web-2014.diversity
datamaestro_text.datasets.irds.data.Adhoc
The TREC Web Track 2014 diverse ranking benchmark. Contains 50 queries with deep subtopic relevance judgments.
-
Dataset irds.clueweb12.touche-2020-task-2.queries
datamaestro_text.datasets.irds.data.Topics
Decision making processes, be it at the societal or at the personal level, eventually come to a point where one side will challenge the other with a why-question, which is a prompt to justify one's stance. Thus, technologies for argument mining and argumentation processing are maturing at a rapid pace, giving rise for the first time to argument retrieval. Touché 2020 is the first lab on Argument Retrieval at CLEF 2020 featuring two tasks.
Given a comparative question, retrieve and rank documents from the ClueWeb12 that help to answer the comparative question.
Documents are judged based on their general topical relevance.
-
Dataset irds.clueweb12.touche-2020-task-2.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Decision making processes, be it at the societal or at the personal level, eventually come to a point where one side will challenge the other with a why-question, which is a prompt to justify one's stance. Thus, technologies for argument mining and argumentation processing are maturing at a rapid pace, giving rise for the first time to argument retrieval. Touché 2020 is the first lab on Argument Retrieval at CLEF 2020 featuring two tasks.
Given a comparative question, retrieve and rank documents from the ClueWeb12 that help to answer the comparative question.
Documents are judged based on their general topical relevance.
-
Dataset irds.clueweb12.touche-2020-task-2
datamaestro_text.datasets.irds.data.Adhoc
Decision making processes, be it at the societal or at the personal level, eventually come to a point where one side will challenge the other with a why-question, which is a prompt to justify one's stance. Thus, technologies for argument mining and argumentation processing are maturing at a rapid pace, giving rise for the first time to argument retrieval. Touché 2020 is the first lab on Argument Retrieval at CLEF 2020 featuring two tasks.
Given a comparative question, retrieve and rank documents from the ClueWeb12 that help to answer the comparative question.
Documents are judged based on their general topical relevance.
-
Dataset irds.clueweb12.touche-2021-task-2.queries
datamaestro_text.datasets.irds.data.Topics
Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2021 is the second lab on argument retrieval at CLEF 2021 featuring two tasks.
Given a comparative question, retrieve and rank documents from the ClueWeb12 that help to answer the comparative question.
Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.
-
Dataset irds.clueweb12.touche-2021-task-2.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2021 is the second lab on argument retrieval at CLEF 2021 featuring two tasks.
Given a comparative question, retrieve and rank documents from the ClueWeb12 that help to answer the comparative question.
Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.
-
Dataset irds.clueweb12.touche-2021-task-2
datamaestro_text.datasets.irds.data.Adhoc
Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2021 is the second lab on argument retrieval at CLEF 2021 featuring two tasks.
Given a comparative question, retrieve and rank documents from the ClueWeb12 that help to answer the comparative question.
Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.
clueweb12/b13
Official subset of the ClueWeb12 datasets with 52M web pages.
-
Dataset irds.clueweb12.b13.documents
datamaestro_text.datasets.irds.data.Documents
Official subset of the ClueWeb12 datasets with 52M web pages.
-
Dataset irds.clueweb12.b13.clef-ehealth.queries
datamaestro_text.datasets.irds.data.Topics
The CLEF eHealth 2016-17 IR dataset. Contains consumer health queries and judgments containing trustworthiness and understandability scores, in addition to the normal relevance assessments.
This dataset contains the combined 2016 and 2017 relevance judgments, since the same queries were used in the two year. The assessment year can be distinguished using iteration (2016 is iteration 0, 2017 is iteration 1).
-
Dataset irds.clueweb12.b13.clef-ehealth.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The CLEF eHealth 2016-17 IR dataset. Contains consumer health queries and judgments containing trustworthiness and understandability scores, in addition to the normal relevance assessments.
This dataset contains the combined 2016 and 2017 relevance judgments, since the same queries were used in the two year. The assessment year can be distinguished using iteration (2016 is iteration 0, 2017 is iteration 1).
-
Dataset irds.clueweb12.b13.clef-ehealth
datamaestro_text.datasets.irds.data.Adhoc
The CLEF eHealth 2016-17 IR dataset. Contains consumer health queries and judgments containing trustworthiness and understandability scores, in addition to the normal relevance assessments.
This dataset contains the combined 2016 and 2017 relevance judgments, since the same queries were used in the two year. The assessment year can be distinguished using iteration (2016 is iteration 0, 2017 is iteration 1).
-
Dataset irds.clueweb12.b13.clef-ehealth.cs.queries
datamaestro_text.datasets.irds.data.Topics
The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Czech. See clueweb12/b13/clef-ehealth for more details.
-
Dataset irds.clueweb12.b13.clef-ehealth.cs.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Czech. See clueweb12/b13/clef-ehealth for more details.
-
Dataset irds.clueweb12.b13.clef-ehealth.cs
datamaestro_text.datasets.irds.data.Adhoc
The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Czech. See clueweb12/b13/clef-ehealth for more details.
-
Dataset irds.clueweb12.b13.clef-ehealth.de.queries
datamaestro_text.datasets.irds.data.Topics
The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to German. See clueweb12/b13/clef-ehealth for more details.
-
Dataset irds.clueweb12.b13.clef-ehealth.de.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to German. See clueweb12/b13/clef-ehealth for more details.
-
Dataset irds.clueweb12.b13.clef-ehealth.de
datamaestro_text.datasets.irds.data.Adhoc
The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to German. See clueweb12/b13/clef-ehealth for more details.
-
Dataset irds.clueweb12.b13.clef-ehealth.fr.queries
datamaestro_text.datasets.irds.data.Topics
The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to French. See clueweb12/b13/clef-ehealth for more details.
-
Dataset irds.clueweb12.b13.clef-ehealth.fr.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to French. See clueweb12/b13/clef-ehealth for more details.
-
Dataset irds.clueweb12.b13.clef-ehealth.fr
datamaestro_text.datasets.irds.data.Adhoc
The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to French. See clueweb12/b13/clef-ehealth for more details.
-
Dataset irds.clueweb12.b13.clef-ehealth.hu.queries
datamaestro_text.datasets.irds.data.Topics
The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Hungarian. See clueweb12/b13/clef-ehealth for more details.
-
Dataset irds.clueweb12.b13.clef-ehealth.hu.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Hungarian. See clueweb12/b13/clef-ehealth for more details.
-
Dataset irds.clueweb12.b13.clef-ehealth.hu
datamaestro_text.datasets.irds.data.Adhoc
The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Hungarian. See clueweb12/b13/clef-ehealth for more details.
-
Dataset irds.clueweb12.b13.clef-ehealth.pl.queries
datamaestro_text.datasets.irds.data.Topics
The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Polish. See clueweb12/b13/clef-ehealth for more details.
-
Dataset irds.clueweb12.b13.clef-ehealth.pl.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Polish. See clueweb12/b13/clef-ehealth for more details.
-
Dataset irds.clueweb12.b13.clef-ehealth.pl
datamaestro_text.datasets.irds.data.Adhoc
The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Polish. See clueweb12/b13/clef-ehealth for more details.
-
Dataset irds.clueweb12.b13.clef-ehealth.sv.queries
datamaestro_text.datasets.irds.data.Topics
The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Swedish. See clueweb12/b13/clef-ehealth for more details.
-
Dataset irds.clueweb12.b13.clef-ehealth.sv.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Swedish. See clueweb12/b13/clef-ehealth for more details.
-
Dataset irds.clueweb12.b13.clef-ehealth.sv
datamaestro_text.datasets.irds.data.Adhoc
The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Swedish. See clueweb12/b13/clef-ehealth for more details.
-
Dataset irds.clueweb12.b13.ntcir-www-1.queries
datamaestro_text.datasets.irds.data.Topics
The NTCIR-13 We Want Web (WWW) 1 ad-hoc ranking benchmark. Contains 100 queries with deep relevance judgments (avg 255 per query). Judgments aggregated from two assessors. Note that the qrels contain additional judgments from the NTCIR-14 CENTRE track.
-
Dataset irds.clueweb12.b13.ntcir-www-1.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The NTCIR-13 We Want Web (WWW) 1 ad-hoc ranking benchmark. Contains 100 queries with deep relevance judgments (avg 255 per query). Judgments aggregated from two assessors. Note that the qrels contain additional judgments from the NTCIR-14 CENTRE track.
-
Dataset irds.clueweb12.b13.ntcir-www-1
datamaestro_text.datasets.irds.data.Adhoc
The NTCIR-13 We Want Web (WWW) 1 ad-hoc ranking benchmark. Contains 100 queries with deep relevance judgments (avg 255 per query). Judgments aggregated from two assessors. Note that the qrels contain additional judgments from the NTCIR-14 CENTRE track.
-
Dataset irds.clueweb12.b13.ntcir-www-2.queries
datamaestro_text.datasets.irds.data.Topics
The NTCIR-14 We Want Web (WWW) 2 ad-hoc ranking benchmark. Contains 80 queries with deep relevance judgments (avg 345 per query). Judgments aggregated from two assessors.
-
Dataset irds.clueweb12.b13.ntcir-www-2.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The NTCIR-14 We Want Web (WWW) 2 ad-hoc ranking benchmark. Contains 80 queries with deep relevance judgments (avg 345 per query). Judgments aggregated from two assessors.
-
Dataset irds.clueweb12.b13.ntcir-www-2
datamaestro_text.datasets.irds.data.Adhoc
The NTCIR-14 We Want Web (WWW) 2 ad-hoc ranking benchmark. Contains 80 queries with deep relevance judgments (avg 345 per query). Judgments aggregated from two assessors.
-
Dataset irds.clueweb12.b13.ntcir-www-3.queries
datamaestro_text.datasets.irds.data.Topics
The NTCIR-15 We Want Web (WWW) 3 ad-hoc ranking benchmark. Contains 160 queries with deep relevance judgments (to be released). 80 of the queries are from clueweb12/b13/ntcir-www-2.
-
Dataset irds.clueweb12.b13.trec-misinfo-2019.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Medical Misinformation 2019 dataset.
-
Dataset irds.clueweb12.b13.trec-misinfo-2019.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Medical Misinformation 2019 dataset.
-
Dataset irds.clueweb12.b13.trec-misinfo-2019
datamaestro_text.datasets.irds.data.Adhoc
The TREC Medical Misinformation 2019 dataset.
CODEC
CODEC Document Ranking sub-task.
- Documents: curated web articles
- Queries: challenging, entity-focused queries
- Task Repository
- See also: kilt/codec, the entity ranking subtask
-
Dataset irds.codec.documents
datamaestro_text.datasets.irds.data.Documents
CODEC Document Ranking sub-task.
- Documents: curated web articles
- Queries: challenging, entity-focused queries
- Task Repository
- See also: kilt/codec, the entity ranking subtask
-
Dataset irds.codec.queries
datamaestro_text.datasets.irds.data.Topics
CODEC Document Ranking sub-task.
- Documents: curated web articles
- Queries: challenging, entity-focused queries
- Task Repository
- See also: kilt/codec, the entity ranking subtask
-
Dataset irds.codec.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
CODEC Document Ranking sub-task.
- Documents: curated web articles
- Queries: challenging, entity-focused queries
- Task Repository
- See also: kilt/codec, the entity ranking subtask
-
Dataset irds.codec
datamaestro_text.datasets.irds.data.Adhoc
CODEC Document Ranking sub-task.
- Documents: curated web articles
- Queries: challenging, entity-focused queries
- Task Repository
- See also: kilt/codec, the entity ranking subtask
-
Dataset irds.codec.economics.queries
datamaestro_text.datasets.irds.data.Topics
Subset of codec that only contains topics about economics.
-
Dataset irds.codec.economics.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Subset of codec that only contains topics about economics.
-
Dataset irds.codec.economics
datamaestro_text.datasets.irds.data.Adhoc
Subset of codec that only contains topics about economics.
-
Dataset irds.codec.history.queries
datamaestro_text.datasets.irds.data.Topics
Subset of codec that only contains topics about history.
-
Dataset irds.codec.history.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Subset of codec that only contains topics about history.
-
Dataset irds.codec.history
datamaestro_text.datasets.irds.data.Adhoc
Subset of codec that only contains topics about history.
-
Dataset irds.codec.politics.queries
datamaestro_text.datasets.irds.data.Topics
Subset of codec that only contains topics about politics.
-
Dataset irds.codec.politics.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Subset of codec that only contains topics about politics.
-
Dataset irds.codec.politics
datamaestro_text.datasets.irds.data.Adhoc
Subset of codec that only contains topics about politics.
CORD-19
Collection of scientific articles related to COVID-19.
Uses the 2020-07-16 version of the dataset, corresponding to the "complete" collection used for TREC COVID.
Note that this version of the document collection only provides article meta-data. To get the full text, use cord19/fulltext.
-
Dataset irds.cord19.documents
datamaestro_text.datasets.irds.data.Documents
Collection of scientific articles related to COVID-19.
Uses the 2020-07-16 version of the dataset, corresponding to the "complete" collection used for TREC COVID.
Note that this version of the document collection only provides article meta-data. To get the full text, use cord19/fulltext.
-
Dataset irds.cord19.trec-covid.queries
datamaestro_text.datasets.irds.data.Topics
The Complete TREC COVID collection. Queries related to COVID-19, including deep relevance judgments.
-
Dataset irds.cord19.trec-covid.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The Complete TREC COVID collection. Queries related to COVID-19, including deep relevance judgments.
-
Dataset irds.cord19.trec-covid
datamaestro_text.datasets.irds.data.Adhoc
The Complete TREC COVID collection. Queries related to COVID-19, including deep relevance judgments.
-
Dataset irds.cord19.trec-covid.round5.queries
datamaestro_text.datasets.irds.data.Topics
Round 5 of the TREC COVID task. Includes 50 queries related to COVID-19. This uses the "2020-07-16" version of the collection.
Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).
-
Dataset irds.cord19.trec-covid.round5.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Round 5 of the TREC COVID task. Includes 50 queries related to COVID-19. This uses the "2020-07-16" version of the collection.
Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).
-
Dataset irds.cord19.trec-covid.round5
datamaestro_text.datasets.irds.data.Adhoc
Round 5 of the TREC COVID task. Includes 50 queries related to COVID-19. This uses the "2020-07-16" version of the collection.
Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).
cord19/fulltext
Version of cord19 dataset that includes article full texts. This dataset takes longer to load than the version that only includes article meata-data.
-
Dataset irds.cord19.fulltext.documents
datamaestro_text.datasets.irds.data.Documents
Version of cord19 dataset that includes article full texts. This dataset takes longer to load than the version that only includes article meata-data.
-
Dataset irds.cord19.fulltext.trec-covid.queries
datamaestro_text.datasets.irds.data.Topics
Version of cord19/trec-covid dataset that includes article full texts. This dataset takes longer to load than the version that only includes article meata-data.
Queries and qrels are the same as cord19/trec-covid; it just uses the extended documents from cord19/fulltext.
-
Dataset irds.cord19.fulltext.trec-covid.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of cord19/trec-covid dataset that includes article full texts. This dataset takes longer to load than the version that only includes article meata-data.
Queries and qrels are the same as cord19/trec-covid; it just uses the extended documents from cord19/fulltext.
-
Dataset irds.cord19.fulltext.trec-covid
datamaestro_text.datasets.irds.data.Adhoc
Version of cord19/trec-covid dataset that includes article full texts. This dataset takes longer to load than the version that only includes article meata-data.
Queries and qrels are the same as cord19/trec-covid; it just uses the extended documents from cord19/fulltext.
cord19/trec-covid/round1
Round 1 of the TREC COVID task. Includes 30 queries related to COVID-19. This uses the "2020-04-10" version of the collection.
-
Dataset irds.cord19.trec-covid.round1.documents
datamaestro_text.datasets.irds.data.Documents
Round 1 of the TREC COVID task. Includes 30 queries related to COVID-19. This uses the "2020-04-10" version of the collection.
-
Dataset irds.cord19.trec-covid.round1.queries
datamaestro_text.datasets.irds.data.Topics
Round 1 of the TREC COVID task. Includes 30 queries related to COVID-19. This uses the "2020-04-10" version of the collection.
-
Dataset irds.cord19.trec-covid.round1.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Round 1 of the TREC COVID task. Includes 30 queries related to COVID-19. This uses the "2020-04-10" version of the collection.
-
Dataset irds.cord19.trec-covid.round1
datamaestro_text.datasets.irds.data.Adhoc
Round 1 of the TREC COVID task. Includes 30 queries related to COVID-19. This uses the "2020-04-10" version of the collection.
cord19/trec-covid/round2
Round 2 of the TREC COVID task. Includes 35 queries related to COVID-19. This uses the "2020-05-01" version of the collection.
Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).
-
Dataset irds.cord19.trec-covid.round2.documents
datamaestro_text.datasets.irds.data.Documents
Round 2 of the TREC COVID task. Includes 35 queries related to COVID-19. This uses the "2020-05-01" version of the collection.
Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).
-
Dataset irds.cord19.trec-covid.round2.queries
datamaestro_text.datasets.irds.data.Topics
Round 2 of the TREC COVID task. Includes 35 queries related to COVID-19. This uses the "2020-05-01" version of the collection.
Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).
-
Dataset irds.cord19.trec-covid.round2.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Round 2 of the TREC COVID task. Includes 35 queries related to COVID-19. This uses the "2020-05-01" version of the collection.
Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).
-
Dataset irds.cord19.trec-covid.round2
datamaestro_text.datasets.irds.data.Adhoc
Round 2 of the TREC COVID task. Includes 35 queries related to COVID-19. This uses the "2020-05-01" version of the collection.
Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).
cord19/trec-covid/round3
Round 3 of the TREC COVID task. Includes 40 queries related to COVID-19. This uses the "2020-05-19" version of the collection.
Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).
-
Dataset irds.cord19.trec-covid.round3.documents
datamaestro_text.datasets.irds.data.Documents
Round 3 of the TREC COVID task. Includes 40 queries related to COVID-19. This uses the "2020-05-19" version of the collection.
Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).
-
Dataset irds.cord19.trec-covid.round3.queries
datamaestro_text.datasets.irds.data.Topics
Round 3 of the TREC COVID task. Includes 40 queries related to COVID-19. This uses the "2020-05-19" version of the collection.
Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).
-
Dataset irds.cord19.trec-covid.round3.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Round 3 of the TREC COVID task. Includes 40 queries related to COVID-19. This uses the "2020-05-19" version of the collection.
Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).
-
Dataset irds.cord19.trec-covid.round3
datamaestro_text.datasets.irds.data.Adhoc
Round 3 of the TREC COVID task. Includes 40 queries related to COVID-19. This uses the "2020-05-19" version of the collection.
Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).
cord19/trec-covid/round4
Round 4 of the TREC COVID task. Includes 45 queries related to COVID-19. This uses the "2020-06-19" version of the collection.
Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).
-
Dataset irds.cord19.trec-covid.round4.documents
datamaestro_text.datasets.irds.data.Documents
Round 4 of the TREC COVID task. Includes 45 queries related to COVID-19. This uses the "2020-06-19" version of the collection.
Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).
-
Dataset irds.cord19.trec-covid.round4.queries
datamaestro_text.datasets.irds.data.Topics
Round 4 of the TREC COVID task. Includes 45 queries related to COVID-19. This uses the "2020-06-19" version of the collection.
Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).
-
Dataset irds.cord19.trec-covid.round4.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Round 4 of the TREC COVID task. Includes 45 queries related to COVID-19. This uses the "2020-06-19" version of the collection.
Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).
-
Dataset irds.cord19.trec-covid.round4
datamaestro_text.datasets.irds.data.Adhoc
Round 4 of the TREC COVID task. Includes 45 queries related to COVID-19. This uses the "2020-06-19" version of the collection.
Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).
Cranfield
A small corpus of 1,400 scientific abstracts.
- Documents: Scientific abstracts
- Queries: Natural language questions
- Dataset Information
-
Dataset irds.cranfield.documents
datamaestro_text.datasets.irds.data.Documents
A small corpus of 1,400 scientific abstracts.
- Documents: Scientific abstracts
- Queries: Natural language questions
- Dataset Information
-
Dataset irds.cranfield.queries
datamaestro_text.datasets.irds.data.Topics
A small corpus of 1,400 scientific abstracts.
- Documents: Scientific abstracts
- Queries: Natural language questions
- Dataset Information
-
Dataset irds.cranfield.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A small corpus of 1,400 scientific abstracts.
- Documents: Scientific abstracts
- Queries: Natural language questions
- Dataset Information
-
Dataset irds.cranfield
datamaestro_text.datasets.irds.data.Adhoc
A small corpus of 1,400 scientific abstracts.
- Documents: Scientific abstracts
- Queries: Natural language questions
- Dataset Information
CSL
The CSL dataset, used for the TREC NueCLIR technical document task.
-
Dataset irds.csl.documents
datamaestro_text.datasets.irds.data.Documents
The CSL dataset, used for the TREC NueCLIR technical document task.
-
Dataset irds.csl.trec-2023.queries
datamaestro_text.datasets.irds.data.Topics
The TREC NeuCLIR 2023 technical documen task.
-
Dataset irds.csl.trec-2023.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC NeuCLIR 2023 technical documen task.
-
Dataset irds.csl.trec-2023
datamaestro_text.datasets.irds.data.Adhoc
The TREC NeuCLIR 2023 technical documen task.
disks45/nocr
A version of disks45 without the Congressional Record. This is the typical setting for tasks like TREC 7, TREC 8, and TREC Robust 2004.
-
Dataset irds.disks45.nocr.documents
datamaestro_text.datasets.irds.data.Documents
A version of disks45 without the Congressional Record. This is the typical setting for tasks like TREC 7, TREC 8, and TREC Robust 2004.
-
Dataset irds.disks45.nocr.trec-robust-2004.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Robust retrieval task focuses on "improving the consistency of retrieval technology by focusing on poorly performing topics."
The TREC Robust document collection is from TREC disks 4 and 5. Due to the copyrighted nature of the documents, this collection is for research use only, which requires agreements to be filed with NIST. See details here.
- Documents: News articles
- Queries: keyword queries, descriptions, narratives
- Relevance: Deep judgments
- Task Overview Paper
- See also: aquaint/trec-robust-2005
-
Dataset irds.disks45.nocr.trec-robust-2004.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Robust retrieval task focuses on "improving the consistency of retrieval technology by focusing on poorly performing topics."
The TREC Robust document collection is from TREC disks 4 and 5. Due to the copyrighted nature of the documents, this collection is for research use only, which requires agreements to be filed with NIST. See details here.
- Documents: News articles
- Queries: keyword queries, descriptions, narratives
- Relevance: Deep judgments
- Task Overview Paper
- See also: aquaint/trec-robust-2005
-
Dataset irds.disks45.nocr.trec-robust-2004
datamaestro_text.datasets.irds.data.Adhoc
The TREC Robust retrieval task focuses on "improving the consistency of retrieval technology by focusing on poorly performing topics."
The TREC Robust document collection is from TREC disks 4 and 5. Due to the copyrighted nature of the documents, this collection is for research use only, which requires agreements to be filed with NIST. See details here.
- Documents: News articles
- Queries: keyword queries, descriptions, narratives
- Relevance: Deep judgments
- Task Overview Paper
- See also: aquaint/trec-robust-2005
-
Dataset irds.disks45.nocr.trec-robust-2004.fold1.queries
datamaestro_text.datasets.irds.data.Topics
Robust04 Fold 1 (Title) proposed by Huston & Croft (2014) and used in numerous works
-
Dataset irds.disks45.nocr.trec-robust-2004.fold1.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Robust04 Fold 1 (Title) proposed by Huston & Croft (2014) and used in numerous works
-
Dataset irds.disks45.nocr.trec-robust-2004.fold1
datamaestro_text.datasets.irds.data.Adhoc
Robust04 Fold 1 (Title) proposed by Huston & Croft (2014) and used in numerous works
-
Dataset irds.disks45.nocr.trec-robust-2004.fold2.queries
datamaestro_text.datasets.irds.data.Topics
Robust04 Fold 2 (Title) proposed by Huston & Croft (2014) and used in numerous works
-
Dataset irds.disks45.nocr.trec-robust-2004.fold2.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Robust04 Fold 2 (Title) proposed by Huston & Croft (2014) and used in numerous works
-
Dataset irds.disks45.nocr.trec-robust-2004.fold2
datamaestro_text.datasets.irds.data.Adhoc
Robust04 Fold 2 (Title) proposed by Huston & Croft (2014) and used in numerous works
-
Dataset irds.disks45.nocr.trec-robust-2004.fold3.queries
datamaestro_text.datasets.irds.data.Topics
Robust04 Fold 3 (Title) proposed by Huston & Croft (2014) and used in numerous works
-
Dataset irds.disks45.nocr.trec-robust-2004.fold3.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Robust04 Fold 3 (Title) proposed by Huston & Croft (2014) and used in numerous works
-
Dataset irds.disks45.nocr.trec-robust-2004.fold3
datamaestro_text.datasets.irds.data.Adhoc
Robust04 Fold 3 (Title) proposed by Huston & Croft (2014) and used in numerous works
-
Dataset irds.disks45.nocr.trec-robust-2004.fold4.queries
datamaestro_text.datasets.irds.data.Topics
Robust04 Fold 4 (Title) proposed by Huston & Croft (2014) and used in numerous works
-
Dataset irds.disks45.nocr.trec-robust-2004.fold4.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Robust04 Fold 4 (Title) proposed by Huston & Croft (2014) and used in numerous works
-
Dataset irds.disks45.nocr.trec-robust-2004.fold4
datamaestro_text.datasets.irds.data.Adhoc
Robust04 Fold 4 (Title) proposed by Huston & Croft (2014) and used in numerous works
-
Dataset irds.disks45.nocr.trec-robust-2004.fold5.queries
datamaestro_text.datasets.irds.data.Topics
Robust04 Fold 5 (Title) proposed by Huston & Croft (2014) and used in numerous works
-
Dataset irds.disks45.nocr.trec-robust-2004.fold5.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Robust04 Fold 5 (Title) proposed by Huston & Croft (2014) and used in numerous works
-
Dataset irds.disks45.nocr.trec-robust-2004.fold5
datamaestro_text.datasets.irds.data.Adhoc
Robust04 Fold 5 (Title) proposed by Huston & Croft (2014) and used in numerous works
-
Dataset irds.disks45.nocr.trec7.queries
datamaestro_text.datasets.irds.data.Topics
The TREC 7 Adhoc Retrieval track.
-
Dataset irds.disks45.nocr.trec7.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC 7 Adhoc Retrieval track.
-
Dataset irds.disks45.nocr.trec7
datamaestro_text.datasets.irds.data.Adhoc
The TREC 7 Adhoc Retrieval track.
-
Dataset irds.disks45.nocr.trec8.queries
datamaestro_text.datasets.irds.data.Topics
The TREC 8 Adhoc Retrieval track.
-
Dataset irds.disks45.nocr.trec8.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC 8 Adhoc Retrieval track.
-
Dataset irds.disks45.nocr.trec8
datamaestro_text.datasets.irds.data.Adhoc
The TREC 8 Adhoc Retrieval track.
DPR Wiki100
A wikipedia dump from 20 December, 2018, split into passages of 100 words. Used in experiments in the DPR paper (and other subsequent works) for retrieval experiments over Q&A collections.
-
Dataset irds.dpr-w100.documents
datamaestro_text.datasets.irds.data.Documents
A wikipedia dump from 20 December, 2018, split into passages of 100 words. Used in experiments in the DPR paper (and other subsequent works) for retrieval experiments over Q&A collections.
-
Dataset irds.dpr-w100.natural-questions.dev.queries
datamaestro_text.datasets.irds.data.Topics
Dev subset from the Natural Questions Q&A collection. This differs from the natural-questions/dev dataset in that it uses the full Wikipedia dump and additional filtering (described in the DPR paper) was applied.
- See also: natural-questions
-
Dataset irds.dpr-w100.natural-questions.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Dev subset from the Natural Questions Q&A collection. This differs from the natural-questions/dev dataset in that it uses the full Wikipedia dump and additional filtering (described in the DPR paper) was applied.
- See also: natural-questions
-
Dataset irds.dpr-w100.natural-questions.dev
datamaestro_text.datasets.irds.data.Adhoc
Dev subset from the Natural Questions Q&A collection. This differs from the natural-questions/dev dataset in that it uses the full Wikipedia dump and additional filtering (described in the DPR paper) was applied.
- See also: natural-questions
-
Dataset irds.dpr-w100.natural-questions.train.queries
datamaestro_text.datasets.irds.data.Topics
Training subset from the Natural Questions Q&A collection. This differs from the natural-questions/train dataset in that it uses the full Wikipedia dump and additional filtering (described in the DPR paper) was applied.
- See also: natural-questions
-
Dataset irds.dpr-w100.natural-questions.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Training subset from the Natural Questions Q&A collection. This differs from the natural-questions/train dataset in that it uses the full Wikipedia dump and additional filtering (described in the DPR paper) was applied.
- See also: natural-questions
-
Dataset irds.dpr-w100.natural-questions.train
datamaestro_text.datasets.irds.data.Adhoc
Training subset from the Natural Questions Q&A collection. This differs from the natural-questions/train dataset in that it uses the full Wikipedia dump and additional filtering (described in the DPR paper) was applied.
- See also: natural-questions
-
Dataset irds.dpr-w100.trivia-qa.dev.queries
datamaestro_text.datasets.irds.data.Topics
Dev subset from the Trivia QA dataset. Differing from the official Trivia QA collection, this uses the DPR Wikipedia dump as the source collection. Refer to the DPR paper for more details.
-
Dataset irds.dpr-w100.trivia-qa.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Dev subset from the Trivia QA dataset. Differing from the official Trivia QA collection, this uses the DPR Wikipedia dump as the source collection. Refer to the DPR paper for more details.
-
Dataset irds.dpr-w100.trivia-qa.dev
datamaestro_text.datasets.irds.data.Adhoc
Dev subset from the Trivia QA dataset. Differing from the official Trivia QA collection, this uses the DPR Wikipedia dump as the source collection. Refer to the DPR paper for more details.
-
Dataset irds.dpr-w100.trivia-qa.train.queries
datamaestro_text.datasets.irds.data.Topics
Training subset from the Trivia QA dataset. Differing from the official Trivia QA collection, this uses the DPR Wikipedia dump as the source collection. Refer to the DPR paper for more details.
-
Dataset irds.dpr-w100.trivia-qa.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Training subset from the Trivia QA dataset. Differing from the official Trivia QA collection, this uses the DPR Wikipedia dump as the source collection. Refer to the DPR paper for more details.
-
Dataset irds.dpr-w100.trivia-qa.train
datamaestro_text.datasets.irds.data.Adhoc
Training subset from the Trivia QA dataset. Differing from the official Trivia QA collection, this uses the DPR Wikipedia dump as the source collection. Refer to the DPR paper for more details.
CodeSearchNet
A benchmark for semantic code search. Uses
- Documents: Code functions in python, java, go, php, ruby, and javascript
- Queries: Inferred from docstrings, or
- Dataset Paper
- Challenge Task Leaderboard
-
Dataset irds.codesearchnet.documents
datamaestro_text.datasets.irds.data.Documents
A benchmark for semantic code search. Uses
- Documents: Code functions in python, java, go, php, ruby, and javascript
- Queries: Inferred from docstrings, or
- Dataset Paper
- Challenge Task Leaderboard
-
Dataset irds.codesearchnet.challenge.queries
datamaestro_text.datasets.irds.data.Topics
Official challenge set, with keyword queries and deep relevance assessments.
-
Dataset irds.codesearchnet.challenge.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official challenge set, with keyword queries and deep relevance assessments.
-
Dataset irds.codesearchnet.challenge
datamaestro_text.datasets.irds.data.Adhoc
Official challenge set, with keyword queries and deep relevance assessments.
-
Dataset irds.codesearchnet.test.queries
datamaestro_text.datasets.irds.data.Topics
Official test set, using queries inferred from docstrings.
-
Dataset irds.codesearchnet.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official test set, using queries inferred from docstrings.
-
Dataset irds.codesearchnet.test
datamaestro_text.datasets.irds.data.Adhoc
Official test set, using queries inferred from docstrings.
-
Dataset irds.codesearchnet.train.queries
datamaestro_text.datasets.irds.data.Topics
Official train set, using queries inferred from docstrings.
-
Dataset irds.codesearchnet.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official train set, using queries inferred from docstrings.
-
Dataset irds.codesearchnet.train
datamaestro_text.datasets.irds.data.Adhoc
Official train set, using queries inferred from docstrings.
-
Dataset irds.codesearchnet.valid.queries
datamaestro_text.datasets.irds.data.Topics
Official validation set, using queries inferred from docstrings.
-
Dataset irds.codesearchnet.valid.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official validation set, using queries inferred from docstrings.
-
Dataset irds.codesearchnet.valid
datamaestro_text.datasets.irds.data.Adhoc
Official validation set, using queries inferred from docstrings.
GOV
GOV web document collection. Used for early TREC Web Tracks. Not to be confused with gov2.
The dataset is obtained for a fee from UoG, and is shipped as a hard drive. More information is provided here.
-
Dataset irds.gov.documents
datamaestro_text.datasets.irds.data.Documents
GOV web document collection. Used for early TREC Web Tracks. Not to be confused with gov2.
The dataset is obtained for a fee from UoG, and is shipped as a hard drive. More information is provided here.
-
Dataset irds.gov.trec-web-2002.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Web Track 2002 ad-hoc ranking benchmark.
-
Dataset irds.gov.trec-web-2002.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Web Track 2002 ad-hoc ranking benchmark.
-
Dataset irds.gov.trec-web-2002
datamaestro_text.datasets.irds.data.Adhoc
The TREC Web Track 2002 ad-hoc ranking benchmark.
-
Dataset irds.gov.trec-web-2002.named-page.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Web Track 2002 named page ranking benchmark.
-
Dataset irds.gov.trec-web-2002.named-page.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Web Track 2002 named page ranking benchmark.
-
Dataset irds.gov.trec-web-2002.named-page
datamaestro_text.datasets.irds.data.Adhoc
The TREC Web Track 2002 named page ranking benchmark.
-
Dataset irds.gov.trec-web-2003.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Web Track 2003 ad-hoc ranking benchmark.
-
Dataset irds.gov.trec-web-2003.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Web Track 2003 ad-hoc ranking benchmark.
-
Dataset irds.gov.trec-web-2003
datamaestro_text.datasets.irds.data.Adhoc
The TREC Web Track 2003 ad-hoc ranking benchmark.
-
Dataset irds.gov.trec-web-2003.named-page.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Web Track 2003 named page ranking benchmark.
-
Dataset irds.gov.trec-web-2003.named-page.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Web Track 2003 named page ranking benchmark.
-
Dataset irds.gov.trec-web-2003.named-page
datamaestro_text.datasets.irds.data.Adhoc
The TREC Web Track 2003 named page ranking benchmark.
-
Dataset irds.gov.trec-web-2004.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Web Track 2004 ad-hoc ranking benchmark.
Queries include a combination of topic distillation, homepage finding, and named page finding.
-
Dataset irds.gov.trec-web-2004.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Web Track 2004 ad-hoc ranking benchmark.
Queries include a combination of topic distillation, homepage finding, and named page finding.
-
Dataset irds.gov.trec-web-2004
datamaestro_text.datasets.irds.data.Adhoc
The TREC Web Track 2004 ad-hoc ranking benchmark.
Queries include a combination of topic distillation, homepage finding, and named page finding.
GOV2
GOV2 web document collection. Used for the TREC Terabyte Track.
The dataset is obtained for a fee from UoG, and is shipped as a hard drive. More information is provided here.
-
Dataset irds.gov2.documents
datamaestro_text.datasets.irds.data.Documents
GOV2 web document collection. Used for the TREC Terabyte Track.
The dataset is obtained for a fee from UoG, and is shipped as a hard drive. More information is provided here.
-
Dataset irds.gov2.trec-mq-2007.queries
datamaestro_text.datasets.irds.data.Topics
TREC 2007 Million Query track.
-
Dataset irds.gov2.trec-mq-2007.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
TREC 2007 Million Query track.
-
Dataset irds.gov2.trec-mq-2007
datamaestro_text.datasets.irds.data.Adhoc
TREC 2007 Million Query track.
-
Dataset irds.gov2.trec-mq-2008.queries
datamaestro_text.datasets.irds.data.Topics
TREC 2008 Million Query track.
-
Dataset irds.gov2.trec-mq-2008.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
TREC 2008 Million Query track.
-
Dataset irds.gov2.trec-mq-2008
datamaestro_text.datasets.irds.data.Adhoc
TREC 2008 Million Query track.
-
Dataset irds.gov2.trec-tb-2004.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Terabyte Track 2004 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.gov2.trec-tb-2004.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Terabyte Track 2004 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.gov2.trec-tb-2004
datamaestro_text.datasets.irds.data.Adhoc
The TREC Terabyte Track 2004 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.gov2.trec-tb-2005.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Terabyte Track 2005 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.gov2.trec-tb-2005.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Terabyte Track 2005 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.gov2.trec-tb-2005
datamaestro_text.datasets.irds.data.Adhoc
The TREC Terabyte Track 2005 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.gov2.trec-tb-2005.efficiency.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Terabyte Track 2005 efficiency ranking benchmark. Contains 50,000 queries from a search engine, including the 50 topics from gov2/trec-tb-2005. Only the 50 topics have judgments.
-
Dataset irds.gov2.trec-tb-2005.efficiency.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Terabyte Track 2005 efficiency ranking benchmark. Contains 50,000 queries from a search engine, including the 50 topics from gov2/trec-tb-2005. Only the 50 topics have judgments.
-
Dataset irds.gov2.trec-tb-2005.efficiency
datamaestro_text.datasets.irds.data.Adhoc
The TREC Terabyte Track 2005 efficiency ranking benchmark. Contains 50,000 queries from a search engine, including the 50 topics from gov2/trec-tb-2005. Only the 50 topics have judgments.
-
Dataset irds.gov2.trec-tb-2005.named-page.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Terabyte Track 2005 named page ranking benchmark. Contains 252 queries with titles that resemble bookmark labels. Relevance judgments include near-duplicate pages and other pages that may satisfy the bookmark label.
-
Dataset irds.gov2.trec-tb-2005.named-page.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Terabyte Track 2005 named page ranking benchmark. Contains 252 queries with titles that resemble bookmark labels. Relevance judgments include near-duplicate pages and other pages that may satisfy the bookmark label.
-
Dataset irds.gov2.trec-tb-2005.named-page
datamaestro_text.datasets.irds.data.Adhoc
The TREC Terabyte Track 2005 named page ranking benchmark. Contains 252 queries with titles that resemble bookmark labels. Relevance judgments include near-duplicate pages and other pages that may satisfy the bookmark label.
-
Dataset irds.gov2.trec-tb-2006.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Terabyte Track 2006 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.gov2.trec-tb-2006.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Terabyte Track 2006 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.gov2.trec-tb-2006
datamaestro_text.datasets.irds.data.Adhoc
The TREC Terabyte Track 2006 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.
-
Dataset irds.gov2.trec-tb-2006.efficiency.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Terabyte Track 2006 efficiency ranking benchmark. Contains 100,000 queries from a search engine, including the 50 topics from gov2/trec-tb-2006. Only the 50 topics have judgments.
-
Dataset irds.gov2.trec-tb-2006.efficiency.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Terabyte Track 2006 efficiency ranking benchmark. Contains 100,000 queries from a search engine, including the 50 topics from gov2/trec-tb-2006. Only the 50 topics have judgments.
-
Dataset irds.gov2.trec-tb-2006.efficiency
datamaestro_text.datasets.irds.data.Adhoc
The TREC Terabyte Track 2006 efficiency ranking benchmark. Contains 100,000 queries from a search engine, including the 50 topics from gov2/trec-tb-2006. Only the 50 topics have judgments.
-
Dataset irds.gov2.trec-tb-2006.efficiency.10k.queries
datamaestro_text.datasets.irds.data.Topics
Small stream from gov2/trec-tb-2006/efficiency, with 10,000 queries.
-
Dataset irds.gov2.trec-tb-2006.efficiency.stream1.queries
datamaestro_text.datasets.irds.data.Topics
Stream 1 of gov2/trec-tb-2006/efficiency (25,000 queries).
-
Dataset irds.gov2.trec-tb-2006.efficiency.stream2.queries
datamaestro_text.datasets.irds.data.Topics
Stream 2 of gov2/trec-tb-2006/efficiency (25,000 queries).
-
Dataset irds.gov2.trec-tb-2006.efficiency.stream3.queries
datamaestro_text.datasets.irds.data.Topics
Stream 3 of gov2/trec-tb-2006/efficiency (25,000 queries).
-
Dataset irds.gov2.trec-tb-2006.efficiency.stream3.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Stream 3 of gov2/trec-tb-2006/efficiency (25,000 queries).
-
Dataset irds.gov2.trec-tb-2006.efficiency.stream3
datamaestro_text.datasets.irds.data.Adhoc
Stream 3 of gov2/trec-tb-2006/efficiency (25,000 queries).
-
Dataset irds.gov2.trec-tb-2006.efficiency.stream4.queries
datamaestro_text.datasets.irds.data.Topics
Stream 4 of gov2/trec-tb-2006/efficiency (25,000 queries).
-
Dataset irds.gov2.trec-tb-2006.named-page.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Terabyte Track 2006 named page ranking benchmark. Contains 181 queries with titles that resemble bookmark labels. Relevance judgments include near-duplicate pages and other pages that may satisfy the bookmark label.
-
Dataset irds.gov2.trec-tb-2006.named-page.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Terabyte Track 2006 named page ranking benchmark. Contains 181 queries with titles that resemble bookmark labels. Relevance judgments include near-duplicate pages and other pages that may satisfy the bookmark label.
-
Dataset irds.gov2.trec-tb-2006.named-page
datamaestro_text.datasets.irds.data.Adhoc
The TREC Terabyte Track 2006 named page ranking benchmark. Contains 181 queries with titles that resemble bookmark labels. Relevance judgments include near-duplicate pages and other pages that may satisfy the bookmark label.
Istella22
The Istella22 dataset facilitates comparisions between traditional and neural learning-to-rank by including query and document text along with LTR features (not included in ir_datasets).
Note that to use the dataset, you must read and accept the Istella22 License Agreement. By using the dataset, you agree to be bound by the terms of the license: the Istella dataset is solely for non-commercial use.
-
Dataset irds.istella22.documents
datamaestro_text.datasets.irds.data.Documents
The Istella22 dataset facilitates comparisions between traditional and neural learning-to-rank by including query and document text along with LTR features (not included in ir_datasets).
Note that to use the dataset, you must read and accept the Istella22 License Agreement. By using the dataset, you agree to be bound by the terms of the license: the Istella dataset is solely for non-commercial use.
-
Dataset irds.istella22.test.queries
datamaestro_text.datasets.irds.data.Topics
Official test query set.
-
Dataset irds.istella22.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official test query set.
-
Dataset irds.istella22.test
datamaestro_text.datasets.irds.data.Adhoc
Official test query set.
-
Dataset irds.istella22.test.fold1.queries
datamaestro_text.datasets.irds.data.Topics
Official test query set.
-
Dataset irds.istella22.test.fold1.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official test query set.
-
Dataset irds.istella22.test.fold1
datamaestro_text.datasets.irds.data.Adhoc
Official test query set.
-
Dataset irds.istella22.test.fold2.queries
datamaestro_text.datasets.irds.data.Topics
Official test query set.
-
Dataset irds.istella22.test.fold2.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official test query set.
-
Dataset irds.istella22.test.fold2
datamaestro_text.datasets.irds.data.Adhoc
Official test query set.
-
Dataset irds.istella22.test.fold3.queries
datamaestro_text.datasets.irds.data.Topics
Official test query set.
-
Dataset irds.istella22.test.fold3.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official test query set.
-
Dataset irds.istella22.test.fold3
datamaestro_text.datasets.irds.data.Adhoc
Official test query set.
-
Dataset irds.istella22.test.fold4.queries
datamaestro_text.datasets.irds.data.Topics
Official test query set.
-
Dataset irds.istella22.test.fold4.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official test query set.
-
Dataset irds.istella22.test.fold4
datamaestro_text.datasets.irds.data.Adhoc
Official test query set.
-
Dataset irds.istella22.test.fold5.queries
datamaestro_text.datasets.irds.data.Topics
Official test query set.
-
Dataset irds.istella22.test.fold5.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official test query set.
-
Dataset irds.istella22.test.fold5
datamaestro_text.datasets.irds.data.Adhoc
Official test query set.
KILT
KILT is a corpus used for various "knowledge intensive language tasks".
- Documents: Wikipedia articles
- Repository
- Paper
- Leaderboard
-
Dataset irds.kilt.documents
datamaestro_text.datasets.irds.data.Documents
KILT is a corpus used for various "knowledge intensive language tasks".
- Documents: Wikipedia articles
- Repository
- Paper
- Leaderboard
-
Dataset irds.kilt.codec.queries
datamaestro_text.datasets.irds.data.Topics
CODEC Entity Ranking sub-task.
- Task Repository
- See also: codec, the document ranking subtask
-
Dataset irds.kilt.codec.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
CODEC Entity Ranking sub-task.
- Task Repository
- See also: codec, the document ranking subtask
-
Dataset irds.kilt.codec
datamaestro_text.datasets.irds.data.Adhoc
CODEC Entity Ranking sub-task.
- Task Repository
- See also: codec, the document ranking subtask
-
Dataset irds.kilt.codec.economics.queries
datamaestro_text.datasets.irds.data.Topics
Subset of codec that only contains topics about economics.
-
Dataset irds.kilt.codec.economics.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Subset of codec that only contains topics about economics.
-
Dataset irds.kilt.codec.economics
datamaestro_text.datasets.irds.data.Adhoc
Subset of codec that only contains topics about economics.
-
Dataset irds.kilt.codec.history.queries
datamaestro_text.datasets.irds.data.Topics
Subset of codec that only contains topics about history.
-
Dataset irds.kilt.codec.history.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Subset of codec that only contains topics about history.
-
Dataset irds.kilt.codec.history
datamaestro_text.datasets.irds.data.Adhoc
Subset of codec that only contains topics about history.
-
Dataset irds.kilt.codec.politics.queries
datamaestro_text.datasets.irds.data.Topics
Subset of codec that only contains topics about politics.
-
Dataset irds.kilt.codec.politics.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Subset of codec that only contains topics about politics.
-
Dataset irds.kilt.codec.politics
datamaestro_text.datasets.irds.data.Adhoc
Subset of codec that only contains topics about politics.
lotte/lifestyle/dev
Answers from lifestyle-focused forums, including bicycles, coffee, crafts, diy, gardening, lifehacks, mechanics, music, outdoors, parenting, pets, sports, and travel.
-
Dataset irds.lotte.lifestyle.dev.documents
datamaestro_text.datasets.irds.data.Documents
Answers from lifestyle-focused forums, including bicycles, coffee, crafts, diy, gardening, lifehacks, mechanics, music, outdoors, parenting, pets, sports, and travel.
-
Dataset irds.lotte.lifestyle.dev.forum.queries
datamaestro_text.datasets.irds.data.Topics
Forum queries for lotte/lifestyle/dev.
-
Dataset irds.lotte.lifestyle.dev.forum.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Forum queries for lotte/lifestyle/dev.
-
Dataset irds.lotte.lifestyle.dev.forum
datamaestro_text.datasets.irds.data.Adhoc
Forum queries for lotte/lifestyle/dev.
-
Dataset irds.lotte.lifestyle.dev.search.queries
datamaestro_text.datasets.irds.data.Topics
Search queries for lotte/lifestyle/dev.
-
Dataset irds.lotte.lifestyle.dev.search.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Search queries for lotte/lifestyle/dev.
-
Dataset irds.lotte.lifestyle.dev.search
datamaestro_text.datasets.irds.data.Adhoc
Search queries for lotte/lifestyle/dev.
lotte/lifestyle/test
Queries and answers from lifestyle-focused forums, including bicycles, coffee, crafts, diy, gardening, lifehacks, mechanics, music, outdoors, parenting, pets, sports, and travel.
-
Dataset irds.lotte.lifestyle.test.documents
datamaestro_text.datasets.irds.data.Documents
Queries and answers from lifestyle-focused forums, including bicycles, coffee, crafts, diy, gardening, lifehacks, mechanics, music, outdoors, parenting, pets, sports, and travel.
-
Dataset irds.lotte.lifestyle.test.forum.queries
datamaestro_text.datasets.irds.data.Topics
Forum queries for lotte/lifestyle/test.
-
Dataset irds.lotte.lifestyle.test.forum.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Forum queries for lotte/lifestyle/test.
-
Dataset irds.lotte.lifestyle.test.forum
datamaestro_text.datasets.irds.data.Adhoc
Forum queries for lotte/lifestyle/test.
-
Dataset irds.lotte.lifestyle.test.search.queries
datamaestro_text.datasets.irds.data.Topics
Search queries for lotte/lifestyle/test.
-
Dataset irds.lotte.lifestyle.test.search.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Search queries for lotte/lifestyle/test.
-
Dataset irds.lotte.lifestyle.test.search
datamaestro_text.datasets.irds.data.Adhoc
Search queries for lotte/lifestyle/test.
lotte/pooled/dev
Combined version of lotte/lifestyle/dev, lotte/recreation/dev, lotte/science/dev, lotte/technology/dev, and lotte/writing/dev.
-
Dataset irds.lotte.pooled.dev.documents
datamaestro_text.datasets.irds.data.Documents
Combined version of lotte/lifestyle/dev, lotte/recreation/dev, lotte/science/dev, lotte/technology/dev, and lotte/writing/dev.
-
Dataset irds.lotte.pooled.dev.forum.queries
datamaestro_text.datasets.irds.data.Topics
Forum queries for lotte/pooled/dev.
-
Dataset irds.lotte.pooled.dev.forum.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Forum queries for lotte/pooled/dev.
-
Dataset irds.lotte.pooled.dev.forum
datamaestro_text.datasets.irds.data.Adhoc
Forum queries for lotte/pooled/dev.
-
Dataset irds.lotte.pooled.dev.search.queries
datamaestro_text.datasets.irds.data.Topics
Search queries for lotte/pooled/dev.
-
Dataset irds.lotte.pooled.dev.search.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Search queries for lotte/pooled/dev.
-
Dataset irds.lotte.pooled.dev.search
datamaestro_text.datasets.irds.data.Adhoc
Search queries for lotte/pooled/dev.
lotte/pooled/test
Combined version of lotte/lifestyle/test, lotte/recreation/test, lotte/science/test, lotte/technology/test, and lotte/writing/test.
-
Dataset irds.lotte.pooled.test.documents
datamaestro_text.datasets.irds.data.Documents
Combined version of lotte/lifestyle/test, lotte/recreation/test, lotte/science/test, lotte/technology/test, and lotte/writing/test.
-
Dataset irds.lotte.pooled.test.forum.queries
datamaestro_text.datasets.irds.data.Topics
Forum queries for lotte/pooled/test.
-
Dataset irds.lotte.pooled.test.forum.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Forum queries for lotte/pooled/test.
-
Dataset irds.lotte.pooled.test.forum
datamaestro_text.datasets.irds.data.Adhoc
Forum queries for lotte/pooled/test.
-
Dataset irds.lotte.pooled.test.search.queries
datamaestro_text.datasets.irds.data.Topics
Search queries for lotte/pooled/test.
-
Dataset irds.lotte.pooled.test.search.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Search queries for lotte/pooled/test.
-
Dataset irds.lotte.pooled.test.search
datamaestro_text.datasets.irds.data.Adhoc
Search queries for lotte/pooled/test.
lotte/recreation/dev
Answers from recreation-focused forums, including anime, boardgames, gaming, movies, photo, rpg, and scifi.
-
Dataset irds.lotte.recreation.dev.documents
datamaestro_text.datasets.irds.data.Documents
Answers from recreation-focused forums, including anime, boardgames, gaming, movies, photo, rpg, and scifi.
-
Dataset irds.lotte.recreation.dev.forum.queries
datamaestro_text.datasets.irds.data.Topics
Forum queries for lotte/recreation/dev.
-
Dataset irds.lotte.recreation.dev.forum.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Forum queries for lotte/recreation/dev.
-
Dataset irds.lotte.recreation.dev.forum
datamaestro_text.datasets.irds.data.Adhoc
Forum queries for lotte/recreation/dev.
-
Dataset irds.lotte.recreation.dev.search.queries
datamaestro_text.datasets.irds.data.Topics
Search queries for lotte/recreation/dev.
-
Dataset irds.lotte.recreation.dev.search.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Search queries for lotte/recreation/dev.
-
Dataset irds.lotte.recreation.dev.search
datamaestro_text.datasets.irds.data.Adhoc
Search queries for lotte/recreation/dev.
lotte/recreation/test
Answers from recreation-focused forums, including anime, boardgames, gaming, movies, photo, rpg, and scifi.
-
Dataset irds.lotte.recreation.test.documents
datamaestro_text.datasets.irds.data.Documents
Answers from recreation-focused forums, including anime, boardgames, gaming, movies, photo, rpg, and scifi.
-
Dataset irds.lotte.recreation.test.forum.queries
datamaestro_text.datasets.irds.data.Topics
Forum queries for lotte/recreation/test.
-
Dataset irds.lotte.recreation.test.forum.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Forum queries for lotte/recreation/test.
-
Dataset irds.lotte.recreation.test.forum
datamaestro_text.datasets.irds.data.Adhoc
Forum queries for lotte/recreation/test.
-
Dataset irds.lotte.recreation.test.search.queries
datamaestro_text.datasets.irds.data.Topics
Search queries for lotte/recreation/test.
-
Dataset irds.lotte.recreation.test.search.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Search queries for lotte/recreation/test.
-
Dataset irds.lotte.recreation.test.search
datamaestro_text.datasets.irds.data.Adhoc
Search queries for lotte/recreation/test.
lotte/science/dev
Answers from science-focused forums, including academia, astronomy, biology, chemistry, datasciene, earthscience, engineering, math, philosophy, physics, and stats.
-
Dataset irds.lotte.science.dev.documents
datamaestro_text.datasets.irds.data.Documents
Answers from science-focused forums, including academia, astronomy, biology, chemistry, datasciene, earthscience, engineering, math, philosophy, physics, and stats.
-
Dataset irds.lotte.science.dev.forum.queries
datamaestro_text.datasets.irds.data.Topics
Forum queries for lotte/science/dev.
-
Dataset irds.lotte.science.dev.forum.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Forum queries for lotte/science/dev.
-
Dataset irds.lotte.science.dev.forum
datamaestro_text.datasets.irds.data.Adhoc
Forum queries for lotte/science/dev.
-
Dataset irds.lotte.science.dev.search.queries
datamaestro_text.datasets.irds.data.Topics
Search queries for lotte/science/dev.
-
Dataset irds.lotte.science.dev.search.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Search queries for lotte/science/dev.
-
Dataset irds.lotte.science.dev.search
datamaestro_text.datasets.irds.data.Adhoc
Search queries for lotte/science/dev.
lotte/science/test
Answers from science-focused forums, including academia, astronomy, biology, chemistry, datasciene, earthscience, engineering, math, philosophy, physics, and stats.
-
Dataset irds.lotte.science.test.documents
datamaestro_text.datasets.irds.data.Documents
Answers from science-focused forums, including academia, astronomy, biology, chemistry, datasciene, earthscience, engineering, math, philosophy, physics, and stats.
-
Dataset irds.lotte.science.test.forum.queries
datamaestro_text.datasets.irds.data.Topics
Forum queries for lotte/science/test.
-
Dataset irds.lotte.science.test.forum.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Forum queries for lotte/science/test.
-
Dataset irds.lotte.science.test.forum
datamaestro_text.datasets.irds.data.Adhoc
Forum queries for lotte/science/test.
-
Dataset irds.lotte.science.test.search.queries
datamaestro_text.datasets.irds.data.Topics
Search queries for lotte/science/test.
-
Dataset irds.lotte.science.test.search.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Search queries for lotte/science/test.
-
Dataset irds.lotte.science.test.search
datamaestro_text.datasets.irds.data.Adhoc
Search queries for lotte/science/test.
lotte/technology/dev
Answers from technology-focused forums, including android, apple, askubuntu, electronics, networkengineering, security, serverfault, softwareengineering, superuser, unix, and webapps.
-
Dataset irds.lotte.technology.dev.documents
datamaestro_text.datasets.irds.data.Documents
Answers from technology-focused forums, including android, apple, askubuntu, electronics, networkengineering, security, serverfault, softwareengineering, superuser, unix, and webapps.
-
Dataset irds.lotte.technology.dev.forum.queries
datamaestro_text.datasets.irds.data.Topics
Forum queries for lotte/technology/dev.
-
Dataset irds.lotte.technology.dev.forum.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Forum queries for lotte/technology/dev.
-
Dataset irds.lotte.technology.dev.forum
datamaestro_text.datasets.irds.data.Adhoc
Forum queries for lotte/technology/dev.
-
Dataset irds.lotte.technology.dev.search.queries
datamaestro_text.datasets.irds.data.Topics
Search queries for lotte/technology/dev.
-
Dataset irds.lotte.technology.dev.search.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Search queries for lotte/technology/dev.
-
Dataset irds.lotte.technology.dev.search
datamaestro_text.datasets.irds.data.Adhoc
Search queries for lotte/technology/dev.
lotte/technology/test
Answers from technology-focused forums, including android, apple, askubuntu, electronics, networkengineering, security, serverfault, softwareengineering, superuser, unix, and webapps.
-
Dataset irds.lotte.technology.test.documents
datamaestro_text.datasets.irds.data.Documents
Answers from technology-focused forums, including android, apple, askubuntu, electronics, networkengineering, security, serverfault, softwareengineering, superuser, unix, and webapps.
-
Dataset irds.lotte.technology.test.forum.queries
datamaestro_text.datasets.irds.data.Topics
Forum queries for lotte/technology/test.
-
Dataset irds.lotte.technology.test.forum.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Forum queries for lotte/technology/test.
-
Dataset irds.lotte.technology.test.forum
datamaestro_text.datasets.irds.data.Adhoc
Forum queries for lotte/technology/test.
-
Dataset irds.lotte.technology.test.search.queries
datamaestro_text.datasets.irds.data.Topics
Search queries for lotte/technology/test.
-
Dataset irds.lotte.technology.test.search.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Search queries for lotte/technology/test.
-
Dataset irds.lotte.technology.test.search
datamaestro_text.datasets.irds.data.Adhoc
Search queries for lotte/technology/test.
lotte/writing/dev
Answers from writing-focused forums, including ell, english, linguistics, literature, worldbuilding, and writing.
-
Dataset irds.lotte.writing.dev.documents
datamaestro_text.datasets.irds.data.Documents
Answers from writing-focused forums, including ell, english, linguistics, literature, worldbuilding, and writing.
-
Dataset irds.lotte.writing.dev.forum.queries
datamaestro_text.datasets.irds.data.Topics
Forum queries for lotte/writing/dev.
-
Dataset irds.lotte.writing.dev.forum.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Forum queries for lotte/writing/dev.
-
Dataset irds.lotte.writing.dev.forum
datamaestro_text.datasets.irds.data.Adhoc
Forum queries for lotte/writing/dev.
-
Dataset irds.lotte.writing.dev.search.queries
datamaestro_text.datasets.irds.data.Topics
Search queries for lotte/writing/dev.
-
Dataset irds.lotte.writing.dev.search.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Search queries for lotte/writing/dev.
-
Dataset irds.lotte.writing.dev.search
datamaestro_text.datasets.irds.data.Adhoc
Search queries for lotte/writing/dev.
lotte/writing/test
Answers from writing-focused forums, including ell, english, linguistics, literature, worldbuilding, and writing.
-
Dataset irds.lotte.writing.test.documents
datamaestro_text.datasets.irds.data.Documents
Answers from writing-focused forums, including ell, english, linguistics, literature, worldbuilding, and writing.
-
Dataset irds.lotte.writing.test.forum.queries
datamaestro_text.datasets.irds.data.Topics
Forum queries for lotte/writing/test.
-
Dataset irds.lotte.writing.test.forum.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Forum queries for lotte/writing/test.
-
Dataset irds.lotte.writing.test.forum
datamaestro_text.datasets.irds.data.Adhoc
Forum queries for lotte/writing/test.
-
Dataset irds.lotte.writing.test.search.queries
datamaestro_text.datasets.irds.data.Topics
Search queries for lotte/writing/test.
-
Dataset irds.lotte.writing.test.search.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Search queries for lotte/writing/test.
-
Dataset irds.lotte.writing.test.search
datamaestro_text.datasets.irds.data.Adhoc
Search queries for lotte/writing/test.
miracl/ar
The Arabic corpus.
-
Dataset irds.miracl.ar.documents
datamaestro_text.datasets.irds.data.Documents
The Arabic corpus.
-
Dataset irds.miracl.ar.dev.queries
datamaestro_text.datasets.irds.data.Topics
The dev set for Arabic.
-
Dataset irds.miracl.ar.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The dev set for Arabic.
-
Dataset irds.miracl.ar.dev
datamaestro_text.datasets.irds.data.Adhoc
The dev set for Arabic.
-
Dataset irds.miracl.ar.test-a.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version a) for Arabic.
-
Dataset irds.miracl.ar.test-b.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version b) for Arabic.
-
Dataset irds.miracl.ar.train.queries
datamaestro_text.datasets.irds.data.Topics
The train set for Arabic.
-
Dataset irds.miracl.ar.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The train set for Arabic.
-
Dataset irds.miracl.ar.train
datamaestro_text.datasets.irds.data.Adhoc
The train set for Arabic.
miracl/bn
The Bengali corpus.
-
Dataset irds.miracl.bn.documents
datamaestro_text.datasets.irds.data.Documents
The Bengali corpus.
-
Dataset irds.miracl.bn.dev.queries
datamaestro_text.datasets.irds.data.Topics
The dev set for Bengali.
-
Dataset irds.miracl.bn.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The dev set for Bengali.
-
Dataset irds.miracl.bn.dev
datamaestro_text.datasets.irds.data.Adhoc
The dev set for Bengali.
-
Dataset irds.miracl.bn.test-a.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version a) for Bengali.
-
Dataset irds.miracl.bn.test-b.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version b) for Bengali.
-
Dataset irds.miracl.bn.train.queries
datamaestro_text.datasets.irds.data.Topics
The train set for Bengali.
-
Dataset irds.miracl.bn.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The train set for Bengali.
-
Dataset irds.miracl.bn.train
datamaestro_text.datasets.irds.data.Adhoc
The train set for Bengali.
miracl/de
The German corpus.
-
Dataset irds.miracl.de.documents
datamaestro_text.datasets.irds.data.Documents
The German corpus.
-
Dataset irds.miracl.de.dev.queries
datamaestro_text.datasets.irds.data.Topics
The dev set for German.
-
Dataset irds.miracl.de.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The dev set for German.
-
Dataset irds.miracl.de.dev
datamaestro_text.datasets.irds.data.Adhoc
The dev set for German.
-
Dataset irds.miracl.de.test-b.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version b) for German.
miracl/en
The English corpus.
-
Dataset irds.miracl.en.documents
datamaestro_text.datasets.irds.data.Documents
The English corpus.
-
Dataset irds.miracl.en.dev.queries
datamaestro_text.datasets.irds.data.Topics
The dev set for English.
-
Dataset irds.miracl.en.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The dev set for English.
-
Dataset irds.miracl.en.dev
datamaestro_text.datasets.irds.data.Adhoc
The dev set for English.
-
Dataset irds.miracl.en.test-a.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version a) for English.
-
Dataset irds.miracl.en.test-b.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version b) for English.
-
Dataset irds.miracl.en.train.queries
datamaestro_text.datasets.irds.data.Topics
The train set for English.
-
Dataset irds.miracl.en.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The train set for English.
-
Dataset irds.miracl.en.train
datamaestro_text.datasets.irds.data.Adhoc
The train set for English.
miracl/es
The Spanish corpus.
-
Dataset irds.miracl.es.documents
datamaestro_text.datasets.irds.data.Documents
The Spanish corpus.
-
Dataset irds.miracl.es.dev.queries
datamaestro_text.datasets.irds.data.Topics
The dev set for Spanish.
-
Dataset irds.miracl.es.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The dev set for Spanish.
-
Dataset irds.miracl.es.dev
datamaestro_text.datasets.irds.data.Adhoc
The dev set for Spanish.
-
Dataset irds.miracl.es.test-b.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version b) for Spanish.
-
Dataset irds.miracl.es.train.queries
datamaestro_text.datasets.irds.data.Topics
The train set for Spanish.
-
Dataset irds.miracl.es.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The train set for Spanish.
-
Dataset irds.miracl.es.train
datamaestro_text.datasets.irds.data.Adhoc
The train set for Spanish.
miracl/fa
The Persian corpus.
-
Dataset irds.miracl.fa.documents
datamaestro_text.datasets.irds.data.Documents
The Persian corpus.
-
Dataset irds.miracl.fa.dev.queries
datamaestro_text.datasets.irds.data.Topics
The dev set for Persian.
-
Dataset irds.miracl.fa.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The dev set for Persian.
-
Dataset irds.miracl.fa.dev
datamaestro_text.datasets.irds.data.Adhoc
The dev set for Persian.
-
Dataset irds.miracl.fa.test-b.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version b) for Persian.
-
Dataset irds.miracl.fa.train.queries
datamaestro_text.datasets.irds.data.Topics
The train set for Persian.
-
Dataset irds.miracl.fa.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The train set for Persian.
-
Dataset irds.miracl.fa.train
datamaestro_text.datasets.irds.data.Adhoc
The train set for Persian.
miracl/fi
The Finnish corpus.
-
Dataset irds.miracl.fi.documents
datamaestro_text.datasets.irds.data.Documents
The Finnish corpus.
-
Dataset irds.miracl.fi.dev.queries
datamaestro_text.datasets.irds.data.Topics
The dev set for Finnish.
-
Dataset irds.miracl.fi.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The dev set for Finnish.
-
Dataset irds.miracl.fi.dev
datamaestro_text.datasets.irds.data.Adhoc
The dev set for Finnish.
-
Dataset irds.miracl.fi.test-a.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version a) for Finnish.
-
Dataset irds.miracl.fi.test-b.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version b) for Finnish.
-
Dataset irds.miracl.fi.train.queries
datamaestro_text.datasets.irds.data.Topics
The train set for Finnish.
-
Dataset irds.miracl.fi.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The train set for Finnish.
-
Dataset irds.miracl.fi.train
datamaestro_text.datasets.irds.data.Adhoc
The train set for Finnish.
miracl/fr
The French corpus.
-
Dataset irds.miracl.fr.documents
datamaestro_text.datasets.irds.data.Documents
The French corpus.
-
Dataset irds.miracl.fr.dev.queries
datamaestro_text.datasets.irds.data.Topics
The dev set for French.
-
Dataset irds.miracl.fr.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The dev set for French.
-
Dataset irds.miracl.fr.dev
datamaestro_text.datasets.irds.data.Adhoc
The dev set for French.
-
Dataset irds.miracl.fr.test-b.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version b) for French.
-
Dataset irds.miracl.fr.train.queries
datamaestro_text.datasets.irds.data.Topics
The train set for French.
-
Dataset irds.miracl.fr.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The train set for French.
-
Dataset irds.miracl.fr.train
datamaestro_text.datasets.irds.data.Adhoc
The train set for French.
miracl/hi
The Hindi corpus.
-
Dataset irds.miracl.hi.documents
datamaestro_text.datasets.irds.data.Documents
The Hindi corpus.
-
Dataset irds.miracl.hi.dev.queries
datamaestro_text.datasets.irds.data.Topics
The dev set for Hindi.
-
Dataset irds.miracl.hi.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The dev set for Hindi.
-
Dataset irds.miracl.hi.dev
datamaestro_text.datasets.irds.data.Adhoc
The dev set for Hindi.
-
Dataset irds.miracl.hi.test-b.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version b) for Hindi.
-
Dataset irds.miracl.hi.train.queries
datamaestro_text.datasets.irds.data.Topics
The train set for Hindi.
-
Dataset irds.miracl.hi.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The train set for Hindi.
-
Dataset irds.miracl.hi.train
datamaestro_text.datasets.irds.data.Adhoc
The train set for Hindi.
miracl/id
The Indonesian corpus.
-
Dataset irds.miracl.id.documents
datamaestro_text.datasets.irds.data.Documents
The Indonesian corpus.
-
Dataset irds.miracl.id.dev.queries
datamaestro_text.datasets.irds.data.Topics
The dev set for Indonesian.
-
Dataset irds.miracl.id.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The dev set for Indonesian.
-
Dataset irds.miracl.id.dev
datamaestro_text.datasets.irds.data.Adhoc
The dev set for Indonesian.
-
Dataset irds.miracl.id.test-a.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version a) for Indonesian.
-
Dataset irds.miracl.id.test-b.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version b) for Indonesian.
-
Dataset irds.miracl.id.train.queries
datamaestro_text.datasets.irds.data.Topics
The train set for Indonesian.
-
Dataset irds.miracl.id.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The train set for Indonesian.
-
Dataset irds.miracl.id.train
datamaestro_text.datasets.irds.data.Adhoc
The train set for Indonesian.
miracl/ja
The Japanese corpus.
-
Dataset irds.miracl.ja.documents
datamaestro_text.datasets.irds.data.Documents
The Japanese corpus.
-
Dataset irds.miracl.ja.dev.queries
datamaestro_text.datasets.irds.data.Topics
The dev set for Japanese.
-
Dataset irds.miracl.ja.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The dev set for Japanese.
-
Dataset irds.miracl.ja.dev
datamaestro_text.datasets.irds.data.Adhoc
The dev set for Japanese.
-
Dataset irds.miracl.ja.test-a.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version a) for Japanese.
-
Dataset irds.miracl.ja.test-b.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version b) for Japanese.
-
Dataset irds.miracl.ja.train.queries
datamaestro_text.datasets.irds.data.Topics
The train set for Japanese.
-
Dataset irds.miracl.ja.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The train set for Japanese.
-
Dataset irds.miracl.ja.train
datamaestro_text.datasets.irds.data.Adhoc
The train set for Japanese.
miracl/ko
The Korean corpus.
-
Dataset irds.miracl.ko.documents
datamaestro_text.datasets.irds.data.Documents
The Korean corpus.
-
Dataset irds.miracl.ko.dev.queries
datamaestro_text.datasets.irds.data.Topics
The dev set for Korean.
-
Dataset irds.miracl.ko.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The dev set for Korean.
-
Dataset irds.miracl.ko.dev
datamaestro_text.datasets.irds.data.Adhoc
The dev set for Korean.
-
Dataset irds.miracl.ko.test-a.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version a) for Korean.
-
Dataset irds.miracl.ko.test-b.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version b) for Korean.
-
Dataset irds.miracl.ko.train.queries
datamaestro_text.datasets.irds.data.Topics
The train set for Korean.
-
Dataset irds.miracl.ko.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The train set for Korean.
-
Dataset irds.miracl.ko.train
datamaestro_text.datasets.irds.data.Adhoc
The train set for Korean.
miracl/ru
The Russian corpus.
-
Dataset irds.miracl.ru.documents
datamaestro_text.datasets.irds.data.Documents
The Russian corpus.
-
Dataset irds.miracl.ru.dev.queries
datamaestro_text.datasets.irds.data.Topics
The dev set for Russian.
-
Dataset irds.miracl.ru.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The dev set for Russian.
-
Dataset irds.miracl.ru.dev
datamaestro_text.datasets.irds.data.Adhoc
The dev set for Russian.
-
Dataset irds.miracl.ru.test-a.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version a) for Russian.
-
Dataset irds.miracl.ru.test-b.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version b) for Russian.
-
Dataset irds.miracl.ru.train.queries
datamaestro_text.datasets.irds.data.Topics
The train set for Russian.
-
Dataset irds.miracl.ru.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The train set for Russian.
-
Dataset irds.miracl.ru.train
datamaestro_text.datasets.irds.data.Adhoc
The train set for Russian.
miracl/sw
The Swahili corpus.
-
Dataset irds.miracl.sw.documents
datamaestro_text.datasets.irds.data.Documents
The Swahili corpus.
-
Dataset irds.miracl.sw.dev.queries
datamaestro_text.datasets.irds.data.Topics
The dev set for Swahili.
-
Dataset irds.miracl.sw.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The dev set for Swahili.
-
Dataset irds.miracl.sw.dev
datamaestro_text.datasets.irds.data.Adhoc
The dev set for Swahili.
-
Dataset irds.miracl.sw.test-a.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version a) for Swahili.
-
Dataset irds.miracl.sw.test-b.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version b) for Swahili.
-
Dataset irds.miracl.sw.train.queries
datamaestro_text.datasets.irds.data.Topics
The train set for Swahili.
-
Dataset irds.miracl.sw.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The train set for Swahili.
-
Dataset irds.miracl.sw.train
datamaestro_text.datasets.irds.data.Adhoc
The train set for Swahili.
miracl/te
The Telugu corpus.
-
Dataset irds.miracl.te.documents
datamaestro_text.datasets.irds.data.Documents
The Telugu corpus.
-
Dataset irds.miracl.te.dev.queries
datamaestro_text.datasets.irds.data.Topics
The dev set for Telugu.
-
Dataset irds.miracl.te.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The dev set for Telugu.
-
Dataset irds.miracl.te.dev
datamaestro_text.datasets.irds.data.Adhoc
The dev set for Telugu.
-
Dataset irds.miracl.te.test-a.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version a) for Telugu.
-
Dataset irds.miracl.te.test-b.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version b) for Telugu.
-
Dataset irds.miracl.te.train.queries
datamaestro_text.datasets.irds.data.Topics
The train set for Telugu.
-
Dataset irds.miracl.te.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The train set for Telugu.
-
Dataset irds.miracl.te.train
datamaestro_text.datasets.irds.data.Adhoc
The train set for Telugu.
miracl/th
The Thai corpus.
-
Dataset irds.miracl.th.documents
datamaestro_text.datasets.irds.data.Documents
The Thai corpus.
-
Dataset irds.miracl.th.dev.queries
datamaestro_text.datasets.irds.data.Topics
The dev set for Thai.
-
Dataset irds.miracl.th.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The dev set for Thai.
-
Dataset irds.miracl.th.dev
datamaestro_text.datasets.irds.data.Adhoc
The dev set for Thai.
-
Dataset irds.miracl.th.test-a.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version a) for Thai.
-
Dataset irds.miracl.th.test-b.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version b) for Thai.
-
Dataset irds.miracl.th.train.queries
datamaestro_text.datasets.irds.data.Topics
The train set for Thai.
-
Dataset irds.miracl.th.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The train set for Thai.
-
Dataset irds.miracl.th.train
datamaestro_text.datasets.irds.data.Adhoc
The train set for Thai.
miracl/yo
The Yoruba corpus.
-
Dataset irds.miracl.yo.documents
datamaestro_text.datasets.irds.data.Documents
The Yoruba corpus.
-
Dataset irds.miracl.yo.dev.queries
datamaestro_text.datasets.irds.data.Topics
The dev set for Yoruba.
-
Dataset irds.miracl.yo.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The dev set for Yoruba.
-
Dataset irds.miracl.yo.dev
datamaestro_text.datasets.irds.data.Adhoc
The dev set for Yoruba.
-
Dataset irds.miracl.yo.test-b.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version b) for Yoruba.
miracl/zh
The Chinese corpus.
-
Dataset irds.miracl.zh.documents
datamaestro_text.datasets.irds.data.Documents
The Chinese corpus.
-
Dataset irds.miracl.zh.dev.queries
datamaestro_text.datasets.irds.data.Topics
The dev set for Chinese.
-
Dataset irds.miracl.zh.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The dev set for Chinese.
-
Dataset irds.miracl.zh.dev
datamaestro_text.datasets.irds.data.Adhoc
The dev set for Chinese.
-
Dataset irds.miracl.zh.test-b.queries
datamaestro_text.datasets.irds.data.Topics
The held-out test set (version b) for Chinese.
-
Dataset irds.miracl.zh.train.queries
datamaestro_text.datasets.irds.data.Topics
The train set for Chinese.
-
Dataset irds.miracl.zh.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The train set for Chinese.
-
Dataset irds.miracl.zh.train
datamaestro_text.datasets.irds.data.Adhoc
The train set for Chinese.
MSMARCO (passage)
A passage ranking benchmark with a collection of 8.8 million passages and question queries. Most relevance judgments are shallow (typically at most 1-2 per query), but the TREC Deep Learning track adds deep judgments. Evaluation typically conducted using MRR@10.
Note that the original document source files for this collection contain a double-encoding error that cause strange sequences like "å¬" and "ðºð". These are automatically corrrected (properly converting previous examples to "公" and "🇺🇸").
- See also: msmarco-document
- Documents: Short passages (from web)
- Queries: Natural language questions (from query log)
- Leaderboard
- Dataset Paper
-
Dataset irds.msmarco-passage.documents
datamaestro_text.datasets.irds.data.Documents
A passage ranking benchmark with a collection of 8.8 million passages and question queries. Most relevance judgments are shallow (typically at most 1-2 per query), but the TREC Deep Learning track adds deep judgments. Evaluation typically conducted using MRR@10.
Note that the original document source files for this collection contain a double-encoding error that cause strange sequences like "å¬" and "ðºð". These are automatically corrrected (properly converting previous examples to "公" and "🇺🇸").
- See also: msmarco-document
- Documents: Short passages (from web)
- Queries: Natural language questions (from query log)
- Leaderboard
- Dataset Paper
-
Dataset irds.msmarco-passage.dev.queries
datamaestro_text.datasets.irds.data.Topics
Official dev set.
scoreddocs are the top 1000 results from BM25. These are used for the "re-ranking" setting. Note that these are sub-sampled to about 1/8 of the total available dev queries by the MSMARCO authors for faster evaluation. The BM25 scores from scoreddocs are not available (all have a score of 0).
-
Dataset irds.msmarco-passage.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official dev set.
scoreddocs are the top 1000 results from BM25. These are used for the "re-ranking" setting. Note that these are sub-sampled to about 1/8 of the total available dev queries by the MSMARCO authors for faster evaluation. The BM25 scores from scoreddocs are not available (all have a score of 0).
-
Dataset irds.msmarco-passage.dev
datamaestro_text.datasets.irds.data.Adhoc
Official dev set.
scoreddocs are the top 1000 results from BM25. These are used for the "re-ranking" setting. Note that these are sub-sampled to about 1/8 of the total available dev queries by the MSMARCO authors for faster evaluation. The BM25 scores from scoreddocs are not available (all have a score of 0).
-
Dataset irds.msmarco-passage.dev.2.queries
datamaestro_text.datasets.irds.data.Topics
"Dev2" split of the msmarco-passage/dev set. Originally released as part of the v2 corpus.
-
Dataset irds.msmarco-passage.dev.2.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
"Dev2" split of the msmarco-passage/dev set. Originally released as part of the v2 corpus.
-
Dataset irds.msmarco-passage.dev.2
datamaestro_text.datasets.irds.data.Adhoc
"Dev2" split of the msmarco-passage/dev set. Originally released as part of the v2 corpus.
-
Dataset irds.msmarco-passage.dev.judged.queries
datamaestro_text.datasets.irds.data.Topics
Subset of msmarco-passage/dev that only includes queries that have at least one qrel.
-
Dataset irds.msmarco-passage.dev.judged.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Subset of msmarco-passage/dev that only includes queries that have at least one qrel.
-
Dataset irds.msmarco-passage.dev.judged
datamaestro_text.datasets.irds.data.Adhoc
Subset of msmarco-passage/dev that only includes queries that have at least one qrel.
-
Dataset irds.msmarco-passage.dev.small.queries
datamaestro_text.datasets.irds.data.Topics
Official "small" version of the dev set, consisting of 6,980 queries (6.9% of the full dev set).
-
Dataset irds.msmarco-passage.dev.small.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Official "small" version of the dev set, consisting of 6,980 queries (6.9% of the full dev set).
-
Dataset irds.msmarco-passage.dev.small.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official "small" version of the dev set, consisting of 6,980 queries (6.9% of the full dev set).
-
Dataset irds.msmarco-passage.dev.small
datamaestro_text.datasets.irds.data.Adhoc
Official "small" version of the dev set, consisting of 6,980 queries (6.9% of the full dev set).
-
Dataset irds.msmarco-passage.eval.queries
datamaestro_text.datasets.irds.data.Topics
Official eval set for submission to MS MARCO leaderboard. Relevance judgments are hidden.
scoreddocs are the top 1000 results from BM25. These are used for the "re-ranking" setting. Note that these are sub-sampled to about 1/8 of the total available eval queries by the MSMARCO authors for faster evaluation. The BM25 scores from scoreddocs are not available (all have a score of 0).
-
Dataset irds.msmarco-passage.eval.small.queries
datamaestro_text.datasets.irds.data.Topics
Official "small" version of the eval set, consisting of 6,837 queries (6.8% of the full eval set).
-
Dataset irds.msmarco-passage.eval.small.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Official "small" version of the eval set, consisting of 6,837 queries (6.8% of the full eval set).
-
Dataset irds.msmarco-passage.train.queries
datamaestro_text.datasets.irds.data.Topics
Official train set.
Not all queries have relevance judgments. Use msmarco-passage/train/judged for a filtered list that only includes documents that have at least one qrel.
scoreddocs are the top 1000 results from BM25. These are used for the "re-ranking" setting. Note that these are sub-sampled to about 1/8 of the total available train queries by the MSMARCO authors for faster evaluation. The BM25 scores from scoreddocs are not available (all have a score of 0).
docpairs provides access to the "official" sequence for pairwise training.
-
Dataset irds.msmarco-passage.train.docpairs
-
Official train set.
Not all queries have relevance judgments. Use msmarco-passage/train/judged for a filtered list that only includes documents that have at least one qrel.
scoreddocs are the top 1000 results from BM25. These are used for the "re-ranking" setting. Note that these are sub-sampled to about 1/8 of the total available train queries by the MSMARCO authors for faster evaluation. The BM25 scores from scoreddocs are not available (all have a score of 0).
docpairs provides access to the "official" sequence for pairwise training.
-
Dataset irds.msmarco-passage.train.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Official train set.
Not all queries have relevance judgments. Use msmarco-passage/train/judged for a filtered list that only includes documents that have at least one qrel.
scoreddocs are the top 1000 results from BM25. These are used for the "re-ranking" setting. Note that these are sub-sampled to about 1/8 of the total available train queries by the MSMARCO authors for faster evaluation. The BM25 scores from scoreddocs are not available (all have a score of 0).
docpairs provides access to the "official" sequence for pairwise training.
-
Dataset irds.msmarco-passage.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official train set.
Not all queries have relevance judgments. Use msmarco-passage/train/judged for a filtered list that only includes documents that have at least one qrel.
scoreddocs are the top 1000 results from BM25. These are used for the "re-ranking" setting. Note that these are sub-sampled to about 1/8 of the total available train queries by the MSMARCO authors for faster evaluation. The BM25 scores from scoreddocs are not available (all have a score of 0).
docpairs provides access to the "official" sequence for pairwise training.
-
Dataset irds.msmarco-passage.train
datamaestro_text.datasets.irds.data.Adhoc
Official train set.
Not all queries have relevance judgments. Use msmarco-passage/train/judged for a filtered list that only includes documents that have at least one qrel.
scoreddocs are the top 1000 results from BM25. These are used for the "re-ranking" setting. Note that these are sub-sampled to about 1/8 of the total available train queries by the MSMARCO authors for faster evaluation. The BM25 scores from scoreddocs are not available (all have a score of 0).
docpairs provides access to the "official" sequence for pairwise training.
-
Dataset irds.msmarco-passage.train.judged.queries
datamaestro_text.datasets.irds.data.Topics
Subset of msmarco-passage/train that only includes queries that have at least one qrel.
-
Dataset irds.msmarco-passage.train.judged.docpairs
-
Subset of msmarco-passage/train that only includes queries that have at least one qrel.
-
Dataset irds.msmarco-passage.train.judged.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Subset of msmarco-passage/train that only includes queries that have at least one qrel.
-
Dataset irds.msmarco-passage.train.judged.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Subset of msmarco-passage/train that only includes queries that have at least one qrel.
-
Dataset irds.msmarco-passage.train.judged
datamaestro_text.datasets.irds.data.Adhoc
Subset of msmarco-passage/train that only includes queries that have at least one qrel.
-
Dataset irds.msmarco-passage.train.medical.queries
datamaestro_text.datasets.irds.data.Topics
Subset of msmarco-passage/train that only includes queries that have a layman or expert medical term. Note that this includes about 20% false matches due to terms with multiple senses.
-
Dataset irds.msmarco-passage.train.medical.docpairs
-
Subset of msmarco-passage/train that only includes queries that have a layman or expert medical term. Note that this includes about 20% false matches due to terms with multiple senses.
-
Dataset irds.msmarco-passage.train.medical.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Subset of msmarco-passage/train that only includes queries that have a layman or expert medical term. Note that this includes about 20% false matches due to terms with multiple senses.
-
Dataset irds.msmarco-passage.train.medical.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Subset of msmarco-passage/train that only includes queries that have a layman or expert medical term. Note that this includes about 20% false matches due to terms with multiple senses.
-
Dataset irds.msmarco-passage.train.medical
datamaestro_text.datasets.irds.data.Adhoc
Subset of msmarco-passage/train that only includes queries that have a layman or expert medical term. Note that this includes about 20% false matches due to terms with multiple senses.
-
Dataset irds.msmarco-passage.train.split200-train.queries
datamaestro_text.datasets.irds.data.Topics
Subset of msmarco-passage/train without 200 queries that are meant to be used as a small validation set. From various works.
-
Dataset irds.msmarco-passage.train.split200-train.docpairs
-
Subset of msmarco-passage/train without 200 queries that are meant to be used as a small validation set. From various works.
-
Dataset irds.msmarco-passage.train.split200-train.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Subset of msmarco-passage/train without 200 queries that are meant to be used as a small validation set. From various works.
-
Dataset irds.msmarco-passage.train.split200-train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Subset of msmarco-passage/train without 200 queries that are meant to be used as a small validation set. From various works.
-
Dataset irds.msmarco-passage.train.split200-train
datamaestro_text.datasets.irds.data.Adhoc
Subset of msmarco-passage/train without 200 queries that are meant to be used as a small validation set. From various works.
-
Dataset irds.msmarco-passage.train.split200-valid.queries
datamaestro_text.datasets.irds.data.Topics
Subset of msmarco-passage/train with only 200 queries that are meant to be used as a small validation set. From various works.
-
Dataset irds.msmarco-passage.train.split200-valid.docpairs
-
Subset of msmarco-passage/train with only 200 queries that are meant to be used as a small validation set. From various works.
-
Dataset irds.msmarco-passage.train.split200-valid.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Subset of msmarco-passage/train with only 200 queries that are meant to be used as a small validation set. From various works.
-
Dataset irds.msmarco-passage.train.split200-valid.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Subset of msmarco-passage/train with only 200 queries that are meant to be used as a small validation set. From various works.
-
Dataset irds.msmarco-passage.train.split200-valid
datamaestro_text.datasets.irds.data.Adhoc
Subset of msmarco-passage/train with only 200 queries that are meant to be used as a small validation set. From various works.
-
Dataset irds.msmarco-passage.train.triples-small.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/train, but with the "small" triples file (a 10% sample of the full file).
Note that to save on storage space (27GB), the contents of the file are mapped to their corresponding query and document IDs. This process takes a few minutes to run the first time the triples are requested.
-
Dataset irds.msmarco-passage.train.triples-small.docpairs
-
Version of msmarco-passage/train, but with the "small" triples file (a 10% sample of the full file).
Note that to save on storage space (27GB), the contents of the file are mapped to their corresponding query and document IDs. This process takes a few minutes to run the first time the triples are requested.
-
Dataset irds.msmarco-passage.train.triples-small.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Version of msmarco-passage/train, but with the "small" triples file (a 10% sample of the full file).
Note that to save on storage space (27GB), the contents of the file are mapped to their corresponding query and document IDs. This process takes a few minutes to run the first time the triples are requested.
-
Dataset irds.msmarco-passage.train.triples-small.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/train, but with the "small" triples file (a 10% sample of the full file).
Note that to save on storage space (27GB), the contents of the file are mapped to their corresponding query and document IDs. This process takes a few minutes to run the first time the triples are requested.
-
Dataset irds.msmarco-passage.train.triples-small
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/train, but with the "small" triples file (a 10% sample of the full file).
Note that to save on storage space (27GB), the contents of the file are mapped to their corresponding query and document IDs. This process takes a few minutes to run the first time the triples are requested.
-
Dataset irds.msmarco-passage.train.triples-v2.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/train, but with version 2 of the triples file.
This version of the triples file includes rows that were accidently missing from version 1 of the file (see discussion here).
Note that this is sorted by the IDs in the file, so you probably would not want to use it unless you first shuffle it before usage. We opened an issue suggesting that a third version of the file is provided that is shuffled so that the order is consistent across groups using the data, but at this time, no such file exists in an official capacity.
-
Dataset irds.msmarco-passage.train.triples-v2.docpairs
-
Version of msmarco-passage/train, but with version 2 of the triples file.
This version of the triples file includes rows that were accidently missing from version 1 of the file (see discussion here).
Note that this is sorted by the IDs in the file, so you probably would not want to use it unless you first shuffle it before usage. We opened an issue suggesting that a third version of the file is provided that is shuffled so that the order is consistent across groups using the data, but at this time, no such file exists in an official capacity.
-
Dataset irds.msmarco-passage.train.triples-v2.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Version of msmarco-passage/train, but with version 2 of the triples file.
This version of the triples file includes rows that were accidently missing from version 1 of the file (see discussion here).
Note that this is sorted by the IDs in the file, so you probably would not want to use it unless you first shuffle it before usage. We opened an issue suggesting that a third version of the file is provided that is shuffled so that the order is consistent across groups using the data, but at this time, no such file exists in an official capacity.
-
Dataset irds.msmarco-passage.train.triples-v2.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/train, but with version 2 of the triples file.
This version of the triples file includes rows that were accidently missing from version 1 of the file (see discussion here).
Note that this is sorted by the IDs in the file, so you probably would not want to use it unless you first shuffle it before usage. We opened an issue suggesting that a third version of the file is provided that is shuffled so that the order is consistent across groups using the data, but at this time, no such file exists in an official capacity.
-
Dataset irds.msmarco-passage.train.triples-v2
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/train, but with version 2 of the triples file.
This version of the triples file includes rows that were accidently missing from version 1 of the file (see discussion here).
Note that this is sorted by the IDs in the file, so you probably would not want to use it unless you first shuffle it before usage. We opened an issue suggesting that a third version of the file is provided that is shuffled so that the order is consistent across groups using the data, but at this time, no such file exists in an official capacity.
-
Dataset irds.msmarco-passage.trec-dl-2019.queries
datamaestro_text.datasets.irds.data.Topics
Queries from the TREC Deep Learning (DL) 2019 shared task, which were sampled from msmarco-passage/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-passage/trec-dl-2019/judged).
-
Dataset irds.msmarco-passage.trec-dl-2019.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Queries from the TREC Deep Learning (DL) 2019 shared task, which were sampled from msmarco-passage/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-passage/trec-dl-2019/judged).
-
Dataset irds.msmarco-passage.trec-dl-2019.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Queries from the TREC Deep Learning (DL) 2019 shared task, which were sampled from msmarco-passage/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-passage/trec-dl-2019/judged).
-
Dataset irds.msmarco-passage.trec-dl-2019
datamaestro_text.datasets.irds.data.Adhoc
Queries from the TREC Deep Learning (DL) 2019 shared task, which were sampled from msmarco-passage/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-passage/trec-dl-2019/judged).
-
Dataset irds.msmarco-passage.trec-dl-2019.judged.queries
datamaestro_text.datasets.irds.data.Topics
Subset of msmarco-passage/trec-dl-2019, only including queries with qrels.
-
Dataset irds.msmarco-passage.trec-dl-2019.judged.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Subset of msmarco-passage/trec-dl-2019, only including queries with qrels.
-
Dataset irds.msmarco-passage.trec-dl-2019.judged.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Subset of msmarco-passage/trec-dl-2019, only including queries with qrels.
-
Dataset irds.msmarco-passage.trec-dl-2019.judged
datamaestro_text.datasets.irds.data.Adhoc
Subset of msmarco-passage/trec-dl-2019, only including queries with qrels.
-
Dataset irds.msmarco-passage.trec-dl-2020.queries
datamaestro_text.datasets.irds.data.Topics
Queries from the TREC Deep Learning (DL) 2020 shared task, which were sampled from msmarco-passage/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-passage/trec-dl-2020/judged).
-
Dataset irds.msmarco-passage.trec-dl-2020.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Queries from the TREC Deep Learning (DL) 2020 shared task, which were sampled from msmarco-passage/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-passage/trec-dl-2020/judged).
-
Dataset irds.msmarco-passage.trec-dl-2020.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Queries from the TREC Deep Learning (DL) 2020 shared task, which were sampled from msmarco-passage/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-passage/trec-dl-2020/judged).
-
Dataset irds.msmarco-passage.trec-dl-2020
datamaestro_text.datasets.irds.data.Adhoc
Queries from the TREC Deep Learning (DL) 2020 shared task, which were sampled from msmarco-passage/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-passage/trec-dl-2020/judged).
-
Dataset irds.msmarco-passage.trec-dl-2020.judged.queries
datamaestro_text.datasets.irds.data.Topics
Subset of msmarco-passage/trec-dl-2020, only including queries with qrels.
-
Dataset irds.msmarco-passage.trec-dl-2020.judged.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Subset of msmarco-passage/trec-dl-2020, only including queries with qrels.
-
Dataset irds.msmarco-passage.trec-dl-2020.judged.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Subset of msmarco-passage/trec-dl-2020, only including queries with qrels.
-
Dataset irds.msmarco-passage.trec-dl-2020.judged
datamaestro_text.datasets.irds.data.Adhoc
Subset of msmarco-passage/trec-dl-2020, only including queries with qrels.
-
Dataset irds.msmarco-passage.trec-dl-hard.queries
datamaestro_text.datasets.irds.data.Topics
A more challenging subset of msmarco-passage/trec-dl-2019 and msmarco-document/trec-dl-2020.
- data website
- See Also: msmarco-document/trec-dl-hard
-
Dataset irds.msmarco-passage.trec-dl-hard.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A more challenging subset of msmarco-passage/trec-dl-2019 and msmarco-document/trec-dl-2020.
- data website
- See Also: msmarco-document/trec-dl-hard
-
Dataset irds.msmarco-passage.trec-dl-hard
datamaestro_text.datasets.irds.data.Adhoc
A more challenging subset of msmarco-passage/trec-dl-2019 and msmarco-document/trec-dl-2020.
- data website
- See Also: msmarco-document/trec-dl-hard
-
Dataset irds.msmarco-passage.trec-dl-hard.fold1.queries
datamaestro_text.datasets.irds.data.Topics
Fold 1 of msmarco-passage/trec-dl-hard
-
Dataset irds.msmarco-passage.trec-dl-hard.fold1.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Fold 1 of msmarco-passage/trec-dl-hard
-
Dataset irds.msmarco-passage.trec-dl-hard.fold1
datamaestro_text.datasets.irds.data.Adhoc
Fold 1 of msmarco-passage/trec-dl-hard
-
Dataset irds.msmarco-passage.trec-dl-hard.fold2.queries
datamaestro_text.datasets.irds.data.Topics
Fold 2 of msmarco-passage/trec-dl-hard
-
Dataset irds.msmarco-passage.trec-dl-hard.fold2.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Fold 2 of msmarco-passage/trec-dl-hard
-
Dataset irds.msmarco-passage.trec-dl-hard.fold2
datamaestro_text.datasets.irds.data.Adhoc
Fold 2 of msmarco-passage/trec-dl-hard
-
Dataset irds.msmarco-passage.trec-dl-hard.fold3.queries
datamaestro_text.datasets.irds.data.Topics
Fold 3 of msmarco-passage/trec-dl-hard
-
Dataset irds.msmarco-passage.trec-dl-hard.fold3.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Fold 3 of msmarco-passage/trec-dl-hard
-
Dataset irds.msmarco-passage.trec-dl-hard.fold3
datamaestro_text.datasets.irds.data.Adhoc
Fold 3 of msmarco-passage/trec-dl-hard
-
Dataset irds.msmarco-passage.trec-dl-hard.fold4.queries
datamaestro_text.datasets.irds.data.Topics
Fold 4 of msmarco-passage/trec-dl-hard
-
Dataset irds.msmarco-passage.trec-dl-hard.fold4.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Fold 4 of msmarco-passage/trec-dl-hard
-
Dataset irds.msmarco-passage.trec-dl-hard.fold4
datamaestro_text.datasets.irds.data.Adhoc
Fold 4 of msmarco-passage/trec-dl-hard
-
Dataset irds.msmarco-passage.trec-dl-hard.fold5.queries
datamaestro_text.datasets.irds.data.Topics
Fold 5 of msmarco-passage/trec-dl-hard
-
Dataset irds.msmarco-passage.trec-dl-hard.fold5.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Fold 5 of msmarco-passage/trec-dl-hard
-
Dataset irds.msmarco-passage.trec-dl-hard.fold5
datamaestro_text.datasets.irds.data.Adhoc
Fold 5 of msmarco-passage/trec-dl-hard
mmarco/de
Version of msmarco-passage, with documents translated into German.
-
Dataset irds.mmarco.de.documents
datamaestro_text.datasets.irds.data.Documents
Version of msmarco-passage, with documents translated into German.
-
Dataset irds.mmarco.de.dev.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev, with queries and documents translated into German.
-
Dataset irds.mmarco.de.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev, with queries and documents translated into German.
-
Dataset irds.mmarco.de.dev
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev, with queries and documents translated into German.
-
Dataset irds.mmarco.de.dev.small.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev/small, with queries and documents translated into German.
-
Dataset irds.mmarco.de.dev.small.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Version of msmarco-passage/dev/small, with queries and documents translated into German.
-
Dataset irds.mmarco.de.dev.small.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev/small, with queries and documents translated into German.
-
Dataset irds.mmarco.de.dev.small
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev/small, with queries and documents translated into German.
-
Dataset irds.mmarco.de.train.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/train, with queries and documents translated into German.
-
Dataset irds.mmarco.de.train.docpairs
-
Version of msmarco-passage/train, with queries and documents translated into German.
-
Dataset irds.mmarco.de.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/train, with queries and documents translated into German.
-
Dataset irds.mmarco.de.train
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/train, with queries and documents translated into German.
mmarco/es
Version of msmarco-passage, with documents translated into Spanish.
-
Dataset irds.mmarco.es.documents
datamaestro_text.datasets.irds.data.Documents
Version of msmarco-passage, with documents translated into Spanish.
-
Dataset irds.mmarco.es.dev.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev, with queries and documents translated into Spanish.
-
Dataset irds.mmarco.es.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev, with queries and documents translated into Spanish.
-
Dataset irds.mmarco.es.dev
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev, with queries and documents translated into Spanish.
-
Dataset irds.mmarco.es.dev.small.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev/small, with queries and documents translated into Spanish.
-
Dataset irds.mmarco.es.dev.small.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Version of msmarco-passage/dev/small, with queries and documents translated into Spanish.
-
Dataset irds.mmarco.es.dev.small.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev/small, with queries and documents translated into Spanish.
-
Dataset irds.mmarco.es.dev.small
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev/small, with queries and documents translated into Spanish.
-
Dataset irds.mmarco.es.train.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/train, with queries and documents translated into Spanish.
-
Dataset irds.mmarco.es.train.docpairs
-
Version of msmarco-passage/train, with queries and documents translated into Spanish.
-
Dataset irds.mmarco.es.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/train, with queries and documents translated into Spanish.
-
Dataset irds.mmarco.es.train
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/train, with queries and documents translated into Spanish.
mmarco/fr
Version of msmarco-passage, with documents translated into French.
-
Dataset irds.mmarco.fr.documents
datamaestro_text.datasets.irds.data.Documents
Version of msmarco-passage, with documents translated into French.
-
Dataset irds.mmarco.fr.dev.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev, with queries and documents translated into French.
-
Dataset irds.mmarco.fr.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev, with queries and documents translated into French.
-
Dataset irds.mmarco.fr.dev
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev, with queries and documents translated into French.
-
Dataset irds.mmarco.fr.dev.small.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev/small, with queries and documents translated into French.
-
Dataset irds.mmarco.fr.dev.small.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Version of msmarco-passage/dev/small, with queries and documents translated into French.
-
Dataset irds.mmarco.fr.dev.small.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev/small, with queries and documents translated into French.
-
Dataset irds.mmarco.fr.dev.small
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev/small, with queries and documents translated into French.
-
Dataset irds.mmarco.fr.train.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/train, with queries and documents translated into French.
-
Dataset irds.mmarco.fr.train.docpairs
-
Version of msmarco-passage/train, with queries and documents translated into French.
-
Dataset irds.mmarco.fr.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/train, with queries and documents translated into French.
-
Dataset irds.mmarco.fr.train
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/train, with queries and documents translated into French.
mmarco/id
Version of msmarco-passage, with documents translated into Indonesian.
-
Dataset irds.mmarco.id.documents
datamaestro_text.datasets.irds.data.Documents
Version of msmarco-passage, with documents translated into Indonesian.
-
Dataset irds.mmarco.id.dev.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev, with queries and documents translated into Indonesian.
-
Dataset irds.mmarco.id.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev, with queries and documents translated into Indonesian.
-
Dataset irds.mmarco.id.dev
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev, with queries and documents translated into Indonesian.
-
Dataset irds.mmarco.id.dev.small.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev/small, with queries and documents translated into Indonesian.
-
Dataset irds.mmarco.id.dev.small.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Version of msmarco-passage/dev/small, with queries and documents translated into Indonesian.
-
Dataset irds.mmarco.id.dev.small.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev/small, with queries and documents translated into Indonesian.
-
Dataset irds.mmarco.id.dev.small
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev/small, with queries and documents translated into Indonesian.
-
Dataset irds.mmarco.id.train.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/train, with queries and documents translated into Indonesian.
-
Dataset irds.mmarco.id.train.docpairs
-
Version of msmarco-passage/train, with queries and documents translated into Indonesian.
-
Dataset irds.mmarco.id.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/train, with queries and documents translated into Indonesian.
-
Dataset irds.mmarco.id.train
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/train, with queries and documents translated into Indonesian.
mmarco/it
Version of msmarco-passage, with documents translated into Italian.
-
Dataset irds.mmarco.it.documents
datamaestro_text.datasets.irds.data.Documents
Version of msmarco-passage, with documents translated into Italian.
-
Dataset irds.mmarco.it.dev.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev, with queries and documents translated into Italian.
-
Dataset irds.mmarco.it.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev, with queries and documents translated into Italian.
-
Dataset irds.mmarco.it.dev
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev, with queries and documents translated into Italian.
-
Dataset irds.mmarco.it.dev.small.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev/small, with queries and documents translated into Italian.
-
Dataset irds.mmarco.it.dev.small.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Version of msmarco-passage/dev/small, with queries and documents translated into Italian.
-
Dataset irds.mmarco.it.dev.small.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev/small, with queries and documents translated into Italian.
-
Dataset irds.mmarco.it.dev.small
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev/small, with queries and documents translated into Italian.
-
Dataset irds.mmarco.it.train.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/train, with queries and documents translated into Italian.
-
Dataset irds.mmarco.it.train.docpairs
-
Version of msmarco-passage/train, with queries and documents translated into Italian.
-
Dataset irds.mmarco.it.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/train, with queries and documents translated into Italian.
-
Dataset irds.mmarco.it.train
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/train, with queries and documents translated into Italian.
mmarco/pt
Version of msmarco-passage, with documents translated into Portuguese.
-
Dataset irds.mmarco.pt.documents
datamaestro_text.datasets.irds.data.Documents
Version of msmarco-passage, with documents translated into Portuguese.
-
Dataset irds.mmarco.pt.dev.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev, with queries and documents translated into Portuguese.
-
Dataset irds.mmarco.pt.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev, with queries and documents translated into Portuguese.
-
Dataset irds.mmarco.pt.dev
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev, with queries and documents translated into Portuguese.
-
Dataset irds.mmarco.pt.dev.small.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev/small, with queries and documents translated into Portuguese.
-
Dataset irds.mmarco.pt.dev.small.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev/small, with queries and documents translated into Portuguese.
-
Dataset irds.mmarco.pt.dev.small
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev/small, with queries and documents translated into Portuguese.
-
Dataset irds.mmarco.pt.dev.small.v1.1.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev, with queries and documents translated into Portuguese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.
-
Dataset irds.mmarco.pt.dev.small.v1.1.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Version of msmarco-passage/dev, with queries and documents translated into Portuguese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.
-
Dataset irds.mmarco.pt.dev.small.v1.1.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev, with queries and documents translated into Portuguese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.
-
Dataset irds.mmarco.pt.dev.small.v1.1
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev, with queries and documents translated into Portuguese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.
-
Dataset irds.mmarco.pt.dev.v1.1.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev, with queries and documents translated into Portuguese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.
-
Dataset irds.mmarco.pt.dev.v1.1.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev, with queries and documents translated into Portuguese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.
-
Dataset irds.mmarco.pt.dev.v1.1
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev, with queries and documents translated into Portuguese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.
-
Dataset irds.mmarco.pt.train.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/train, with queries and documents translated into Portuguese.
-
Dataset irds.mmarco.pt.train.docpairs
-
Version of msmarco-passage/train, with queries and documents translated into Portuguese.
-
Dataset irds.mmarco.pt.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/train, with queries and documents translated into Portuguese.
-
Dataset irds.mmarco.pt.train
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/train, with queries and documents translated into Portuguese.
-
Dataset irds.mmarco.pt.train.v1.1.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/train, with queries and documents translated into Portuguese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.
-
Dataset irds.mmarco.pt.train.v1.1.docpairs
-
Version of msmarco-passage/train, with queries and documents translated into Portuguese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.
-
Dataset irds.mmarco.pt.train.v1.1.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/train, with queries and documents translated into Portuguese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.
-
Dataset irds.mmarco.pt.train.v1.1
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/train, with queries and documents translated into Portuguese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.
mmarco/ru
Version of msmarco-passage, with documents translated into Russian.
-
Dataset irds.mmarco.ru.documents
datamaestro_text.datasets.irds.data.Documents
Version of msmarco-passage, with documents translated into Russian.
-
Dataset irds.mmarco.ru.dev.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev, with queries and documents translated into Russian.
-
Dataset irds.mmarco.ru.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev, with queries and documents translated into Russian.
-
Dataset irds.mmarco.ru.dev
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev, with queries and documents translated into Russian.
-
Dataset irds.mmarco.ru.dev.small.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev/small, with queries and documents translated into Russian.
-
Dataset irds.mmarco.ru.dev.small.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Version of msmarco-passage/dev/small, with queries and documents translated into Russian.
-
Dataset irds.mmarco.ru.dev.small.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev/small, with queries and documents translated into Russian.
-
Dataset irds.mmarco.ru.dev.small
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev/small, with queries and documents translated into Russian.
-
Dataset irds.mmarco.ru.train.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/train, with queries and documents translated into Russian.
-
Dataset irds.mmarco.ru.train.docpairs
-
Version of msmarco-passage/train, with queries and documents translated into Russian.
-
Dataset irds.mmarco.ru.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/train, with queries and documents translated into Russian.
-
Dataset irds.mmarco.ru.train
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/train, with queries and documents translated into Russian.
mmarco/v2/ar
Version of msmarco-passage, with queries and documents translated into Arabic.
-
Dataset irds.mmarco.v2.ar.documents
datamaestro_text.datasets.irds.data.Documents
Version of msmarco-passage, with queries and documents translated into Arabic.
-
Dataset irds.mmarco.v2.ar.dev.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev, with queries and documents translated into Arabic.
-
Dataset irds.mmarco.v2.ar.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev, with queries and documents translated into Arabic.
-
Dataset irds.mmarco.v2.ar.dev
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev, with queries and documents translated into Arabic.
-
Dataset irds.mmarco.v2.ar.dev.small.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev/small, with queries and documents translated into Arabic.
-
Dataset irds.mmarco.v2.ar.dev.small.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Version of msmarco-passage/dev/small, with queries and documents translated into Arabic.
-
Dataset irds.mmarco.v2.ar.dev.small.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev/small, with queries and documents translated into Arabic.
-
Dataset irds.mmarco.v2.ar.dev.small
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev/small, with queries and documents translated into Arabic.
-
Dataset irds.mmarco.v2.ar.train.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/train, with queries and documents translated into Arabic.
-
Dataset irds.mmarco.v2.ar.train.docpairs
-
Version of msmarco-passage/train, with queries and documents translated into Arabic.
-
Dataset irds.mmarco.v2.ar.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/train, with queries and documents translated into Arabic.
-
Dataset irds.mmarco.v2.ar.train
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/train, with queries and documents translated into Arabic.
mmarco/v2/de
Version of msmarco-passage, with queries and documents translated into German.
-
Dataset irds.mmarco.v2.de.documents
datamaestro_text.datasets.irds.data.Documents
Version of msmarco-passage, with queries and documents translated into German.
-
Dataset irds.mmarco.v2.de.dev.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev, with queries and documents translated into German.
-
Dataset irds.mmarco.v2.de.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev, with queries and documents translated into German.
-
Dataset irds.mmarco.v2.de.dev
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev, with queries and documents translated into German.
-
Dataset irds.mmarco.v2.de.dev.small.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev/small, with queries and documents translated into German.
-
Dataset irds.mmarco.v2.de.dev.small.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Version of msmarco-passage/dev/small, with queries and documents translated into German.
-
Dataset irds.mmarco.v2.de.dev.small.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev/small, with queries and documents translated into German.
-
Dataset irds.mmarco.v2.de.dev.small
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev/small, with queries and documents translated into German.
-
Dataset irds.mmarco.v2.de.train.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/train, with queries and documents translated into German.
-
Dataset irds.mmarco.v2.de.train.docpairs
-
Version of msmarco-passage/train, with queries and documents translated into German.
-
Dataset irds.mmarco.v2.de.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/train, with queries and documents translated into German.
-
Dataset irds.mmarco.v2.de.train
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/train, with queries and documents translated into German.
mmarco/v2/dt
Version of msmarco-passage, with queries and documents translated into Dutch.
-
Dataset irds.mmarco.v2.dt.documents
datamaestro_text.datasets.irds.data.Documents
Version of msmarco-passage, with queries and documents translated into Dutch.
-
Dataset irds.mmarco.v2.dt.dev.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev, with queries and documents translated into Dutch.
-
Dataset irds.mmarco.v2.dt.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev, with queries and documents translated into Dutch.
-
Dataset irds.mmarco.v2.dt.dev
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev, with queries and documents translated into Dutch.
-
Dataset irds.mmarco.v2.dt.dev.small.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev/small, with queries and documents translated into Dutch.
-
Dataset irds.mmarco.v2.dt.dev.small.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Version of msmarco-passage/dev/small, with queries and documents translated into Dutch.
-
Dataset irds.mmarco.v2.dt.dev.small.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev/small, with queries and documents translated into Dutch.
-
Dataset irds.mmarco.v2.dt.dev.small
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev/small, with queries and documents translated into Dutch.
-
Dataset irds.mmarco.v2.dt.train.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/train, with queries and documents translated into Dutch.
-
Dataset irds.mmarco.v2.dt.train.docpairs
-
Version of msmarco-passage/train, with queries and documents translated into Dutch.
-
Dataset irds.mmarco.v2.dt.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/train, with queries and documents translated into Dutch.
-
Dataset irds.mmarco.v2.dt.train
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/train, with queries and documents translated into Dutch.
mmarco/v2/es
Version of msmarco-passage, with queries and documents translated into Spanish.
-
Dataset irds.mmarco.v2.es.documents
datamaestro_text.datasets.irds.data.Documents
Version of msmarco-passage, with queries and documents translated into Spanish.
-
Dataset irds.mmarco.v2.es.dev.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev, with queries and documents translated into Spanish.
-
Dataset irds.mmarco.v2.es.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev, with queries and documents translated into Spanish.
-
Dataset irds.mmarco.v2.es.dev
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev, with queries and documents translated into Spanish.
-
Dataset irds.mmarco.v2.es.dev.small.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev/small, with queries and documents translated into Spanish.
-
Dataset irds.mmarco.v2.es.dev.small.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Version of msmarco-passage/dev/small, with queries and documents translated into Spanish.
-
Dataset irds.mmarco.v2.es.dev.small.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev/small, with queries and documents translated into Spanish.
-
Dataset irds.mmarco.v2.es.dev.small
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev/small, with queries and documents translated into Spanish.
-
Dataset irds.mmarco.v2.es.train.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/train, with queries and documents translated into Spanish.
-
Dataset irds.mmarco.v2.es.train.docpairs
-
Version of msmarco-passage/train, with queries and documents translated into Spanish.
-
Dataset irds.mmarco.v2.es.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/train, with queries and documents translated into Spanish.
-
Dataset irds.mmarco.v2.es.train
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/train, with queries and documents translated into Spanish.
mmarco/v2/fr
Version of msmarco-passage, with queries and documents translated into French.
-
Dataset irds.mmarco.v2.fr.documents
datamaestro_text.datasets.irds.data.Documents
Version of msmarco-passage, with queries and documents translated into French.
-
Dataset irds.mmarco.v2.fr.dev.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev, with queries and documents translated into French.
-
Dataset irds.mmarco.v2.fr.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev, with queries and documents translated into French.
-
Dataset irds.mmarco.v2.fr.dev
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev, with queries and documents translated into French.
-
Dataset irds.mmarco.v2.fr.dev.small.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev/small, with queries and documents translated into French.
-
Dataset irds.mmarco.v2.fr.dev.small.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Version of msmarco-passage/dev/small, with queries and documents translated into French.
-
Dataset irds.mmarco.v2.fr.dev.small.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev/small, with queries and documents translated into French.
-
Dataset irds.mmarco.v2.fr.dev.small
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev/small, with queries and documents translated into French.
-
Dataset irds.mmarco.v2.fr.train.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/train, with queries and documents translated into French.
-
Dataset irds.mmarco.v2.fr.train.docpairs
-
Version of msmarco-passage/train, with queries and documents translated into French.
-
Dataset irds.mmarco.v2.fr.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/train, with queries and documents translated into French.
-
Dataset irds.mmarco.v2.fr.train
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/train, with queries and documents translated into French.
mmarco/v2/hi
Version of msmarco-passage, with queries and documents translated into Hindi.
-
Dataset irds.mmarco.v2.hi.documents
datamaestro_text.datasets.irds.data.Documents
Version of msmarco-passage, with queries and documents translated into Hindi.
-
Dataset irds.mmarco.v2.hi.dev.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev, with queries and documents translated into Hindi.
-
Dataset irds.mmarco.v2.hi.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev, with queries and documents translated into Hindi.
-
Dataset irds.mmarco.v2.hi.dev
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev, with queries and documents translated into Hindi.
-
Dataset irds.mmarco.v2.hi.dev.small.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev/small, with queries and documents translated into Hindi.
-
Dataset irds.mmarco.v2.hi.dev.small.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Version of msmarco-passage/dev/small, with queries and documents translated into Hindi.
-
Dataset irds.mmarco.v2.hi.dev.small.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev/small, with queries and documents translated into Hindi.
-
Dataset irds.mmarco.v2.hi.dev.small
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev/small, with queries and documents translated into Hindi.
-
Dataset irds.mmarco.v2.hi.train.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/train, with queries and documents translated into Hindi.
-
Dataset irds.mmarco.v2.hi.train.docpairs
-
Version of msmarco-passage/train, with queries and documents translated into Hindi.
-
Dataset irds.mmarco.v2.hi.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/train, with queries and documents translated into Hindi.
-
Dataset irds.mmarco.v2.hi.train
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/train, with queries and documents translated into Hindi.
mmarco/v2/id
Version of msmarco-passage, with queries and documents translated into Indonesian.
-
Dataset irds.mmarco.v2.id.documents
datamaestro_text.datasets.irds.data.Documents
Version of msmarco-passage, with queries and documents translated into Indonesian.
-
Dataset irds.mmarco.v2.id.dev.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev, with queries and documents translated into Indonesian.
-
Dataset irds.mmarco.v2.id.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev, with queries and documents translated into Indonesian.
-
Dataset irds.mmarco.v2.id.dev
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev, with queries and documents translated into Indonesian.
-
Dataset irds.mmarco.v2.id.dev.small.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev/small, with queries and documents translated into Indonesian.
-
Dataset irds.mmarco.v2.id.dev.small.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Version of msmarco-passage/dev/small, with queries and documents translated into Indonesian.
-
Dataset irds.mmarco.v2.id.dev.small.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev/small, with queries and documents translated into Indonesian.
-
Dataset irds.mmarco.v2.id.dev.small
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev/small, with queries and documents translated into Indonesian.
-
Dataset irds.mmarco.v2.id.train.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/train, with queries and documents translated into Indonesian.
-
Dataset irds.mmarco.v2.id.train.docpairs
-
Version of msmarco-passage/train, with queries and documents translated into Indonesian.
-
Dataset irds.mmarco.v2.id.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/train, with queries and documents translated into Indonesian.
-
Dataset irds.mmarco.v2.id.train
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/train, with queries and documents translated into Indonesian.
mmarco/v2/it
Version of msmarco-passage, with queries and documents translated into Italian.
-
Dataset irds.mmarco.v2.it.documents
datamaestro_text.datasets.irds.data.Documents
Version of msmarco-passage, with queries and documents translated into Italian.
-
Dataset irds.mmarco.v2.it.dev.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev, with queries and documents translated into Italian.
-
Dataset irds.mmarco.v2.it.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev, with queries and documents translated into Italian.
-
Dataset irds.mmarco.v2.it.dev
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev, with queries and documents translated into Italian.
-
Dataset irds.mmarco.v2.it.dev.small.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev/small, with queries and documents translated into Italian.
-
Dataset irds.mmarco.v2.it.dev.small.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Version of msmarco-passage/dev/small, with queries and documents translated into Italian.
-
Dataset irds.mmarco.v2.it.dev.small.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev/small, with queries and documents translated into Italian.
-
Dataset irds.mmarco.v2.it.dev.small
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev/small, with queries and documents translated into Italian.
-
Dataset irds.mmarco.v2.it.train.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/train, with queries and documents translated into Italian.
-
Dataset irds.mmarco.v2.it.train.docpairs
-
Version of msmarco-passage/train, with queries and documents translated into Italian.
-
Dataset irds.mmarco.v2.it.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/train, with queries and documents translated into Italian.
-
Dataset irds.mmarco.v2.it.train
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/train, with queries and documents translated into Italian.
mmarco/v2/ja
Version of msmarco-passage, with queries and documents translated into Japanese.
-
Dataset irds.mmarco.v2.ja.documents
datamaestro_text.datasets.irds.data.Documents
Version of msmarco-passage, with queries and documents translated into Japanese.
-
Dataset irds.mmarco.v2.ja.dev.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev, with queries and documents translated into Japanese.
-
Dataset irds.mmarco.v2.ja.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev, with queries and documents translated into Japanese.
-
Dataset irds.mmarco.v2.ja.dev
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev, with queries and documents translated into Japanese.
-
Dataset irds.mmarco.v2.ja.dev.small.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev/small, with queries and documents translated into Japanese.
-
Dataset irds.mmarco.v2.ja.dev.small.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Version of msmarco-passage/dev/small, with queries and documents translated into Japanese.
-
Dataset irds.mmarco.v2.ja.dev.small.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev/small, with queries and documents translated into Japanese.
-
Dataset irds.mmarco.v2.ja.dev.small
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev/small, with queries and documents translated into Japanese.
-
Dataset irds.mmarco.v2.ja.train.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/train, with queries and documents translated into Japanese.
-
Dataset irds.mmarco.v2.ja.train.docpairs
-
Version of msmarco-passage/train, with queries and documents translated into Japanese.
-
Dataset irds.mmarco.v2.ja.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/train, with queries and documents translated into Japanese.
-
Dataset irds.mmarco.v2.ja.train
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/train, with queries and documents translated into Japanese.
mmarco/v2/pt
Version of msmarco-passage, with queries and documents translated into Portuguese.
-
Dataset irds.mmarco.v2.pt.documents
datamaestro_text.datasets.irds.data.Documents
Version of msmarco-passage, with queries and documents translated into Portuguese.
-
Dataset irds.mmarco.v2.pt.dev.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev, with queries and documents translated into Portuguese.
-
Dataset irds.mmarco.v2.pt.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev, with queries and documents translated into Portuguese.
-
Dataset irds.mmarco.v2.pt.dev
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev, with queries and documents translated into Portuguese.
-
Dataset irds.mmarco.v2.pt.dev.small.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev/small, with queries and documents translated into Portuguese.
-
Dataset irds.mmarco.v2.pt.dev.small.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Version of msmarco-passage/dev/small, with queries and documents translated into Portuguese.
-
Dataset irds.mmarco.v2.pt.dev.small.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev/small, with queries and documents translated into Portuguese.
-
Dataset irds.mmarco.v2.pt.dev.small
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev/small, with queries and documents translated into Portuguese.
-
Dataset irds.mmarco.v2.pt.train.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/train, with queries and documents translated into Portuguese.
-
Dataset irds.mmarco.v2.pt.train.docpairs
-
Version of msmarco-passage/train, with queries and documents translated into Portuguese.
-
Dataset irds.mmarco.v2.pt.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/train, with queries and documents translated into Portuguese.
-
Dataset irds.mmarco.v2.pt.train
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/train, with queries and documents translated into Portuguese.
mmarco/v2/ru
Version of msmarco-passage, with queries and documents translated into Russian.
-
Dataset irds.mmarco.v2.ru.documents
datamaestro_text.datasets.irds.data.Documents
Version of msmarco-passage, with queries and documents translated into Russian.
-
Dataset irds.mmarco.v2.ru.dev.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev, with queries and documents translated into Russian.
-
Dataset irds.mmarco.v2.ru.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev, with queries and documents translated into Russian.
-
Dataset irds.mmarco.v2.ru.dev
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev, with queries and documents translated into Russian.
-
Dataset irds.mmarco.v2.ru.dev.small.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev/small, with queries and documents translated into Russian.
-
Dataset irds.mmarco.v2.ru.dev.small.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Version of msmarco-passage/dev/small, with queries and documents translated into Russian.
-
Dataset irds.mmarco.v2.ru.dev.small.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev/small, with queries and documents translated into Russian.
-
Dataset irds.mmarco.v2.ru.dev.small
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev/small, with queries and documents translated into Russian.
-
Dataset irds.mmarco.v2.ru.train.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/train, with queries and documents translated into Russian.
-
Dataset irds.mmarco.v2.ru.train.docpairs
-
Version of msmarco-passage/train, with queries and documents translated into Russian.
-
Dataset irds.mmarco.v2.ru.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/train, with queries and documents translated into Russian.
-
Dataset irds.mmarco.v2.ru.train
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/train, with queries and documents translated into Russian.
mmarco/v2/vi
Version of msmarco-passage, with queries and documents translated into Vietnamese.
-
Dataset irds.mmarco.v2.vi.documents
datamaestro_text.datasets.irds.data.Documents
Version of msmarco-passage, with queries and documents translated into Vietnamese.
-
Dataset irds.mmarco.v2.vi.dev.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev, with queries and documents translated into Vietnamese.
-
Dataset irds.mmarco.v2.vi.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev, with queries and documents translated into Vietnamese.
-
Dataset irds.mmarco.v2.vi.dev
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev, with queries and documents translated into Vietnamese.
-
Dataset irds.mmarco.v2.vi.dev.small.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev/small, with queries and documents translated into Vietnamese.
-
Dataset irds.mmarco.v2.vi.dev.small.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Version of msmarco-passage/dev/small, with queries and documents translated into Vietnamese.
-
Dataset irds.mmarco.v2.vi.dev.small.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev/small, with queries and documents translated into Vietnamese.
-
Dataset irds.mmarco.v2.vi.dev.small
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev/small, with queries and documents translated into Vietnamese.
-
Dataset irds.mmarco.v2.vi.train.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/train, with queries and documents translated into Vietnamese.
-
Dataset irds.mmarco.v2.vi.train.docpairs
-
Version of msmarco-passage/train, with queries and documents translated into Vietnamese.
-
Dataset irds.mmarco.v2.vi.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/train, with queries and documents translated into Vietnamese.
-
Dataset irds.mmarco.v2.vi.train
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/train, with queries and documents translated into Vietnamese.
mmarco/v2/zh
Version of msmarco-passage, with queries and documents translated into Chinese.
-
Dataset irds.mmarco.v2.zh.documents
datamaestro_text.datasets.irds.data.Documents
Version of msmarco-passage, with queries and documents translated into Chinese.
-
Dataset irds.mmarco.v2.zh.dev.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev, with queries and documents translated into Chinese.
-
Dataset irds.mmarco.v2.zh.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev, with queries and documents translated into Chinese.
-
Dataset irds.mmarco.v2.zh.dev
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev, with queries and documents translated into Chinese.
-
Dataset irds.mmarco.v2.zh.dev.small.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev/small, with queries and documents translated into Chinese.
-
Dataset irds.mmarco.v2.zh.dev.small.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Version of msmarco-passage/dev/small, with queries and documents translated into Chinese.
-
Dataset irds.mmarco.v2.zh.dev.small.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev/small, with queries and documents translated into Chinese.
-
Dataset irds.mmarco.v2.zh.dev.small
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev/small, with queries and documents translated into Chinese.
-
Dataset irds.mmarco.v2.zh.train.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/train, with queries and documents translated into Chinese.
-
Dataset irds.mmarco.v2.zh.train.docpairs
-
Version of msmarco-passage/train, with queries and documents translated into Chinese.
-
Dataset irds.mmarco.v2.zh.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/train, with queries and documents translated into Chinese.
-
Dataset irds.mmarco.v2.zh.train
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/train, with queries and documents translated into Chinese.
mmarco/zh
Version of msmarco-passage, with documents translated into Chinese.
-
Dataset irds.mmarco.zh.documents
datamaestro_text.datasets.irds.data.Documents
Version of msmarco-passage, with documents translated into Chinese.
-
Dataset irds.mmarco.zh.dev.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev, with queries and documents translated into Chinese.
-
Dataset irds.mmarco.zh.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev, with queries and documents translated into Chinese.
-
Dataset irds.mmarco.zh.dev
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev, with queries and documents translated into Chinese.
-
Dataset irds.mmarco.zh.dev.small.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev/small, with queries and documents translated into Chinese.
-
Dataset irds.mmarco.zh.dev.small.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev/small, with queries and documents translated into Chinese.
-
Dataset irds.mmarco.zh.dev.small
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev/small, with queries and documents translated into Chinese.
-
Dataset irds.mmarco.zh.dev.small.v1.1.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev, with queries and documents translated into Chinese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here.
-
Dataset irds.mmarco.zh.dev.small.v1.1.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Version of msmarco-passage/dev, with queries and documents translated into Chinese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here.
-
Dataset irds.mmarco.zh.dev.small.v1.1.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev, with queries and documents translated into Chinese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here.
-
Dataset irds.mmarco.zh.dev.small.v1.1
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev, with queries and documents translated into Chinese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here.
-
Dataset irds.mmarco.zh.dev.v1.1.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/dev, with queries and documents translated into Chinese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here.
-
Dataset irds.mmarco.zh.dev.v1.1.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/dev, with queries and documents translated into Chinese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here.
-
Dataset irds.mmarco.zh.dev.v1.1
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/dev, with queries and documents translated into Chinese.
Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here.
-
Dataset irds.mmarco.zh.train.queries
datamaestro_text.datasets.irds.data.Topics
Version of msmarco-passage/train, with queries and documents translated into Chinese.
-
Dataset irds.mmarco.zh.train.docpairs
-
Version of msmarco-passage/train, with queries and documents translated into Chinese.
-
Dataset irds.mmarco.zh.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Version of msmarco-passage/train, with queries and documents translated into Chinese.
-
Dataset irds.mmarco.zh.train
datamaestro_text.datasets.irds.data.Adhoc
Version of msmarco-passage/train, with queries and documents translated into Chinese.
mr-tydi/ar
Complete Arabic dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.ar.documents
datamaestro_text.datasets.irds.data.Documents
Complete Arabic dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.ar.queries
datamaestro_text.datasets.irds.data.Topics
Complete Arabic dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.ar.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Complete Arabic dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.ar
datamaestro_text.datasets.irds.data.Adhoc
Complete Arabic dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.ar.dev.queries
datamaestro_text.datasets.irds.data.Topics
Development set for Arabic
-
Dataset irds.mr-tydi.ar.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Development set for Arabic
-
Dataset irds.mr-tydi.ar.dev
datamaestro_text.datasets.irds.data.Adhoc
Development set for Arabic
-
Dataset irds.mr-tydi.ar.test.queries
datamaestro_text.datasets.irds.data.Topics
Test set for Arabic
-
Dataset irds.mr-tydi.ar.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Test set for Arabic
-
Dataset irds.mr-tydi.ar.test
datamaestro_text.datasets.irds.data.Adhoc
Test set for Arabic
-
Dataset irds.mr-tydi.ar.train.queries
datamaestro_text.datasets.irds.data.Topics
Train set for Arabic
-
Dataset irds.mr-tydi.ar.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Train set for Arabic
-
Dataset irds.mr-tydi.ar.train
datamaestro_text.datasets.irds.data.Adhoc
Train set for Arabic
mr-tydi/bn
Complete Bengali dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.bn.documents
datamaestro_text.datasets.irds.data.Documents
Complete Bengali dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.bn.queries
datamaestro_text.datasets.irds.data.Topics
Complete Bengali dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.bn.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Complete Bengali dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.bn
datamaestro_text.datasets.irds.data.Adhoc
Complete Bengali dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.bn.dev.queries
datamaestro_text.datasets.irds.data.Topics
Development set for Bengali
-
Dataset irds.mr-tydi.bn.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Development set for Bengali
-
Dataset irds.mr-tydi.bn.dev
datamaestro_text.datasets.irds.data.Adhoc
Development set for Bengali
-
Dataset irds.mr-tydi.bn.test.queries
datamaestro_text.datasets.irds.data.Topics
Test set for Bengali
-
Dataset irds.mr-tydi.bn.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Test set for Bengali
-
Dataset irds.mr-tydi.bn.test
datamaestro_text.datasets.irds.data.Adhoc
Test set for Bengali
-
Dataset irds.mr-tydi.bn.train.queries
datamaestro_text.datasets.irds.data.Topics
Train set for Bengali
-
Dataset irds.mr-tydi.bn.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Train set for Bengali
-
Dataset irds.mr-tydi.bn.train
datamaestro_text.datasets.irds.data.Adhoc
Train set for Bengali
mr-tydi/en
Complete English dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.en.documents
datamaestro_text.datasets.irds.data.Documents
Complete English dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.en.queries
datamaestro_text.datasets.irds.data.Topics
Complete English dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.en.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Complete English dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.en
datamaestro_text.datasets.irds.data.Adhoc
Complete English dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.en.dev.queries
datamaestro_text.datasets.irds.data.Topics
Development set for English
-
Dataset irds.mr-tydi.en.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Development set for English
-
Dataset irds.mr-tydi.en.dev
datamaestro_text.datasets.irds.data.Adhoc
Development set for English
-
Dataset irds.mr-tydi.en.test.queries
datamaestro_text.datasets.irds.data.Topics
Test set for English
-
Dataset irds.mr-tydi.en.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Test set for English
-
Dataset irds.mr-tydi.en.test
datamaestro_text.datasets.irds.data.Adhoc
Test set for English
-
Dataset irds.mr-tydi.en.train.queries
datamaestro_text.datasets.irds.data.Topics
Train set for English
-
Dataset irds.mr-tydi.en.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Train set for English
-
Dataset irds.mr-tydi.en.train
datamaestro_text.datasets.irds.data.Adhoc
Train set for English
mr-tydi/fi
Complete Finnish dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.fi.documents
datamaestro_text.datasets.irds.data.Documents
Complete Finnish dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.fi.queries
datamaestro_text.datasets.irds.data.Topics
Complete Finnish dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.fi.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Complete Finnish dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.fi
datamaestro_text.datasets.irds.data.Adhoc
Complete Finnish dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.fi.dev.queries
datamaestro_text.datasets.irds.data.Topics
Development set for Finnish
-
Dataset irds.mr-tydi.fi.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Development set for Finnish
-
Dataset irds.mr-tydi.fi.dev
datamaestro_text.datasets.irds.data.Adhoc
Development set for Finnish
-
Dataset irds.mr-tydi.fi.test.queries
datamaestro_text.datasets.irds.data.Topics
Test set for Finnish
-
Dataset irds.mr-tydi.fi.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Test set for Finnish
-
Dataset irds.mr-tydi.fi.test
datamaestro_text.datasets.irds.data.Adhoc
Test set for Finnish
-
Dataset irds.mr-tydi.fi.train.queries
datamaestro_text.datasets.irds.data.Topics
Train set for Finnish
-
Dataset irds.mr-tydi.fi.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Train set for Finnish
-
Dataset irds.mr-tydi.fi.train
datamaestro_text.datasets.irds.data.Adhoc
Train set for Finnish
mr-tydi/id
Complete Indonesian dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.id.documents
datamaestro_text.datasets.irds.data.Documents
Complete Indonesian dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.id.queries
datamaestro_text.datasets.irds.data.Topics
Complete Indonesian dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.id.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Complete Indonesian dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.id
datamaestro_text.datasets.irds.data.Adhoc
Complete Indonesian dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.id.dev.queries
datamaestro_text.datasets.irds.data.Topics
Development set for Indonesian
-
Dataset irds.mr-tydi.id.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Development set for Indonesian
-
Dataset irds.mr-tydi.id.dev
datamaestro_text.datasets.irds.data.Adhoc
Development set for Indonesian
-
Dataset irds.mr-tydi.id.test.queries
datamaestro_text.datasets.irds.data.Topics
Test set for Indonesian
-
Dataset irds.mr-tydi.id.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Test set for Indonesian
-
Dataset irds.mr-tydi.id.test
datamaestro_text.datasets.irds.data.Adhoc
Test set for Indonesian
-
Dataset irds.mr-tydi.id.train.queries
datamaestro_text.datasets.irds.data.Topics
Train set for Indonesian
-
Dataset irds.mr-tydi.id.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Train set for Indonesian
-
Dataset irds.mr-tydi.id.train
datamaestro_text.datasets.irds.data.Adhoc
Train set for Indonesian
mr-tydi/ja
Complete Japanese dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.ja.documents
datamaestro_text.datasets.irds.data.Documents
Complete Japanese dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.ja.queries
datamaestro_text.datasets.irds.data.Topics
Complete Japanese dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.ja.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Complete Japanese dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.ja
datamaestro_text.datasets.irds.data.Adhoc
Complete Japanese dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.ja.dev.queries
datamaestro_text.datasets.irds.data.Topics
Development set for Japanese
-
Dataset irds.mr-tydi.ja.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Development set for Japanese
-
Dataset irds.mr-tydi.ja.dev
datamaestro_text.datasets.irds.data.Adhoc
Development set for Japanese
-
Dataset irds.mr-tydi.ja.test.queries
datamaestro_text.datasets.irds.data.Topics
Test set for Japanese
-
Dataset irds.mr-tydi.ja.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Test set for Japanese
-
Dataset irds.mr-tydi.ja.test
datamaestro_text.datasets.irds.data.Adhoc
Test set for Japanese
-
Dataset irds.mr-tydi.ja.train.queries
datamaestro_text.datasets.irds.data.Topics
Train set for Japanese
-
Dataset irds.mr-tydi.ja.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Train set for Japanese
-
Dataset irds.mr-tydi.ja.train
datamaestro_text.datasets.irds.data.Adhoc
Train set for Japanese
mr-tydi/ko
Complete Korean dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.ko.documents
datamaestro_text.datasets.irds.data.Documents
Complete Korean dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.ko.queries
datamaestro_text.datasets.irds.data.Topics
Complete Korean dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.ko.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Complete Korean dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.ko
datamaestro_text.datasets.irds.data.Adhoc
Complete Korean dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.ko.dev.queries
datamaestro_text.datasets.irds.data.Topics
Development set for Korean
-
Dataset irds.mr-tydi.ko.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Development set for Korean
-
Dataset irds.mr-tydi.ko.dev
datamaestro_text.datasets.irds.data.Adhoc
Development set for Korean
-
Dataset irds.mr-tydi.ko.test.queries
datamaestro_text.datasets.irds.data.Topics
Test set for Korean
-
Dataset irds.mr-tydi.ko.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Test set for Korean
-
Dataset irds.mr-tydi.ko.test
datamaestro_text.datasets.irds.data.Adhoc
Test set for Korean
-
Dataset irds.mr-tydi.ko.train.queries
datamaestro_text.datasets.irds.data.Topics
Train set for Korean
-
Dataset irds.mr-tydi.ko.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Train set for Korean
-
Dataset irds.mr-tydi.ko.train
datamaestro_text.datasets.irds.data.Adhoc
Train set for Korean
mr-tydi/ru
Complete Russian dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.ru.documents
datamaestro_text.datasets.irds.data.Documents
Complete Russian dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.ru.queries
datamaestro_text.datasets.irds.data.Topics
Complete Russian dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.ru.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Complete Russian dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.ru
datamaestro_text.datasets.irds.data.Adhoc
Complete Russian dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.ru.dev.queries
datamaestro_text.datasets.irds.data.Topics
Development set for Russian
-
Dataset irds.mr-tydi.ru.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Development set for Russian
-
Dataset irds.mr-tydi.ru.dev
datamaestro_text.datasets.irds.data.Adhoc
Development set for Russian
-
Dataset irds.mr-tydi.ru.test.queries
datamaestro_text.datasets.irds.data.Topics
Test set for Russian
-
Dataset irds.mr-tydi.ru.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Test set for Russian
-
Dataset irds.mr-tydi.ru.test
datamaestro_text.datasets.irds.data.Adhoc
Test set for Russian
-
Dataset irds.mr-tydi.ru.train.queries
datamaestro_text.datasets.irds.data.Topics
Train set for Russian
-
Dataset irds.mr-tydi.ru.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Train set for Russian
-
Dataset irds.mr-tydi.ru.train
datamaestro_text.datasets.irds.data.Adhoc
Train set for Russian
mr-tydi/sw
Complete Swahili dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.sw.documents
datamaestro_text.datasets.irds.data.Documents
Complete Swahili dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.sw.queries
datamaestro_text.datasets.irds.data.Topics
Complete Swahili dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.sw.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Complete Swahili dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.sw
datamaestro_text.datasets.irds.data.Adhoc
Complete Swahili dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.sw.dev.queries
datamaestro_text.datasets.irds.data.Topics
Development set for Swahili
-
Dataset irds.mr-tydi.sw.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Development set for Swahili
-
Dataset irds.mr-tydi.sw.dev
datamaestro_text.datasets.irds.data.Adhoc
Development set for Swahili
-
Dataset irds.mr-tydi.sw.test.queries
datamaestro_text.datasets.irds.data.Topics
Test set for Swahili
-
Dataset irds.mr-tydi.sw.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Test set for Swahili
-
Dataset irds.mr-tydi.sw.test
datamaestro_text.datasets.irds.data.Adhoc
Test set for Swahili
-
Dataset irds.mr-tydi.sw.train.queries
datamaestro_text.datasets.irds.data.Topics
Train set for Swahili
-
Dataset irds.mr-tydi.sw.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Train set for Swahili
-
Dataset irds.mr-tydi.sw.train
datamaestro_text.datasets.irds.data.Adhoc
Train set for Swahili
mr-tydi/te
Complete Telugu dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.te.documents
datamaestro_text.datasets.irds.data.Documents
Complete Telugu dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.te.queries
datamaestro_text.datasets.irds.data.Topics
Complete Telugu dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.te.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Complete Telugu dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.te
datamaestro_text.datasets.irds.data.Adhoc
Complete Telugu dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.te.dev.queries
datamaestro_text.datasets.irds.data.Topics
Development set for Telugu
-
Dataset irds.mr-tydi.te.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Development set for Telugu
-
Dataset irds.mr-tydi.te.dev
datamaestro_text.datasets.irds.data.Adhoc
Development set for Telugu
-
Dataset irds.mr-tydi.te.test.queries
datamaestro_text.datasets.irds.data.Topics
Test set for Telugu
-
Dataset irds.mr-tydi.te.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Test set for Telugu
-
Dataset irds.mr-tydi.te.test
datamaestro_text.datasets.irds.data.Adhoc
Test set for Telugu
-
Dataset irds.mr-tydi.te.train.queries
datamaestro_text.datasets.irds.data.Topics
Train set for Telugu
-
Dataset irds.mr-tydi.te.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Train set for Telugu
-
Dataset irds.mr-tydi.te.train
datamaestro_text.datasets.irds.data.Adhoc
Train set for Telugu
mr-tydi/th
Complete Thai dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.th.documents
datamaestro_text.datasets.irds.data.Documents
Complete Thai dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.th.queries
datamaestro_text.datasets.irds.data.Topics
Complete Thai dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.th.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Complete Thai dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.th
datamaestro_text.datasets.irds.data.Adhoc
Complete Thai dataset, including all train, dev, and test queries and qrels.
-
Dataset irds.mr-tydi.th.dev.queries
datamaestro_text.datasets.irds.data.Topics
Development set for Thai
-
Dataset irds.mr-tydi.th.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Development set for Thai
-
Dataset irds.mr-tydi.th.dev
datamaestro_text.datasets.irds.data.Adhoc
Development set for Thai
-
Dataset irds.mr-tydi.th.test.queries
datamaestro_text.datasets.irds.data.Topics
Test set for Thai
-
Dataset irds.mr-tydi.th.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Test set for Thai
-
Dataset irds.mr-tydi.th.test
datamaestro_text.datasets.irds.data.Adhoc
Test set for Thai
-
Dataset irds.mr-tydi.th.train.queries
datamaestro_text.datasets.irds.data.Topics
Train set for Thai
-
Dataset irds.mr-tydi.th.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Train set for Thai
-
Dataset irds.mr-tydi.th.train
datamaestro_text.datasets.irds.data.Adhoc
Train set for Thai
MSMARCO (document)
"Based the questions in the [MS-MARCO] Question Answering Dataset and the documents which answered the questions a document ranking task was formulated. There are 3.2 million documents and the goal is to rank based on their relevance. Relevance labels are derived from what passages was marked as having the answer in the QnA dataset."
- See also: msmarco-passage
- Documents: Text extracted from web pages
- Queries: Natural language questions (from query log)
- Leaderboard
- Dataset Paper
-
Dataset irds.msmarco-document.documents
datamaestro_text.datasets.irds.data.Documents
"Based the questions in the [MS-MARCO] Question Answering Dataset and the documents which answered the questions a document ranking task was formulated. There are 3.2 million documents and the goal is to rank based on their relevance. Relevance labels are derived from what passages was marked as having the answer in the QnA dataset."
- See also: msmarco-passage
- Documents: Text extracted from web pages
- Queries: Natural language questions (from query log)
- Leaderboard
- Dataset Paper
-
Dataset irds.msmarco-document.dev.queries
datamaestro_text.datasets.irds.data.Topics
Official dev set. All queries have exactly 1 (positive) relevance judgment.
scoreddocs are the top 100 results from Indri QL. These are used for the "re-ranking" setting.
-
Dataset irds.msmarco-document.dev.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Official dev set. All queries have exactly 1 (positive) relevance judgment.
scoreddocs are the top 100 results from Indri QL. These are used for the "re-ranking" setting.
-
Dataset irds.msmarco-document.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official dev set. All queries have exactly 1 (positive) relevance judgment.
scoreddocs are the top 100 results from Indri QL. These are used for the "re-ranking" setting.
-
Dataset irds.msmarco-document.dev
datamaestro_text.datasets.irds.data.Adhoc
Official dev set. All queries have exactly 1 (positive) relevance judgment.
scoreddocs are the top 100 results from Indri QL. These are used for the "re-ranking" setting.
-
Dataset irds.msmarco-document.eval.queries
datamaestro_text.datasets.irds.data.Topics
Official eval set for submission to MS MARCO leaderboard. Relevance judgments are hidden.
scoreddocs are the top 100 results from Indri QL. These are used for the "re-ranking" setting.
-
Dataset irds.msmarco-document.eval.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Official eval set for submission to MS MARCO leaderboard. Relevance judgments are hidden.
scoreddocs are the top 100 results from Indri QL. These are used for the "re-ranking" setting.
-
Dataset irds.msmarco-document.orcas.queries
datamaestro_text.datasets.irds.data.Topics
"ORCAS is a click-based dataset associated with the TREC Deep Learning Track. It covers 1.4 million of the TREC DL documents, providing 18 million connections to 10 million distinct queries."
- Queries: From query log
- Relevance Data: User clicks
- Scored docs: Indri Query Likelihood model
- Dataset Paper
-
Dataset irds.msmarco-document.orcas.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
"ORCAS is a click-based dataset associated with the TREC Deep Learning Track. It covers 1.4 million of the TREC DL documents, providing 18 million connections to 10 million distinct queries."
- Queries: From query log
- Relevance Data: User clicks
- Scored docs: Indri Query Likelihood model
- Dataset Paper
-
Dataset irds.msmarco-document.orcas.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
"ORCAS is a click-based dataset associated with the TREC Deep Learning Track. It covers 1.4 million of the TREC DL documents, providing 18 million connections to 10 million distinct queries."
- Queries: From query log
- Relevance Data: User clicks
- Scored docs: Indri Query Likelihood model
- Dataset Paper
-
Dataset irds.msmarco-document.orcas
datamaestro_text.datasets.irds.data.Adhoc
"ORCAS is a click-based dataset associated with the TREC Deep Learning Track. It covers 1.4 million of the TREC DL documents, providing 18 million connections to 10 million distinct queries."
- Queries: From query log
- Relevance Data: User clicks
- Scored docs: Indri Query Likelihood model
- Dataset Paper
-
Dataset irds.msmarco-document.train.queries
datamaestro_text.datasets.irds.data.Topics
Official train set. All queries have exactly 1 (positive) relevance judgment.
scoreddocs are the top 100 results from Indri QL. These are used for the "re-ranking" setting.
-
Dataset irds.msmarco-document.train.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Official train set. All queries have exactly 1 (positive) relevance judgment.
scoreddocs are the top 100 results from Indri QL. These are used for the "re-ranking" setting.
-
Dataset irds.msmarco-document.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official train set. All queries have exactly 1 (positive) relevance judgment.
scoreddocs are the top 100 results from Indri QL. These are used for the "re-ranking" setting.
-
Dataset irds.msmarco-document.train
datamaestro_text.datasets.irds.data.Adhoc
Official train set. All queries have exactly 1 (positive) relevance judgment.
scoreddocs are the top 100 results from Indri QL. These are used for the "re-ranking" setting.
-
Dataset irds.msmarco-document.trec-dl-2019.queries
datamaestro_text.datasets.irds.data.Topics
Queries from the TREC Deep Learning (DL) 2019 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document/trec-dl-2019/judged).
-
Dataset irds.msmarco-document.trec-dl-2019.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Queries from the TREC Deep Learning (DL) 2019 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document/trec-dl-2019/judged).
-
Dataset irds.msmarco-document.trec-dl-2019.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Queries from the TREC Deep Learning (DL) 2019 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document/trec-dl-2019/judged).
-
Dataset irds.msmarco-document.trec-dl-2019
datamaestro_text.datasets.irds.data.Adhoc
Queries from the TREC Deep Learning (DL) 2019 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document/trec-dl-2019/judged).
-
Dataset irds.msmarco-document.trec-dl-2019.judged.queries
datamaestro_text.datasets.irds.data.Topics
Subset of msmarco-document/trec-dl-2019, only including queries with qrels.
-
Dataset irds.msmarco-document.trec-dl-2019.judged.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Subset of msmarco-document/trec-dl-2019, only including queries with qrels.
-
Dataset irds.msmarco-document.trec-dl-2019.judged.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Subset of msmarco-document/trec-dl-2019, only including queries with qrels.
-
Dataset irds.msmarco-document.trec-dl-2019.judged
datamaestro_text.datasets.irds.data.Adhoc
Subset of msmarco-document/trec-dl-2019, only including queries with qrels.
-
Dataset irds.msmarco-document.trec-dl-2020.queries
datamaestro_text.datasets.irds.data.Topics
Queries from the TREC Deep Learning (DL) 2020 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document/trec-dl-2020/judged).
-
Dataset irds.msmarco-document.trec-dl-2020.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Queries from the TREC Deep Learning (DL) 2020 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document/trec-dl-2020/judged).
-
Dataset irds.msmarco-document.trec-dl-2020.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Queries from the TREC Deep Learning (DL) 2020 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document/trec-dl-2020/judged).
-
Dataset irds.msmarco-document.trec-dl-2020
datamaestro_text.datasets.irds.data.Adhoc
Queries from the TREC Deep Learning (DL) 2020 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document/trec-dl-2020/judged).
-
Dataset irds.msmarco-document.trec-dl-2020.judged.queries
datamaestro_text.datasets.irds.data.Topics
Subset of msmarco-document/trec-dl-2020, only including queries with qrels.
-
Dataset irds.msmarco-document.trec-dl-2020.judged.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Subset of msmarco-document/trec-dl-2020, only including queries with qrels.
-
Dataset irds.msmarco-document.trec-dl-2020.judged.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Subset of msmarco-document/trec-dl-2020, only including queries with qrels.
-
Dataset irds.msmarco-document.trec-dl-2020.judged
datamaestro_text.datasets.irds.data.Adhoc
Subset of msmarco-document/trec-dl-2020, only including queries with qrels.
-
Dataset irds.msmarco-document.trec-dl-hard.queries
datamaestro_text.datasets.irds.data.Topics
A more challenging subset of msmarco-document/trec-dl-2019 and msmarco-document/trec-dl-2020.
- data website
- See Also: msmarco-passage/trec-dl-hard
-
Dataset irds.msmarco-document.trec-dl-hard.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A more challenging subset of msmarco-document/trec-dl-2019 and msmarco-document/trec-dl-2020.
- data website
- See Also: msmarco-passage/trec-dl-hard
-
Dataset irds.msmarco-document.trec-dl-hard
datamaestro_text.datasets.irds.data.Adhoc
A more challenging subset of msmarco-document/trec-dl-2019 and msmarco-document/trec-dl-2020.
- data website
- See Also: msmarco-passage/trec-dl-hard
-
Dataset irds.msmarco-document.trec-dl-hard.fold1.queries
datamaestro_text.datasets.irds.data.Topics
Fold 1 of msmarco-document/trec-dl-hard
-
Dataset irds.msmarco-document.trec-dl-hard.fold1.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Fold 1 of msmarco-document/trec-dl-hard
-
Dataset irds.msmarco-document.trec-dl-hard.fold1
datamaestro_text.datasets.irds.data.Adhoc
Fold 1 of msmarco-document/trec-dl-hard
-
Dataset irds.msmarco-document.trec-dl-hard.fold2.queries
datamaestro_text.datasets.irds.data.Topics
Fold 2 of msmarco-document/trec-dl-hard
-
Dataset irds.msmarco-document.trec-dl-hard.fold2.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Fold 2 of msmarco-document/trec-dl-hard
-
Dataset irds.msmarco-document.trec-dl-hard.fold2
datamaestro_text.datasets.irds.data.Adhoc
Fold 2 of msmarco-document/trec-dl-hard
-
Dataset irds.msmarco-document.trec-dl-hard.fold3.queries
datamaestro_text.datasets.irds.data.Topics
Fold 3 of msmarco-document/trec-dl-hard
-
Dataset irds.msmarco-document.trec-dl-hard.fold3.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Fold 3 of msmarco-document/trec-dl-hard
-
Dataset irds.msmarco-document.trec-dl-hard.fold3
datamaestro_text.datasets.irds.data.Adhoc
Fold 3 of msmarco-document/trec-dl-hard
-
Dataset irds.msmarco-document.trec-dl-hard.fold4.queries
datamaestro_text.datasets.irds.data.Topics
Fold 4 of msmarco-document/trec-dl-hard
-
Dataset irds.msmarco-document.trec-dl-hard.fold4.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Fold 4 of msmarco-document/trec-dl-hard
-
Dataset irds.msmarco-document.trec-dl-hard.fold4
datamaestro_text.datasets.irds.data.Adhoc
Fold 4 of msmarco-document/trec-dl-hard
-
Dataset irds.msmarco-document.trec-dl-hard.fold5.queries
datamaestro_text.datasets.irds.data.Topics
Fold 5 of msmarco-document/trec-dl-hard
-
Dataset irds.msmarco-document.trec-dl-hard.fold5.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Fold 5 of msmarco-document/trec-dl-hard
-
Dataset irds.msmarco-document.trec-dl-hard.fold5
datamaestro_text.datasets.irds.data.Adhoc
Fold 5 of msmarco-document/trec-dl-hard
Anchor Text for Version 1 of MS MARCO
For version 1 of MS MARCO, the anchor text collection enriches 1,703,834 documents with anchor text extracted from six Common Crawl snapshots. To keep the collection size reasonable, we sampled 1,000 anchor texts for documents with more than 1,000 anchor texts (this sampling yields that all anchor text is included for 94% of the documents). The text field contains the anchor texts concatenated and the anchors field contains the anchor texts as list. The raw dataset with additional information (roughly 100GB) is available online.
-
Dataset irds.msmarco-document.anchor-text.documents
datamaestro_text.datasets.irds.data.Documents
For version 1 of MS MARCO, the anchor text collection enriches 1,703,834 documents with anchor text extracted from six Common Crawl snapshots. To keep the collection size reasonable, we sampled 1,000 anchor texts for documents with more than 1,000 anchor texts (this sampling yields that all anchor text is included for 94% of the documents). The
textfield contains the anchor texts concatenated and theanchorsfield contains the anchor texts as list. The raw dataset with additional information (roughly 100GB) is available online.
MSMARCO (document, version 2)
Version 2 of the MS MARCO document ranking dataset. The corpus contains 12M documents (roughly 3x as many as version 1).
- Version 1 of dataset: msmarco-document
- Documents: Text extracted from web pages
- Queries: Natural language questions (from query log)
- Dataset Paper
-
Dataset irds.msmarco-document-v2.documents
datamaestro_text.datasets.irds.data.Documents
Version 2 of the MS MARCO document ranking dataset. The corpus contains 12M documents (roughly 3x as many as version 1).
- Version 1 of dataset: msmarco-document
- Documents: Text extracted from web pages
- Queries: Natural language questions (from query log)
- Dataset Paper
-
Dataset irds.msmarco-document-v2.dev1.queries
datamaestro_text.datasets.irds.data.Topics
Official dev1 set with 4,552 queries.
-
Dataset irds.msmarco-document-v2.dev1.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Official dev1 set with 4,552 queries.
-
Dataset irds.msmarco-document-v2.dev1.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official dev1 set with 4,552 queries.
-
Dataset irds.msmarco-document-v2.dev1
datamaestro_text.datasets.irds.data.Adhoc
Official dev1 set with 4,552 queries.
-
Dataset irds.msmarco-document-v2.dev2.queries
datamaestro_text.datasets.irds.data.Topics
Official dev2 set with 5,000 queries.
-
Dataset irds.msmarco-document-v2.dev2.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Official dev2 set with 5,000 queries.
-
Dataset irds.msmarco-document-v2.dev2.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official dev2 set with 5,000 queries.
-
Dataset irds.msmarco-document-v2.dev2
datamaestro_text.datasets.irds.data.Adhoc
Official dev2 set with 5,000 queries.
-
Dataset irds.msmarco-document-v2.train.queries
datamaestro_text.datasets.irds.data.Topics
Official train set with 322,196 queries.
-
Dataset irds.msmarco-document-v2.train.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Official train set with 322,196 queries.
-
Dataset irds.msmarco-document-v2.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official train set with 322,196 queries.
-
Dataset irds.msmarco-document-v2.train
datamaestro_text.datasets.irds.data.Adhoc
Official train set with 322,196 queries.
-
Dataset irds.msmarco-document-v2.trec-dl-2019.queries
datamaestro_text.datasets.irds.data.Topics
Queries from the TREC Deep Learning (DL) 2019 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document-v2/trec-dl-2019/judged).
-
Dataset irds.msmarco-document-v2.trec-dl-2019.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Queries from the TREC Deep Learning (DL) 2019 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document-v2/trec-dl-2019/judged).
-
Dataset irds.msmarco-document-v2.trec-dl-2019
datamaestro_text.datasets.irds.data.Adhoc
Queries from the TREC Deep Learning (DL) 2019 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document-v2/trec-dl-2019/judged).
-
Dataset irds.msmarco-document-v2.trec-dl-2019.judged.queries
datamaestro_text.datasets.irds.data.Topics
Subset of msmarco-document-v2/trec-dl-2019, only including queries with qrels.
-
Dataset irds.msmarco-document-v2.trec-dl-2019.judged.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Subset of msmarco-document-v2/trec-dl-2019, only including queries with qrels.
-
Dataset irds.msmarco-document-v2.trec-dl-2019.judged
datamaestro_text.datasets.irds.data.Adhoc
Subset of msmarco-document-v2/trec-dl-2019, only including queries with qrels.
-
Dataset irds.msmarco-document-v2.trec-dl-2020.queries
datamaestro_text.datasets.irds.data.Topics
Queries from the TREC Deep Learning (DL) 2020 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document-v2/trec-dl-2020/judged).
-
Dataset irds.msmarco-document-v2.trec-dl-2020.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Queries from the TREC Deep Learning (DL) 2020 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document-v2/trec-dl-2020/judged).
-
Dataset irds.msmarco-document-v2.trec-dl-2020
datamaestro_text.datasets.irds.data.Adhoc
Queries from the TREC Deep Learning (DL) 2020 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document-v2/trec-dl-2020/judged).
-
Dataset irds.msmarco-document-v2.trec-dl-2020.judged.queries
datamaestro_text.datasets.irds.data.Topics
Subset of msmarco-document-v2/trec-dl-2020, only including queries with qrels.
-
Dataset irds.msmarco-document-v2.trec-dl-2020.judged.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Subset of msmarco-document-v2/trec-dl-2020, only including queries with qrels.
-
Dataset irds.msmarco-document-v2.trec-dl-2020.judged
datamaestro_text.datasets.irds.data.Adhoc
Subset of msmarco-document-v2/trec-dl-2020, only including queries with qrels.
-
Dataset irds.msmarco-document-v2.trec-dl-2021.queries
datamaestro_text.datasets.irds.data.Topics
Official topics for the TREC Deep Learning (DL) 2021 shared task.
Note that at this time, qrels are only available to those with TREC active participant login credentials.
-
Dataset irds.msmarco-document-v2.trec-dl-2021.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Official topics for the TREC Deep Learning (DL) 2021 shared task.
Note that at this time, qrels are only available to those with TREC active participant login credentials.
-
Dataset irds.msmarco-document-v2.trec-dl-2021.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official topics for the TREC Deep Learning (DL) 2021 shared task.
Note that at this time, qrels are only available to those with TREC active participant login credentials.
-
Dataset irds.msmarco-document-v2.trec-dl-2021
datamaestro_text.datasets.irds.data.Adhoc
Official topics for the TREC Deep Learning (DL) 2021 shared task.
Note that at this time, qrels are only available to those with TREC active participant login credentials.
-
Dataset irds.msmarco-document-v2.trec-dl-2021.judged.queries
datamaestro_text.datasets.irds.data.Topics
msmarco-document-v2/trec-dl-2021, but filtered down to the 57 queries with qrels.
Note that at this time, this is only available to those with TREC active participant login credentials.
-
Dataset irds.msmarco-document-v2.trec-dl-2021.judged.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
msmarco-document-v2/trec-dl-2021, but filtered down to the 57 queries with qrels.
Note that at this time, this is only available to those with TREC active participant login credentials.
-
Dataset irds.msmarco-document-v2.trec-dl-2021.judged.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
msmarco-document-v2/trec-dl-2021, but filtered down to the 57 queries with qrels.
Note that at this time, this is only available to those with TREC active participant login credentials.
-
Dataset irds.msmarco-document-v2.trec-dl-2021.judged
datamaestro_text.datasets.irds.data.Adhoc
msmarco-document-v2/trec-dl-2021, but filtered down to the 57 queries with qrels.
Note that at this time, this is only available to those with TREC active participant login credentials.
-
Dataset irds.msmarco-document-v2.trec-dl-2022.queries
datamaestro_text.datasets.irds.data.Topics
Official topics for the TREC Deep Learning (DL) 2022 shared task.
Note that these qrels are inferred from the passage ranking task; a document's relevance label is the maximum of the labels of its passages.
-
Dataset irds.msmarco-document-v2.trec-dl-2022.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Official topics for the TREC Deep Learning (DL) 2022 shared task.
Note that these qrels are inferred from the passage ranking task; a document's relevance label is the maximum of the labels of its passages.
-
Dataset irds.msmarco-document-v2.trec-dl-2022.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official topics for the TREC Deep Learning (DL) 2022 shared task.
Note that these qrels are inferred from the passage ranking task; a document's relevance label is the maximum of the labels of its passages.
-
Dataset irds.msmarco-document-v2.trec-dl-2022
datamaestro_text.datasets.irds.data.Adhoc
Official topics for the TREC Deep Learning (DL) 2022 shared task.
Note that these qrels are inferred from the passage ranking task; a document's relevance label is the maximum of the labels of its passages.
-
Dataset irds.msmarco-document-v2.trec-dl-2022.judged.queries
datamaestro_text.datasets.irds.data.Topics
msmarco-document-v2/trec-dl-2022, but filtered down to only the queries with qrels.
-
Dataset irds.msmarco-document-v2.trec-dl-2022.judged.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
msmarco-document-v2/trec-dl-2022, but filtered down to only the queries with qrels.
-
Dataset irds.msmarco-document-v2.trec-dl-2022.judged.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
msmarco-document-v2/trec-dl-2022, but filtered down to only the queries with qrels.
-
Dataset irds.msmarco-document-v2.trec-dl-2022.judged
datamaestro_text.datasets.irds.data.Adhoc
msmarco-document-v2/trec-dl-2022, but filtered down to only the queries with qrels.
-
Dataset irds.msmarco-document-v2.trec-dl-2023.queries
datamaestro_text.datasets.irds.data.Topics
Official topics for the TREC Deep Learning (DL) 2023 shared task.
-
Dataset irds.msmarco-document-v2.trec-dl-2023.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Official topics for the TREC Deep Learning (DL) 2023 shared task.
Anchor Text for version 2 of MS Marco
For version 2 of MS MARCO, the anchor text collection enriches 4,821,244 documents with anchor text extracted from six Common Crawl snapshots. To keep the collection size reasonable, we sampled 1,000 anchor texts for documents with more than 1,000 anchor texts (this sampling yields that all anchor text is included for 97% of the documents). The text field contains the anchor texts concatenated and the anchors field contains the anchor texts as list. The raw dataset with additional information (roughly 100GB) is available online.
-
Dataset irds.msmarco-document-v2.anchor-text.documents
datamaestro_text.datasets.irds.data.Documents
For version 2 of MS MARCO, the anchor text collection enriches 4,821,244 documents with anchor text extracted from six Common Crawl snapshots. To keep the collection size reasonable, we sampled 1,000 anchor texts for documents with more than 1,000 anchor texts (this sampling yields that all anchor text is included for 97% of the documents). The
textfield contains the anchor texts concatenated and theanchorsfield contains the anchor texts as list. The raw dataset with additional information (roughly 100GB) is available online.
MSMARCO (passage, version 2)
Version 2 of the MS MARCO passage ranking dataset. The corpus contains 138M passages, which can be linked up with documents in msmarco-document-v2.
- Version 1 of dataset: msmarco-passage
- Documents: Text extracted from web pages
- Queries: Natural language questions (from query log)
- Dataset Paper
Change Log
- On July 21, 2021, the task organizers updated the train, dev1, and dev2 qrels to remove duplicate entries from the files. This should not have change results from evaluation tools, but may result in non-repeatable results if these files were used in another process (e.g., model training). The original qrels file for msmarco-passage-v2/train can be found here to aid in result repeatability.
-
Dataset irds.msmarco-passage-v2.documents
datamaestro_text.datasets.irds.data.Documents
Version 2 of the MS MARCO passage ranking dataset. The corpus contains 138M passages, which can be linked up with documents in msmarco-document-v2.
- Version 1 of dataset: msmarco-passage
- Documents: Text extracted from web pages
- Queries: Natural language questions (from query log)
- Dataset Paper
Change Log
- On July 21, 2021, the task organizers updated the train, dev1, and dev2 qrels to remove duplicate entries from the files. This should not have change results from evaluation tools, but may result in non-repeatable results if these files were used in another process (e.g., model training). The original qrels file for msmarco-passage-v2/train can be found here to aid in result repeatability.
-
Dataset irds.msmarco-passage-v2.dev1.queries
datamaestro_text.datasets.irds.data.Topics
Official dev1 set with 3,903 queries.
Note that that qrels in this dataset are not directly human-assessed; labels from msmarco-passage are mapped to documents via URL, these documents are re-passaged, and then the best approximate match is identified.
-
Dataset irds.msmarco-passage-v2.dev1.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Official dev1 set with 3,903 queries.
Note that that qrels in this dataset are not directly human-assessed; labels from msmarco-passage are mapped to documents via URL, these documents are re-passaged, and then the best approximate match is identified.
-
Dataset irds.msmarco-passage-v2.dev1.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official dev1 set with 3,903 queries.
Note that that qrels in this dataset are not directly human-assessed; labels from msmarco-passage are mapped to documents via URL, these documents are re-passaged, and then the best approximate match is identified.
-
Dataset irds.msmarco-passage-v2.dev1
datamaestro_text.datasets.irds.data.Adhoc
Official dev1 set with 3,903 queries.
Note that that qrels in this dataset are not directly human-assessed; labels from msmarco-passage are mapped to documents via URL, these documents are re-passaged, and then the best approximate match is identified.
-
Dataset irds.msmarco-passage-v2.dev2.queries
datamaestro_text.datasets.irds.data.Topics
Official dev2 set with 4,281 queries.
Note that that qrels in this dataset are not directly human-assessed; labels from msmarco-passage are mapped to documents via URL, these documents are re-passaged, and then the best approximate match is identified.
-
Dataset irds.msmarco-passage-v2.dev2.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Official dev2 set with 4,281 queries.
Note that that qrels in this dataset are not directly human-assessed; labels from msmarco-passage are mapped to documents via URL, these documents are re-passaged, and then the best approximate match is identified.
-
Dataset irds.msmarco-passage-v2.dev2.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official dev2 set with 4,281 queries.
Note that that qrels in this dataset are not directly human-assessed; labels from msmarco-passage are mapped to documents via URL, these documents are re-passaged, and then the best approximate match is identified.
-
Dataset irds.msmarco-passage-v2.dev2
datamaestro_text.datasets.irds.data.Adhoc
Official dev2 set with 4,281 queries.
Note that that qrels in this dataset are not directly human-assessed; labels from msmarco-passage are mapped to documents via URL, these documents are re-passaged, and then the best approximate match is identified.
-
Dataset irds.msmarco-passage-v2.train.queries
datamaestro_text.datasets.irds.data.Topics
Official train set with 277,144 queries.
-
Dataset irds.msmarco-passage-v2.train.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Official train set with 277,144 queries.
-
Dataset irds.msmarco-passage-v2.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official train set with 277,144 queries.
-
Dataset irds.msmarco-passage-v2.train
datamaestro_text.datasets.irds.data.Adhoc
Official train set with 277,144 queries.
-
Dataset irds.msmarco-passage-v2.trec-dl-2021.queries
datamaestro_text.datasets.irds.data.Topics
Official topics for the TREC Deep Learning (DL) 2021 shared task.
-
Dataset irds.msmarco-passage-v2.trec-dl-2021.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Official topics for the TREC Deep Learning (DL) 2021 shared task.
-
Dataset irds.msmarco-passage-v2.trec-dl-2021.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official topics for the TREC Deep Learning (DL) 2021 shared task.
-
Dataset irds.msmarco-passage-v2.trec-dl-2021
datamaestro_text.datasets.irds.data.Adhoc
Official topics for the TREC Deep Learning (DL) 2021 shared task.
-
Dataset irds.msmarco-passage-v2.trec-dl-2021.judged.queries
datamaestro_text.datasets.irds.data.Topics
msmarco-passage-v2/trec-dl-2021, but filtered down to the 53 queries with qrels.
-
Dataset irds.msmarco-passage-v2.trec-dl-2021.judged.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
msmarco-passage-v2/trec-dl-2021, but filtered down to the 53 queries with qrels.
-
Dataset irds.msmarco-passage-v2.trec-dl-2021.judged.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
msmarco-passage-v2/trec-dl-2021, but filtered down to the 53 queries with qrels.
-
Dataset irds.msmarco-passage-v2.trec-dl-2021.judged
datamaestro_text.datasets.irds.data.Adhoc
msmarco-passage-v2/trec-dl-2021, but filtered down to the 53 queries with qrels.
-
Dataset irds.msmarco-passage-v2.trec-dl-2022.queries
datamaestro_text.datasets.irds.data.Topics
Official topics for the TREC Deep Learning (DL) 2022 shared task.
Note that the officially-released qrels include relevance labels propagated to duplicate passages, while results presented in the notebook papers remove duplicate documents. This means that the results are not directly comparable, and extra care should be taken when making comparisions among systems to ensure that they were evaluated in the same settings.
-
Dataset irds.msmarco-passage-v2.trec-dl-2022.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Official topics for the TREC Deep Learning (DL) 2022 shared task.
Note that the officially-released qrels include relevance labels propagated to duplicate passages, while results presented in the notebook papers remove duplicate documents. This means that the results are not directly comparable, and extra care should be taken when making comparisions among systems to ensure that they were evaluated in the same settings.
-
Dataset irds.msmarco-passage-v2.trec-dl-2022.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official topics for the TREC Deep Learning (DL) 2022 shared task.
Note that the officially-released qrels include relevance labels propagated to duplicate passages, while results presented in the notebook papers remove duplicate documents. This means that the results are not directly comparable, and extra care should be taken when making comparisions among systems to ensure that they were evaluated in the same settings.
-
Dataset irds.msmarco-passage-v2.trec-dl-2022
datamaestro_text.datasets.irds.data.Adhoc
Official topics for the TREC Deep Learning (DL) 2022 shared task.
Note that the officially-released qrels include relevance labels propagated to duplicate passages, while results presented in the notebook papers remove duplicate documents. This means that the results are not directly comparable, and extra care should be taken when making comparisions among systems to ensure that they were evaluated in the same settings.
-
Dataset irds.msmarco-passage-v2.trec-dl-2022.judged.queries
datamaestro_text.datasets.irds.data.Topics
msmarco-passage-v2/trec-dl-2022, but filtered down to only the queries with qrels.
-
Dataset irds.msmarco-passage-v2.trec-dl-2022.judged.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
msmarco-passage-v2/trec-dl-2022, but filtered down to only the queries with qrels.
-
Dataset irds.msmarco-passage-v2.trec-dl-2022.judged.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
msmarco-passage-v2/trec-dl-2022, but filtered down to only the queries with qrels.
-
Dataset irds.msmarco-passage-v2.trec-dl-2022.judged
datamaestro_text.datasets.irds.data.Adhoc
msmarco-passage-v2/trec-dl-2022, but filtered down to only the queries with qrels.
-
Dataset irds.msmarco-passage-v2.trec-dl-2023.queries
datamaestro_text.datasets.irds.data.Topics
Official topics for the TREC Deep Learning (DL) 2023 shared task.
-
Dataset irds.msmarco-passage-v2.trec-dl-2023.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Official topics for the TREC Deep Learning (DL) 2023 shared task.
msmarco-passage-v2/dedup
-
Dataset irds.msmarco-passage-v2.dedup.documents
MSMARCO (QnA)
The MS MARCO Question Answering dataset. This is the source collection of msmarco-passage and msmarco-document.
Query IDs in this collection align with those found in msmarco-passage and msmarco-document. The collection does not provide doc_ids, so these are assigned in the following format: [msmarco_passage_id]-[url_seq], where [msmarco_passage_id] is the document from msmarco-passage that has matching contents and [url_seq] is assigned sequentially for each URL encountered. In other words, all documents with the same prefix have the same text; they only differ in the originating document.
Doc msmarco_passage_id fields are assigned by matching pasasge contents in msmarco-passage, and this field is provided for every document. Doc msmarco_document_id fields are assigned by matching the URL to the one found in msmarco-document. Due to how msmarco-document was constructed, there is not necessarily a match (value will be None if no match).
- Documents: Short passages (from web)
- Queries: Natural language questions (from query log), including type and natural-language answers.
- Leaderboard
- Dataset Paper
- More information
-
Dataset irds.msmarco-qna.documents
datamaestro_text.datasets.irds.data.Documents
The MS MARCO Question Answering dataset. This is the source collection of msmarco-passage and msmarco-document.
It is prohibited to use information from this dataset for submissions to the MS MARCO passage and document leaderboards or the TREC DL shared task.Query IDs in this collection align with those found in msmarco-passage and msmarco-document. The collection does not provide doc_ids, so these are assigned in the following format:
[msmarco_passage_id]-[url_seq], where[msmarco_passage_id]is the document from msmarco-passage that has matching contents and[url_seq]is assigned sequentially for each URL encountered. In other words, all documents with the same prefix have the same text; they only differ in the originating document.Doc
msmarco_passage_idfields are assigned by matching pasasge contents in msmarco-passage, and this field is provided for every document. Docmsmarco_document_idfields are assigned by matching the URL to the one found in msmarco-document. Due to how msmarco-document was constructed, there is not necessarily a match (value will beNoneif no match).- Documents: Short passages (from web)
- Queries: Natural language questions (from query log), including type and natural-language answers.
- Leaderboard
- Dataset Paper
- More information
-
Dataset irds.msmarco-qna.dev.queries
datamaestro_text.datasets.irds.data.Topics
Official dev set.
The scoreddocs provides the roughtly 10 passages presented to the user for annotation, where the score indicates the order presented.
-
Dataset irds.msmarco-qna.dev.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Official dev set.
The scoreddocs provides the roughtly 10 passages presented to the user for annotation, where the score indicates the order presented.
-
Dataset irds.msmarco-qna.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official dev set.
The scoreddocs provides the roughtly 10 passages presented to the user for annotation, where the score indicates the order presented.
-
Dataset irds.msmarco-qna.dev
datamaestro_text.datasets.irds.data.Adhoc
Official dev set.
The scoreddocs provides the roughtly 10 passages presented to the user for annotation, where the score indicates the order presented.
-
Dataset irds.msmarco-qna.eval.queries
datamaestro_text.datasets.irds.data.Topics
Official eval set.
The scoreddocs provides the roughtly 10 passages presented to the user for annotation, where the score indicates the order presented.
-
Dataset irds.msmarco-qna.eval.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Official eval set.
The scoreddocs provides the roughtly 10 passages presented to the user for annotation, where the score indicates the order presented.
-
Dataset irds.msmarco-qna.train.queries
datamaestro_text.datasets.irds.data.Topics
Official train set.
The scoreddocs provides the roughtly 10 passages presented to the user for annotation, where the score indicates the order presented.
-
Dataset irds.msmarco-qna.train.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Official train set.
The scoreddocs provides the roughtly 10 passages presented to the user for annotation, where the score indicates the order presented.
-
Dataset irds.msmarco-qna.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official train set.
The scoreddocs provides the roughtly 10 passages presented to the user for annotation, where the score indicates the order presented.
-
Dataset irds.msmarco-qna.train
datamaestro_text.datasets.irds.data.Adhoc
Official train set.
The scoreddocs provides the roughtly 10 passages presented to the user for annotation, where the score indicates the order presented.
nano-beir/arguana
A version of the ArguAna Counterargs dataset, for argument retrieval.
-
Dataset irds.nano-beir.arguana.documents
datamaestro_text.datasets.irds.data.Documents
A version of the ArguAna Counterargs dataset, for argument retrieval.
-
Dataset irds.nano-beir.arguana.queries
datamaestro_text.datasets.irds.data.Topics
A version of the ArguAna Counterargs dataset, for argument retrieval.
-
Dataset irds.nano-beir.arguana.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the ArguAna Counterargs dataset, for argument retrieval.
-
Dataset irds.nano-beir.arguana
datamaestro_text.datasets.irds.data.Adhoc
A version of the ArguAna Counterargs dataset, for argument retrieval.
nano-beir/climate-fever
A version of the CLIMATE-FEVER dataset, for fact verification on claims about climate.
-
Dataset irds.nano-beir.climate-fever.documents
datamaestro_text.datasets.irds.data.Documents
A version of the CLIMATE-FEVER dataset, for fact verification on claims about climate.
-
Dataset irds.nano-beir.climate-fever.queries
datamaestro_text.datasets.irds.data.Topics
A version of the CLIMATE-FEVER dataset, for fact verification on claims about climate.
-
Dataset irds.nano-beir.climate-fever.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the CLIMATE-FEVER dataset, for fact verification on claims about climate.
-
Dataset irds.nano-beir.climate-fever
datamaestro_text.datasets.irds.data.Adhoc
A version of the CLIMATE-FEVER dataset, for fact verification on claims about climate.
nano-beir/dbpedia-entity
A version of the DBPedia-Entity-v2 dataset for entity retrieval.
-
Dataset irds.nano-beir.dbpedia-entity.documents
datamaestro_text.datasets.irds.data.Documents
A version of the DBPedia-Entity-v2 dataset for entity retrieval.
-
Dataset irds.nano-beir.dbpedia-entity.queries
datamaestro_text.datasets.irds.data.Topics
A version of the DBPedia-Entity-v2 dataset for entity retrieval.
-
Dataset irds.nano-beir.dbpedia-entity.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the DBPedia-Entity-v2 dataset for entity retrieval.
-
Dataset irds.nano-beir.dbpedia-entity
datamaestro_text.datasets.irds.data.Adhoc
A version of the DBPedia-Entity-v2 dataset for entity retrieval.
nano-beir/fever
A version of the FEVER dataset for fact verification.
-
Dataset irds.nano-beir.fever.documents
datamaestro_text.datasets.irds.data.Documents
A version of the FEVER dataset for fact verification.
-
Dataset irds.nano-beir.fever.queries
datamaestro_text.datasets.irds.data.Topics
A version of the FEVER dataset for fact verification.
-
Dataset irds.nano-beir.fever.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the FEVER dataset for fact verification.
-
Dataset irds.nano-beir.fever
datamaestro_text.datasets.irds.data.Adhoc
A version of the FEVER dataset for fact verification.
nano-beir/fiqa
A version of the FIQA-2018 dataset (financial opinion question answering).
-
Dataset irds.nano-beir.fiqa.documents
datamaestro_text.datasets.irds.data.Documents
A version of the FIQA-2018 dataset (financial opinion question answering).
-
Dataset irds.nano-beir.fiqa.queries
datamaestro_text.datasets.irds.data.Topics
A version of the FIQA-2018 dataset (financial opinion question answering).
-
Dataset irds.nano-beir.fiqa.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the FIQA-2018 dataset (financial opinion question answering).
-
Dataset irds.nano-beir.fiqa
datamaestro_text.datasets.irds.data.Adhoc
A version of the FIQA-2018 dataset (financial opinion question answering).
nano-beir/hotpotqa
A version of the Hotpot QA dataset for multi-hop question answering.
-
Dataset irds.nano-beir.hotpotqa.documents
datamaestro_text.datasets.irds.data.Documents
A version of the Hotpot QA dataset for multi-hop question answering.
-
Dataset irds.nano-beir.hotpotqa.queries
datamaestro_text.datasets.irds.data.Topics
A version of the Hotpot QA dataset for multi-hop question answering.
-
Dataset irds.nano-beir.hotpotqa.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the Hotpot QA dataset for multi-hop question answering.
-
Dataset irds.nano-beir.hotpotqa
datamaestro_text.datasets.irds.data.Adhoc
A version of the Hotpot QA dataset for multi-hop question answering.
nano-beir/msmarco
A version of the MS MARCO passage ranking dataset.
Note that this version differs from msmarco-passage, in that it does not correct the encoding problems in the source documents.
- Leaderboard
- Dataset Paper
- See also: msmarco-passage
-
Dataset irds.nano-beir.msmarco.documents
datamaestro_text.datasets.irds.data.Documents
A version of the MS MARCO passage ranking dataset.
Note that this version differs from msmarco-passage, in that it does not correct the encoding problems in the source documents.
- Leaderboard
- Dataset Paper
- See also: msmarco-passage
-
Dataset irds.nano-beir.msmarco.queries
datamaestro_text.datasets.irds.data.Topics
A version of the MS MARCO passage ranking dataset.
Note that this version differs from msmarco-passage, in that it does not correct the encoding problems in the source documents.
- Leaderboard
- Dataset Paper
- See also: msmarco-passage
-
Dataset irds.nano-beir.msmarco.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the MS MARCO passage ranking dataset.
Note that this version differs from msmarco-passage, in that it does not correct the encoding problems in the source documents.
- Leaderboard
- Dataset Paper
- See also: msmarco-passage
-
Dataset irds.nano-beir.msmarco
datamaestro_text.datasets.irds.data.Adhoc
A version of the MS MARCO passage ranking dataset.
Note that this version differs from msmarco-passage, in that it does not correct the encoding problems in the source documents.
- Leaderboard
- Dataset Paper
- See also: msmarco-passage
nano-beir/nfcorpus
A version of the NF Corpus (Nutrition Facts).
Data pre-processing may be different than what is done in nfcorpus.
- Dataset website
- Dataset paper
- See also: nfcorpus
-
Dataset irds.nano-beir.nfcorpus.documents
datamaestro_text.datasets.irds.data.Documents
A version of the NF Corpus (Nutrition Facts).
Data pre-processing may be different than what is done in nfcorpus.
- Dataset website
- Dataset paper
- See also: nfcorpus
-
Dataset irds.nano-beir.nfcorpus.queries
datamaestro_text.datasets.irds.data.Topics
A version of the NF Corpus (Nutrition Facts).
Data pre-processing may be different than what is done in nfcorpus.
- Dataset website
- Dataset paper
- See also: nfcorpus
-
Dataset irds.nano-beir.nfcorpus.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the NF Corpus (Nutrition Facts).
Data pre-processing may be different than what is done in nfcorpus.
- Dataset website
- Dataset paper
- See also: nfcorpus
-
Dataset irds.nano-beir.nfcorpus
datamaestro_text.datasets.irds.data.Adhoc
A version of the NF Corpus (Nutrition Facts).
Data pre-processing may be different than what is done in nfcorpus.
- Dataset website
- Dataset paper
- See also: nfcorpus
nano-beir/nq
A version of the Natural Questions dev dataset.
Data pre-processing differs both from what is done in natural-questions and dpr-w100/natural-questions, especially with respect to the document collection and filtering conducted on the queries. See the Beir paper for details.
-
Dataset irds.nano-beir.nq.documents
datamaestro_text.datasets.irds.data.Documents
A version of the Natural Questions dev dataset.
Data pre-processing differs both from what is done in natural-questions and dpr-w100/natural-questions, especially with respect to the document collection and filtering conducted on the queries. See the Beir paper for details.
-
Dataset irds.nano-beir.nq.queries
datamaestro_text.datasets.irds.data.Topics
A version of the Natural Questions dev dataset.
Data pre-processing differs both from what is done in natural-questions and dpr-w100/natural-questions, especially with respect to the document collection and filtering conducted on the queries. See the Beir paper for details.
-
Dataset irds.nano-beir.nq.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the Natural Questions dev dataset.
Data pre-processing differs both from what is done in natural-questions and dpr-w100/natural-questions, especially with respect to the document collection and filtering conducted on the queries. See the Beir paper for details.
-
Dataset irds.nano-beir.nq
datamaestro_text.datasets.irds.data.Adhoc
A version of the Natural Questions dev dataset.
Data pre-processing differs both from what is done in natural-questions and dpr-w100/natural-questions, especially with respect to the document collection and filtering conducted on the queries. See the Beir paper for details.
nano-beir/quora
A version of the Quora duplicate question detection dataset (QQP).
-
Dataset irds.nano-beir.quora.documents
datamaestro_text.datasets.irds.data.Documents
A version of the Quora duplicate question detection dataset (QQP).
-
Dataset irds.nano-beir.quora.queries
datamaestro_text.datasets.irds.data.Topics
A version of the Quora duplicate question detection dataset (QQP).
-
Dataset irds.nano-beir.quora.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the Quora duplicate question detection dataset (QQP).
-
Dataset irds.nano-beir.quora
datamaestro_text.datasets.irds.data.Adhoc
A version of the Quora duplicate question detection dataset (QQP).
nano-beir/scidocs
A version of the SciDocs dataset, used for citation retrieval.
-
Dataset irds.nano-beir.scidocs.documents
datamaestro_text.datasets.irds.data.Documents
A version of the SciDocs dataset, used for citation retrieval.
-
Dataset irds.nano-beir.scidocs.queries
datamaestro_text.datasets.irds.data.Topics
A version of the SciDocs dataset, used for citation retrieval.
-
Dataset irds.nano-beir.scidocs.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the SciDocs dataset, used for citation retrieval.
-
Dataset irds.nano-beir.scidocs
datamaestro_text.datasets.irds.data.Adhoc
A version of the SciDocs dataset, used for citation retrieval.
nano-beir/scifact
A version of the SciFact dataset, for fact verification.
-
Dataset irds.nano-beir.scifact.documents
datamaestro_text.datasets.irds.data.Documents
A version of the SciFact dataset, for fact verification.
-
Dataset irds.nano-beir.scifact.queries
datamaestro_text.datasets.irds.data.Topics
A version of the SciFact dataset, for fact verification.
-
Dataset irds.nano-beir.scifact.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of the SciFact dataset, for fact verification.
-
Dataset irds.nano-beir.scifact
datamaestro_text.datasets.irds.data.Adhoc
A version of the SciFact dataset, for fact verification.
nano-beir/webis-touche2020
Original version of the Touchè-2020 dataset, for argument retrieval.
-
Dataset irds.nano-beir.webis-touche2020.documents
datamaestro_text.datasets.irds.data.Documents
Original version of the Touchè-2020 dataset, for argument retrieval.
Consider using beir/webis-touche2020/v2 instead; it uses an updated, more complete version of the qrels.
-
Dataset irds.nano-beir.webis-touche2020.queries
datamaestro_text.datasets.irds.data.Topics
Original version of the Touchè-2020 dataset, for argument retrieval.
Consider using beir/webis-touche2020/v2 instead; it uses an updated, more complete version of the qrels.
-
Dataset irds.nano-beir.webis-touche2020.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Original version of the Touchè-2020 dataset, for argument retrieval.
Consider using beir/webis-touche2020/v2 instead; it uses an updated, more complete version of the qrels.
-
Dataset irds.nano-beir.webis-touche2020
datamaestro_text.datasets.irds.data.Adhoc
Original version of the Touchè-2020 dataset, for argument retrieval.
Consider using beir/webis-touche2020/v2 instead; it uses an updated, more complete version of the qrels.
neumarco/fa
The msmarco-passage corpus, translated to Persian (Farsi).
-
Dataset irds.neumarco.fa.documents
datamaestro_text.datasets.irds.data.Documents
The msmarco-passage corpus, translated to Persian (Farsi).
-
Dataset irds.neumarco.fa.dev.queries
datamaestro_text.datasets.irds.data.Topics
A version of msmarco-passage/dev, with the corpus translated to Persian (Farsi).
-
Dataset irds.neumarco.fa.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of msmarco-passage/dev, with the corpus translated to Persian (Farsi).
-
Dataset irds.neumarco.fa.dev
datamaestro_text.datasets.irds.data.Adhoc
A version of msmarco-passage/dev, with the corpus translated to Persian (Farsi).
-
Dataset irds.neumarco.fa.dev.judged.queries
datamaestro_text.datasets.irds.data.Topics
A version of msmarco-passage/dev/judged, with the corpus translated to Persian (Farsi).
-
Dataset irds.neumarco.fa.dev.judged.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of msmarco-passage/dev/judged, with the corpus translated to Persian (Farsi).
-
Dataset irds.neumarco.fa.dev.judged
datamaestro_text.datasets.irds.data.Adhoc
A version of msmarco-passage/dev/judged, with the corpus translated to Persian (Farsi).
-
Dataset irds.neumarco.fa.dev.small.queries
datamaestro_text.datasets.irds.data.Topics
A version of msmarco-passage/dev/small, with the corpus translated to Persian (Farsi).
-
Dataset irds.neumarco.fa.dev.small.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of msmarco-passage/dev/small, with the corpus translated to Persian (Farsi).
-
Dataset irds.neumarco.fa.dev.small
datamaestro_text.datasets.irds.data.Adhoc
A version of msmarco-passage/dev/small, with the corpus translated to Persian (Farsi).
-
Dataset irds.neumarco.fa.train.queries
datamaestro_text.datasets.irds.data.Topics
A version of msmarco-passage/train, with the corpus translated to Persian (Farsi).
-
Dataset irds.neumarco.fa.train.docpairs
-
A version of msmarco-passage/train, with the corpus translated to Persian (Farsi).
-
Dataset irds.neumarco.fa.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of msmarco-passage/train, with the corpus translated to Persian (Farsi).
-
Dataset irds.neumarco.fa.train
datamaestro_text.datasets.irds.data.Adhoc
A version of msmarco-passage/train, with the corpus translated to Persian (Farsi).
-
Dataset irds.neumarco.fa.train.judged.queries
datamaestro_text.datasets.irds.data.Topics
A version of msmarco-passage/train/judged, with the corpus translated to Persian (Farsi).
-
Dataset irds.neumarco.fa.train.judged.docpairs
-
A version of msmarco-passage/train/judged, with the corpus translated to Persian (Farsi).
-
Dataset irds.neumarco.fa.train.judged.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of msmarco-passage/train/judged, with the corpus translated to Persian (Farsi).
-
Dataset irds.neumarco.fa.train.judged
datamaestro_text.datasets.irds.data.Adhoc
A version of msmarco-passage/train/judged, with the corpus translated to Persian (Farsi).
neumarco/ru
The msmarco-passage corpus, translated to Russian.
-
Dataset irds.neumarco.ru.documents
datamaestro_text.datasets.irds.data.Documents
The msmarco-passage corpus, translated to Russian.
-
Dataset irds.neumarco.ru.dev.queries
datamaestro_text.datasets.irds.data.Topics
A version of msmarco-passage/dev, with the corpus translated to Russian.
-
Dataset irds.neumarco.ru.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of msmarco-passage/dev, with the corpus translated to Russian.
-
Dataset irds.neumarco.ru.dev
datamaestro_text.datasets.irds.data.Adhoc
A version of msmarco-passage/dev, with the corpus translated to Russian.
-
Dataset irds.neumarco.ru.dev.judged.queries
datamaestro_text.datasets.irds.data.Topics
A version of msmarco-passage/dev/judged, with the corpus translated to Russian.
-
Dataset irds.neumarco.ru.dev.judged.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of msmarco-passage/dev/judged, with the corpus translated to Russian.
-
Dataset irds.neumarco.ru.dev.judged
datamaestro_text.datasets.irds.data.Adhoc
A version of msmarco-passage/dev/judged, with the corpus translated to Russian.
-
Dataset irds.neumarco.ru.dev.small.queries
datamaestro_text.datasets.irds.data.Topics
A version of msmarco-passage/dev/small, with the corpus translated to Russian.
-
Dataset irds.neumarco.ru.dev.small.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of msmarco-passage/dev/small, with the corpus translated to Russian.
-
Dataset irds.neumarco.ru.dev.small
datamaestro_text.datasets.irds.data.Adhoc
A version of msmarco-passage/dev/small, with the corpus translated to Russian.
-
Dataset irds.neumarco.ru.train.queries
datamaestro_text.datasets.irds.data.Topics
A version of msmarco-passage/train, with the corpus translated to Russian.
-
Dataset irds.neumarco.ru.train.docpairs
-
A version of msmarco-passage/train, with the corpus translated to Russian.
-
Dataset irds.neumarco.ru.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of msmarco-passage/train, with the corpus translated to Russian.
-
Dataset irds.neumarco.ru.train
datamaestro_text.datasets.irds.data.Adhoc
A version of msmarco-passage/train, with the corpus translated to Russian.
-
Dataset irds.neumarco.ru.train.judged.queries
datamaestro_text.datasets.irds.data.Topics
A version of msmarco-passage/train/judged, with the corpus translated to Russian.
-
Dataset irds.neumarco.ru.train.judged.docpairs
-
A version of msmarco-passage/train/judged, with the corpus translated to Russian.
-
Dataset irds.neumarco.ru.train.judged.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of msmarco-passage/train/judged, with the corpus translated to Russian.
-
Dataset irds.neumarco.ru.train.judged
datamaestro_text.datasets.irds.data.Adhoc
A version of msmarco-passage/train/judged, with the corpus translated to Russian.
neumarco/zh
The msmarco-passage corpus, translated to Chinese.
-
Dataset irds.neumarco.zh.documents
datamaestro_text.datasets.irds.data.Documents
The msmarco-passage corpus, translated to Chinese.
-
Dataset irds.neumarco.zh.dev.queries
datamaestro_text.datasets.irds.data.Topics
A version of msmarco-passage/dev, with the corpus translated to Chinese.
-
Dataset irds.neumarco.zh.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of msmarco-passage/dev, with the corpus translated to Chinese.
-
Dataset irds.neumarco.zh.dev
datamaestro_text.datasets.irds.data.Adhoc
A version of msmarco-passage/dev, with the corpus translated to Chinese.
-
Dataset irds.neumarco.zh.dev.judged.queries
datamaestro_text.datasets.irds.data.Topics
A version of msmarco-passage/dev/judged, with the corpus translated to Chinese.
-
Dataset irds.neumarco.zh.dev.judged.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of msmarco-passage/dev/judged, with the corpus translated to Chinese.
-
Dataset irds.neumarco.zh.dev.judged
datamaestro_text.datasets.irds.data.Adhoc
A version of msmarco-passage/dev/judged, with the corpus translated to Chinese.
-
Dataset irds.neumarco.zh.dev.small.queries
datamaestro_text.datasets.irds.data.Topics
A version of msmarco-passage/dev/small, with the corpus translated to Chinese.
-
Dataset irds.neumarco.zh.dev.small.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of msmarco-passage/dev/small, with the corpus translated to Chinese.
-
Dataset irds.neumarco.zh.dev.small
datamaestro_text.datasets.irds.data.Adhoc
A version of msmarco-passage/dev/small, with the corpus translated to Chinese.
-
Dataset irds.neumarco.zh.train.queries
datamaestro_text.datasets.irds.data.Topics
A version of msmarco-passage/train, with the corpus translated to Chinese.
-
Dataset irds.neumarco.zh.train.docpairs
-
A version of msmarco-passage/train, with the corpus translated to Chinese.
-
Dataset irds.neumarco.zh.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of msmarco-passage/train, with the corpus translated to Chinese.
-
Dataset irds.neumarco.zh.train
datamaestro_text.datasets.irds.data.Adhoc
A version of msmarco-passage/train, with the corpus translated to Chinese.
-
Dataset irds.neumarco.zh.train.judged.queries
datamaestro_text.datasets.irds.data.Topics
A version of msmarco-passage/train/judged, with the corpus translated to Chinese.
-
Dataset irds.neumarco.zh.train.judged.docpairs
-
A version of msmarco-passage/train/judged, with the corpus translated to Chinese.
-
Dataset irds.neumarco.zh.train.judged.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of msmarco-passage/train/judged, with the corpus translated to Chinese.
-
Dataset irds.neumarco.zh.train.judged
datamaestro_text.datasets.irds.data.Adhoc
A version of msmarco-passage/train/judged, with the corpus translated to Chinese.
NFCorpus (NutritionFacts)
"NFCorpus is a full-text English retrieval data set for Medical Information Retrieval. It contains a total of 3,244 natural language queries (written in non-technical English, harvested from the NutritionFacts.org site) with 169,756 automatically extracted relevance judgments for 9,964 medical documents (written in a complex terminology-heavy language), mostly from PubMed."
-
Dataset irds.nfcorpus.documents
datamaestro_text.datasets.irds.data.Documents
"NFCorpus is a full-text English retrieval data set for Medical Information Retrieval. It contains a total of 3,244 natural language queries (written in non-technical English, harvested from the NutritionFacts.org site) with 169,756 automatically extracted relevance judgments for 9,964 medical documents (written in a complex terminology-heavy language), mostly from PubMed."
-
Dataset irds.nfcorpus.dev.queries
datamaestro_text.datasets.irds.data.Topics
Official dev set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)
-
Dataset irds.nfcorpus.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official dev set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)
-
Dataset irds.nfcorpus.dev
datamaestro_text.datasets.irds.data.Adhoc
Official dev set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)
-
Dataset irds.nfcorpus.dev.nontopic.queries
datamaestro_text.datasets.irds.data.Topics
Official dev set, filtered to exclude queries from topic pages.
-
Dataset irds.nfcorpus.dev.nontopic.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official dev set, filtered to exclude queries from topic pages.
-
Dataset irds.nfcorpus.dev.nontopic
datamaestro_text.datasets.irds.data.Adhoc
Official dev set, filtered to exclude queries from topic pages.
-
Dataset irds.nfcorpus.dev.video.queries
datamaestro_text.datasets.irds.data.Topics
Official dev set, filtered to only include queries from video pages.
-
Dataset irds.nfcorpus.dev.video.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official dev set, filtered to only include queries from video pages.
-
Dataset irds.nfcorpus.dev.video
datamaestro_text.datasets.irds.data.Adhoc
Official dev set, filtered to only include queries from video pages.
-
Dataset irds.nfcorpus.test.queries
datamaestro_text.datasets.irds.data.Topics
Official test set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)
-
Dataset irds.nfcorpus.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official test set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)
-
Dataset irds.nfcorpus.test
datamaestro_text.datasets.irds.data.Adhoc
Official test set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)
-
Dataset irds.nfcorpus.test.nontopic.queries
datamaestro_text.datasets.irds.data.Topics
Official test set, filtered to exclude queries from topic pages.
-
Dataset irds.nfcorpus.test.nontopic.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official test set, filtered to exclude queries from topic pages.
-
Dataset irds.nfcorpus.test.nontopic
datamaestro_text.datasets.irds.data.Adhoc
Official test set, filtered to exclude queries from topic pages.
-
Dataset irds.nfcorpus.test.video.queries
datamaestro_text.datasets.irds.data.Topics
Official test set, filtered to only include queries from video pages.
-
Dataset irds.nfcorpus.test.video.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official test set, filtered to only include queries from video pages.
-
Dataset irds.nfcorpus.test.video
datamaestro_text.datasets.irds.data.Adhoc
Official test set, filtered to only include queries from video pages.
-
Dataset irds.nfcorpus.train.queries
datamaestro_text.datasets.irds.data.Topics
Official train set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)
-
Dataset irds.nfcorpus.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official train set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)
-
Dataset irds.nfcorpus.train
datamaestro_text.datasets.irds.data.Adhoc
Official train set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)
-
Dataset irds.nfcorpus.train.nontopic.queries
datamaestro_text.datasets.irds.data.Topics
Official train set, filtered to exclude queries from topic pages.
-
Dataset irds.nfcorpus.train.nontopic.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official train set, filtered to exclude queries from topic pages.
-
Dataset irds.nfcorpus.train.nontopic
datamaestro_text.datasets.irds.data.Adhoc
Official train set, filtered to exclude queries from topic pages.
-
Dataset irds.nfcorpus.train.video.queries
datamaestro_text.datasets.irds.data.Topics
Official train set, filtered to only include queries from video pages.
-
Dataset irds.nfcorpus.train.video.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official train set, filtered to only include queries from video pages.
-
Dataset irds.nfcorpus.train.video
datamaestro_text.datasets.irds.data.Adhoc
Official train set, filtered to only include queries from video pages.
Natural Questions
Google Natural Questions is a Q&A dataset containing long, short, and Yes/No answers from Wikipedia. ir_datasets frames this around an ad-hoc ranking setting by building a collection of all long answer candidate passages. However, short and Yes/No annotations are also available in the qrels, as are the passages presented to the annotators (via scoreddocs).
Importantly, the document collection does not consist of all Wikipedia passages, but instead a union of the candidate passages presented to the annotators (akin to MS MARCO). dph-w100/natural-questions/train and dph-w100/natural-questions/dev contain a filtered set of the questions in this dataset and a full Wikipedia dump (which is a more realistic retrieval setting).
- Dataset website
- Dataset paper
- See also: dph-w100
-
Dataset irds.natural-questions.documents
datamaestro_text.datasets.irds.data.Documents
Google Natural Questions is a Q&A dataset containing long, short, and Yes/No answers from Wikipedia. ir_datasets frames this around an ad-hoc ranking setting by building a collection of all long answer candidate passages. However, short and Yes/No annotations are also available in the qrels, as are the passages presented to the annotators (via scoreddocs).
Importantly, the document collection does not consist of all Wikipedia passages, but instead a union of the candidate passages presented to the annotators (akin to MS MARCO). dph-w100/natural-questions/train and dph-w100/natural-questions/dev contain a filtered set of the questions in this dataset and a full Wikipedia dump (which is a more realistic retrieval setting).
- Dataset website
- Dataset paper
- See also: dph-w100
-
Dataset irds.natural-questions.dev.queries
datamaestro_text.datasets.irds.data.Topics
Official dev set.
-
Dataset irds.natural-questions.dev.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Official dev set.
-
Dataset irds.natural-questions.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official dev set.
-
Dataset irds.natural-questions.dev
datamaestro_text.datasets.irds.data.Adhoc
Official dev set.
-
Dataset irds.natural-questions.train.queries
datamaestro_text.datasets.irds.data.Topics
Official train set.
-
Dataset irds.natural-questions.train.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Official train set.
-
Dataset irds.natural-questions.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official train set.
-
Dataset irds.natural-questions.train
datamaestro_text.datasets.irds.data.Adhoc
Official train set.
NYT
The New York Times Annotated Corpus. Consists of articles published between 1987 and 2007. It is used in TREC Core 2017 and it is also useful for transferring relevance signals in cases where training data is in short supply.
Uses data from LDC2008T19. The source collection can be downloaded from the LDC.
-
Dataset irds.nyt.documents
datamaestro_text.datasets.irds.data.Documents
The New York Times Annotated Corpus. Consists of articles published between 1987 and 2007. It is used in TREC Core 2017 and it is also useful for transferring relevance signals in cases where training data is in short supply.
Uses data from LDC2008T19. The source collection can be downloaded from the LDC.
-
Dataset irds.nyt.trec-core-2017.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Common Core 2017 benchmark.
Note that this dataset only contains the 50 queries assessed by NIST.
- Queries: TREC-style (keyword, description, narrative)
- Relevance: Deeply-annotated
- Shared Task Website
- Shared Task Paper
-
Dataset irds.nyt.trec-core-2017.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Common Core 2017 benchmark.
Note that this dataset only contains the 50 queries assessed by NIST.
- Queries: TREC-style (keyword, description, narrative)
- Relevance: Deeply-annotated
- Shared Task Website
- Shared Task Paper
-
Dataset irds.nyt.trec-core-2017
datamaestro_text.datasets.irds.data.Adhoc
The TREC Common Core 2017 benchmark.
Note that this dataset only contains the 50 queries assessed by NIST.
- Queries: TREC-style (keyword, description, narrative)
- Relevance: Deeply-annotated
- Shared Task Website
- Shared Task Paper
-
Dataset irds.nyt.wksup.queries
datamaestro_text.datasets.irds.data.Topics
Training set (without held-out nyt/wksup/valid) for transferring relevance signals from NYT corpus.
-
Dataset irds.nyt.wksup.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Training set (without held-out nyt/wksup/valid) for transferring relevance signals from NYT corpus.
-
Dataset irds.nyt.wksup
datamaestro_text.datasets.irds.data.Adhoc
Training set (without held-out nyt/wksup/valid) for transferring relevance signals from NYT corpus.
-
Dataset irds.nyt.wksup.train.queries
datamaestro_text.datasets.irds.data.Topics
Training set (without held-out nyt/wksup/valid) for transferring relevance signals from NYT corpus.
-
Dataset irds.nyt.wksup.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Training set (without held-out nyt/wksup/valid) for transferring relevance signals from NYT corpus.
-
Dataset irds.nyt.wksup.train
datamaestro_text.datasets.irds.data.Adhoc
Training set (without held-out nyt/wksup/valid) for transferring relevance signals from NYT corpus.
-
Dataset irds.nyt.wksup.valid.queries
datamaestro_text.datasets.irds.data.Topics
Held-out validation set for transferring relevance signals from NYT corpus (see nyt/wksup/train).
-
Dataset irds.nyt.wksup.valid.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Held-out validation set for transferring relevance signals from NYT corpus (see nyt/wksup/train).
-
Dataset irds.nyt.wksup.valid
datamaestro_text.datasets.irds.data.Adhoc
Held-out validation set for transferring relevance signals from NYT corpus (see nyt/wksup/train).
pmc/v1
Subset of PMC articles used for the TREC 2014 and 2015 tasks (v1). Inclues titles, abstracts, full text. Collected from the open access segment on January 21, 2014.
- Information on documents
-
Dataset irds.pmc.v1.documents
datamaestro_text.datasets.irds.data.Documents
Subset of PMC articles used for the TREC 2014 and 2015 tasks (v1). Inclues titles, abstracts, full text. Collected from the open access segment on January 21, 2014.
-
Dataset irds.pmc.v1.trec-cds-2014.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Clinical Decision Support (CDS) track from 2014.
-
Dataset irds.pmc.v1.trec-cds-2014.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Clinical Decision Support (CDS) track from 2014.
-
Dataset irds.pmc.v1.trec-cds-2014
datamaestro_text.datasets.irds.data.Adhoc
The TREC Clinical Decision Support (CDS) track from 2014.
-
Dataset irds.pmc.v1.trec-cds-2015.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Clinical Decision Support (CDS) track from 2015.
-
Dataset irds.pmc.v1.trec-cds-2015.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Clinical Decision Support (CDS) track from 2015.
-
Dataset irds.pmc.v1.trec-cds-2015
datamaestro_text.datasets.irds.data.Adhoc
The TREC Clinical Decision Support (CDS) track from 2015.
pmc/v2
Subset of PMC articles used for the TREC 2016 task (v2). Inclues titles, abstracts, full text. Collected from the open access segment on March 28, 2016.
- Information on documents
-
Dataset irds.pmc.v2.documents
datamaestro_text.datasets.irds.data.Documents
Subset of PMC articles used for the TREC 2016 task (v2). Inclues titles, abstracts, full text. Collected from the open access segment on March 28, 2016.
-
Dataset irds.pmc.v2.trec-cds-2016.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Clinical Decision Support (CDS) track from 2016.
-
Dataset irds.pmc.v2.trec-cds-2016.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Clinical Decision Support (CDS) track from 2016.
-
Dataset irds.pmc.v2.trec-cds-2016
datamaestro_text.datasets.irds.data.Adhoc
The TREC Clinical Decision Support (CDS) track from 2016.
Touché Image Search
Corpus version 2022-06-13 with 23 841 images. It was released on June 13, 2022 on Zenodo.
This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.
-
Dataset irds.touche-image.2022-06-13.documents
datamaestro_text.datasets.irds.data.Documents
Corpus version 2022-06-13 with 23 841 images. It was released on June 13, 2022 on Zenodo.
This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.
-
Dataset irds.touche-image.2022-06-13.touche-2022-task-3.queries
datamaestro_text.datasets.irds.data.Topics
Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2022 is the third lab on argument retrieval at CLEF 2022 featuring three tasks.
Given a controversial topic, the task is to retrieve images (from touche-image/2022-06-13) for each stance (pro/con) that show support for that stance.
Systems are evaluated on Touché topics 1-50 by the ratio of images among the 20 retrieved images for each topic (10 images for each stance) that are all three: relevant to the topic, argumentative, and have the associated stance.
-
Dataset irds.touche-image.2022-06-13.touche-2022-task-3.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2022 is the third lab on argument retrieval at CLEF 2022 featuring three tasks.
Given a controversial topic, the task is to retrieve images (from touche-image/2022-06-13) for each stance (pro/con) that show support for that stance.
Systems are evaluated on Touché topics 1-50 by the ratio of images among the 20 retrieved images for each topic (10 images for each stance) that are all three: relevant to the topic, argumentative, and have the associated stance.
-
Dataset irds.touche-image.2022-06-13.touche-2022-task-3
datamaestro_text.datasets.irds.data.Adhoc
Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2022 is the third lab on argument retrieval at CLEF 2022 featuring three tasks.
Given a controversial topic, the task is to retrieve images (from touche-image/2022-06-13) for each stance (pro/con) that show support for that stance.
Systems are evaluated on Touché topics 1-50 by the ratio of images among the 20 retrieved images for each topic (10 images for each stance) that are all three: relevant to the topic, argumentative, and have the associated stance.
Touché 2022 Task 2: Argument Retrieval for Comparative Questions
Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2022 is the third lab on argument retrieval at CLEF 2022 featuring three tasks.
Given a comparative topic and a collection of documents, the task is to retrieve relevant argumentative passages for either compared object or for both and to detect their respective stances with respect to the object they talk about.
Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.
Additionally, classify the stance of the retrieved text passages towards the compared objects in questions. For instance, in the question Who is a better friend, a cat or a dog? the terms cat and dog are the comparison objects. An answer candidate like Cats can be quite affectionate and attentive, and thus are good friends should be classified as pro the cat object, while Cats are less faithful than dogs as supporting the dog object.
-
Dataset irds.clueweb12.touche-2022-task-2.documents
datamaestro_text.datasets.irds.data.Documents
Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2022 is the third lab on argument retrieval at CLEF 2022 featuring three tasks.
Given a comparative topic and a collection of documents, the task is to retrieve relevant argumentative passages for either compared object or for both and to detect their respective stances with respect to the object they talk about.
Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.
Additionally, classify the stance of the retrieved text passages towards the compared objects in questions. For instance, in the question Who is a better friend, a cat or a dog? the terms cat and dog are the comparison objects. An answer candidate like Cats can be quite affectionate and attentive, and thus are good friends should be classified as pro the cat object, while Cats are less faithful than dogs as supporting the dog object.
-
Dataset irds.clueweb12.touche-2022-task-2.queries
datamaestro_text.datasets.irds.data.Topics
Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2022 is the third lab on argument retrieval at CLEF 2022 featuring three tasks.
Given a comparative topic and a collection of documents, the task is to retrieve relevant argumentative passages for either compared object or for both and to detect their respective stances with respect to the object they talk about.
Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.
Additionally, classify the stance of the retrieved text passages towards the compared objects in questions. For instance, in the question Who is a better friend, a cat or a dog? the terms cat and dog are the comparison objects. An answer candidate like Cats can be quite affectionate and attentive, and thus are good friends should be classified as pro the cat object, while Cats are less faithful than dogs as supporting the dog object.
-
Dataset irds.clueweb12.touche-2022-task-2.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2022 is the third lab on argument retrieval at CLEF 2022 featuring three tasks.
Given a comparative topic and a collection of documents, the task is to retrieve relevant argumentative passages for either compared object or for both and to detect their respective stances with respect to the object they talk about.
Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.
Additionally, classify the stance of the retrieved text passages towards the compared objects in questions. For instance, in the question Who is a better friend, a cat or a dog? the terms cat and dog are the comparison objects. An answer candidate like Cats can be quite affectionate and attentive, and thus are good friends should be classified as pro the cat object, while Cats are less faithful than dogs as supporting the dog object.
-
Dataset irds.clueweb12.touche-2022-task-2
datamaestro_text.datasets.irds.data.Adhoc
Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2022 is the third lab on argument retrieval at CLEF 2022 featuring three tasks.
Given a comparative topic and a collection of documents, the task is to retrieve relevant argumentative passages for either compared object or for both and to detect their respective stances with respect to the object they talk about.
Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.
Additionally, classify the stance of the retrieved text passages towards the compared objects in questions. For instance, in the question Who is a better friend, a cat or a dog? the terms cat and dog are the comparison objects. An answer candidate like Cats can be quite affectionate and attentive, and thus are good friends should be classified as pro the cat object, while Cats are less faithful than dogs as supporting the dog object.
Touché 2022 Task 2: Argument Retrieval for Comparative Questions (Expanded)
Pre-processed version of clueweb12/touche-2022-task-2 where each passage has been expanded with queries generated using DocT5Query.
-
Dataset irds.clueweb12.touche-2022-task-2.expanded-doc-t5-query.documents
datamaestro_text.datasets.irds.data.Documents
Pre-processed version of clueweb12/touche-2022-task-2 where each passage has been expanded with queries generated using DocT5Query.
-
Dataset irds.clueweb12.touche-2022-task-2.expanded-doc-t5-query.queries
datamaestro_text.datasets.irds.data.Topics
Pre-processed version of clueweb12/touche-2022-task-2 where each passage has been expanded with queries generated using DocT5Query.
-
Dataset irds.clueweb12.touche-2022-task-2.expanded-doc-t5-query.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Pre-processed version of clueweb12/touche-2022-task-2 where each passage has been expanded with queries generated using DocT5Query.
-
Dataset irds.clueweb12.touche-2022-task-2.expanded-doc-t5-query
datamaestro_text.datasets.irds.data.Adhoc
Pre-processed version of clueweb12/touche-2022-task-2 where each passage has been expanded with queries generated using DocT5Query.
TREC Arabic
A collection of news articles in Arabic, used for multi-lingual evaluation in TREC 2001 and TREC 2002.
Document collection from LDC2001T55.
-
Dataset irds.trec-arabic.documents
datamaestro_text.datasets.irds.data.Documents
A collection of news articles in Arabic, used for multi-lingual evaluation in TREC 2001 and TREC 2002.
Document collection from LDC2001T55.
-
Dataset irds.trec-arabic.ar2001.queries
datamaestro_text.datasets.irds.data.Topics
Arabic benchmark from TREC 2001.
-
Dataset irds.trec-arabic.ar2001.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Arabic benchmark from TREC 2001.
-
Dataset irds.trec-arabic.ar2001
datamaestro_text.datasets.irds.data.Adhoc
Arabic benchmark from TREC 2001.
-
Dataset irds.trec-arabic.ar2002.queries
datamaestro_text.datasets.irds.data.Topics
Arabic benchmark from TREC 2002.
-
Dataset irds.trec-arabic.ar2002.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Arabic benchmark from TREC 2002.
-
Dataset irds.trec-arabic.ar2002
datamaestro_text.datasets.irds.data.Adhoc
Arabic benchmark from TREC 2002.
TREC Mandarin
A collection of news articles in Mandarin in Simplified Chinese, used for multi-lingual evaluation in TREC 5 and TREC 6.
Document collection from LDC2000T52.
-
Dataset irds.trec-mandarin.documents
datamaestro_text.datasets.irds.data.Documents
A collection of news articles in Mandarin in Simplified Chinese, used for multi-lingual evaluation in TREC 5 and TREC 6.
Document collection from LDC2000T52.
-
Dataset irds.trec-mandarin.trec5.queries
datamaestro_text.datasets.irds.data.Topics
Mandarin Chinese benchmark from TREC 5.
-
Dataset irds.trec-mandarin.trec5.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Mandarin Chinese benchmark from TREC 5.
-
Dataset irds.trec-mandarin.trec5
datamaestro_text.datasets.irds.data.Adhoc
Mandarin Chinese benchmark from TREC 5.
-
Dataset irds.trec-mandarin.trec6.queries
datamaestro_text.datasets.irds.data.Topics
Mandarin Chinese benchmark from TREC 6.
-
Dataset irds.trec-mandarin.trec6.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Mandarin Chinese benchmark from TREC 6.
-
Dataset irds.trec-mandarin.trec6
datamaestro_text.datasets.irds.data.Adhoc
Mandarin Chinese benchmark from TREC 6.
TREC Spanish
A collection of news articles in Spanish, used for multi-lingual evaluation in TREC 3 and TREC 4.
Document collection from LDC2000T51.
-
Dataset irds.trec-spanish.documents
datamaestro_text.datasets.irds.data.Documents
A collection of news articles in Spanish, used for multi-lingual evaluation in TREC 3 and TREC 4.
Document collection from LDC2000T51.
-
Dataset irds.trec-spanish.trec3.queries
datamaestro_text.datasets.irds.data.Topics
Spanish benchmark from TREC 3.
-
Dataset irds.trec-spanish.trec3.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Spanish benchmark from TREC 3.
-
Dataset irds.trec-spanish.trec3
datamaestro_text.datasets.irds.data.Adhoc
Spanish benchmark from TREC 3.
-
Dataset irds.trec-spanish.trec4.queries
datamaestro_text.datasets.irds.data.Topics
Spanish benchmark from TREC 4.
-
Dataset irds.trec-spanish.trec4.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Spanish benchmark from TREC 4.
-
Dataset irds.trec-spanish.trec4
datamaestro_text.datasets.irds.data.Adhoc
Spanish benchmark from TREC 4.
trec-tot/2023
Corpus for the TREC 2023 tip-of-the-tongue search track.
-
Dataset irds.trec-tot.2023.documents
datamaestro_text.datasets.irds.data.Documents
Corpus for the TREC 2023 tip-of-the-tongue search track.
-
Dataset irds.trec-tot.2023.train.queries
datamaestro_text.datasets.irds.data.Topics
Train query set for TREC 2023 tip-of-the-tongue search track.
-
Dataset irds.trec-tot.2023.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Train query set for TREC 2023 tip-of-the-tongue search track.
-
Dataset irds.trec-tot.2023.train
datamaestro_text.datasets.irds.data.Adhoc
Train query set for TREC 2023 tip-of-the-tongue search track.
-
Dataset irds.trec-tot.2023.dev.queries
datamaestro_text.datasets.irds.data.Topics
Dev query set for TREC 2023 tip-of-the-tongue search track.
-
Dataset irds.trec-tot.2023.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Dev query set for TREC 2023 tip-of-the-tongue search track.
-
Dataset irds.trec-tot.2023.dev
datamaestro_text.datasets.irds.data.Adhoc
Dev query set for TREC 2023 tip-of-the-tongue search track.
trec-tot/2024
Corpus for the TREC 2024 tip-of-the-tongue search track.
-
Dataset irds.trec-tot.2024.documents
datamaestro_text.datasets.irds.data.Documents
Corpus for the TREC 2024 tip-of-the-tongue search track.
-
Dataset irds.trec-tot.2024.test.queries
datamaestro_text.datasets.irds.data.Topics
Test query set for TREC 2024 tip-of-the-tongue search track.
TripClick
TripClick is a large collection from the Trip Database. Relevance is inferred from click signals.
A copy of this dataset can be obtained from the Trip Database through the process described here. Documents, queries, and qrels require the "TripClick IR Benchmark"; for scoreddocs and docpairs, you will also need to request the "TripClick Training Package for Deep Learning Models".
- Documents: Medline article titles and abstracts
- Queries: user queries issued to the Trip Database
- Qrels: Inferred from clicks
- Dataset request form
- Dataset website
- Dataset paper
-
Dataset irds.tripclick.documents
datamaestro_text.datasets.irds.data.Documents
TripClick is a large collection from the Trip Database. Relevance is inferred from click signals.
A copy of this dataset can be obtained from the Trip Database through the process described here. Documents, queries, and qrels require the "TripClick IR Benchmark"; for scoreddocs and docpairs, you will also need to request the "TripClick Training Package for Deep Learning Models".
- Documents: Medline article titles and abstracts
- Queries: user queries issued to the Trip Database
- Qrels: Inferred from clicks
- Dataset request form
- Dataset website
- Dataset paper
-
Dataset irds.tripclick.test.queries
datamaestro_text.datasets.irds.data.Topics
Test subset of tripclick, including all queries from tripclick/test/head, tripclick/test/torso, and tripclick/test/tail.
The scoreddocs are the official BM25 results from Anserini.
-
Dataset irds.tripclick.test.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Test subset of tripclick, including all queries from tripclick/test/head, tripclick/test/torso, and tripclick/test/tail.
The scoreddocs are the official BM25 results from Anserini.
-
Dataset irds.tripclick.test.head.queries
datamaestro_text.datasets.irds.data.Topics
The most frequent queries in the validation set. This represents 20% of the search engine traffic.
-
Dataset irds.tripclick.test.head.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
The most frequent queries in the validation set. This represents 20% of the search engine traffic.
-
Dataset irds.tripclick.test.tail.queries
datamaestro_text.datasets.irds.data.Topics
The least frequent queries in the test set. This represents 50% of the search engine traffic.
-
Dataset irds.tripclick.test.tail.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
The least frequent queries in the test set. This represents 50% of the search engine traffic.
-
Dataset irds.tripclick.test.torso.queries
datamaestro_text.datasets.irds.data.Topics
The moderately frequent queries in the test set. This represents 30% of the search engine traffic.
-
Dataset irds.tripclick.test.torso.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
The moderately frequent queries in the test set. This represents 30% of the search engine traffic.
-
Dataset irds.tripclick.train.queries
datamaestro_text.datasets.irds.data.Topics
Training subset of tripclick, including all queries from tripclick/train/head, tripclick/train/torso, and tripclick/train/tail.
The dataset provides docpairs in a full text format; we map this text back to the query and doc IDs. A small number of docpairs could not be mapped back, so they are skipped.
-
Dataset irds.tripclick.train.docpairs
-
Training subset of tripclick, including all queries from tripclick/train/head, tripclick/train/torso, and tripclick/train/tail.
The dataset provides docpairs in a full text format; we map this text back to the query and doc IDs. A small number of docpairs could not be mapped back, so they are skipped.
-
Dataset irds.tripclick.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Training subset of tripclick, including all queries from tripclick/train/head, tripclick/train/torso, and tripclick/train/tail.
The dataset provides docpairs in a full text format; we map this text back to the query and doc IDs. A small number of docpairs could not be mapped back, so they are skipped.
-
Dataset irds.tripclick.train
datamaestro_text.datasets.irds.data.Adhoc
Training subset of tripclick, including all queries from tripclick/train/head, tripclick/train/torso, and tripclick/train/tail.
The dataset provides docpairs in a full text format; we map this text back to the query and doc IDs. A small number of docpairs could not be mapped back, so they are skipped.
-
Dataset irds.tripclick.train.head.queries
datamaestro_text.datasets.irds.data.Topics
The most frequent queries in the train set. This represents 20% of the search engine traffic.
-
Dataset irds.tripclick.train.head.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The most frequent queries in the train set. This represents 20% of the search engine traffic.
-
Dataset irds.tripclick.train.head
datamaestro_text.datasets.irds.data.Adhoc
The most frequent queries in the train set. This represents 20% of the search engine traffic.
-
Dataset irds.tripclick.train.head.dctr.queries
datamaestro_text.datasets.irds.data.Topics
The most frequent queries in the train set. This represents 20% of the search engine traffic.
-
Dataset irds.tripclick.train.head.dctr.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The most frequent queries in the train set. This represents 20% of the search engine traffic.
-
Dataset irds.tripclick.train.head.dctr
datamaestro_text.datasets.irds.data.Adhoc
The most frequent queries in the train set. This represents 20% of the search engine traffic.
-
Dataset irds.tripclick.train.hofstaetter-triples.queries
datamaestro_text.datasets.irds.data.Topics
A version of tripclick/train that replaces the original (noisy) training triples (docpairs) with those sampled from BM25 instead, as suggested by Hofstätter et al (2022).
-
Dataset irds.tripclick.train.hofstaetter-triples.docpairs
-
A version of tripclick/train that replaces the original (noisy) training triples (docpairs) with those sampled from BM25 instead, as suggested by Hofstätter et al (2022).
-
Dataset irds.tripclick.train.hofstaetter-triples.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A version of tripclick/train that replaces the original (noisy) training triples (docpairs) with those sampled from BM25 instead, as suggested by Hofstätter et al (2022).
-
Dataset irds.tripclick.train.hofstaetter-triples
datamaestro_text.datasets.irds.data.Adhoc
A version of tripclick/train that replaces the original (noisy) training triples (docpairs) with those sampled from BM25 instead, as suggested by Hofstätter et al (2022).
-
Dataset irds.tripclick.train.tail.queries
datamaestro_text.datasets.irds.data.Topics
The least frequent queries in the train set. This represents 50% of the search engine traffic.
-
Dataset irds.tripclick.train.tail.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The least frequent queries in the train set. This represents 50% of the search engine traffic.
-
Dataset irds.tripclick.train.tail
datamaestro_text.datasets.irds.data.Adhoc
The least frequent queries in the train set. This represents 50% of the search engine traffic.
-
Dataset irds.tripclick.train.torso.queries
datamaestro_text.datasets.irds.data.Topics
The moderately frequent queries in the train set. This represents 30% of the search engine traffic.
-
Dataset irds.tripclick.train.torso.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The moderately frequent queries in the train set. This represents 30% of the search engine traffic.
-
Dataset irds.tripclick.train.torso
datamaestro_text.datasets.irds.data.Adhoc
The moderately frequent queries in the train set. This represents 30% of the search engine traffic.
-
Dataset irds.tripclick.val.queries
datamaestro_text.datasets.irds.data.Topics
Validation subset of tripclick, including all queries from tripclick/val/head, tripclick/val/torso, and tripclick/val/tail.
The scoreddocs are the official BM25 results from Anserini.
-
Dataset irds.tripclick.val.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Validation subset of tripclick, including all queries from tripclick/val/head, tripclick/val/torso, and tripclick/val/tail.
The scoreddocs are the official BM25 results from Anserini.
-
Dataset irds.tripclick.val.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Validation subset of tripclick, including all queries from tripclick/val/head, tripclick/val/torso, and tripclick/val/tail.
The scoreddocs are the official BM25 results from Anserini.
-
Dataset irds.tripclick.val
datamaestro_text.datasets.irds.data.Adhoc
Validation subset of tripclick, including all queries from tripclick/val/head, tripclick/val/torso, and tripclick/val/tail.
The scoreddocs are the official BM25 results from Anserini.
-
Dataset irds.tripclick.val.head.queries
datamaestro_text.datasets.irds.data.Topics
The most frequent queries in the validation set. This represents 20% of the search engine traffic.
-
Dataset irds.tripclick.val.head.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
The most frequent queries in the validation set. This represents 20% of the search engine traffic.
-
Dataset irds.tripclick.val.head.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The most frequent queries in the validation set. This represents 20% of the search engine traffic.
-
Dataset irds.tripclick.val.head
datamaestro_text.datasets.irds.data.Adhoc
The most frequent queries in the validation set. This represents 20% of the search engine traffic.
-
Dataset irds.tripclick.val.head.dctr.queries
datamaestro_text.datasets.irds.data.Topics
The most frequent queries in the validation set. This represents 20% of the search engine traffic.
-
Dataset irds.tripclick.val.head.dctr.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
The most frequent queries in the validation set. This represents 20% of the search engine traffic.
-
Dataset irds.tripclick.val.head.dctr.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The most frequent queries in the validation set. This represents 20% of the search engine traffic.
-
Dataset irds.tripclick.val.head.dctr
datamaestro_text.datasets.irds.data.Adhoc
The most frequent queries in the validation set. This represents 20% of the search engine traffic.
-
Dataset irds.tripclick.val.tail.queries
datamaestro_text.datasets.irds.data.Topics
The least frequent queries in the validation set. This represents 50% of the search engine traffic.
-
Dataset irds.tripclick.val.tail.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
The least frequent queries in the validation set. This represents 50% of the search engine traffic.
-
Dataset irds.tripclick.val.tail.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The least frequent queries in the validation set. This represents 50% of the search engine traffic.
-
Dataset irds.tripclick.val.tail
datamaestro_text.datasets.irds.data.Adhoc
The least frequent queries in the validation set. This represents 50% of the search engine traffic.
-
Dataset irds.tripclick.val.torso.queries
datamaestro_text.datasets.irds.data.Topics
The moderately frequent queries in the validation set. This represents 30% of the search engine traffic.
-
Dataset irds.tripclick.val.torso.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
The moderately frequent queries in the validation set. This represents 30% of the search engine traffic.
-
Dataset irds.tripclick.val.torso.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The moderately frequent queries in the validation set. This represents 30% of the search engine traffic.
-
Dataset irds.tripclick.val.torso
datamaestro_text.datasets.irds.data.Adhoc
The moderately frequent queries in the validation set. This represents 30% of the search engine traffic.
tripclick/logs
Raw query logs from TripClick.
Note that this subset includes a broader set of documents than the main collection, but they only provide the title and URL.
-
Dataset irds.tripclick.logs.documents
datamaestro_text.datasets.irds.data.Documents
Raw query logs from TripClick.
Note that this subset includes a broader set of documents than the main collection, but they only provide the title and URL.
Tweets 2013 (Internet Archive)
A collection of tweets from a 2-month window achived by the Internet Achive. This collection can be a stand-in document collection for the TREC Microblog 2013-14 tasks. (Even though it is not exactly the same collection, Sequiera and Lin show that it it close enough.)
This collection is automatically downloaded from the Internet Archive, though download speeds are often slow so it takes some time. ir_datasets constructs a new directory hierarchy during the download process to facilitate fast lookups and slices.
- Documents: Tweets
- Information about dataset (paper)
- Information about dataset (repository)
-
Dataset irds.tweets2013-ia.documents
datamaestro_text.datasets.irds.data.Documents
A collection of tweets from a 2-month window achived by the Internet Achive. This collection can be a stand-in document collection for the TREC Microblog 2013-14 tasks. (Even though it is not exactly the same collection, Sequiera and Lin show that it it close enough.)
This collection is automatically downloaded from the Internet Archive, though download speeds are often slow so it takes some time. ir_datasets constructs a new directory hierarchy during the download process to facilitate fast lookups and slices.
- Documents: Tweets
- Information about dataset (paper)
- Information about dataset (repository)
-
Dataset irds.tweets2013-ia.trec-mb-2013.queries
datamaestro_text.datasets.irds.data.Topics
TREC Microblog 2013 test collection.
-
Dataset irds.tweets2013-ia.trec-mb-2013.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
TREC Microblog 2013 test collection.
-
Dataset irds.tweets2013-ia.trec-mb-2013
datamaestro_text.datasets.irds.data.Adhoc
TREC Microblog 2013 test collection.
-
Dataset irds.tweets2013-ia.trec-mb-2014.queries
datamaestro_text.datasets.irds.data.Topics
TREC Microblog 2014 test collection.
-
Dataset irds.tweets2013-ia.trec-mb-2014.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
TREC Microblog 2014 test collection.
-
Dataset irds.tweets2013-ia.trec-mb-2014
datamaestro_text.datasets.irds.data.Adhoc
TREC Microblog 2014 test collection.
Vaswani
A small corpus of roughly 11,000 scientific abstracts.
- Documents: Scientific abstracts
- Queries: Natural language keywords
- Dataset Information
-
Dataset irds.vaswani.documents
datamaestro_text.datasets.irds.data.Documents
A small corpus of roughly 11,000 scientific abstracts.
- Documents: Scientific abstracts
- Queries: Natural language keywords
- Dataset Information
-
Dataset irds.vaswani.queries
datamaestro_text.datasets.irds.data.Topics
A small corpus of roughly 11,000 scientific abstracts.
- Documents: Scientific abstracts
- Queries: Natural language keywords
- Dataset Information
-
Dataset irds.vaswani.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A small corpus of roughly 11,000 scientific abstracts.
- Documents: Scientific abstracts
- Queries: Natural language keywords
- Dataset Information
-
Dataset irds.vaswani
datamaestro_text.datasets.irds.data.Adhoc
A small corpus of roughly 11,000 scientific abstracts.
- Documents: Scientific abstracts
- Queries: Natural language keywords
- Dataset Information
wapo/v2
Version 2 of the Washington Post collection, consisting of articles published between 2012-2017.
The collection is obtained from NIST by requesting it from NIST here.
body contains all body text in plain text format, including paragrphs and multi-media captions. body_paras_html contains only source paragraphs and contains HTML markup. body_media contains images, videos, tweets, and galeries, along with a link to the content and a textual caption.
-
Dataset irds.wapo.v2.documents
datamaestro_text.datasets.irds.data.Documents
Version 2 of the Washington Post collection, consisting of articles published between 2012-2017.
The collection is obtained from NIST by requesting it from NIST here.
body contains all body text in plain text format, including paragrphs and multi-media captions. body_paras_html contains only source paragraphs and contains HTML markup. body_media contains images, videos, tweets, and galeries, along with a link to the content and a textual caption.
-
Dataset irds.wapo.v2.trec-core-2018.queries
datamaestro_text.datasets.irds.data.Topics
The TREC Common Core 2018 benchmark.
- Queries: TREC-style (keyword, description, narrative)
- Relevance: Deeply-annotated
- Shared Task Website
-
Dataset irds.wapo.v2.trec-core-2018.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC Common Core 2018 benchmark.
- Queries: TREC-style (keyword, description, narrative)
- Relevance: Deeply-annotated
- Shared Task Website
-
Dataset irds.wapo.v2.trec-core-2018
datamaestro_text.datasets.irds.data.Adhoc
The TREC Common Core 2018 benchmark.
- Queries: TREC-style (keyword, description, narrative)
- Relevance: Deeply-annotated
- Shared Task Website
-
Dataset irds.wapo.v2.trec-news-2018.queries
datamaestro_text.datasets.irds.data.Topics
The TREC News 2018 Background Linking task. The task is to find relevant background information for the provided articles.
- Queries: Articles via the doc_id field
- Shared Task Website
- Sared task paper
-
Dataset irds.wapo.v2.trec-news-2018.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC News 2018 Background Linking task. The task is to find relevant background information for the provided articles.
- Queries: Articles via the doc_id field
- Shared Task Website
- Sared task paper
-
Dataset irds.wapo.v2.trec-news-2018
datamaestro_text.datasets.irds.data.Adhoc
The TREC News 2018 Background Linking task. The task is to find relevant background information for the provided articles.
- Queries: Articles via the doc_id field
- Shared Task Website
- Sared task paper
-
Dataset irds.wapo.v2.trec-news-2019.queries
datamaestro_text.datasets.irds.data.Topics
The TREC News 2019 Background Linking task. The task is to find relevant background information for the provided articles.
- Queries: Articles via the doc_id field
- Shared Task Website
- Sared task paper
-
Dataset irds.wapo.v2.trec-news-2019.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
The TREC News 2019 Background Linking task. The task is to find relevant background information for the provided articles.
- Queries: Articles via the doc_id field
- Shared Task Website
- Sared task paper
-
Dataset irds.wapo.v2.trec-news-2019
datamaestro_text.datasets.irds.data.Adhoc
The TREC News 2019 Background Linking task. The task is to find relevant background information for the provided articles.
- Queries: Articles via the doc_id field
- Shared Task Website
- Sared task paper
wapo/v4
-
Dataset irds.wapo.v4.documents
wikiclir/ar
WikiCLIR with Arabic documents.
-
Dataset irds.wikiclir.ar.documents
datamaestro_text.datasets.irds.data.Documents
WikiCLIR with Arabic documents.
-
Dataset irds.wikiclir.ar.queries
datamaestro_text.datasets.irds.data.Topics
WikiCLIR with Arabic documents.
-
Dataset irds.wikiclir.ar.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
WikiCLIR with Arabic documents.
-
Dataset irds.wikiclir.ar
datamaestro_text.datasets.irds.data.Adhoc
WikiCLIR with Arabic documents.
wikiclir/ca
WikiCLIR with Catalan documents.
-
Dataset irds.wikiclir.ca.documents
datamaestro_text.datasets.irds.data.Documents
WikiCLIR with Catalan documents.
-
Dataset irds.wikiclir.ca.queries
datamaestro_text.datasets.irds.data.Topics
WikiCLIR with Catalan documents.
-
Dataset irds.wikiclir.ca.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
WikiCLIR with Catalan documents.
-
Dataset irds.wikiclir.ca
datamaestro_text.datasets.irds.data.Adhoc
WikiCLIR with Catalan documents.
wikiclir/cs
WikiCLIR with Czech documents.
-
Dataset irds.wikiclir.cs.documents
datamaestro_text.datasets.irds.data.Documents
WikiCLIR with Czech documents.
-
Dataset irds.wikiclir.cs.queries
datamaestro_text.datasets.irds.data.Topics
WikiCLIR with Czech documents.
-
Dataset irds.wikiclir.cs.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
WikiCLIR with Czech documents.
-
Dataset irds.wikiclir.cs
datamaestro_text.datasets.irds.data.Adhoc
WikiCLIR with Czech documents.
wikiclir/de
WikiCLIR with German documents.
-
Dataset irds.wikiclir.de.documents
datamaestro_text.datasets.irds.data.Documents
WikiCLIR with German documents.
-
Dataset irds.wikiclir.de.queries
datamaestro_text.datasets.irds.data.Topics
WikiCLIR with German documents.
-
Dataset irds.wikiclir.de.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
WikiCLIR with German documents.
-
Dataset irds.wikiclir.de
datamaestro_text.datasets.irds.data.Adhoc
WikiCLIR with German documents.
wikiclir/en-simple
WikiCLIR with Simple English documents.
-
Dataset irds.wikiclir.en-simple.documents
datamaestro_text.datasets.irds.data.Documents
WikiCLIR with Simple English documents.
-
Dataset irds.wikiclir.en-simple.queries
datamaestro_text.datasets.irds.data.Topics
WikiCLIR with Simple English documents.
-
Dataset irds.wikiclir.en-simple.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
WikiCLIR with Simple English documents.
-
Dataset irds.wikiclir.en-simple
datamaestro_text.datasets.irds.data.Adhoc
WikiCLIR with Simple English documents.
wikiclir/es
WikiCLIR with Spanish documents.
-
Dataset irds.wikiclir.es.documents
datamaestro_text.datasets.irds.data.Documents
WikiCLIR with Spanish documents.
-
Dataset irds.wikiclir.es.queries
datamaestro_text.datasets.irds.data.Topics
WikiCLIR with Spanish documents.
-
Dataset irds.wikiclir.es.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
WikiCLIR with Spanish documents.
-
Dataset irds.wikiclir.es
datamaestro_text.datasets.irds.data.Adhoc
WikiCLIR with Spanish documents.
wikiclir/fi
WikiCLIR with Finnish documents.
-
Dataset irds.wikiclir.fi.documents
datamaestro_text.datasets.irds.data.Documents
WikiCLIR with Finnish documents.
-
Dataset irds.wikiclir.fi.queries
datamaestro_text.datasets.irds.data.Topics
WikiCLIR with Finnish documents.
-
Dataset irds.wikiclir.fi.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
WikiCLIR with Finnish documents.
-
Dataset irds.wikiclir.fi
datamaestro_text.datasets.irds.data.Adhoc
WikiCLIR with Finnish documents.
wikiclir/fr
WikiCLIR with French documents.
-
Dataset irds.wikiclir.fr.documents
datamaestro_text.datasets.irds.data.Documents
WikiCLIR with French documents.
-
Dataset irds.wikiclir.fr.queries
datamaestro_text.datasets.irds.data.Topics
WikiCLIR with French documents.
-
Dataset irds.wikiclir.fr.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
WikiCLIR with French documents.
-
Dataset irds.wikiclir.fr
datamaestro_text.datasets.irds.data.Adhoc
WikiCLIR with French documents.
wikiclir/it
WikiCLIR with Italian documents.
-
Dataset irds.wikiclir.it.documents
datamaestro_text.datasets.irds.data.Documents
WikiCLIR with Italian documents.
-
Dataset irds.wikiclir.it.queries
datamaestro_text.datasets.irds.data.Topics
WikiCLIR with Italian documents.
-
Dataset irds.wikiclir.it.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
WikiCLIR with Italian documents.
-
Dataset irds.wikiclir.it
datamaestro_text.datasets.irds.data.Adhoc
WikiCLIR with Italian documents.
wikiclir/ja
WikiCLIR with Japanese documents.
-
Dataset irds.wikiclir.ja.documents
datamaestro_text.datasets.irds.data.Documents
WikiCLIR with Japanese documents.
-
Dataset irds.wikiclir.ja.queries
datamaestro_text.datasets.irds.data.Topics
WikiCLIR with Japanese documents.
-
Dataset irds.wikiclir.ja.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
WikiCLIR with Japanese documents.
-
Dataset irds.wikiclir.ja
datamaestro_text.datasets.irds.data.Adhoc
WikiCLIR with Japanese documents.
wikiclir/ko
WikiCLIR with Korean documents.
-
Dataset irds.wikiclir.ko.documents
datamaestro_text.datasets.irds.data.Documents
WikiCLIR with Korean documents.
-
Dataset irds.wikiclir.ko.queries
datamaestro_text.datasets.irds.data.Topics
WikiCLIR with Korean documents.
-
Dataset irds.wikiclir.ko.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
WikiCLIR with Korean documents.
-
Dataset irds.wikiclir.ko
datamaestro_text.datasets.irds.data.Adhoc
WikiCLIR with Korean documents.
wikiclir/nl
WikiCLIR with Dutch documents.
-
Dataset irds.wikiclir.nl.documents
datamaestro_text.datasets.irds.data.Documents
WikiCLIR with Dutch documents.
-
Dataset irds.wikiclir.nl.queries
datamaestro_text.datasets.irds.data.Topics
WikiCLIR with Dutch documents.
-
Dataset irds.wikiclir.nl.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
WikiCLIR with Dutch documents.
-
Dataset irds.wikiclir.nl
datamaestro_text.datasets.irds.data.Adhoc
WikiCLIR with Dutch documents.
wikiclir/nn
WikiCLIR with Norwegian (Bokmål) documents.
-
Dataset irds.wikiclir.nn.documents
datamaestro_text.datasets.irds.data.Documents
WikiCLIR with Norwegian (Bokmål) documents.
-
Dataset irds.wikiclir.nn.queries
datamaestro_text.datasets.irds.data.Topics
WikiCLIR with Norwegian (Bokmål) documents.
-
Dataset irds.wikiclir.nn.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
WikiCLIR with Norwegian (Bokmål) documents.
-
Dataset irds.wikiclir.nn
datamaestro_text.datasets.irds.data.Adhoc
WikiCLIR with Norwegian (Bokmål) documents.
wikiclir/no
WikiCLIR with Norwegian (Nynorsk) documents.
-
Dataset irds.wikiclir.no.documents
datamaestro_text.datasets.irds.data.Documents
WikiCLIR with Norwegian (Nynorsk) documents.
-
Dataset irds.wikiclir.no.queries
datamaestro_text.datasets.irds.data.Topics
WikiCLIR with Norwegian (Nynorsk) documents.
-
Dataset irds.wikiclir.no.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
WikiCLIR with Norwegian (Nynorsk) documents.
-
Dataset irds.wikiclir.no
datamaestro_text.datasets.irds.data.Adhoc
WikiCLIR with Norwegian (Nynorsk) documents.
wikiclir/pl
WikiCLIR with Polish documents.
-
Dataset irds.wikiclir.pl.documents
datamaestro_text.datasets.irds.data.Documents
WikiCLIR with Polish documents.
-
Dataset irds.wikiclir.pl.queries
datamaestro_text.datasets.irds.data.Topics
WikiCLIR with Polish documents.
-
Dataset irds.wikiclir.pl.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
WikiCLIR with Polish documents.
-
Dataset irds.wikiclir.pl
datamaestro_text.datasets.irds.data.Adhoc
WikiCLIR with Polish documents.
wikiclir/pt
WikiCLIR with Portuguese documents.
-
Dataset irds.wikiclir.pt.documents
datamaestro_text.datasets.irds.data.Documents
WikiCLIR with Portuguese documents.
-
Dataset irds.wikiclir.pt.queries
datamaestro_text.datasets.irds.data.Topics
WikiCLIR with Portuguese documents.
-
Dataset irds.wikiclir.pt.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
WikiCLIR with Portuguese documents.
-
Dataset irds.wikiclir.pt
datamaestro_text.datasets.irds.data.Adhoc
WikiCLIR with Portuguese documents.
wikiclir/ro
WikiCLIR with Romanian documents.
-
Dataset irds.wikiclir.ro.documents
datamaestro_text.datasets.irds.data.Documents
WikiCLIR with Romanian documents.
-
Dataset irds.wikiclir.ro.queries
datamaestro_text.datasets.irds.data.Topics
WikiCLIR with Romanian documents.
-
Dataset irds.wikiclir.ro.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
WikiCLIR with Romanian documents.
-
Dataset irds.wikiclir.ro
datamaestro_text.datasets.irds.data.Adhoc
WikiCLIR with Romanian documents.
wikiclir/ru
WikiCLIR with Russian documents.
-
Dataset irds.wikiclir.ru.documents
datamaestro_text.datasets.irds.data.Documents
WikiCLIR with Russian documents.
-
Dataset irds.wikiclir.ru.queries
datamaestro_text.datasets.irds.data.Topics
WikiCLIR with Russian documents.
-
Dataset irds.wikiclir.ru.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
WikiCLIR with Russian documents.
-
Dataset irds.wikiclir.ru
datamaestro_text.datasets.irds.data.Adhoc
WikiCLIR with Russian documents.
wikiclir/sv
WikiCLIR with Swedish documents.
-
Dataset irds.wikiclir.sv.documents
datamaestro_text.datasets.irds.data.Documents
WikiCLIR with Swedish documents.
-
Dataset irds.wikiclir.sv.queries
datamaestro_text.datasets.irds.data.Topics
WikiCLIR with Swedish documents.
-
Dataset irds.wikiclir.sv.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
WikiCLIR with Swedish documents.
-
Dataset irds.wikiclir.sv
datamaestro_text.datasets.irds.data.Adhoc
WikiCLIR with Swedish documents.
wikiclir/sw
WikiCLIR with Swahili documents.
-
Dataset irds.wikiclir.sw.documents
datamaestro_text.datasets.irds.data.Documents
WikiCLIR with Swahili documents.
-
Dataset irds.wikiclir.sw.queries
datamaestro_text.datasets.irds.data.Topics
WikiCLIR with Swahili documents.
-
Dataset irds.wikiclir.sw.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
WikiCLIR with Swahili documents.
-
Dataset irds.wikiclir.sw
datamaestro_text.datasets.irds.data.Adhoc
WikiCLIR with Swahili documents.
wikiclir/tl
WikiCLIR with Tagalog documents.
-
Dataset irds.wikiclir.tl.documents
datamaestro_text.datasets.irds.data.Documents
WikiCLIR with Tagalog documents.
-
Dataset irds.wikiclir.tl.queries
datamaestro_text.datasets.irds.data.Topics
WikiCLIR with Tagalog documents.
-
Dataset irds.wikiclir.tl.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
WikiCLIR with Tagalog documents.
-
Dataset irds.wikiclir.tl
datamaestro_text.datasets.irds.data.Adhoc
WikiCLIR with Tagalog documents.
wikiclir/tr
WikiCLIR with Turkish documents.
-
Dataset irds.wikiclir.tr.documents
datamaestro_text.datasets.irds.data.Documents
WikiCLIR with Turkish documents.
-
Dataset irds.wikiclir.tr.queries
datamaestro_text.datasets.irds.data.Topics
WikiCLIR with Turkish documents.
-
Dataset irds.wikiclir.tr.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
WikiCLIR with Turkish documents.
-
Dataset irds.wikiclir.tr
datamaestro_text.datasets.irds.data.Adhoc
WikiCLIR with Turkish documents.
wikiclir/uk
WikiCLIR with Ukrainian documents.
-
Dataset irds.wikiclir.uk.documents
datamaestro_text.datasets.irds.data.Documents
WikiCLIR with Ukrainian documents.
-
Dataset irds.wikiclir.uk.queries
datamaestro_text.datasets.irds.data.Topics
WikiCLIR with Ukrainian documents.
-
Dataset irds.wikiclir.uk.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
WikiCLIR with Ukrainian documents.
-
Dataset irds.wikiclir.uk
datamaestro_text.datasets.irds.data.Adhoc
WikiCLIR with Ukrainian documents.
wikiclir/vi
WikiCLIR with Vietnamese documents.
-
Dataset irds.wikiclir.vi.documents
datamaestro_text.datasets.irds.data.Documents
WikiCLIR with Vietnamese documents.
-
Dataset irds.wikiclir.vi.queries
datamaestro_text.datasets.irds.data.Topics
WikiCLIR with Vietnamese documents.
-
Dataset irds.wikiclir.vi.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
WikiCLIR with Vietnamese documents.
-
Dataset irds.wikiclir.vi
datamaestro_text.datasets.irds.data.Adhoc
WikiCLIR with Vietnamese documents.
wikiclir/zh
WikiCLIR with Chinese documents.
-
Dataset irds.wikiclir.zh.documents
datamaestro_text.datasets.irds.data.Documents
WikiCLIR with Chinese documents.
-
Dataset irds.wikiclir.zh.queries
datamaestro_text.datasets.irds.data.Topics
WikiCLIR with Chinese documents.
-
Dataset irds.wikiclir.zh.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
WikiCLIR with Chinese documents.
-
Dataset irds.wikiclir.zh
datamaestro_text.datasets.irds.data.Adhoc
WikiCLIR with Chinese documents.
wikir/en1k
A small version of WikIR for English.
-
Dataset irds.wikir.en1k.documents
datamaestro_text.datasets.irds.data.Documents
A small version of WikIR for English.
-
Dataset irds.wikir.en1k.test.queries
datamaestro_text.datasets.irds.data.Topics
Test set of wikir/en1k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en1k.test.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Test set of wikir/en1k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en1k.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Test set of wikir/en1k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en1k.test
datamaestro_text.datasets.irds.data.Adhoc
Test set of wikir/en1k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en1k.training.queries
datamaestro_text.datasets.irds.data.Topics
Training set of wikir/en1k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en1k.training.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Training set of wikir/en1k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en1k.training.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Training set of wikir/en1k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en1k.training
datamaestro_text.datasets.irds.data.Adhoc
Training set of wikir/en1k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en1k.validation.queries
datamaestro_text.datasets.irds.data.Topics
Validation set of wikir/en1k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en1k.validation.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Validation set of wikir/en1k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en1k.validation.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Validation set of wikir/en1k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en1k.validation
datamaestro_text.datasets.irds.data.Adhoc
Validation set of wikir/en1k. Scoreddocs are the provided BM25 run.
wikir/en59k
WikIR for English.
-
Dataset irds.wikir.en59k.documents
datamaestro_text.datasets.irds.data.Documents
WikIR for English.
-
Dataset irds.wikir.en59k.test.queries
datamaestro_text.datasets.irds.data.Topics
Test set of wikir/en59k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en59k.test.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Test set of wikir/en59k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en59k.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Test set of wikir/en59k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en59k.test
datamaestro_text.datasets.irds.data.Adhoc
Test set of wikir/en59k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en59k.training.queries
datamaestro_text.datasets.irds.data.Topics
Training set of wikir/en59k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en59k.training.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Training set of wikir/en59k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en59k.training.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Training set of wikir/en59k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en59k.training
datamaestro_text.datasets.irds.data.Adhoc
Training set of wikir/en59k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en59k.validation.queries
datamaestro_text.datasets.irds.data.Topics
Validation set of wikir/en59k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en59k.validation.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Validation set of wikir/en59k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en59k.validation.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Validation set of wikir/en59k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en59k.validation
datamaestro_text.datasets.irds.data.Adhoc
Validation set of wikir/en59k. Scoreddocs are the provided BM25 run.
wikir/en78k
WikIR for English. This is one of the two versions used in Frej2020Wikir.
-
Dataset irds.wikir.en78k.documents
datamaestro_text.datasets.irds.data.Documents
WikIR for English. This is one of the two versions used in Frej2020Wikir.
-
Dataset irds.wikir.en78k.test.queries
datamaestro_text.datasets.irds.data.Topics
Test set of wikir/en78k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en78k.test.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Test set of wikir/en78k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en78k.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Test set of wikir/en78k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en78k.test
datamaestro_text.datasets.irds.data.Adhoc
Test set of wikir/en78k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en78k.training.queries
datamaestro_text.datasets.irds.data.Topics
Training set of wikir/en78k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en78k.training.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Training set of wikir/en78k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en78k.training.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Training set of wikir/en78k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en78k.training
datamaestro_text.datasets.irds.data.Adhoc
Training set of wikir/en78k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en78k.validation.queries
datamaestro_text.datasets.irds.data.Topics
Validation set of wikir/en78k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en78k.validation.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Validation set of wikir/en78k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en78k.validation.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Validation set of wikir/en78k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.en78k.validation
datamaestro_text.datasets.irds.data.Adhoc
Validation set of wikir/en78k. Scoreddocs are the provided BM25 run.
wikir/ens78k
WikIR for English, using the first sentences of articles as queries. This is one of the two versions used in Frej2020Wikir.
-
Dataset irds.wikir.ens78k.documents
datamaestro_text.datasets.irds.data.Documents
WikIR for English, using the first sentences of articles as queries. This is one of the two versions used in Frej2020Wikir.
-
Dataset irds.wikir.ens78k.test.queries
datamaestro_text.datasets.irds.data.Topics
Test set of wikir/ens78k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.ens78k.test.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Test set of wikir/ens78k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.ens78k.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Test set of wikir/ens78k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.ens78k.test
datamaestro_text.datasets.irds.data.Adhoc
Test set of wikir/ens78k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.ens78k.training.queries
datamaestro_text.datasets.irds.data.Topics
Training set of wikir/ens78k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.ens78k.training.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Training set of wikir/ens78k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.ens78k.training.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Training set of wikir/ens78k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.ens78k.training
datamaestro_text.datasets.irds.data.Adhoc
Training set of wikir/ens78k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.ens78k.validation.queries
datamaestro_text.datasets.irds.data.Topics
Validation set of wikir/ens78k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.ens78k.validation.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Validation set of wikir/ens78k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.ens78k.validation.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Validation set of wikir/ens78k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.ens78k.validation
datamaestro_text.datasets.irds.data.Adhoc
Validation set of wikir/ens78k. Scoreddocs are the provided BM25 run.
wikir/es13k
WikIR for Spanish.
-
Dataset irds.wikir.es13k.documents
datamaestro_text.datasets.irds.data.Documents
WikIR for Spanish.
-
Dataset irds.wikir.es13k.test.queries
datamaestro_text.datasets.irds.data.Topics
Test set of wikir/es13k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.es13k.test.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Test set of wikir/es13k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.es13k.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Test set of wikir/es13k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.es13k.test
datamaestro_text.datasets.irds.data.Adhoc
Test set of wikir/es13k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.es13k.training.queries
datamaestro_text.datasets.irds.data.Topics
Training set of wikir/es13k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.es13k.training.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Training set of wikir/es13k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.es13k.training.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Training set of wikir/es13k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.es13k.training
datamaestro_text.datasets.irds.data.Adhoc
Training set of wikir/es13k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.es13k.validation.queries
datamaestro_text.datasets.irds.data.Topics
Validation set of wikir/es13k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.es13k.validation.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Validation set of wikir/es13k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.es13k.validation.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Validation set of wikir/es13k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.es13k.validation
datamaestro_text.datasets.irds.data.Adhoc
Validation set of wikir/es13k. Scoreddocs are the provided BM25 run.
wikir/fr14k
WikIR for French.
-
Dataset irds.wikir.fr14k.documents
datamaestro_text.datasets.irds.data.Documents
WikIR for French.
-
Dataset irds.wikir.fr14k.test.queries
datamaestro_text.datasets.irds.data.Topics
Test set of wikir/fr14k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.fr14k.test.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Test set of wikir/fr14k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.fr14k.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Test set of wikir/fr14k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.fr14k.test
datamaestro_text.datasets.irds.data.Adhoc
Test set of wikir/fr14k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.fr14k.training.queries
datamaestro_text.datasets.irds.data.Topics
Training set of wikir/fr14k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.fr14k.training.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Training set of wikir/fr14k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.fr14k.training.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Training set of wikir/fr14k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.fr14k.training
datamaestro_text.datasets.irds.data.Adhoc
Training set of wikir/fr14k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.fr14k.validation.queries
datamaestro_text.datasets.irds.data.Topics
Validation set of wikir/fr14k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.fr14k.validation.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Validation set of wikir/fr14k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.fr14k.validation.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Validation set of wikir/fr14k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.fr14k.validation
datamaestro_text.datasets.irds.data.Adhoc
Validation set of wikir/fr14k. Scoreddocs are the provided BM25 run.
wikir/it16k
WikIR for Italian.
-
Dataset irds.wikir.it16k.documents
datamaestro_text.datasets.irds.data.Documents
WikIR for Italian.
-
Dataset irds.wikir.it16k.test.queries
datamaestro_text.datasets.irds.data.Topics
Test set of wikir/it16k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.it16k.test.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Test set of wikir/it16k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.it16k.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Test set of wikir/it16k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.it16k.test
datamaestro_text.datasets.irds.data.Adhoc
Test set of wikir/it16k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.it16k.training.queries
datamaestro_text.datasets.irds.data.Topics
Training set of wikir/it16k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.it16k.training.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Training set of wikir/it16k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.it16k.training.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Training set of wikir/it16k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.it16k.training
datamaestro_text.datasets.irds.data.Adhoc
Training set of wikir/it16k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.it16k.validation.queries
datamaestro_text.datasets.irds.data.Topics
Validation set of wikir/it16k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.it16k.validation.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Validation set of wikir/it16k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.it16k.validation.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Validation set of wikir/it16k. Scoreddocs are the provided BM25 run.
-
Dataset irds.wikir.it16k.validation
datamaestro_text.datasets.irds.data.Adhoc
Validation set of wikir/it16k. Scoreddocs are the provided BM25 run.
TREC Fair Ranking
The TREC Fair Ranking track evaluates systems according to how well they fairly rank documents.
-
Dataset irds.trec-fair.2021.documents
datamaestro_text.datasets.irds.data.Documents
The TREC Fair Ranking track evaluates systems according to how well they fairly rank documents.
-
Dataset irds.trec-fair.2021.train.queries
datamaestro_text.datasets.irds.data.Topics
Official TREC Fair Ranking 2021 train set.
-
Dataset irds.trec-fair.2021.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official TREC Fair Ranking 2021 train set.
-
Dataset irds.trec-fair.2021.train
datamaestro_text.datasets.irds.data.Adhoc
Official TREC Fair Ranking 2021 train set.
-
Dataset irds.trec-fair.2021.eval.queries
datamaestro_text.datasets.irds.data.Topics
Official TREC Fair Ranking 2021 evaluation set.
-
Dataset irds.trec-fair.2021.eval.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official TREC Fair Ranking 2021 evaluation set.
-
Dataset irds.trec-fair.2021.eval
datamaestro_text.datasets.irds.data.Adhoc
Official TREC Fair Ranking 2021 evaluation set.
trec-fair/2022
The TREC Fair Ranking 2022 track focuses on fairly prioritising Wikimedia articles for editing to provide a fair exposure to articles from different groups.
-
Dataset irds.trec-fair.2022.documents
datamaestro_text.datasets.irds.data.Documents
The TREC Fair Ranking 2022 track focuses on fairly prioritising Wikimedia articles for editing to provide a fair exposure to articles from different groups.
-
Dataset irds.trec-fair.2022.train.queries
datamaestro_text.datasets.irds.data.Topics
Official TREC Fair Ranking 2022 train set.
-
Dataset irds.trec-fair.2022.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official TREC Fair Ranking 2022 train set.
-
Dataset irds.trec-fair.2022.train
datamaestro_text.datasets.irds.data.Adhoc
Official TREC Fair Ranking 2022 train set.
trec-cast/v0
Version 0 of the TREC CAsT corpus. This version uses documents from the Washington Post (version 2), TREC CAR (version 2), and MS MARCO passage (version 1).
This corpus was originally meant to be used for evaluation of the 2019 task, but the Washington Post corpus was not included for scoring in the final version due to "an error in the process led to ambiguous document ids," and Washington Post documents were removed from participating systems. As such, trec-cast/v1 (which doesn't include the Washington Post) should be used for the 2019 version of the task. However, this version still can be used for the training set (trec-cast/v0/train) or for replicating the original submissions to the track (prior to the removal of Washingotn Post documents).
-
Dataset irds.trec-cast.v0.documents
datamaestro_text.datasets.irds.data.Documents
Version 0 of the TREC CAsT corpus. This version uses documents from the Washington Post (version 2), TREC CAR (version 2), and MS MARCO passage (version 1).
This corpus was originally meant to be used for evaluation of the 2019 task, but the Washington Post corpus was not included for scoring in the final version due to "an error in the process led to ambiguous document ids," and Washington Post documents were removed from participating systems. As such, trec-cast/v1 (which doesn't include the Washington Post) should be used for the 2019 version of the task. However, this version still can be used for the training set (trec-cast/v0/train) or for replicating the original submissions to the track (prior to the removal of Washingotn Post documents).
-
Dataset irds.trec-cast.v0.train.queries
datamaestro_text.datasets.irds.data.Topics
Training set provided by TREC CAsT 2019.
-
Dataset irds.trec-cast.v0.train.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Training set provided by TREC CAsT 2019.
-
Dataset irds.trec-cast.v0.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Training set provided by TREC CAsT 2019.
-
Dataset irds.trec-cast.v0.train
datamaestro_text.datasets.irds.data.Adhoc
Training set provided by TREC CAsT 2019.
-
Dataset irds.trec-cast.v0.train.judged.queries
datamaestro_text.datasets.irds.data.Topics
trec-cast/2019/train, but with queries that do not appear in the qrels removed.
-
Dataset irds.trec-cast.v0.train.judged.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
trec-cast/2019/train, but with queries that do not appear in the qrels removed.
-
Dataset irds.trec-cast.v0.train.judged.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
trec-cast/2019/train, but with queries that do not appear in the qrels removed.
-
Dataset irds.trec-cast.v0.train.judged
datamaestro_text.datasets.irds.data.Adhoc
trec-cast/2019/train, but with queries that do not appear in the qrels removed.
trec-cast/v1
Version 1 of the TREC CAsT corpus. This version uses documents from the TREC CAR (version 2) and MS MARCO passage (version 1). This version of the corpus was used for TREC CAsT 2019 and 2020.
-
Dataset irds.trec-cast.v1.documents
datamaestro_text.datasets.irds.data.Documents
Version 1 of the TREC CAsT corpus. This version uses documents from the TREC CAR (version 2) and MS MARCO passage (version 1). This version of the corpus was used for TREC CAsT 2019 and 2020.
-
Dataset irds.trec-cast.v1.2019.queries
datamaestro_text.datasets.irds.data.Topics
Official evaluation set for TREC CAsT 2019.
-
Dataset irds.trec-cast.v1.2019.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
Official evaluation set for TREC CAsT 2019.
-
Dataset irds.trec-cast.v1.2019.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official evaluation set for TREC CAsT 2019.
-
Dataset irds.trec-cast.v1.2019
datamaestro_text.datasets.irds.data.Adhoc
Official evaluation set for TREC CAsT 2019.
-
Dataset irds.trec-cast.v1.2019.judged.queries
datamaestro_text.datasets.irds.data.Topics
trec-cast/v1/2019, but with queries that do not appear in the qrels removed.
-
Dataset irds.trec-cast.v1.2019.judged.scoreddocs
datamaestro_text.datasets.irds.data.AdhocRun
trec-cast/v1/2019, but with queries that do not appear in the qrels removed.
-
Dataset irds.trec-cast.v1.2019.judged.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
trec-cast/v1/2019, but with queries that do not appear in the qrels removed.
-
Dataset irds.trec-cast.v1.2019.judged
datamaestro_text.datasets.irds.data.Adhoc
trec-cast/v1/2019, but with queries that do not appear in the qrels removed.
-
Dataset irds.trec-cast.v1.2020.queries
datamaestro_text.datasets.irds.data.Topics
Official evaluation set for TREC CAsT 2020.
-
Dataset irds.trec-cast.v1.2020.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Official evaluation set for TREC CAsT 2020.
-
Dataset irds.trec-cast.v1.2020
datamaestro_text.datasets.irds.data.Adhoc
Official evaluation set for TREC CAsT 2020.
-
Dataset irds.trec-cast.v1.2020.judged.queries
datamaestro_text.datasets.irds.data.Topics
trec-cast/v1/2020, but with queries that do not appear in the qrels removed.
-
Dataset irds.trec-cast.v1.2020.judged.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
trec-cast/v1/2020, but with queries that do not appear in the qrels removed.
-
Dataset irds.trec-cast.v1.2020.judged
datamaestro_text.datasets.irds.data.Adhoc
trec-cast/v1/2020, but with queries that do not appear in the qrels removed.
hc4/fa
The Persian collection contains English queries and Persian documents for retrieval. Human and machine translated queries are provided in the query object for running monolingual retrieval or cross-language retrival assuming the machine query tranlstion into Persian is available.
-
Dataset irds.hc4.fa.documents
datamaestro_text.datasets.irds.data.Documents
The Persian collection contains English queries and Persian documents for retrieval. Human and machine translated queries are provided in the query object for running monolingual retrieval or cross-language retrival assuming the machine query tranlstion into Persian is available.
-
Dataset irds.hc4.fa.dev.queries
datamaestro_text.datasets.irds.data.Topics
Development split of hc4/fa.
-
Dataset irds.hc4.fa.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Development split of hc4/fa.
-
Dataset irds.hc4.fa.dev
datamaestro_text.datasets.irds.data.Adhoc
Development split of hc4/fa.
-
Dataset irds.hc4.fa.test.queries
datamaestro_text.datasets.irds.data.Topics
Test split of hc4/fa.
-
Dataset irds.hc4.fa.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Test split of hc4/fa.
-
Dataset irds.hc4.fa.test
datamaestro_text.datasets.irds.data.Adhoc
Test split of hc4/fa.
-
Dataset irds.hc4.fa.train.queries
datamaestro_text.datasets.irds.data.Topics
Train split of hc4/fa.
-
Dataset irds.hc4.fa.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Train split of hc4/fa.
-
Dataset irds.hc4.fa.train
datamaestro_text.datasets.irds.data.Adhoc
Train split of hc4/fa.
hc4/ru
The Russian collection contains English queries and Russian documents for retrieval. Human and machine translated queries are provided in the query object for running monolingual retrieval or cross-language retrival assuming the machine query tranlstion into Russian is available.
-
Dataset irds.hc4.ru.documents
datamaestro_text.datasets.irds.data.Documents
The Russian collection contains English queries and Russian documents for retrieval. Human and machine translated queries are provided in the query object for running monolingual retrieval or cross-language retrival assuming the machine query tranlstion into Russian is available.
-
Dataset irds.hc4.ru.dev.queries
datamaestro_text.datasets.irds.data.Topics
Development split of hc4/ru.
-
Dataset irds.hc4.ru.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Development split of hc4/ru.
-
Dataset irds.hc4.ru.dev
datamaestro_text.datasets.irds.data.Adhoc
Development split of hc4/ru.
-
Dataset irds.hc4.ru.test.queries
datamaestro_text.datasets.irds.data.Topics
Test split of hc4/ru.
-
Dataset irds.hc4.ru.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Test split of hc4/ru.
-
Dataset irds.hc4.ru.test
datamaestro_text.datasets.irds.data.Adhoc
Test split of hc4/ru.
-
Dataset irds.hc4.ru.train.queries
datamaestro_text.datasets.irds.data.Topics
Train split of hc4/ru.
-
Dataset irds.hc4.ru.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Train split of hc4/ru.
-
Dataset irds.hc4.ru.train
datamaestro_text.datasets.irds.data.Adhoc
Train split of hc4/ru.
hc4/zh
The Chinese collection contains English queries and Chinese documents for retrieval. Human and machine translated queries are provided in the query object for running monolingual retrieval or cross-language retrival assuming the machine query tranlstion into Chinese is available.
-
Dataset irds.hc4.zh.documents
datamaestro_text.datasets.irds.data.Documents
The Chinese collection contains English queries and Chinese documents for retrieval. Human and machine translated queries are provided in the query object for running monolingual retrieval or cross-language retrival assuming the machine query tranlstion into Chinese is available.
-
Dataset irds.hc4.zh.dev.queries
datamaestro_text.datasets.irds.data.Topics
Development split of hc4/zh.
-
Dataset irds.hc4.zh.dev.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Development split of hc4/zh.
-
Dataset irds.hc4.zh.dev
datamaestro_text.datasets.irds.data.Adhoc
Development split of hc4/zh.
-
Dataset irds.hc4.zh.test.queries
datamaestro_text.datasets.irds.data.Topics
Test split of hc4/zh.
-
Dataset irds.hc4.zh.test.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Test split of hc4/zh.
-
Dataset irds.hc4.zh.test
datamaestro_text.datasets.irds.data.Adhoc
Test split of hc4/zh.
-
Dataset irds.hc4.zh.train.queries
datamaestro_text.datasets.irds.data.Topics
Train split of hc4/zh.
-
Dataset irds.hc4.zh.train.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Train split of hc4/zh.
-
Dataset irds.hc4.zh.train
datamaestro_text.datasets.irds.data.Adhoc
Train split of hc4/zh.
neuclir/1/fa
The Persian collection contains English queries (to be released) and Persian documents for retrieval. Human and machine translated queries will be provided in the query object for running monolingual retrieval or cross-language retrival assuming the machine query tranlstion into Persian is available.
-
Dataset irds.neuclir.1.fa.documents
datamaestro_text.datasets.irds.data.Documents
The Persian collection contains English queries (to be released) and Persian documents for retrieval. Human and machine translated queries will be provided in the query object for running monolingual retrieval or cross-language retrival assuming the machine query tranlstion into Persian is available.
-
Dataset irds.neuclir.1.fa.trec-2022.queries
datamaestro_text.datasets.irds.data.Topics
Topics and assessments for the TREC NeuCLIR 2022 (Persian language CLIR).
-
Dataset irds.neuclir.1.fa.trec-2022.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Topics and assessments for the TREC NeuCLIR 2022 (Persian language CLIR).
-
Dataset irds.neuclir.1.fa.trec-2022
datamaestro_text.datasets.irds.data.Adhoc
Topics and assessments for the TREC NeuCLIR 2022 (Persian language CLIR).
-
Dataset irds.neuclir.1.fa.trec-2023.queries
datamaestro_text.datasets.irds.data.Topics
Topics and assessments for the TREC NeuCLIR 2023 (Persian language CLIR).
-
Dataset irds.neuclir.1.fa.trec-2023.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Topics and assessments for the TREC NeuCLIR 2023 (Persian language CLIR).
-
Dataset irds.neuclir.1.fa.trec-2023
datamaestro_text.datasets.irds.data.Adhoc
Topics and assessments for the TREC NeuCLIR 2023 (Persian language CLIR).
neuclir/1/fa/hc4-filtered
Subset of the Persian collection that intersect with HC4. The 60 queries are the hc4/fa/dev and hc4/fa/test sets combined.
-
Dataset irds.neuclir.1.fa.hc4-filtered.documents
datamaestro_text.datasets.irds.data.Documents
Subset of the Persian collection that intersect with HC4. The 60 queries are the hc4/fa/dev and hc4/fa/test sets combined.
-
Dataset irds.neuclir.1.fa.hc4-filtered.queries
datamaestro_text.datasets.irds.data.Topics
Subset of the Persian collection that intersect with HC4. The 60 queries are the hc4/fa/dev and hc4/fa/test sets combined.
-
Dataset irds.neuclir.1.fa.hc4-filtered.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Subset of the Persian collection that intersect with HC4. The 60 queries are the hc4/fa/dev and hc4/fa/test sets combined.
-
Dataset irds.neuclir.1.fa.hc4-filtered
datamaestro_text.datasets.irds.data.Adhoc
Subset of the Persian collection that intersect with HC4. The 60 queries are the hc4/fa/dev and hc4/fa/test sets combined.
neuclir/1/multi
A combined corpus of NeuCLIR v1 including all Persian, Russian, and Chinese documents.
-
Dataset irds.neuclir.1.multi.documents
datamaestro_text.datasets.irds.data.Documents
A combined corpus of NeuCLIR v1 including all Persian, Russian, and Chinese documents.
-
Dataset irds.neuclir.1.multi.trec-2023.queries
datamaestro_text.datasets.irds.data.Topics
Topics and assessments for the TREC NeuCLIR 2023 multi-language retrieval task.
-
Dataset irds.neuclir.1.multi.trec-2023.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Topics and assessments for the TREC NeuCLIR 2023 multi-language retrieval task.
-
Dataset irds.neuclir.1.multi.trec-2023
datamaestro_text.datasets.irds.data.Adhoc
Topics and assessments for the TREC NeuCLIR 2023 multi-language retrieval task.
neuclir/1/ru
The Russian collection contains English queries (to be released) and Russian documents for retrieval. Human and machine translated queries will be provided in the query object for running monolingual retrieval or cross-language retrival assuming the machine query tranlstion into Russian is available.
-
Dataset irds.neuclir.1.ru.documents
datamaestro_text.datasets.irds.data.Documents
The Russian collection contains English queries (to be released) and Russian documents for retrieval. Human and machine translated queries will be provided in the query object for running monolingual retrieval or cross-language retrival assuming the machine query tranlstion into Russian is available.
-
Dataset irds.neuclir.1.ru.trec-2022.queries
datamaestro_text.datasets.irds.data.Topics
Topics and assessments for the TREC NeuCLIR 2022 (Russian language CLIR).
-
Dataset irds.neuclir.1.ru.trec-2022.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Topics and assessments for the TREC NeuCLIR 2022 (Russian language CLIR).
-
Dataset irds.neuclir.1.ru.trec-2022
datamaestro_text.datasets.irds.data.Adhoc
Topics and assessments for the TREC NeuCLIR 2022 (Russian language CLIR).
-
Dataset irds.neuclir.1.ru.trec-2023.queries
datamaestro_text.datasets.irds.data.Topics
Topics and assessments for the TREC NeuCLIR 2023 (Russian language CLIR).
-
Dataset irds.neuclir.1.ru.trec-2023.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Topics and assessments for the TREC NeuCLIR 2023 (Russian language CLIR).
-
Dataset irds.neuclir.1.ru.trec-2023
datamaestro_text.datasets.irds.data.Adhoc
Topics and assessments for the TREC NeuCLIR 2023 (Russian language CLIR).
neuclir/1/ru/hc4-filtered
Subset of the Russian collection that intersect with HC4. The 54 queries are the hc4/ru/dev and hc4/ru/test sets combined.
-
Dataset irds.neuclir.1.ru.hc4-filtered.documents
datamaestro_text.datasets.irds.data.Documents
Subset of the Russian collection that intersect with HC4. The 54 queries are the hc4/ru/dev and hc4/ru/test sets combined.
-
Dataset irds.neuclir.1.ru.hc4-filtered.queries
datamaestro_text.datasets.irds.data.Topics
Subset of the Russian collection that intersect with HC4. The 54 queries are the hc4/ru/dev and hc4/ru/test sets combined.
-
Dataset irds.neuclir.1.ru.hc4-filtered.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Subset of the Russian collection that intersect with HC4. The 54 queries are the hc4/ru/dev and hc4/ru/test sets combined.
-
Dataset irds.neuclir.1.ru.hc4-filtered
datamaestro_text.datasets.irds.data.Adhoc
Subset of the Russian collection that intersect with HC4. The 54 queries are the hc4/ru/dev and hc4/ru/test sets combined.
neuclir/1/zh
The Chinese collection contains English queries (to be released) and Chinese documents for retrieval. Human and machine translated queries will be provided in the query object for running monolingual retrieval or cross-language retrival assuming the machine query tranlstion into Chinese is available.
-
Dataset irds.neuclir.1.zh.documents
datamaestro_text.datasets.irds.data.Documents
The Chinese collection contains English queries (to be released) and Chinese documents for retrieval. Human and machine translated queries will be provided in the query object for running monolingual retrieval or cross-language retrival assuming the machine query tranlstion into Chinese is available.
-
Dataset irds.neuclir.1.zh.trec-2022.queries
datamaestro_text.datasets.irds.data.Topics
Topics and assessments for the TREC NeuCLIR 2022 (Chinese language CLIR).
-
Dataset irds.neuclir.1.zh.trec-2022.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Topics and assessments for the TREC NeuCLIR 2022 (Chinese language CLIR).
-
Dataset irds.neuclir.1.zh.trec-2022
datamaestro_text.datasets.irds.data.Adhoc
Topics and assessments for the TREC NeuCLIR 2022 (Chinese language CLIR).
-
Dataset irds.neuclir.1.zh.trec-2023.queries
datamaestro_text.datasets.irds.data.Topics
Topics and assessments for the TREC NeuCLIR 2023 (Chinese language CLIR).
-
Dataset irds.neuclir.1.zh.trec-2023.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Topics and assessments for the TREC NeuCLIR 2023 (Chinese language CLIR).
-
Dataset irds.neuclir.1.zh.trec-2023
datamaestro_text.datasets.irds.data.Adhoc
Topics and assessments for the TREC NeuCLIR 2023 (Chinese language CLIR).
neuclir/1/zh/hc4-filtered
Subset of the Chinse collection that intersect with HC4. The 60 queries are the hc4/zh/dev and hc4/zh/test sets combined.
-
Dataset irds.neuclir.1.zh.hc4-filtered.documents
datamaestro_text.datasets.irds.data.Documents
Subset of the Chinse collection that intersect with HC4. The 60 queries are the hc4/zh/dev and hc4/zh/test sets combined.
-
Dataset irds.neuclir.1.zh.hc4-filtered.queries
datamaestro_text.datasets.irds.data.Topics
Subset of the Chinse collection that intersect with HC4. The 60 queries are the hc4/zh/dev and hc4/zh/test sets combined.
-
Dataset irds.neuclir.1.zh.hc4-filtered.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
Subset of the Chinse collection that intersect with HC4. The 60 queries are the hc4/zh/dev and hc4/zh/test sets combined.
-
Dataset irds.neuclir.1.zh.hc4-filtered
datamaestro_text.datasets.irds.data.Adhoc
Subset of the Chinse collection that intersect with HC4. The 60 queries are the hc4/zh/dev and hc4/zh/test sets combined.
SARA
A set of sensitivity-aware relevance assessments. More information is avaliable here:
-
Dataset irds.sara.documents
datamaestro_text.datasets.irds.data.Documents
A set of sensitivity-aware relevance assessments. More information is avaliable here:
-
Dataset irds.sara.queries
datamaestro_text.datasets.irds.data.Topics
A set of sensitivity-aware relevance assessments. More information is avaliable here:
-
Dataset irds.sara.qrels
datamaestro_text.datasets.irds.data.AdhocAssessments
A set of sensitivity-aware relevance assessments. More information is avaliable here:
-
Dataset irds.sara
datamaestro_text.datasets.irds.data.Adhoc
A set of sensitivity-aware relevance assessments. More information is avaliable here:
trec-tot/2025
-
Dataset irds.trec-tot.2025.documents
trec-tot/2025/train
-
Dataset irds.trec-tot.2025.train.documents
-
Dataset irds.trec-tot.2025.train.queries
-
Dataset irds.trec-tot.2025.train.qrels
-
Dataset irds.trec-tot.2025.train
trec-tot/2025/dev1
-
Dataset irds.trec-tot.2025.dev1.documents
-
Dataset irds.trec-tot.2025.dev1.queries
-
Dataset irds.trec-tot.2025.dev1.qrels
-
Dataset irds.trec-tot.2025.dev1
trec-tot/2025/dev2
-
Dataset irds.trec-tot.2025.dev2.documents
-
Dataset irds.trec-tot.2025.dev2.queries
-
Dataset irds.trec-tot.2025.dev2.qrels
-
Dataset irds.trec-tot.2025.dev2
trec-tot/2025/dev3
-
Dataset irds.trec-tot.2025.dev3.documents
-
Dataset irds.trec-tot.2025.dev3.queries
-
Dataset irds.trec-tot.2025.dev3.qrels
-
Dataset irds.trec-tot.2025.dev3