IR-Datasets Integration

Datamaestro-text provides an interface to the ir-datasets library, giving access to hundreds of IR benchmarks through a unified API.

Install ir-datasets:

pip install ir-datasets

Usage:

from datamaestro import prepare_dataset

# Load any ir-datasets collection via the irds namespace
dataset = prepare_dataset("irds.msmarco-passage")

# Same API as native datasets
for doc in dataset.documents.iter_documents():
    print(doc)

The list below is auto-generated and may not reflect the exact version of ir-datasets installed on your system.

Data Types

These wrapper types provide the datamaestro interface for ir-datasets data:

XPM Configdatamaestro_text.datasets.irds.data.Topics(*, irds, id)

Bases: TopicsStore, IRDSId

irds: str

The id to load the dataset from ir_datasets

id: str

The unique (sub-)dataset ID

XPM Configdatamaestro_text.datasets.irds.data.Documents(*, irds, id, count, file_access)

Bases: DocumentStore, IRDSId

irds: str

The id to load the dataset from ir_datasets

id: str

The unique (sub-)dataset ID

count: int

Number of documents

file_access: FileAccess = FileAccess.MMAP

How to access the file collection (might not have any impact, depends on the docstore)

XPM Configdatamaestro_text.datasets.irds.data.AdhocAssessments(*, irds, id)

Bases: AdhocAssessments, IRDSId

irds: str

The id to load the dataset from ir_datasets

id: str

The unique (sub-)dataset ID

See also LZ4DocumentStore in the Information Retrieval API section.

Available Datasets

ANTIQUE

"ANTIQUE is a non-factoid quesiton answering dataset based on the questions and answers of Yahoo! Webscope L6."

  • Documents: Short answer passages (from Yahoo Answers)
  • Queries: Natural language questions (from Yahoo Answers)
  • Dataset Paper
Dataset irds.antique.documents

datamaestro_text.datasets.irds.data.Documents

"ANTIQUE is a non-factoid quesiton answering dataset based on the questions and answers of Yahoo! Webscope L6."

  • Documents: Short answer passages (from Yahoo Answers)
  • Queries: Natural language questions (from Yahoo Answers)
  • Dataset Paper
Dataset irds.antique.test.queries

datamaestro_text.datasets.irds.data.Topics

Official test set of the ANTIQUE dataset.

Dataset irds.antique.test.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official test set of the ANTIQUE dataset.

Dataset irds.antique.test

datamaestro_text.datasets.irds.data.Adhoc

Official test set of the ANTIQUE dataset.

Dataset irds.antique.test.non-offensive.queries

datamaestro_text.datasets.irds.data.Topics

antique/test without a set of queries deemed by the authors of ANTIQUE to be "offensive (and noisy)."

Dataset irds.antique.test.non-offensive.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

antique/test without a set of queries deemed by the authors of ANTIQUE to be "offensive (and noisy)."

Dataset irds.antique.test.non-offensive

datamaestro_text.datasets.irds.data.Adhoc

antique/test without a set of queries deemed by the authors of ANTIQUE to be "offensive (and noisy)."

Dataset irds.antique.train.queries

datamaestro_text.datasets.irds.data.Topics

Official train set of the ANTIQUE dataset.

Dataset irds.antique.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official train set of the ANTIQUE dataset.

Dataset irds.antique.train

datamaestro_text.datasets.irds.data.Adhoc

Official train set of the ANTIQUE dataset.

Dataset irds.antique.train.split200-train.queries

datamaestro_text.datasets.irds.data.Topics

antique/train without the 200 queries used by antique/train/split200-valid.

Dataset irds.antique.train.split200-train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

antique/train without the 200 queries used by antique/train/split200-valid.

Dataset irds.antique.train.split200-train

datamaestro_text.datasets.irds.data.Adhoc

antique/train without the 200 queries used by antique/train/split200-valid.

Dataset irds.antique.train.split200-valid.queries

datamaestro_text.datasets.irds.data.Topics

A held-out subset of 200 queries from antique/train. Use in conjunction with antique/train/split200-train.

Dataset irds.antique.train.split200-valid.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A held-out subset of 200 queries from antique/train. Use in conjunction with antique/train/split200-train.

Dataset irds.antique.train.split200-valid

datamaestro_text.datasets.irds.data.Adhoc

A held-out subset of 200 queries from antique/train. Use in conjunction with antique/train/split200-train.

AOL-IA (Internet Archive)

This is a version of the AOL Query Log. Documents use versions that appeared around the time of the query log (early 2006) via the Internet Archive.

The query log does not include document or query IDs. These are instead created by ir_datasets. Document IDs are assigned using a hash of the URL that appears in the query log. Query IDs are assigned using the a hash of the noramlised query. All unique normalized queries are available from queries, and all clicked documents are available from qrels (iteration value set to the user ID). Full information (including original query) are available from qlogs.

Dataset irds.aol-ia.documents

datamaestro_text.datasets.irds.data.Documents

This is a version of the AOL Query Log. Documents use versions that appeared around the time of the query log (early 2006) via the Internet Archive.

The query log does not include document or query IDs. These are instead created by ir_datasets. Document IDs are assigned using a hash of the URL that appears in the query log. Query IDs are assigned using the a hash of the noramlised query. All unique normalized queries are available from queries, and all clicked documents are available from qrels (iteration value set to the user ID). Full information (including original query) are available from qlogs.

Dataset irds.aol-ia.queries

datamaestro_text.datasets.irds.data.Topics

This is a version of the AOL Query Log. Documents use versions that appeared around the time of the query log (early 2006) via the Internet Archive.

The query log does not include document or query IDs. These are instead created by ir_datasets. Document IDs are assigned using a hash of the URL that appears in the query log. Query IDs are assigned using the a hash of the noramlised query. All unique normalized queries are available from queries, and all clicked documents are available from qrels (iteration value set to the user ID). Full information (including original query) are available from qlogs.

Dataset irds.aol-ia.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

This is a version of the AOL Query Log. Documents use versions that appeared around the time of the query log (early 2006) via the Internet Archive.

The query log does not include document or query IDs. These are instead created by ir_datasets. Document IDs are assigned using a hash of the URL that appears in the query log. Query IDs are assigned using the a hash of the noramlised query. All unique normalized queries are available from queries, and all clicked documents are available from qrels (iteration value set to the user ID). Full information (including original query) are available from qlogs.

Dataset irds.aol-ia

datamaestro_text.datasets.irds.data.Adhoc

This is a version of the AOL Query Log. Documents use versions that appeared around the time of the query log (early 2006) via the Internet Archive.

The query log does not include document or query IDs. These are instead created by ir_datasets. Document IDs are assigned using a hash of the URL that appears in the query log. Query IDs are assigned using the a hash of the noramlised query. All unique normalized queries are available from queries, and all clicked documents are available from qrels (iteration value set to the user ID). Full information (including original query) are available from qlogs.

AQUAINT

A document collection of about 1M English newswire text. Sources are the Xinhua News Service (People's Republic of China), the New York Times News Service, and the Associated Press Worldstream News Service.

Dataset irds.aquaint.documents

datamaestro_text.datasets.irds.data.Documents

A document collection of about 1M English newswire text. Sources are the Xinhua News Service (People's Republic of China), the New York Times News Service, and the Associated Press Worldstream News Service.

Dataset irds.aquaint.trec-robust-2005.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Robust 2005 dataset. Contains a subset of 50 "hard" queries from trec-robust04.

Dataset irds.aquaint.trec-robust-2005.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Robust 2005 dataset. Contains a subset of 50 "hard" queries from trec-robust04.

Dataset irds.aquaint.trec-robust-2005

datamaestro_text.datasets.irds.data.Adhoc

The TREC Robust 2005 dataset. Contains a subset of 50 "hard" queries from trec-robust04.

args.me version 1.0

Corpus version 1.0 with 387 606 arguments crawled from Debatewise, IDebate.org, Debatepedia, Debate.org. It was released on July 9, 2019 on Zenodo. The cleaned version argsme/1.0-cleaned should be preferred.

This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.

Dataset irds.argsme.1.0.documents

datamaestro_text.datasets.irds.data.Documents

Corpus version 1.0 with 387 606 arguments crawled from Debatewise, IDebate.org, Debatepedia, Debate.org. It was released on July 9, 2019 on Zenodo. The cleaned version argsme/1.0-cleaned should be preferred.

This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.

Dataset irds.argsme.1.0.touche-2020-task-1.uncorrected.queries

datamaestro_text.datasets.irds.data.Topics

Version of argsme/2020-04-01/touche-2020-task-1 that uses the argsme/1.0 corpus with uncorrected relevance judgements derived from crowdworkers. This dataset's relevance judgements should not be used without preprocessing.

Dataset irds.argsme.1.0.touche-2020-task-1.uncorrected.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of argsme/2020-04-01/touche-2020-task-1 that uses the argsme/1.0 corpus with uncorrected relevance judgements derived from crowdworkers. This dataset's relevance judgements should not be used without preprocessing.

Dataset irds.argsme.1.0.touche-2020-task-1.uncorrected

datamaestro_text.datasets.irds.data.Adhoc

Version of argsme/2020-04-01/touche-2020-task-1 that uses the argsme/1.0 corpus with uncorrected relevance judgements derived from crowdworkers. This dataset's relevance judgements should not be used without preprocessing.

args.me version 1.0 cleaned

Corpus version 1.0-cleaned with 382 545 arguments crawled from Debatewise, IDebate.org, Debatepedia, Debate.org. This version contains the same arguments as argsme/1.0, but was cleaned as described in the corresponding publication. It was released on October 27, 2020 on Zenodo.

This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.

Dataset irds.argsme.1.0-cleaned.documents

datamaestro_text.datasets.irds.data.Documents

Corpus version 1.0-cleaned with 382 545 arguments crawled from Debatewise, IDebate.org, Debatepedia, Debate.org. This version contains the same arguments as argsme/1.0, but was cleaned as described in the corresponding publication. It was released on October 27, 2020 on Zenodo.

This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.

argsme/2020-04-01/debateorg

Subset of the 338 620 arguments from argsme/2020-04-01 that were crawled from the debate portal Debate.org.

Dataset irds.argsme.2020-04-01.debateorg.documents

datamaestro_text.datasets.irds.data.Documents

Subset of the 338 620 arguments from argsme/2020-04-01 that were crawled from the debate portal Debate.org.

argsme/2020-04-01/debatepedia

Subset of the 21 197 arguments from argsme/2020-04-01 that were crawled from the debate portal Debatepedia.

Dataset irds.argsme.2020-04-01.debatepedia.documents

datamaestro_text.datasets.irds.data.Documents

Subset of the 21 197 arguments from argsme/2020-04-01 that were crawled from the debate portal Debatepedia.

argsme/2020-04-01/debatewise

Subset of the 14 353 arguments from argsme/2020-04-01 that were crawled from the debate portal Debatewise.

Dataset irds.argsme.2020-04-01.debatewise.documents

datamaestro_text.datasets.irds.data.Documents

Subset of the 14 353 arguments from argsme/2020-04-01 that were crawled from the debate portal Debatewise.

argsme/2020-04-01/idebate

Subset of the 13 522 arguments from argsme/2020-04-01 that were crawled from the debate portal IDebate.org.

Dataset irds.argsme.2020-04-01.idebate.documents

datamaestro_text.datasets.irds.data.Documents

Subset of the 13 522 arguments from argsme/2020-04-01 that were crawled from the debate portal IDebate.org.

argsme/2020-04-01/parliamentary

Subset of the 48 arguments from argsme/2020-04-01 that were crawled from Canadian Parliament discussions.

Dataset irds.argsme.2020-04-01.parliamentary.documents

datamaestro_text.datasets.irds.data.Documents

Subset of the 48 arguments from argsme/2020-04-01 that were crawled from Canadian Parliament discussions.

argsme/2020-04-01/processed

Pre-processed version of argsme/2020-04-01 where each argument is split into sentences.

Dataset irds.argsme.2020-04-01.processed.documents

datamaestro_text.datasets.irds.data.Documents

Pre-processed version of argsme/2020-04-01 where each argument is split into sentences.

Dataset irds.argsme.2020-04-01.processed.touche-2022-task-1.queries

datamaestro_text.datasets.irds.data.Topics

Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2022 is the third lab on argument retrieval at CLEF 2022 featuring three tasks.

Given a query about a controversial topic, retrieve and rank a relevant pair of sentences from a collection of arguments (argsme/2020-04-01-processed).

Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.

Dataset irds.argsme.2020-04-01.processed.touche-2022-task-1.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2022 is the third lab on argument retrieval at CLEF 2022 featuring three tasks.

Given a query about a controversial topic, retrieve and rank a relevant pair of sentences from a collection of arguments (argsme/2020-04-01-processed).

Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.

Dataset irds.argsme.2020-04-01.processed.touche-2022-task-1

datamaestro_text.datasets.irds.data.Adhoc

Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2022 is the third lab on argument retrieval at CLEF 2022 featuring three tasks.

Given a query about a controversial topic, retrieve and rank a relevant pair of sentences from a collection of arguments (argsme/2020-04-01-processed).

Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.

args.me

Corpus version 2020-04-01 with 387 740 arguments crawled from Debatewise, IDebate.org, Debatepedia, Debate.org, and from Canadian Parliament discussions. It was released on April 1, 2020 on Zenodo.

This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.

Dataset irds.argsme.2020-04-01.documents

datamaestro_text.datasets.irds.data.Documents

Corpus version 2020-04-01 with 387 740 arguments crawled from Debatewise, IDebate.org, Debatepedia, Debate.org, and from Canadian Parliament discussions. It was released on April 1, 2020 on Zenodo.

This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.

Dataset irds.argsme.2020-04-01.touche-2020-task-1.queries

datamaestro_text.datasets.irds.data.Topics

Decision making processes, be it at the societal or at the personal level, eventually come to a point where one side will challenge the other with a why-question, which is a prompt to justify one's stance. Thus, technologies for argument mining and argumentation processing are maturing at a rapid pace, giving rise for the first time to argument retrieval. Touché 2020 is the first lab on Argument Retrieval at CLEF 2020 featuring two tasks.

Given a question on a controversial topic, retrieve relevant arguments from a focused crawl of online debate portals (argsme/2020-04-01).

Documents are judged based on their general topical relevance.

Dataset irds.argsme.2020-04-01.touche-2020-task-1.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Decision making processes, be it at the societal or at the personal level, eventually come to a point where one side will challenge the other with a why-question, which is a prompt to justify one's stance. Thus, technologies for argument mining and argumentation processing are maturing at a rapid pace, giving rise for the first time to argument retrieval. Touché 2020 is the first lab on Argument Retrieval at CLEF 2020 featuring two tasks.

Given a question on a controversial topic, retrieve relevant arguments from a focused crawl of online debate portals (argsme/2020-04-01).

Documents are judged based on their general topical relevance.

Dataset irds.argsme.2020-04-01.touche-2020-task-1

datamaestro_text.datasets.irds.data.Adhoc

Decision making processes, be it at the societal or at the personal level, eventually come to a point where one side will challenge the other with a why-question, which is a prompt to justify one's stance. Thus, technologies for argument mining and argumentation processing are maturing at a rapid pace, giving rise for the first time to argument retrieval. Touché 2020 is the first lab on Argument Retrieval at CLEF 2020 featuring two tasks.

Given a question on a controversial topic, retrieve relevant arguments from a focused crawl of online debate portals (argsme/2020-04-01).

Documents are judged based on their general topical relevance.

Dataset irds.argsme.2020-04-01.touche-2021-task-1.queries

datamaestro_text.datasets.irds.data.Topics

Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2021 is the second lab on argument retrieval at CLEF 2021 featuring two tasks.

Given a question on a controversial topic, retrieve relevant arguments from a focused crawl of online debate portals (argsme/2020-04-01).

Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.

Dataset irds.argsme.2020-04-01.touche-2021-task-1.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2021 is the second lab on argument retrieval at CLEF 2021 featuring two tasks.

Given a question on a controversial topic, retrieve relevant arguments from a focused crawl of online debate portals (argsme/2020-04-01).

Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.

Dataset irds.argsme.2020-04-01.touche-2021-task-1

datamaestro_text.datasets.irds.data.Adhoc

Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2021 is the second lab on argument retrieval at CLEF 2021 featuring two tasks.

Given a question on a controversial topic, retrieve relevant arguments from a focused crawl of online debate portals (argsme/2020-04-01).

Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.

Dataset irds.argsme.2020-04-01.touche-2020-task-1.uncorrected.queries

datamaestro_text.datasets.irds.data.Topics

Version of argsme/2020-04-01/touche-2020-task-1 that uses uncorrected relevance judgements derived from crowdworkers. This dataset's relevance judgements should not be used without preprocessing.

Dataset irds.argsme.2020-04-01.touche-2020-task-1.uncorrected.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of argsme/2020-04-01/touche-2020-task-1 that uses uncorrected relevance judgements derived from crowdworkers. This dataset's relevance judgements should not be used without preprocessing.

Dataset irds.argsme.2020-04-01.touche-2020-task-1.uncorrected

datamaestro_text.datasets.irds.data.Adhoc

Version of argsme/2020-04-01/touche-2020-task-1 that uses uncorrected relevance judgements derived from crowdworkers. This dataset's relevance judgements should not be used without preprocessing.

beir/arguana

A version of the ArguAna Counterargs dataset, for argument retrieval.

Dataset irds.beir.arguana.documents

datamaestro_text.datasets.irds.data.Documents

A version of the ArguAna Counterargs dataset, for argument retrieval.

Dataset irds.beir.arguana.queries

datamaestro_text.datasets.irds.data.Topics

A version of the ArguAna Counterargs dataset, for argument retrieval.

Dataset irds.beir.arguana.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the ArguAna Counterargs dataset, for argument retrieval.

Dataset irds.beir.arguana

datamaestro_text.datasets.irds.data.Adhoc

A version of the ArguAna Counterargs dataset, for argument retrieval.

beir/climate-fever

A version of the CLIMATE-FEVER dataset, for fact verification on claims about climate.

Dataset irds.beir.climate-fever.documents

datamaestro_text.datasets.irds.data.Documents

A version of the CLIMATE-FEVER dataset, for fact verification on claims about climate.

Dataset irds.beir.climate-fever.queries

datamaestro_text.datasets.irds.data.Topics

A version of the CLIMATE-FEVER dataset, for fact verification on claims about climate.

Dataset irds.beir.climate-fever.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the CLIMATE-FEVER dataset, for fact verification on claims about climate.

Dataset irds.beir.climate-fever

datamaestro_text.datasets.irds.data.Adhoc

A version of the CLIMATE-FEVER dataset, for fact verification on claims about climate.

beir/cqadupstack/android

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the android StackExchange subforum.

Dataset irds.beir.cqadupstack.android.documents

datamaestro_text.datasets.irds.data.Documents

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the android StackExchange subforum.

Dataset irds.beir.cqadupstack.android.queries

datamaestro_text.datasets.irds.data.Topics

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the android StackExchange subforum.

Dataset irds.beir.cqadupstack.android.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the android StackExchange subforum.

Dataset irds.beir.cqadupstack.android

datamaestro_text.datasets.irds.data.Adhoc

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the android StackExchange subforum.

beir/cqadupstack/english

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the english StackExchange subforum.

Dataset irds.beir.cqadupstack.english.documents

datamaestro_text.datasets.irds.data.Documents

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the english StackExchange subforum.

Dataset irds.beir.cqadupstack.english.queries

datamaestro_text.datasets.irds.data.Topics

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the english StackExchange subforum.

Dataset irds.beir.cqadupstack.english.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the english StackExchange subforum.

Dataset irds.beir.cqadupstack.english

datamaestro_text.datasets.irds.data.Adhoc

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the english StackExchange subforum.

beir/cqadupstack/gaming

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the gaming StackExchange subforum.

Dataset irds.beir.cqadupstack.gaming.documents

datamaestro_text.datasets.irds.data.Documents

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the gaming StackExchange subforum.

Dataset irds.beir.cqadupstack.gaming.queries

datamaestro_text.datasets.irds.data.Topics

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the gaming StackExchange subforum.

Dataset irds.beir.cqadupstack.gaming.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the gaming StackExchange subforum.

Dataset irds.beir.cqadupstack.gaming

datamaestro_text.datasets.irds.data.Adhoc

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the gaming StackExchange subforum.

beir/cqadupstack/gis

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the gis StackExchange subforum.

Dataset irds.beir.cqadupstack.gis.documents

datamaestro_text.datasets.irds.data.Documents

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the gis StackExchange subforum.

Dataset irds.beir.cqadupstack.gis.queries

datamaestro_text.datasets.irds.data.Topics

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the gis StackExchange subforum.

Dataset irds.beir.cqadupstack.gis.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the gis StackExchange subforum.

Dataset irds.beir.cqadupstack.gis

datamaestro_text.datasets.irds.data.Adhoc

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the gis StackExchange subforum.

beir/cqadupstack/mathematica

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the mathematica StackExchange subforum.

Dataset irds.beir.cqadupstack.mathematica.documents

datamaestro_text.datasets.irds.data.Documents

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the mathematica StackExchange subforum.

Dataset irds.beir.cqadupstack.mathematica.queries

datamaestro_text.datasets.irds.data.Topics

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the mathematica StackExchange subforum.

Dataset irds.beir.cqadupstack.mathematica.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the mathematica StackExchange subforum.

Dataset irds.beir.cqadupstack.mathematica

datamaestro_text.datasets.irds.data.Adhoc

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the mathematica StackExchange subforum.

beir/cqadupstack/physics

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the physics StackExchange subforum.

Dataset irds.beir.cqadupstack.physics.documents

datamaestro_text.datasets.irds.data.Documents

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the physics StackExchange subforum.

Dataset irds.beir.cqadupstack.physics.queries

datamaestro_text.datasets.irds.data.Topics

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the physics StackExchange subforum.

Dataset irds.beir.cqadupstack.physics.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the physics StackExchange subforum.

Dataset irds.beir.cqadupstack.physics

datamaestro_text.datasets.irds.data.Adhoc

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the physics StackExchange subforum.

beir/cqadupstack/programmers

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the programmers StackExchange subforum.

Dataset irds.beir.cqadupstack.programmers.documents

datamaestro_text.datasets.irds.data.Documents

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the programmers StackExchange subforum.

Dataset irds.beir.cqadupstack.programmers.queries

datamaestro_text.datasets.irds.data.Topics

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the programmers StackExchange subforum.

Dataset irds.beir.cqadupstack.programmers.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the programmers StackExchange subforum.

Dataset irds.beir.cqadupstack.programmers

datamaestro_text.datasets.irds.data.Adhoc

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the programmers StackExchange subforum.

beir/cqadupstack/stats

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the stats StackExchange subforum.

Dataset irds.beir.cqadupstack.stats.documents

datamaestro_text.datasets.irds.data.Documents

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the stats StackExchange subforum.

Dataset irds.beir.cqadupstack.stats.queries

datamaestro_text.datasets.irds.data.Topics

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the stats StackExchange subforum.

Dataset irds.beir.cqadupstack.stats.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the stats StackExchange subforum.

Dataset irds.beir.cqadupstack.stats

datamaestro_text.datasets.irds.data.Adhoc

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the stats StackExchange subforum.

beir/cqadupstack/tex

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the tex StackExchange subforum.

Dataset irds.beir.cqadupstack.tex.documents

datamaestro_text.datasets.irds.data.Documents

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the tex StackExchange subforum.

Dataset irds.beir.cqadupstack.tex.queries

datamaestro_text.datasets.irds.data.Topics

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the tex StackExchange subforum.

Dataset irds.beir.cqadupstack.tex.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the tex StackExchange subforum.

Dataset irds.beir.cqadupstack.tex

datamaestro_text.datasets.irds.data.Adhoc

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the tex StackExchange subforum.

beir/cqadupstack/unix

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the unix StackExchange subforum.

Dataset irds.beir.cqadupstack.unix.documents

datamaestro_text.datasets.irds.data.Documents

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the unix StackExchange subforum.

Dataset irds.beir.cqadupstack.unix.queries

datamaestro_text.datasets.irds.data.Topics

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the unix StackExchange subforum.

Dataset irds.beir.cqadupstack.unix.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the unix StackExchange subforum.

Dataset irds.beir.cqadupstack.unix

datamaestro_text.datasets.irds.data.Adhoc

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the unix StackExchange subforum.

beir/cqadupstack/webmasters

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the webmasters StackExchange subforum.

Dataset irds.beir.cqadupstack.webmasters.documents

datamaestro_text.datasets.irds.data.Documents

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the webmasters StackExchange subforum.

Dataset irds.beir.cqadupstack.webmasters.queries

datamaestro_text.datasets.irds.data.Topics

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the webmasters StackExchange subforum.

Dataset irds.beir.cqadupstack.webmasters.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the webmasters StackExchange subforum.

Dataset irds.beir.cqadupstack.webmasters

datamaestro_text.datasets.irds.data.Adhoc

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the webmasters StackExchange subforum.

beir/cqadupstack/wordpress

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the wordpress StackExchange subforum.

Dataset irds.beir.cqadupstack.wordpress.documents

datamaestro_text.datasets.irds.data.Documents

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the wordpress StackExchange subforum.

Dataset irds.beir.cqadupstack.wordpress.queries

datamaestro_text.datasets.irds.data.Topics

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the wordpress StackExchange subforum.

Dataset irds.beir.cqadupstack.wordpress.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the wordpress StackExchange subforum.

Dataset irds.beir.cqadupstack.wordpress

datamaestro_text.datasets.irds.data.Adhoc

A version of the CQADupStack dataset, for duplicate question retrieval. This subset is from the wordpress StackExchange subforum.

beir/dbpedia-entity

A version of the DBPedia-Entity-v2 dataset for entity retrieval.

Dataset irds.beir.dbpedia-entity.documents

datamaestro_text.datasets.irds.data.Documents

A version of the DBPedia-Entity-v2 dataset for entity retrieval.

Dataset irds.beir.dbpedia-entity.queries

datamaestro_text.datasets.irds.data.Topics

A version of the DBPedia-Entity-v2 dataset for entity retrieval.

Dataset irds.beir.dbpedia-entity.dev.queries

datamaestro_text.datasets.irds.data.Topics

A random sample of 67 queries from the official test set, used as a dev set.

Dataset irds.beir.dbpedia-entity.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A random sample of 67 queries from the official test set, used as a dev set.

Dataset irds.beir.dbpedia-entity.dev

datamaestro_text.datasets.irds.data.Adhoc

A random sample of 67 queries from the official test set, used as a dev set.

Dataset irds.beir.dbpedia-entity.test.queries

datamaestro_text.datasets.irds.data.Topics

A the official test set, without 67 queries used as a dev set.

Dataset irds.beir.dbpedia-entity.test.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A the official test set, without 67 queries used as a dev set.

Dataset irds.beir.dbpedia-entity.test

datamaestro_text.datasets.irds.data.Adhoc

A the official test set, without 67 queries used as a dev set.

beir/fever

A version of the FEVER dataset for fact verification. Includes queries from the /train /dev and /test subsets.

Dataset irds.beir.fever.documents

datamaestro_text.datasets.irds.data.Documents

A version of the FEVER dataset for fact verification. Includes queries from the /train /dev and /test subsets.

Dataset irds.beir.fever.queries

datamaestro_text.datasets.irds.data.Topics

A version of the FEVER dataset for fact verification. Includes queries from the /train /dev and /test subsets.

Dataset irds.beir.fever.dev.queries

datamaestro_text.datasets.irds.data.Topics

The official dev set.

Dataset irds.beir.fever.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The official dev set.

Dataset irds.beir.fever.dev

datamaestro_text.datasets.irds.data.Adhoc

The official dev set.

Dataset irds.beir.fever.test.queries

datamaestro_text.datasets.irds.data.Topics

The official test set.

Dataset irds.beir.fever.test.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The official test set.

Dataset irds.beir.fever.test

datamaestro_text.datasets.irds.data.Adhoc

The official test set.

Dataset irds.beir.fever.train.queries

datamaestro_text.datasets.irds.data.Topics

The official train set.

Dataset irds.beir.fever.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The official train set.

Dataset irds.beir.fever.train

datamaestro_text.datasets.irds.data.Adhoc

The official train set.

beir/fiqa

A version of the FIQA-2018 dataset (financial opinion question answering). Queries include those in the /train /dev and /test subsets.

Dataset irds.beir.fiqa.documents

datamaestro_text.datasets.irds.data.Documents

A version of the FIQA-2018 dataset (financial opinion question answering). Queries include those in the /train /dev and /test subsets.

Dataset irds.beir.fiqa.queries

datamaestro_text.datasets.irds.data.Topics

A version of the FIQA-2018 dataset (financial opinion question answering). Queries include those in the /train /dev and /test subsets.

Dataset irds.beir.fiqa.dev.queries

datamaestro_text.datasets.irds.data.Topics

Random sample of 500 queries from the official dataset.

Dataset irds.beir.fiqa.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Random sample of 500 queries from the official dataset.

Dataset irds.beir.fiqa.dev

datamaestro_text.datasets.irds.data.Adhoc

Random sample of 500 queries from the official dataset.

Dataset irds.beir.fiqa.test.queries

datamaestro_text.datasets.irds.data.Topics

Random sample of 648 queries from the official dataset.

Dataset irds.beir.fiqa.test.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Random sample of 648 queries from the official dataset.

Dataset irds.beir.fiqa.test

datamaestro_text.datasets.irds.data.Adhoc

Random sample of 648 queries from the official dataset.

Dataset irds.beir.fiqa.train.queries

datamaestro_text.datasets.irds.data.Topics

Official dataset without the 1148 queries sampled for /dev and /test.

Dataset irds.beir.fiqa.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official dataset without the 1148 queries sampled for /dev and /test.

Dataset irds.beir.fiqa.train

datamaestro_text.datasets.irds.data.Adhoc

Official dataset without the 1148 queries sampled for /dev and /test.

beir/hotpotqa

A version of the Hotpot QA dataset for multi-hop question answering. Queries include all those in /train /dev and /test.

Dataset irds.beir.hotpotqa.documents

datamaestro_text.datasets.irds.data.Documents

A version of the Hotpot QA dataset for multi-hop question answering. Queries include all those in /train /dev and /test.

Dataset irds.beir.hotpotqa.queries

datamaestro_text.datasets.irds.data.Topics

A version of the Hotpot QA dataset for multi-hop question answering. Queries include all those in /train /dev and /test.

Dataset irds.beir.hotpotqa.dev.queries

datamaestro_text.datasets.irds.data.Topics

Random selection of the 5447 queries from /train.

Dataset irds.beir.hotpotqa.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Random selection of the 5447 queries from /train.

Dataset irds.beir.hotpotqa.dev

datamaestro_text.datasets.irds.data.Adhoc

Random selection of the 5447 queries from /train.

Dataset irds.beir.hotpotqa.test.queries

datamaestro_text.datasets.irds.data.Topics

Official dev set from HotpotQA, here used as a test set.

Dataset irds.beir.hotpotqa.test.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official dev set from HotpotQA, here used as a test set.

Dataset irds.beir.hotpotqa.test

datamaestro_text.datasets.irds.data.Adhoc

Official dev set from HotpotQA, here used as a test set.

Dataset irds.beir.hotpotqa.train.queries

datamaestro_text.datasets.irds.data.Topics

Official train set, without the random selection of the 5447 queries used for /dev.

Dataset irds.beir.hotpotqa.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official train set, without the random selection of the 5447 queries used for /dev.

Dataset irds.beir.hotpotqa.train

datamaestro_text.datasets.irds.data.Adhoc

Official train set, without the random selection of the 5447 queries used for /dev.

beir/msmarco

A version of the MS MARCO passage ranking dataset. Includes queries from the /train, /dev, and /test sub-datasets.

Note that this version differs from msmarco-passage, in that it does not correct the encoding problems in the source documents.

Dataset irds.beir.msmarco.documents

datamaestro_text.datasets.irds.data.Documents

A version of the MS MARCO passage ranking dataset. Includes queries from the /train, /dev, and /test sub-datasets.

Note that this version differs from msmarco-passage, in that it does not correct the encoding problems in the source documents.

Dataset irds.beir.msmarco.queries

datamaestro_text.datasets.irds.data.Topics

A version of the MS MARCO passage ranking dataset. Includes queries from the /train, /dev, and /test sub-datasets.

Note that this version differs from msmarco-passage, in that it does not correct the encoding problems in the source documents.

Dataset irds.beir.msmarco.dev.queries

datamaestro_text.datasets.irds.data.Topics

A version of the MS MARCO passage ranking dev set.

Dataset irds.beir.msmarco.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the MS MARCO passage ranking dev set.

Dataset irds.beir.msmarco.dev

datamaestro_text.datasets.irds.data.Adhoc

A version of the MS MARCO passage ranking dev set.

Dataset irds.beir.msmarco.test.queries

datamaestro_text.datasets.irds.data.Topics

A version of the TREC Deep Learning 2019 set.

Dataset irds.beir.msmarco.test.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the TREC Deep Learning 2019 set.

Dataset irds.beir.msmarco.test

datamaestro_text.datasets.irds.data.Adhoc

A version of the TREC Deep Learning 2019 set.

Dataset irds.beir.msmarco.train.queries

datamaestro_text.datasets.irds.data.Topics

A version of the MS MARCO passage ranking train set.

Dataset irds.beir.msmarco.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the MS MARCO passage ranking train set.

Dataset irds.beir.msmarco.train

datamaestro_text.datasets.irds.data.Adhoc

A version of the MS MARCO passage ranking train set.

beir/nfcorpus

A version of the NF Corpus (Nutrition Facts). Queries use the "title" variant of the query, which here are often natural language questions. Queries include all those from /train /dev and /test.

Data pre-processing may be different than what is done in nfcorpus.

Dataset irds.beir.nfcorpus.documents

datamaestro_text.datasets.irds.data.Documents

A version of the NF Corpus (Nutrition Facts). Queries use the "title" variant of the query, which here are often natural language questions. Queries include all those from /train /dev and /test.

Data pre-processing may be different than what is done in nfcorpus.

Dataset irds.beir.nfcorpus.queries

datamaestro_text.datasets.irds.data.Topics

A version of the NF Corpus (Nutrition Facts). Queries use the "title" variant of the query, which here are often natural language questions. Queries include all those from /train /dev and /test.

Data pre-processing may be different than what is done in nfcorpus.

Dataset irds.beir.nfcorpus.dev.queries

datamaestro_text.datasets.irds.data.Topics

Combined dev set of NFCorpus.

Dataset irds.beir.nfcorpus.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Combined dev set of NFCorpus.

Dataset irds.beir.nfcorpus.dev

datamaestro_text.datasets.irds.data.Adhoc

Combined dev set of NFCorpus.

Dataset irds.beir.nfcorpus.test.queries

datamaestro_text.datasets.irds.data.Topics

Combined test set of NFCorpus.

Dataset irds.beir.nfcorpus.test.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Combined test set of NFCorpus.

Dataset irds.beir.nfcorpus.test

datamaestro_text.datasets.irds.data.Adhoc

Combined test set of NFCorpus.

Dataset irds.beir.nfcorpus.train.queries

datamaestro_text.datasets.irds.data.Topics

Combined train set of NFCorpus.

Dataset irds.beir.nfcorpus.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Combined train set of NFCorpus.

Dataset irds.beir.nfcorpus.train

datamaestro_text.datasets.irds.data.Adhoc

Combined train set of NFCorpus.

beir/nq

A version of the Natural Questions dev dataset.

Data pre-processing differs both from what is done in natural-questions and dpr-w100/natural-questions, especially with respect to the document collection and filtering conducted on the queries. See the Beir paper for details.

Dataset irds.beir.nq.documents

datamaestro_text.datasets.irds.data.Documents

A version of the Natural Questions dev dataset.

Data pre-processing differs both from what is done in natural-questions and dpr-w100/natural-questions, especially with respect to the document collection and filtering conducted on the queries. See the Beir paper for details.

Dataset irds.beir.nq.queries

datamaestro_text.datasets.irds.data.Topics

A version of the Natural Questions dev dataset.

Data pre-processing differs both from what is done in natural-questions and dpr-w100/natural-questions, especially with respect to the document collection and filtering conducted on the queries. See the Beir paper for details.

Dataset irds.beir.nq.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the Natural Questions dev dataset.

Data pre-processing differs both from what is done in natural-questions and dpr-w100/natural-questions, especially with respect to the document collection and filtering conducted on the queries. See the Beir paper for details.

Dataset irds.beir.nq

datamaestro_text.datasets.irds.data.Adhoc

A version of the Natural Questions dev dataset.

Data pre-processing differs both from what is done in natural-questions and dpr-w100/natural-questions, especially with respect to the document collection and filtering conducted on the queries. See the Beir paper for details.

beir/quora

A version of the Quora duplicate question detection dataset (QQP). Includes queries from /dev and /test sets.

Dataset irds.beir.quora.documents

datamaestro_text.datasets.irds.data.Documents

A version of the Quora duplicate question detection dataset (QQP). Includes queries from /dev and /test sets.

Dataset irds.beir.quora.queries

datamaestro_text.datasets.irds.data.Topics

A version of the Quora duplicate question detection dataset (QQP). Includes queries from /dev and /test sets.

Dataset irds.beir.quora.dev.queries

datamaestro_text.datasets.irds.data.Topics

A 5,000 question subset of the original dataset, without overlaps in the other subsets.

Dataset irds.beir.quora.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A 5,000 question subset of the original dataset, without overlaps in the other subsets.

Dataset irds.beir.quora.dev

datamaestro_text.datasets.irds.data.Adhoc

A 5,000 question subset of the original dataset, without overlaps in the other subsets.

Dataset irds.beir.quora.test.queries

datamaestro_text.datasets.irds.data.Topics

A 10,000 question subset of the original dataset, without overlaps in the other subsets.

Dataset irds.beir.quora.test.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A 10,000 question subset of the original dataset, without overlaps in the other subsets.

Dataset irds.beir.quora.test

datamaestro_text.datasets.irds.data.Adhoc

A 10,000 question subset of the original dataset, without overlaps in the other subsets.

beir/scidocs

A version of the SciDocs dataset, used for citation retrieval.

Dataset irds.beir.scidocs.documents

datamaestro_text.datasets.irds.data.Documents

A version of the SciDocs dataset, used for citation retrieval.

Dataset irds.beir.scidocs.queries

datamaestro_text.datasets.irds.data.Topics

A version of the SciDocs dataset, used for citation retrieval.

Dataset irds.beir.scidocs.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the SciDocs dataset, used for citation retrieval.

Dataset irds.beir.scidocs

datamaestro_text.datasets.irds.data.Adhoc

A version of the SciDocs dataset, used for citation retrieval.

beir/scifact

A version of the SciFact dataset, for fact verification. Queries include those form the /train and /test sets.

Dataset irds.beir.scifact.documents

datamaestro_text.datasets.irds.data.Documents

A version of the SciFact dataset, for fact verification. Queries include those form the /train and /test sets.

Dataset irds.beir.scifact.queries

datamaestro_text.datasets.irds.data.Topics

A version of the SciFact dataset, for fact verification. Queries include those form the /train and /test sets.

Dataset irds.beir.scifact.test.queries

datamaestro_text.datasets.irds.data.Topics

The official dev set.

Dataset irds.beir.scifact.test.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The official dev set.

Dataset irds.beir.scifact.test

datamaestro_text.datasets.irds.data.Adhoc

The official dev set.

Dataset irds.beir.scifact.train.queries

datamaestro_text.datasets.irds.data.Topics

The official train set.

Dataset irds.beir.scifact.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The official train set.

Dataset irds.beir.scifact.train

datamaestro_text.datasets.irds.data.Adhoc

The official train set.

beir/trec-covid

A version of the TREC COVID (complete) dataset, with titles and abstracts as documents. Queries are the question variant.

Data pre-processing may be different than what is done in cord19/trec-covid.

Dataset irds.beir.trec-covid.documents

datamaestro_text.datasets.irds.data.Documents

A version of the TREC COVID (complete) dataset, with titles and abstracts as documents. Queries are the question variant.

Data pre-processing may be different than what is done in cord19/trec-covid.

Dataset irds.beir.trec-covid.queries

datamaestro_text.datasets.irds.data.Topics

A version of the TREC COVID (complete) dataset, with titles and abstracts as documents. Queries are the question variant.

Data pre-processing may be different than what is done in cord19/trec-covid.

Dataset irds.beir.trec-covid.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the TREC COVID (complete) dataset, with titles and abstracts as documents. Queries are the question variant.

Data pre-processing may be different than what is done in cord19/trec-covid.

Dataset irds.beir.trec-covid

datamaestro_text.datasets.irds.data.Adhoc

A version of the TREC COVID (complete) dataset, with titles and abstracts as documents. Queries are the question variant.

Data pre-processing may be different than what is done in cord19/trec-covid.

beir/webis-touche2020

Original version of the Touchè-2020 dataset, for argument retrieval.

Consider using beir/webis-touche2020/v2 instead; it uses an updated, more complete version of the qrels.
Dataset irds.beir.webis-touche2020.documents

datamaestro_text.datasets.irds.data.Documents

Original version of the Touchè-2020 dataset, for argument retrieval.

Consider using beir/webis-touche2020/v2 instead; it uses an updated, more complete version of the qrels.
Dataset irds.beir.webis-touche2020.queries

datamaestro_text.datasets.irds.data.Topics

Original version of the Touchè-2020 dataset, for argument retrieval.

Consider using beir/webis-touche2020/v2 instead; it uses an updated, more complete version of the qrels.
Dataset irds.beir.webis-touche2020.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Original version of the Touchè-2020 dataset, for argument retrieval.

Consider using beir/webis-touche2020/v2 instead; it uses an updated, more complete version of the qrels.
Dataset irds.beir.webis-touche2020

datamaestro_text.datasets.irds.data.Adhoc

Original version of the Touchè-2020 dataset, for argument retrieval.

Consider using beir/webis-touche2020/v2 instead; it uses an updated, more complete version of the qrels.

beir/webis-touche2020/v2

Version 2 of the Touchè-2020 dataset, for argument retrieval. This version uses the "corrected" version of the qrels, mapped to version 1 of the corpus.

Dataset irds.beir.webis-touche2020.v2.documents

datamaestro_text.datasets.irds.data.Documents

Version 2 of the Touchè-2020 dataset, for argument retrieval. This version uses the "corrected" version of the qrels, mapped to version 1 of the corpus.

Dataset irds.beir.webis-touche2020.v2.queries

datamaestro_text.datasets.irds.data.Topics

Version 2 of the Touchè-2020 dataset, for argument retrieval. This version uses the "corrected" version of the qrels, mapped to version 1 of the corpus.

Dataset irds.beir.webis-touche2020.v2.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version 2 of the Touchè-2020 dataset, for argument retrieval. This version uses the "corrected" version of the qrels, mapped to version 1 of the corpus.

Dataset irds.beir.webis-touche2020.v2

datamaestro_text.datasets.irds.data.Adhoc

Version 2 of the Touchè-2020 dataset, for argument retrieval. This version uses the "corrected" version of the qrels, mapped to version 1 of the corpus.

c4/en-noclean-tr

The "en-noclean" train subset of the corpus, consisting of ~1B documents written in English. Document IDs are assigned as proposed by the TREC Health Misinformation 2021 track.

Dataset irds.c4.en-noclean-tr.documents

datamaestro_text.datasets.irds.data.Documents

The "en-noclean" train subset of the corpus, consisting of ~1B documents written in English. Document IDs are assigned as proposed by the TREC Health Misinformation 2021 track.

Dataset irds.c4.en-noclean-tr.trec-misinfo-2021.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Health Misinformation 2021 track.

car/v1.5

Version 1.5 of the TREC dataset. This version is used for year 1 (2017) of the TREC CAR shared task.

Dataset irds.car.v1.5.documents

datamaestro_text.datasets.irds.data.Documents

Version 1.5 of the TREC dataset. This version is used for year 1 (2017) of the TREC CAR shared task.

Dataset irds.car.v1.5.test200.queries

datamaestro_text.datasets.irds.data.Topics

Un-official test set consisting of manually-selected articles. Sometimes used as a validation set.

Dataset irds.car.v1.5.test200.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Un-official test set consisting of manually-selected articles. Sometimes used as a validation set.

Dataset irds.car.v1.5.test200

datamaestro_text.datasets.irds.data.Adhoc

Un-official test set consisting of manually-selected articles. Sometimes used as a validation set.

Dataset irds.car.v1.5.train.fold0.queries

datamaestro_text.datasets.irds.data.Topics

Fold 0 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)

Dataset irds.car.v1.5.train.fold0.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Fold 0 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)

Dataset irds.car.v1.5.train.fold0

datamaestro_text.datasets.irds.data.Adhoc

Fold 0 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)

Dataset irds.car.v1.5.train.fold1.queries

datamaestro_text.datasets.irds.data.Topics

Fold 1 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)

Dataset irds.car.v1.5.train.fold1.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Fold 1 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)

Dataset irds.car.v1.5.train.fold1

datamaestro_text.datasets.irds.data.Adhoc

Fold 1 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)

Dataset irds.car.v1.5.train.fold2.queries

datamaestro_text.datasets.irds.data.Topics

Fold 2 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)

Dataset irds.car.v1.5.train.fold2.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Fold 2 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)

Dataset irds.car.v1.5.train.fold2

datamaestro_text.datasets.irds.data.Adhoc

Fold 2 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)

Dataset irds.car.v1.5.train.fold3.queries

datamaestro_text.datasets.irds.data.Topics

Fold 3 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)

Dataset irds.car.v1.5.train.fold3.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Fold 3 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)

Dataset irds.car.v1.5.train.fold3

datamaestro_text.datasets.irds.data.Adhoc

Fold 3 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)

Dataset irds.car.v1.5.train.fold4.queries

datamaestro_text.datasets.irds.data.Topics

Fold 4 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)

Dataset irds.car.v1.5.train.fold4.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Fold 4 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)

Dataset irds.car.v1.5.train.fold4

datamaestro_text.datasets.irds.data.Adhoc

Fold 4 of the official large training set for TREC CAR 2017. Relevance assumed from hierarchical structure of pages (i.e., paragraphs under a header are assumed relevant.)

Dataset irds.car.v1.5.trec-y1.queries

datamaestro_text.datasets.irds.data.Topics

Official test set of TREC CAR 2017 (year 1).

Dataset irds.car.v1.5.trec-y1.auto.queries

datamaestro_text.datasets.irds.data.Topics

Official test set of TREC CAR 2017 (year 1), using automatic relevance judgments (assumed from hierarchical structure of pages, i.e., paragraphs under a header are assumed relevant.)

Dataset irds.car.v1.5.trec-y1.auto.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official test set of TREC CAR 2017 (year 1), using automatic relevance judgments (assumed from hierarchical structure of pages, i.e., paragraphs under a header are assumed relevant.)

Dataset irds.car.v1.5.trec-y1.auto

datamaestro_text.datasets.irds.data.Adhoc

Official test set of TREC CAR 2017 (year 1), using automatic relevance judgments (assumed from hierarchical structure of pages, i.e., paragraphs under a header are assumed relevant.)

Dataset irds.car.v1.5.trec-y1.manual.queries

datamaestro_text.datasets.irds.data.Topics

Official test set of TREC CAR 2017 (year 1), using manual graded relevance judgments.

Dataset irds.car.v1.5.trec-y1.manual.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official test set of TREC CAR 2017 (year 1), using manual graded relevance judgments.

Dataset irds.car.v1.5.trec-y1.manual

datamaestro_text.datasets.irds.data.Adhoc

Official test set of TREC CAR 2017 (year 1), using manual graded relevance judgments.

car/v2.0

Version 2.0 of the TREC CAR dataset.

Dataset irds.car.v2.0.documents

datamaestro_text.datasets.irds.data.Documents

Version 2.0 of the TREC CAR dataset.

Highwire (TREC Genomics 2006-07)

Medical document collection from Highwire Press. Includes 162,259 scientific articles from 49 journals.

This dataset is used for the TREC 2006-07 TREC Genomics track.

Note that these documents are split into passages based on paragraph tags in the HTML.

Dataset irds.highwire.documents

datamaestro_text.datasets.irds.data.Documents

Medical document collection from Highwire Press. Includes 162,259 scientific articles from 49 journals.

This dataset is used for the TREC 2006-07 TREC Genomics track.

Note that these documents are split into passages based on paragraph tags in the HTML.

Dataset irds.highwire.trec-genomics-2006.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Genomics Track 2006 benchmark. Contains 28 queries with passage-level relevance judgments.

Dataset irds.highwire.trec-genomics-2006.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Genomics Track 2006 benchmark. Contains 28 queries with passage-level relevance judgments.

Dataset irds.highwire.trec-genomics-2006

datamaestro_text.datasets.irds.data.Adhoc

The TREC Genomics Track 2006 benchmark. Contains 28 queries with passage-level relevance judgments.

Dataset irds.highwire.trec-genomics-2007.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Genomics Track 2007 benchmark. Contains 36 queries with passage-level relevance judgments.

Dataset irds.highwire.trec-genomics-2007.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Genomics Track 2007 benchmark. Contains 36 queries with passage-level relevance judgments.

Dataset irds.highwire.trec-genomics-2007

datamaestro_text.datasets.irds.data.Adhoc

The TREC Genomics Track 2007 benchmark. Contains 36 queries with passage-level relevance judgments.

medline/2004

3M Medline articles including titles and abstracts, used for the TREC 2004-05 Genomics track.

Dataset irds.medline.2004.documents

datamaestro_text.datasets.irds.data.Documents

3M Medline articles including titles and abstracts, used for the TREC 2004-05 Genomics track.

Dataset irds.medline.2004.trec-genomics-2004.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Genomics Track 2004 benchmark. Contains 50 queries with article-level relevance judgments.

Dataset irds.medline.2004.trec-genomics-2004.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Genomics Track 2004 benchmark. Contains 50 queries with article-level relevance judgments.

Dataset irds.medline.2004.trec-genomics-2004

datamaestro_text.datasets.irds.data.Adhoc

The TREC Genomics Track 2004 benchmark. Contains 50 queries with article-level relevance judgments.

Dataset irds.medline.2004.trec-genomics-2005.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Genomics Track 2005 benchmark. Contains 50 queries with article-level relevance judgments.

Dataset irds.medline.2004.trec-genomics-2005.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Genomics Track 2005 benchmark. Contains 50 queries with article-level relevance judgments.

Dataset irds.medline.2004.trec-genomics-2005

datamaestro_text.datasets.irds.data.Adhoc

The TREC Genomics Track 2005 benchmark. Contains 50 queries with article-level relevance judgments.

medline/2017

26M Medline and AACR/ASCO Proceedings articles including titles and abstracts. This collection is used for the TREC 2017-18 TREC Precision Medicine track.

Dataset irds.medline.2017.documents

datamaestro_text.datasets.irds.data.Documents

26M Medline and AACR/ASCO Proceedings articles including titles and abstracts. This collection is used for the TREC 2017-18 TREC Precision Medicine track.

Dataset irds.medline.2017.trec-pm-2017.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Precision Medicine (PM) Track 2017 benchmark. Contains 30 queries containing disease, gene, and target demographic information.

Dataset irds.medline.2017.trec-pm-2017.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Precision Medicine (PM) Track 2017 benchmark. Contains 30 queries containing disease, gene, and target demographic information.

Dataset irds.medline.2017.trec-pm-2017

datamaestro_text.datasets.irds.data.Adhoc

The TREC Precision Medicine (PM) Track 2017 benchmark. Contains 30 queries containing disease, gene, and target demographic information.

Dataset irds.medline.2017.trec-pm-2018.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Precision Medicine (PM) Track 2018 benchmark. Contains 50 queries containing disease, gene, and target demographic information.

Dataset irds.medline.2017.trec-pm-2018.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Precision Medicine (PM) Track 2018 benchmark. Contains 50 queries containing disease, gene, and target demographic information.

Dataset irds.medline.2017.trec-pm-2018

datamaestro_text.datasets.irds.data.Adhoc

The TREC Precision Medicine (PM) Track 2018 benchmark. Contains 50 queries containing disease, gene, and target demographic information.

clinicaltrials/2017

A snapshot of ClinicalTrials.gov from April 2017 for use with the clinicaltrials/2017/trec-pm-2017 and clinicaltrials/2017/trec-pm-2018 Clinical Trials subtasks.

Dataset irds.clinicaltrials.2017.documents

datamaestro_text.datasets.irds.data.Documents

A snapshot of ClinicalTrials.gov from April 2017 for use with the clinicaltrials/2017/trec-pm-2017 and clinicaltrials/2017/trec-pm-2018 Clinical Trials subtasks.

Dataset irds.clinicaltrials.2017.trec-pm-2017.queries

datamaestro_text.datasets.irds.data.Topics

The TREC 2017 Precision Medicine clinical trials subtask.

Dataset irds.clinicaltrials.2017.trec-pm-2017.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC 2017 Precision Medicine clinical trials subtask.

Dataset irds.clinicaltrials.2017.trec-pm-2017

datamaestro_text.datasets.irds.data.Adhoc

The TREC 2017 Precision Medicine clinical trials subtask.

Dataset irds.clinicaltrials.2017.trec-pm-2018.queries

datamaestro_text.datasets.irds.data.Topics

The TREC 2018 Precision Medicine clinical trials subtask.

Dataset irds.clinicaltrials.2017.trec-pm-2018.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC 2018 Precision Medicine clinical trials subtask.

Dataset irds.clinicaltrials.2017.trec-pm-2018

datamaestro_text.datasets.irds.data.Adhoc

The TREC 2018 Precision Medicine clinical trials subtask.

clinicaltrials/2019

A snapshot of ClinicalTrials.gov from May 2019 for use with the clinicaltrials/2019/trec-pm-2019 Clinical Trials subtask.

Dataset irds.clinicaltrials.2019.documents

datamaestro_text.datasets.irds.data.Documents

A snapshot of ClinicalTrials.gov from May 2019 for use with the clinicaltrials/2019/trec-pm-2019 Clinical Trials subtask.

Dataset irds.clinicaltrials.2019.trec-pm-2019.queries

datamaestro_text.datasets.irds.data.Topics

The TREC 2019 Precision Medicine clinical trials subtask.

Dataset irds.clinicaltrials.2019.trec-pm-2019.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC 2019 Precision Medicine clinical trials subtask.

Dataset irds.clinicaltrials.2019.trec-pm-2019

datamaestro_text.datasets.irds.data.Adhoc

The TREC 2019 Precision Medicine clinical trials subtask.

clinicaltrials/2021

A snapshot of ClinicalTrials.gov from April 2021 for use with the TREC Clinical Trials 2021 Track.

Dataset irds.clinicaltrials.2021.documents

datamaestro_text.datasets.irds.data.Documents

A snapshot of ClinicalTrials.gov from April 2021 for use with the TREC Clinical Trials 2021 Track.

Dataset irds.clinicaltrials.2021.trec-ct-2021.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Clinical Trials 2021 track.

Dataset irds.clinicaltrials.2021.trec-ct-2021.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Clinical Trials 2021 track.

Dataset irds.clinicaltrials.2021.trec-ct-2021

datamaestro_text.datasets.irds.data.Adhoc

The TREC Clinical Trials 2021 track.

Dataset irds.clinicaltrials.2021.trec-ct-2022.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Clinical Trials 2022 track.

ClueWeb09

ClueWeb 2009 web document collection. Contains over 1B web pages, in 10 languages.

The dataset is obtained for a fee from CMU, and is shipped as hard drives. More information is provided here.

Dataset irds.clueweb09.documents

datamaestro_text.datasets.irds.data.Documents

ClueWeb 2009 web document collection. Contains over 1B web pages, in 10 languages.

The dataset is obtained for a fee from CMU, and is shipped as hard drives. More information is provided here.

Dataset irds.clueweb09.trec-mq-2009.queries

datamaestro_text.datasets.irds.data.Topics

TREC 2009 Million Query track.

Dataset irds.clueweb09.trec-mq-2009.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

TREC 2009 Million Query track.

Dataset irds.clueweb09.trec-mq-2009

datamaestro_text.datasets.irds.data.Adhoc

TREC 2009 Million Query track.

clueweb09/ar

Subset of ClueWeb09 with only Arabic-language documents.

Dataset irds.clueweb09.ar.documents

datamaestro_text.datasets.irds.data.Documents

Subset of ClueWeb09 with only Arabic-language documents.

clueweb09/catb

Subset of ClueWeb09 with the first ~50 million English-language documents. Used as a smaller collection for TREC Web Track tasks.

Dataset irds.clueweb09.catb.documents

datamaestro_text.datasets.irds.data.Documents

Subset of ClueWeb09 with the first ~50 million English-language documents. Used as a smaller collection for TREC Web Track tasks.

Dataset irds.clueweb09.catb.trec-web-2009.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Web Track 2009 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.catb.trec-web-2009.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Web Track 2009 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.catb.trec-web-2009

datamaestro_text.datasets.irds.data.Adhoc

The TREC Web Track 2009 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.catb.trec-web-2009.diversity.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Web Track 2009 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.catb.trec-web-2009.diversity.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Web Track 2009 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.catb.trec-web-2009.diversity

datamaestro_text.datasets.irds.data.Adhoc

The TREC Web Track 2009 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.catb.trec-web-2010.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Web Track 2010 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.catb.trec-web-2010.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Web Track 2010 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.catb.trec-web-2010

datamaestro_text.datasets.irds.data.Adhoc

The TREC Web Track 2010 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.catb.trec-web-2010.diversity.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Web Track 2010 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.catb.trec-web-2010.diversity.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Web Track 2010 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.catb.trec-web-2010.diversity

datamaestro_text.datasets.irds.data.Adhoc

The TREC Web Track 2010 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.catb.trec-web-2011.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Web Track 2011 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.catb.trec-web-2011.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Web Track 2011 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.catb.trec-web-2011

datamaestro_text.datasets.irds.data.Adhoc

The TREC Web Track 2011 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.catb.trec-web-2011.diversity.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Web Track 2011 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.catb.trec-web-2011.diversity.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Web Track 2011 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.catb.trec-web-2011.diversity

datamaestro_text.datasets.irds.data.Adhoc

The TREC Web Track 2011 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.catb.trec-web-2012.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Web Track 2012 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.catb.trec-web-2012.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Web Track 2012 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.catb.trec-web-2012

datamaestro_text.datasets.irds.data.Adhoc

The TREC Web Track 2012 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.catb.trec-web-2012.diversity.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Web Track 2012 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.catb.trec-web-2012.diversity.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Web Track 2012 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.catb.trec-web-2012.diversity

datamaestro_text.datasets.irds.data.Adhoc

The TREC Web Track 2012 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

clueweb09/de

Subset of ClueWeb09 with only German-language documents.

Dataset irds.clueweb09.de.documents

datamaestro_text.datasets.irds.data.Documents

Subset of ClueWeb09 with only German-language documents.

clueweb09/en

Subset of ClueWeb09 with only English-language documents.

Dataset irds.clueweb09.en.documents

datamaestro_text.datasets.irds.data.Documents

Subset of ClueWeb09 with only English-language documents.

Dataset irds.clueweb09.en.trec-web-2009.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Web Track 2009 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.en.trec-web-2009.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Web Track 2009 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.en.trec-web-2009

datamaestro_text.datasets.irds.data.Adhoc

The TREC Web Track 2009 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.en.trec-web-2009.diversity.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Web Track 2009 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.en.trec-web-2009.diversity.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Web Track 2009 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.en.trec-web-2009.diversity

datamaestro_text.datasets.irds.data.Adhoc

The TREC Web Track 2009 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.en.trec-web-2010.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Web Track 2010 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.en.trec-web-2010.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Web Track 2010 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.en.trec-web-2010

datamaestro_text.datasets.irds.data.Adhoc

The TREC Web Track 2010 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.en.trec-web-2010.diversity.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Web Track 2010 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.en.trec-web-2010.diversity.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Web Track 2010 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.en.trec-web-2010.diversity

datamaestro_text.datasets.irds.data.Adhoc

The TREC Web Track 2010 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.en.trec-web-2011.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Web Track 2011 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.en.trec-web-2011.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Web Track 2011 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.en.trec-web-2011

datamaestro_text.datasets.irds.data.Adhoc

The TREC Web Track 2011 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.en.trec-web-2011.diversity.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Web Track 2011 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.en.trec-web-2011.diversity.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Web Track 2011 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.en.trec-web-2011.diversity

datamaestro_text.datasets.irds.data.Adhoc

The TREC Web Track 2011 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.en.trec-web-2012.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Web Track 2012 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.en.trec-web-2012.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Web Track 2012 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.en.trec-web-2012

datamaestro_text.datasets.irds.data.Adhoc

The TREC Web Track 2012 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.en.trec-web-2012.diversity.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Web Track 2012 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.en.trec-web-2012.diversity.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Web Track 2012 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb09.en.trec-web-2012.diversity

datamaestro_text.datasets.irds.data.Adhoc

The TREC Web Track 2012 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

clueweb09/es

Subset of ClueWeb09 with only Spanish-language documents.

Dataset irds.clueweb09.es.documents

datamaestro_text.datasets.irds.data.Documents

Subset of ClueWeb09 with only Spanish-language documents.

clueweb09/fr

Subset of ClueWeb09 with only French-language documents.

Dataset irds.clueweb09.fr.documents

datamaestro_text.datasets.irds.data.Documents

Subset of ClueWeb09 with only French-language documents.

clueweb09/it

Subset of ClueWeb09 with only Italian-language documents.

Dataset irds.clueweb09.it.documents

datamaestro_text.datasets.irds.data.Documents

Subset of ClueWeb09 with only Italian-language documents.

clueweb09/ja

Subset of ClueWeb09 with only Japanese-language documents.

Dataset irds.clueweb09.ja.documents

datamaestro_text.datasets.irds.data.Documents

Subset of ClueWeb09 with only Japanese-language documents.

clueweb09/ko

Subset of ClueWeb09 with only Korean-language documents.

Dataset irds.clueweb09.ko.documents

datamaestro_text.datasets.irds.data.Documents

Subset of ClueWeb09 with only Korean-language documents.

clueweb09/pt

Subset of ClueWeb09 with only Portuguese-language documents.

Dataset irds.clueweb09.pt.documents

datamaestro_text.datasets.irds.data.Documents

Subset of ClueWeb09 with only Portuguese-language documents.

clueweb09/zh

Subset of ClueWeb09 with only Chinese-language documents.

Dataset irds.clueweb09.zh.documents

datamaestro_text.datasets.irds.data.Documents

Subset of ClueWeb09 with only Chinese-language documents.

ClueWeb12

ClueWeb 2012 web document collection. Contains 733M web pages.

The dataset is obtained for a fee from CMU, and is shipped as hard drives. More information is provided here.

Dataset irds.clueweb12.documents

datamaestro_text.datasets.irds.data.Documents

ClueWeb 2012 web document collection. Contains 733M web pages.

The dataset is obtained for a fee from CMU, and is shipped as hard drives. More information is provided here.

Dataset irds.clueweb12.trec-web-2013.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Web Track 2013 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb12.trec-web-2013.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Web Track 2013 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb12.trec-web-2013

datamaestro_text.datasets.irds.data.Adhoc

The TREC Web Track 2013 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb12.trec-web-2013.diversity.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Web Track 2013 diverse ranking benchmark. Contains 50 queries with deep subtopic relevance judgments.

Dataset irds.clueweb12.trec-web-2013.diversity.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Web Track 2013 diverse ranking benchmark. Contains 50 queries with deep subtopic relevance judgments.

Dataset irds.clueweb12.trec-web-2013.diversity

datamaestro_text.datasets.irds.data.Adhoc

The TREC Web Track 2013 diverse ranking benchmark. Contains 50 queries with deep subtopic relevance judgments.

Dataset irds.clueweb12.trec-web-2014.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Web Track 2014 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb12.trec-web-2014.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Web Track 2014 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb12.trec-web-2014

datamaestro_text.datasets.irds.data.Adhoc

The TREC Web Track 2014 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.clueweb12.trec-web-2014.diversity.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Web Track 2014 diverse ranking benchmark. Contains 50 queries with deep subtopic relevance judgments.

Dataset irds.clueweb12.trec-web-2014.diversity.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Web Track 2014 diverse ranking benchmark. Contains 50 queries with deep subtopic relevance judgments.

Dataset irds.clueweb12.trec-web-2014.diversity

datamaestro_text.datasets.irds.data.Adhoc

The TREC Web Track 2014 diverse ranking benchmark. Contains 50 queries with deep subtopic relevance judgments.

Dataset irds.clueweb12.touche-2020-task-2.queries

datamaestro_text.datasets.irds.data.Topics

Decision making processes, be it at the societal or at the personal level, eventually come to a point where one side will challenge the other with a why-question, which is a prompt to justify one's stance. Thus, technologies for argument mining and argumentation processing are maturing at a rapid pace, giving rise for the first time to argument retrieval. Touché 2020 is the first lab on Argument Retrieval at CLEF 2020 featuring two tasks.

Given a comparative question, retrieve and rank documents from the ClueWeb12 that help to answer the comparative question.

Documents are judged based on their general topical relevance.

Dataset irds.clueweb12.touche-2020-task-2.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Decision making processes, be it at the societal or at the personal level, eventually come to a point where one side will challenge the other with a why-question, which is a prompt to justify one's stance. Thus, technologies for argument mining and argumentation processing are maturing at a rapid pace, giving rise for the first time to argument retrieval. Touché 2020 is the first lab on Argument Retrieval at CLEF 2020 featuring two tasks.

Given a comparative question, retrieve and rank documents from the ClueWeb12 that help to answer the comparative question.

Documents are judged based on their general topical relevance.

Dataset irds.clueweb12.touche-2020-task-2

datamaestro_text.datasets.irds.data.Adhoc

Decision making processes, be it at the societal or at the personal level, eventually come to a point where one side will challenge the other with a why-question, which is a prompt to justify one's stance. Thus, technologies for argument mining and argumentation processing are maturing at a rapid pace, giving rise for the first time to argument retrieval. Touché 2020 is the first lab on Argument Retrieval at CLEF 2020 featuring two tasks.

Given a comparative question, retrieve and rank documents from the ClueWeb12 that help to answer the comparative question.

Documents are judged based on their general topical relevance.

Dataset irds.clueweb12.touche-2021-task-2.queries

datamaestro_text.datasets.irds.data.Topics

Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2021 is the second lab on argument retrieval at CLEF 2021 featuring two tasks.

Given a comparative question, retrieve and rank documents from the ClueWeb12 that help to answer the comparative question.

Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.

Dataset irds.clueweb12.touche-2021-task-2.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2021 is the second lab on argument retrieval at CLEF 2021 featuring two tasks.

Given a comparative question, retrieve and rank documents from the ClueWeb12 that help to answer the comparative question.

Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.

Dataset irds.clueweb12.touche-2021-task-2

datamaestro_text.datasets.irds.data.Adhoc

Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2021 is the second lab on argument retrieval at CLEF 2021 featuring two tasks.

Given a comparative question, retrieve and rank documents from the ClueWeb12 that help to answer the comparative question.

Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.

clueweb12/b13

Official subset of the ClueWeb12 datasets with 52M web pages.

Dataset irds.clueweb12.b13.documents

datamaestro_text.datasets.irds.data.Documents

Official subset of the ClueWeb12 datasets with 52M web pages.

Dataset irds.clueweb12.b13.clef-ehealth.queries

datamaestro_text.datasets.irds.data.Topics

The CLEF eHealth 2016-17 IR dataset. Contains consumer health queries and judgments containing trustworthiness and understandability scores, in addition to the normal relevance assessments.

This dataset contains the combined 2016 and 2017 relevance judgments, since the same queries were used in the two year. The assessment year can be distinguished using iteration (2016 is iteration 0, 2017 is iteration 1).

Dataset irds.clueweb12.b13.clef-ehealth.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The CLEF eHealth 2016-17 IR dataset. Contains consumer health queries and judgments containing trustworthiness and understandability scores, in addition to the normal relevance assessments.

This dataset contains the combined 2016 and 2017 relevance judgments, since the same queries were used in the two year. The assessment year can be distinguished using iteration (2016 is iteration 0, 2017 is iteration 1).

Dataset irds.clueweb12.b13.clef-ehealth

datamaestro_text.datasets.irds.data.Adhoc

The CLEF eHealth 2016-17 IR dataset. Contains consumer health queries and judgments containing trustworthiness and understandability scores, in addition to the normal relevance assessments.

This dataset contains the combined 2016 and 2017 relevance judgments, since the same queries were used in the two year. The assessment year can be distinguished using iteration (2016 is iteration 0, 2017 is iteration 1).

Dataset irds.clueweb12.b13.clef-ehealth.cs.queries

datamaestro_text.datasets.irds.data.Topics

The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Czech. See clueweb12/b13/clef-ehealth for more details.

Dataset irds.clueweb12.b13.clef-ehealth.cs.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Czech. See clueweb12/b13/clef-ehealth for more details.

Dataset irds.clueweb12.b13.clef-ehealth.cs

datamaestro_text.datasets.irds.data.Adhoc

The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Czech. See clueweb12/b13/clef-ehealth for more details.

Dataset irds.clueweb12.b13.clef-ehealth.de.queries

datamaestro_text.datasets.irds.data.Topics

The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to German. See clueweb12/b13/clef-ehealth for more details.

Dataset irds.clueweb12.b13.clef-ehealth.de.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to German. See clueweb12/b13/clef-ehealth for more details.

Dataset irds.clueweb12.b13.clef-ehealth.de

datamaestro_text.datasets.irds.data.Adhoc

The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to German. See clueweb12/b13/clef-ehealth for more details.

Dataset irds.clueweb12.b13.clef-ehealth.fr.queries

datamaestro_text.datasets.irds.data.Topics

The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to French. See clueweb12/b13/clef-ehealth for more details.

Dataset irds.clueweb12.b13.clef-ehealth.fr.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to French. See clueweb12/b13/clef-ehealth for more details.

Dataset irds.clueweb12.b13.clef-ehealth.fr

datamaestro_text.datasets.irds.data.Adhoc

The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to French. See clueweb12/b13/clef-ehealth for more details.

Dataset irds.clueweb12.b13.clef-ehealth.hu.queries

datamaestro_text.datasets.irds.data.Topics

The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Hungarian. See clueweb12/b13/clef-ehealth for more details.

Dataset irds.clueweb12.b13.clef-ehealth.hu.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Hungarian. See clueweb12/b13/clef-ehealth for more details.

Dataset irds.clueweb12.b13.clef-ehealth.hu

datamaestro_text.datasets.irds.data.Adhoc

The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Hungarian. See clueweb12/b13/clef-ehealth for more details.

Dataset irds.clueweb12.b13.clef-ehealth.pl.queries

datamaestro_text.datasets.irds.data.Topics

The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Polish. See clueweb12/b13/clef-ehealth for more details.

Dataset irds.clueweb12.b13.clef-ehealth.pl.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Polish. See clueweb12/b13/clef-ehealth for more details.

Dataset irds.clueweb12.b13.clef-ehealth.pl

datamaestro_text.datasets.irds.data.Adhoc

The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Polish. See clueweb12/b13/clef-ehealth for more details.

Dataset irds.clueweb12.b13.clef-ehealth.sv.queries

datamaestro_text.datasets.irds.data.Topics

The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Swedish. See clueweb12/b13/clef-ehealth for more details.

Dataset irds.clueweb12.b13.clef-ehealth.sv.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Swedish. See clueweb12/b13/clef-ehealth for more details.

Dataset irds.clueweb12.b13.clef-ehealth.sv

datamaestro_text.datasets.irds.data.Adhoc

The CLEF eHealth 2016-17 IR dataset, with queries professionally translataed to Swedish. See clueweb12/b13/clef-ehealth for more details.

Dataset irds.clueweb12.b13.ntcir-www-1.queries

datamaestro_text.datasets.irds.data.Topics

The NTCIR-13 We Want Web (WWW) 1 ad-hoc ranking benchmark. Contains 100 queries with deep relevance judgments (avg 255 per query). Judgments aggregated from two assessors. Note that the qrels contain additional judgments from the NTCIR-14 CENTRE track.

Dataset irds.clueweb12.b13.ntcir-www-1.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The NTCIR-13 We Want Web (WWW) 1 ad-hoc ranking benchmark. Contains 100 queries with deep relevance judgments (avg 255 per query). Judgments aggregated from two assessors. Note that the qrels contain additional judgments from the NTCIR-14 CENTRE track.

Dataset irds.clueweb12.b13.ntcir-www-1

datamaestro_text.datasets.irds.data.Adhoc

The NTCIR-13 We Want Web (WWW) 1 ad-hoc ranking benchmark. Contains 100 queries with deep relevance judgments (avg 255 per query). Judgments aggregated from two assessors. Note that the qrels contain additional judgments from the NTCIR-14 CENTRE track.

Dataset irds.clueweb12.b13.ntcir-www-2.queries

datamaestro_text.datasets.irds.data.Topics

The NTCIR-14 We Want Web (WWW) 2 ad-hoc ranking benchmark. Contains 80 queries with deep relevance judgments (avg 345 per query). Judgments aggregated from two assessors.

Dataset irds.clueweb12.b13.ntcir-www-2.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The NTCIR-14 We Want Web (WWW) 2 ad-hoc ranking benchmark. Contains 80 queries with deep relevance judgments (avg 345 per query). Judgments aggregated from two assessors.

Dataset irds.clueweb12.b13.ntcir-www-2

datamaestro_text.datasets.irds.data.Adhoc

The NTCIR-14 We Want Web (WWW) 2 ad-hoc ranking benchmark. Contains 80 queries with deep relevance judgments (avg 345 per query). Judgments aggregated from two assessors.

Dataset irds.clueweb12.b13.ntcir-www-3.queries

datamaestro_text.datasets.irds.data.Topics

The NTCIR-15 We Want Web (WWW) 3 ad-hoc ranking benchmark. Contains 160 queries with deep relevance judgments (to be released). 80 of the queries are from clueweb12/b13/ntcir-www-2.

Dataset irds.clueweb12.b13.trec-misinfo-2019.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Medical Misinformation 2019 dataset.

Dataset irds.clueweb12.b13.trec-misinfo-2019.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Medical Misinformation 2019 dataset.

Dataset irds.clueweb12.b13.trec-misinfo-2019

datamaestro_text.datasets.irds.data.Adhoc

The TREC Medical Misinformation 2019 dataset.

CODEC

CODEC Document Ranking sub-task.

  • Documents: curated web articles
  • Queries: challenging, entity-focused queries
  • Task Repository
  • See also: kilt/codec, the entity ranking subtask
Dataset irds.codec.documents

datamaestro_text.datasets.irds.data.Documents

CODEC Document Ranking sub-task.

  • Documents: curated web articles
  • Queries: challenging, entity-focused queries
  • Task Repository
  • See also: kilt/codec, the entity ranking subtask
Dataset irds.codec.queries

datamaestro_text.datasets.irds.data.Topics

CODEC Document Ranking sub-task.

  • Documents: curated web articles
  • Queries: challenging, entity-focused queries
  • Task Repository
  • See also: kilt/codec, the entity ranking subtask
Dataset irds.codec.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

CODEC Document Ranking sub-task.

  • Documents: curated web articles
  • Queries: challenging, entity-focused queries
  • Task Repository
  • See also: kilt/codec, the entity ranking subtask
Dataset irds.codec

datamaestro_text.datasets.irds.data.Adhoc

CODEC Document Ranking sub-task.

  • Documents: curated web articles
  • Queries: challenging, entity-focused queries
  • Task Repository
  • See also: kilt/codec, the entity ranking subtask
Dataset irds.codec.economics.queries

datamaestro_text.datasets.irds.data.Topics

Subset of codec that only contains topics about economics.

Dataset irds.codec.economics.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Subset of codec that only contains topics about economics.

Dataset irds.codec.economics

datamaestro_text.datasets.irds.data.Adhoc

Subset of codec that only contains topics about economics.

Dataset irds.codec.history.queries

datamaestro_text.datasets.irds.data.Topics

Subset of codec that only contains topics about history.

Dataset irds.codec.history.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Subset of codec that only contains topics about history.

Dataset irds.codec.history

datamaestro_text.datasets.irds.data.Adhoc

Subset of codec that only contains topics about history.

Dataset irds.codec.politics.queries

datamaestro_text.datasets.irds.data.Topics

Subset of codec that only contains topics about politics.

Dataset irds.codec.politics.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Subset of codec that only contains topics about politics.

Dataset irds.codec.politics

datamaestro_text.datasets.irds.data.Adhoc

Subset of codec that only contains topics about politics.

CORD-19

Collection of scientific articles related to COVID-19.

Uses the 2020-07-16 version of the dataset, corresponding to the "complete" collection used for TREC COVID.

Note that this version of the document collection only provides article meta-data. To get the full text, use cord19/fulltext.

Dataset irds.cord19.documents

datamaestro_text.datasets.irds.data.Documents

Collection of scientific articles related to COVID-19.

Uses the 2020-07-16 version of the dataset, corresponding to the "complete" collection used for TREC COVID.

Note that this version of the document collection only provides article meta-data. To get the full text, use cord19/fulltext.

Dataset irds.cord19.trec-covid.queries

datamaestro_text.datasets.irds.data.Topics

The Complete TREC COVID collection. Queries related to COVID-19, including deep relevance judgments.

Dataset irds.cord19.trec-covid.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The Complete TREC COVID collection. Queries related to COVID-19, including deep relevance judgments.

Dataset irds.cord19.trec-covid

datamaestro_text.datasets.irds.data.Adhoc

The Complete TREC COVID collection. Queries related to COVID-19, including deep relevance judgments.

Dataset irds.cord19.trec-covid.round5.queries

datamaestro_text.datasets.irds.data.Topics

Round 5 of the TREC COVID task. Includes 50 queries related to COVID-19. This uses the "2020-07-16" version of the collection.

Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).

Dataset irds.cord19.trec-covid.round5.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Round 5 of the TREC COVID task. Includes 50 queries related to COVID-19. This uses the "2020-07-16" version of the collection.

Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).

Dataset irds.cord19.trec-covid.round5

datamaestro_text.datasets.irds.data.Adhoc

Round 5 of the TREC COVID task. Includes 50 queries related to COVID-19. This uses the "2020-07-16" version of the collection.

Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).

cord19/fulltext

Version of cord19 dataset that includes article full texts. This dataset takes longer to load than the version that only includes article meata-data.

Dataset irds.cord19.fulltext.documents

datamaestro_text.datasets.irds.data.Documents

Version of cord19 dataset that includes article full texts. This dataset takes longer to load than the version that only includes article meata-data.

Dataset irds.cord19.fulltext.trec-covid.queries

datamaestro_text.datasets.irds.data.Topics

Version of cord19/trec-covid dataset that includes article full texts. This dataset takes longer to load than the version that only includes article meata-data.

Queries and qrels are the same as cord19/trec-covid; it just uses the extended documents from cord19/fulltext.

Dataset irds.cord19.fulltext.trec-covid.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of cord19/trec-covid dataset that includes article full texts. This dataset takes longer to load than the version that only includes article meata-data.

Queries and qrels are the same as cord19/trec-covid; it just uses the extended documents from cord19/fulltext.

Dataset irds.cord19.fulltext.trec-covid

datamaestro_text.datasets.irds.data.Adhoc

Version of cord19/trec-covid dataset that includes article full texts. This dataset takes longer to load than the version that only includes article meata-data.

Queries and qrels are the same as cord19/trec-covid; it just uses the extended documents from cord19/fulltext.

cord19/trec-covid/round1

Round 1 of the TREC COVID task. Includes 30 queries related to COVID-19. This uses the "2020-04-10" version of the collection.

Dataset irds.cord19.trec-covid.round1.documents

datamaestro_text.datasets.irds.data.Documents

Round 1 of the TREC COVID task. Includes 30 queries related to COVID-19. This uses the "2020-04-10" version of the collection.

Dataset irds.cord19.trec-covid.round1.queries

datamaestro_text.datasets.irds.data.Topics

Round 1 of the TREC COVID task. Includes 30 queries related to COVID-19. This uses the "2020-04-10" version of the collection.

Dataset irds.cord19.trec-covid.round1.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Round 1 of the TREC COVID task. Includes 30 queries related to COVID-19. This uses the "2020-04-10" version of the collection.

Dataset irds.cord19.trec-covid.round1

datamaestro_text.datasets.irds.data.Adhoc

Round 1 of the TREC COVID task. Includes 30 queries related to COVID-19. This uses the "2020-04-10" version of the collection.

cord19/trec-covid/round2

Round 2 of the TREC COVID task. Includes 35 queries related to COVID-19. This uses the "2020-05-01" version of the collection.

Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).

Dataset irds.cord19.trec-covid.round2.documents

datamaestro_text.datasets.irds.data.Documents

Round 2 of the TREC COVID task. Includes 35 queries related to COVID-19. This uses the "2020-05-01" version of the collection.

Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).

Dataset irds.cord19.trec-covid.round2.queries

datamaestro_text.datasets.irds.data.Topics

Round 2 of the TREC COVID task. Includes 35 queries related to COVID-19. This uses the "2020-05-01" version of the collection.

Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).

Dataset irds.cord19.trec-covid.round2.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Round 2 of the TREC COVID task. Includes 35 queries related to COVID-19. This uses the "2020-05-01" version of the collection.

Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).

Dataset irds.cord19.trec-covid.round2

datamaestro_text.datasets.irds.data.Adhoc

Round 2 of the TREC COVID task. Includes 35 queries related to COVID-19. This uses the "2020-05-01" version of the collection.

Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).

cord19/trec-covid/round3

Round 3 of the TREC COVID task. Includes 40 queries related to COVID-19. This uses the "2020-05-19" version of the collection.

Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).

Dataset irds.cord19.trec-covid.round3.documents

datamaestro_text.datasets.irds.data.Documents

Round 3 of the TREC COVID task. Includes 40 queries related to COVID-19. This uses the "2020-05-19" version of the collection.

Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).

Dataset irds.cord19.trec-covid.round3.queries

datamaestro_text.datasets.irds.data.Topics

Round 3 of the TREC COVID task. Includes 40 queries related to COVID-19. This uses the "2020-05-19" version of the collection.

Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).

Dataset irds.cord19.trec-covid.round3.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Round 3 of the TREC COVID task. Includes 40 queries related to COVID-19. This uses the "2020-05-19" version of the collection.

Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).

Dataset irds.cord19.trec-covid.round3

datamaestro_text.datasets.irds.data.Adhoc

Round 3 of the TREC COVID task. Includes 40 queries related to COVID-19. This uses the "2020-05-19" version of the collection.

Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).

cord19/trec-covid/round4

Round 4 of the TREC COVID task. Includes 45 queries related to COVID-19. This uses the "2020-06-19" version of the collection.

Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).

Dataset irds.cord19.trec-covid.round4.documents

datamaestro_text.datasets.irds.data.Documents

Round 4 of the TREC COVID task. Includes 45 queries related to COVID-19. This uses the "2020-06-19" version of the collection.

Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).

Dataset irds.cord19.trec-covid.round4.queries

datamaestro_text.datasets.irds.data.Topics

Round 4 of the TREC COVID task. Includes 45 queries related to COVID-19. This uses the "2020-06-19" version of the collection.

Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).

Dataset irds.cord19.trec-covid.round4.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Round 4 of the TREC COVID task. Includes 45 queries related to COVID-19. This uses the "2020-06-19" version of the collection.

Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).

Dataset irds.cord19.trec-covid.round4

datamaestro_text.datasets.irds.data.Adhoc

Round 4 of the TREC COVID task. Includes 45 queries related to COVID-19. This uses the "2020-06-19" version of the collection.

Note that the qrels do not contain results from the prior round(s). Use the "complete" version for this setting (cord19/trec-covid).

Cranfield

A small corpus of 1,400 scientific abstracts.

Dataset irds.cranfield.documents

datamaestro_text.datasets.irds.data.Documents

A small corpus of 1,400 scientific abstracts.

Dataset irds.cranfield.queries

datamaestro_text.datasets.irds.data.Topics

A small corpus of 1,400 scientific abstracts.

Dataset irds.cranfield.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A small corpus of 1,400 scientific abstracts.

Dataset irds.cranfield

datamaestro_text.datasets.irds.data.Adhoc

A small corpus of 1,400 scientific abstracts.

CSL

The CSL dataset, used for the TREC NueCLIR technical document task.

Dataset irds.csl.documents

datamaestro_text.datasets.irds.data.Documents

The CSL dataset, used for the TREC NueCLIR technical document task.

Dataset irds.csl.trec-2023.queries

datamaestro_text.datasets.irds.data.Topics

The TREC NeuCLIR 2023 technical documen task.

Dataset irds.csl.trec-2023.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC NeuCLIR 2023 technical documen task.

Dataset irds.csl.trec-2023

datamaestro_text.datasets.irds.data.Adhoc

The TREC NeuCLIR 2023 technical documen task.

disks45/nocr

A version of disks45 without the Congressional Record. This is the typical setting for tasks like TREC 7, TREC 8, and TREC Robust 2004.

Dataset irds.disks45.nocr.documents

datamaestro_text.datasets.irds.data.Documents

A version of disks45 without the Congressional Record. This is the typical setting for tasks like TREC 7, TREC 8, and TREC Robust 2004.

Dataset irds.disks45.nocr.trec-robust-2004.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Robust retrieval task focuses on "improving the consistency of retrieval technology by focusing on poorly performing topics."

The TREC Robust document collection is from TREC disks 4 and 5. Due to the copyrighted nature of the documents, this collection is for research use only, which requires agreements to be filed with NIST. See details here.

Dataset irds.disks45.nocr.trec-robust-2004.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Robust retrieval task focuses on "improving the consistency of retrieval technology by focusing on poorly performing topics."

The TREC Robust document collection is from TREC disks 4 and 5. Due to the copyrighted nature of the documents, this collection is for research use only, which requires agreements to be filed with NIST. See details here.

Dataset irds.disks45.nocr.trec-robust-2004

datamaestro_text.datasets.irds.data.Adhoc

The TREC Robust retrieval task focuses on "improving the consistency of retrieval technology by focusing on poorly performing topics."

The TREC Robust document collection is from TREC disks 4 and 5. Due to the copyrighted nature of the documents, this collection is for research use only, which requires agreements to be filed with NIST. See details here.

Dataset irds.disks45.nocr.trec-robust-2004.fold1.queries

datamaestro_text.datasets.irds.data.Topics

Robust04 Fold 1 (Title) proposed by Huston & Croft (2014) and used in numerous works

Dataset irds.disks45.nocr.trec-robust-2004.fold1.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Robust04 Fold 1 (Title) proposed by Huston & Croft (2014) and used in numerous works

Dataset irds.disks45.nocr.trec-robust-2004.fold1

datamaestro_text.datasets.irds.data.Adhoc

Robust04 Fold 1 (Title) proposed by Huston & Croft (2014) and used in numerous works

Dataset irds.disks45.nocr.trec-robust-2004.fold2.queries

datamaestro_text.datasets.irds.data.Topics

Robust04 Fold 2 (Title) proposed by Huston & Croft (2014) and used in numerous works

Dataset irds.disks45.nocr.trec-robust-2004.fold2.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Robust04 Fold 2 (Title) proposed by Huston & Croft (2014) and used in numerous works

Dataset irds.disks45.nocr.trec-robust-2004.fold2

datamaestro_text.datasets.irds.data.Adhoc

Robust04 Fold 2 (Title) proposed by Huston & Croft (2014) and used in numerous works

Dataset irds.disks45.nocr.trec-robust-2004.fold3.queries

datamaestro_text.datasets.irds.data.Topics

Robust04 Fold 3 (Title) proposed by Huston & Croft (2014) and used in numerous works

Dataset irds.disks45.nocr.trec-robust-2004.fold3.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Robust04 Fold 3 (Title) proposed by Huston & Croft (2014) and used in numerous works

Dataset irds.disks45.nocr.trec-robust-2004.fold3

datamaestro_text.datasets.irds.data.Adhoc

Robust04 Fold 3 (Title) proposed by Huston & Croft (2014) and used in numerous works

Dataset irds.disks45.nocr.trec-robust-2004.fold4.queries

datamaestro_text.datasets.irds.data.Topics

Robust04 Fold 4 (Title) proposed by Huston & Croft (2014) and used in numerous works

Dataset irds.disks45.nocr.trec-robust-2004.fold4.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Robust04 Fold 4 (Title) proposed by Huston & Croft (2014) and used in numerous works

Dataset irds.disks45.nocr.trec-robust-2004.fold4

datamaestro_text.datasets.irds.data.Adhoc

Robust04 Fold 4 (Title) proposed by Huston & Croft (2014) and used in numerous works

Dataset irds.disks45.nocr.trec-robust-2004.fold5.queries

datamaestro_text.datasets.irds.data.Topics

Robust04 Fold 5 (Title) proposed by Huston & Croft (2014) and used in numerous works

Dataset irds.disks45.nocr.trec-robust-2004.fold5.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Robust04 Fold 5 (Title) proposed by Huston & Croft (2014) and used in numerous works

Dataset irds.disks45.nocr.trec-robust-2004.fold5

datamaestro_text.datasets.irds.data.Adhoc

Robust04 Fold 5 (Title) proposed by Huston & Croft (2014) and used in numerous works

Dataset irds.disks45.nocr.trec7.queries

datamaestro_text.datasets.irds.data.Topics

The TREC 7 Adhoc Retrieval track.

Dataset irds.disks45.nocr.trec7.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC 7 Adhoc Retrieval track.

Dataset irds.disks45.nocr.trec7

datamaestro_text.datasets.irds.data.Adhoc

The TREC 7 Adhoc Retrieval track.

Dataset irds.disks45.nocr.trec8.queries

datamaestro_text.datasets.irds.data.Topics

The TREC 8 Adhoc Retrieval track.

Dataset irds.disks45.nocr.trec8.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC 8 Adhoc Retrieval track.

Dataset irds.disks45.nocr.trec8

datamaestro_text.datasets.irds.data.Adhoc

The TREC 8 Adhoc Retrieval track.

DPR Wiki100

A wikipedia dump from 20 December, 2018, split into passages of 100 words. Used in experiments in the DPR paper (and other subsequent works) for retrieval experiments over Q&A collections.

Dataset irds.dpr-w100.documents

datamaestro_text.datasets.irds.data.Documents

A wikipedia dump from 20 December, 2018, split into passages of 100 words. Used in experiments in the DPR paper (and other subsequent works) for retrieval experiments over Q&A collections.

Dataset irds.dpr-w100.natural-questions.dev.queries

datamaestro_text.datasets.irds.data.Topics

Dev subset from the Natural Questions Q&A collection. This differs from the natural-questions/dev dataset in that it uses the full Wikipedia dump and additional filtering (described in the DPR paper) was applied.

Dataset irds.dpr-w100.natural-questions.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Dev subset from the Natural Questions Q&A collection. This differs from the natural-questions/dev dataset in that it uses the full Wikipedia dump and additional filtering (described in the DPR paper) was applied.

Dataset irds.dpr-w100.natural-questions.dev

datamaestro_text.datasets.irds.data.Adhoc

Dev subset from the Natural Questions Q&A collection. This differs from the natural-questions/dev dataset in that it uses the full Wikipedia dump and additional filtering (described in the DPR paper) was applied.

Dataset irds.dpr-w100.natural-questions.train.queries

datamaestro_text.datasets.irds.data.Topics

Training subset from the Natural Questions Q&A collection. This differs from the natural-questions/train dataset in that it uses the full Wikipedia dump and additional filtering (described in the DPR paper) was applied.

Dataset irds.dpr-w100.natural-questions.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Training subset from the Natural Questions Q&A collection. This differs from the natural-questions/train dataset in that it uses the full Wikipedia dump and additional filtering (described in the DPR paper) was applied.

Dataset irds.dpr-w100.natural-questions.train

datamaestro_text.datasets.irds.data.Adhoc

Training subset from the Natural Questions Q&A collection. This differs from the natural-questions/train dataset in that it uses the full Wikipedia dump and additional filtering (described in the DPR paper) was applied.

Dataset irds.dpr-w100.trivia-qa.dev.queries

datamaestro_text.datasets.irds.data.Topics

Dev subset from the Trivia QA dataset. Differing from the official Trivia QA collection, this uses the DPR Wikipedia dump as the source collection. Refer to the DPR paper for more details.

Dataset irds.dpr-w100.trivia-qa.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Dev subset from the Trivia QA dataset. Differing from the official Trivia QA collection, this uses the DPR Wikipedia dump as the source collection. Refer to the DPR paper for more details.

Dataset irds.dpr-w100.trivia-qa.dev

datamaestro_text.datasets.irds.data.Adhoc

Dev subset from the Trivia QA dataset. Differing from the official Trivia QA collection, this uses the DPR Wikipedia dump as the source collection. Refer to the DPR paper for more details.

Dataset irds.dpr-w100.trivia-qa.train.queries

datamaestro_text.datasets.irds.data.Topics

Training subset from the Trivia QA dataset. Differing from the official Trivia QA collection, this uses the DPR Wikipedia dump as the source collection. Refer to the DPR paper for more details.

Dataset irds.dpr-w100.trivia-qa.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Training subset from the Trivia QA dataset. Differing from the official Trivia QA collection, this uses the DPR Wikipedia dump as the source collection. Refer to the DPR paper for more details.

Dataset irds.dpr-w100.trivia-qa.train

datamaestro_text.datasets.irds.data.Adhoc

Training subset from the Trivia QA dataset. Differing from the official Trivia QA collection, this uses the DPR Wikipedia dump as the source collection. Refer to the DPR paper for more details.

CodeSearchNet

A benchmark for semantic code search. Uses

Dataset irds.codesearchnet.documents

datamaestro_text.datasets.irds.data.Documents

A benchmark for semantic code search. Uses

Dataset irds.codesearchnet.challenge.queries

datamaestro_text.datasets.irds.data.Topics

Official challenge set, with keyword queries and deep relevance assessments.

Dataset irds.codesearchnet.challenge.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official challenge set, with keyword queries and deep relevance assessments.

Dataset irds.codesearchnet.challenge

datamaestro_text.datasets.irds.data.Adhoc

Official challenge set, with keyword queries and deep relevance assessments.

Dataset irds.codesearchnet.test.queries

datamaestro_text.datasets.irds.data.Topics

Official test set, using queries inferred from docstrings.

Dataset irds.codesearchnet.test.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official test set, using queries inferred from docstrings.

Dataset irds.codesearchnet.test

datamaestro_text.datasets.irds.data.Adhoc

Official test set, using queries inferred from docstrings.

Dataset irds.codesearchnet.train.queries

datamaestro_text.datasets.irds.data.Topics

Official train set, using queries inferred from docstrings.

Dataset irds.codesearchnet.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official train set, using queries inferred from docstrings.

Dataset irds.codesearchnet.train

datamaestro_text.datasets.irds.data.Adhoc

Official train set, using queries inferred from docstrings.

Dataset irds.codesearchnet.valid.queries

datamaestro_text.datasets.irds.data.Topics

Official validation set, using queries inferred from docstrings.

Dataset irds.codesearchnet.valid.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official validation set, using queries inferred from docstrings.

Dataset irds.codesearchnet.valid

datamaestro_text.datasets.irds.data.Adhoc

Official validation set, using queries inferred from docstrings.

GOV

GOV web document collection. Used for early TREC Web Tracks. Not to be confused with gov2.

The dataset is obtained for a fee from UoG, and is shipped as a hard drive. More information is provided here.

Dataset irds.gov.documents

datamaestro_text.datasets.irds.data.Documents

GOV web document collection. Used for early TREC Web Tracks. Not to be confused with gov2.

The dataset is obtained for a fee from UoG, and is shipped as a hard drive. More information is provided here.

Dataset irds.gov.trec-web-2002.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Web Track 2002 ad-hoc ranking benchmark.

Dataset irds.gov.trec-web-2002.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Web Track 2002 ad-hoc ranking benchmark.

Dataset irds.gov.trec-web-2002

datamaestro_text.datasets.irds.data.Adhoc

The TREC Web Track 2002 ad-hoc ranking benchmark.

Dataset irds.gov.trec-web-2002.named-page.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Web Track 2002 named page ranking benchmark.

Dataset irds.gov.trec-web-2002.named-page.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Web Track 2002 named page ranking benchmark.

Dataset irds.gov.trec-web-2002.named-page

datamaestro_text.datasets.irds.data.Adhoc

The TREC Web Track 2002 named page ranking benchmark.

Dataset irds.gov.trec-web-2003.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Web Track 2003 ad-hoc ranking benchmark.

Dataset irds.gov.trec-web-2003.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Web Track 2003 ad-hoc ranking benchmark.

Dataset irds.gov.trec-web-2003

datamaestro_text.datasets.irds.data.Adhoc

The TREC Web Track 2003 ad-hoc ranking benchmark.

Dataset irds.gov.trec-web-2003.named-page.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Web Track 2003 named page ranking benchmark.

Dataset irds.gov.trec-web-2003.named-page.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Web Track 2003 named page ranking benchmark.

Dataset irds.gov.trec-web-2003.named-page

datamaestro_text.datasets.irds.data.Adhoc

The TREC Web Track 2003 named page ranking benchmark.

Dataset irds.gov.trec-web-2004.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Web Track 2004 ad-hoc ranking benchmark.

Queries include a combination of topic distillation, homepage finding, and named page finding.

Dataset irds.gov.trec-web-2004.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Web Track 2004 ad-hoc ranking benchmark.

Queries include a combination of topic distillation, homepage finding, and named page finding.

Dataset irds.gov.trec-web-2004

datamaestro_text.datasets.irds.data.Adhoc

The TREC Web Track 2004 ad-hoc ranking benchmark.

Queries include a combination of topic distillation, homepage finding, and named page finding.

GOV2

GOV2 web document collection. Used for the TREC Terabyte Track.

The dataset is obtained for a fee from UoG, and is shipped as a hard drive. More information is provided here.

Dataset irds.gov2.documents

datamaestro_text.datasets.irds.data.Documents

GOV2 web document collection. Used for the TREC Terabyte Track.

The dataset is obtained for a fee from UoG, and is shipped as a hard drive. More information is provided here.

Dataset irds.gov2.trec-mq-2007.queries

datamaestro_text.datasets.irds.data.Topics

TREC 2007 Million Query track.

Dataset irds.gov2.trec-mq-2007.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

TREC 2007 Million Query track.

Dataset irds.gov2.trec-mq-2007

datamaestro_text.datasets.irds.data.Adhoc

TREC 2007 Million Query track.

Dataset irds.gov2.trec-mq-2008.queries

datamaestro_text.datasets.irds.data.Topics

TREC 2008 Million Query track.

Dataset irds.gov2.trec-mq-2008.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

TREC 2008 Million Query track.

Dataset irds.gov2.trec-mq-2008

datamaestro_text.datasets.irds.data.Adhoc

TREC 2008 Million Query track.

Dataset irds.gov2.trec-tb-2004.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Terabyte Track 2004 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.gov2.trec-tb-2004.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Terabyte Track 2004 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.gov2.trec-tb-2004

datamaestro_text.datasets.irds.data.Adhoc

The TREC Terabyte Track 2004 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.gov2.trec-tb-2005.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Terabyte Track 2005 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.gov2.trec-tb-2005.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Terabyte Track 2005 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.gov2.trec-tb-2005

datamaestro_text.datasets.irds.data.Adhoc

The TREC Terabyte Track 2005 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.gov2.trec-tb-2005.efficiency.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Terabyte Track 2005 efficiency ranking benchmark. Contains 50,000 queries from a search engine, including the 50 topics from gov2/trec-tb-2005. Only the 50 topics have judgments.

Dataset irds.gov2.trec-tb-2005.efficiency.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Terabyte Track 2005 efficiency ranking benchmark. Contains 50,000 queries from a search engine, including the 50 topics from gov2/trec-tb-2005. Only the 50 topics have judgments.

Dataset irds.gov2.trec-tb-2005.efficiency

datamaestro_text.datasets.irds.data.Adhoc

The TREC Terabyte Track 2005 efficiency ranking benchmark. Contains 50,000 queries from a search engine, including the 50 topics from gov2/trec-tb-2005. Only the 50 topics have judgments.

Dataset irds.gov2.trec-tb-2005.named-page.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Terabyte Track 2005 named page ranking benchmark. Contains 252 queries with titles that resemble bookmark labels. Relevance judgments include near-duplicate pages and other pages that may satisfy the bookmark label.

Dataset irds.gov2.trec-tb-2005.named-page.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Terabyte Track 2005 named page ranking benchmark. Contains 252 queries with titles that resemble bookmark labels. Relevance judgments include near-duplicate pages and other pages that may satisfy the bookmark label.

Dataset irds.gov2.trec-tb-2005.named-page

datamaestro_text.datasets.irds.data.Adhoc

The TREC Terabyte Track 2005 named page ranking benchmark. Contains 252 queries with titles that resemble bookmark labels. Relevance judgments include near-duplicate pages and other pages that may satisfy the bookmark label.

Dataset irds.gov2.trec-tb-2006.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Terabyte Track 2006 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.gov2.trec-tb-2006.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Terabyte Track 2006 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.gov2.trec-tb-2006

datamaestro_text.datasets.irds.data.Adhoc

The TREC Terabyte Track 2006 ad-hoc ranking benchmark. Contains 50 queries with deep relevance judgments.

Dataset irds.gov2.trec-tb-2006.efficiency.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Terabyte Track 2006 efficiency ranking benchmark. Contains 100,000 queries from a search engine, including the 50 topics from gov2/trec-tb-2006. Only the 50 topics have judgments.

Dataset irds.gov2.trec-tb-2006.efficiency.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Terabyte Track 2006 efficiency ranking benchmark. Contains 100,000 queries from a search engine, including the 50 topics from gov2/trec-tb-2006. Only the 50 topics have judgments.

Dataset irds.gov2.trec-tb-2006.efficiency

datamaestro_text.datasets.irds.data.Adhoc

The TREC Terabyte Track 2006 efficiency ranking benchmark. Contains 100,000 queries from a search engine, including the 50 topics from gov2/trec-tb-2006. Only the 50 topics have judgments.

Dataset irds.gov2.trec-tb-2006.efficiency.10k.queries

datamaestro_text.datasets.irds.data.Topics

Small stream from gov2/trec-tb-2006/efficiency, with 10,000 queries.

Dataset irds.gov2.trec-tb-2006.efficiency.stream1.queries

datamaestro_text.datasets.irds.data.Topics

Stream 1 of gov2/trec-tb-2006/efficiency (25,000 queries).

Dataset irds.gov2.trec-tb-2006.efficiency.stream2.queries

datamaestro_text.datasets.irds.data.Topics

Stream 2 of gov2/trec-tb-2006/efficiency (25,000 queries).

Dataset irds.gov2.trec-tb-2006.efficiency.stream3.queries

datamaestro_text.datasets.irds.data.Topics

Stream 3 of gov2/trec-tb-2006/efficiency (25,000 queries).

Dataset irds.gov2.trec-tb-2006.efficiency.stream3.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Stream 3 of gov2/trec-tb-2006/efficiency (25,000 queries).

Dataset irds.gov2.trec-tb-2006.efficiency.stream3

datamaestro_text.datasets.irds.data.Adhoc

Stream 3 of gov2/trec-tb-2006/efficiency (25,000 queries).

Dataset irds.gov2.trec-tb-2006.efficiency.stream4.queries

datamaestro_text.datasets.irds.data.Topics

Stream 4 of gov2/trec-tb-2006/efficiency (25,000 queries).

Dataset irds.gov2.trec-tb-2006.named-page.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Terabyte Track 2006 named page ranking benchmark. Contains 181 queries with titles that resemble bookmark labels. Relevance judgments include near-duplicate pages and other pages that may satisfy the bookmark label.

Dataset irds.gov2.trec-tb-2006.named-page.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Terabyte Track 2006 named page ranking benchmark. Contains 181 queries with titles that resemble bookmark labels. Relevance judgments include near-duplicate pages and other pages that may satisfy the bookmark label.

Dataset irds.gov2.trec-tb-2006.named-page

datamaestro_text.datasets.irds.data.Adhoc

The TREC Terabyte Track 2006 named page ranking benchmark. Contains 181 queries with titles that resemble bookmark labels. Relevance judgments include near-duplicate pages and other pages that may satisfy the bookmark label.

Istella22

The Istella22 dataset facilitates comparisions between traditional and neural learning-to-rank by including query and document text along with LTR features (not included in ir_datasets).

Note that to use the dataset, you must read and accept the Istella22 License Agreement. By using the dataset, you agree to be bound by the terms of the license: the Istella dataset is solely for non-commercial use.

Dataset irds.istella22.documents

datamaestro_text.datasets.irds.data.Documents

The Istella22 dataset facilitates comparisions between traditional and neural learning-to-rank by including query and document text along with LTR features (not included in ir_datasets).

Note that to use the dataset, you must read and accept the Istella22 License Agreement. By using the dataset, you agree to be bound by the terms of the license: the Istella dataset is solely for non-commercial use.

Dataset irds.istella22.test.queries

datamaestro_text.datasets.irds.data.Topics

Official test query set.

Dataset irds.istella22.test.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official test query set.

Dataset irds.istella22.test

datamaestro_text.datasets.irds.data.Adhoc

Official test query set.

Dataset irds.istella22.test.fold1.queries

datamaestro_text.datasets.irds.data.Topics

Official test query set.

Dataset irds.istella22.test.fold1.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official test query set.

Dataset irds.istella22.test.fold1

datamaestro_text.datasets.irds.data.Adhoc

Official test query set.

Dataset irds.istella22.test.fold2.queries

datamaestro_text.datasets.irds.data.Topics

Official test query set.

Dataset irds.istella22.test.fold2.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official test query set.

Dataset irds.istella22.test.fold2

datamaestro_text.datasets.irds.data.Adhoc

Official test query set.

Dataset irds.istella22.test.fold3.queries

datamaestro_text.datasets.irds.data.Topics

Official test query set.

Dataset irds.istella22.test.fold3.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official test query set.

Dataset irds.istella22.test.fold3

datamaestro_text.datasets.irds.data.Adhoc

Official test query set.

Dataset irds.istella22.test.fold4.queries

datamaestro_text.datasets.irds.data.Topics

Official test query set.

Dataset irds.istella22.test.fold4.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official test query set.

Dataset irds.istella22.test.fold4

datamaestro_text.datasets.irds.data.Adhoc

Official test query set.

Dataset irds.istella22.test.fold5.queries

datamaestro_text.datasets.irds.data.Topics

Official test query set.

Dataset irds.istella22.test.fold5.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official test query set.

Dataset irds.istella22.test.fold5

datamaestro_text.datasets.irds.data.Adhoc

Official test query set.

KILT

KILT is a corpus used for various "knowledge intensive language tasks".

Dataset irds.kilt.documents

datamaestro_text.datasets.irds.data.Documents

KILT is a corpus used for various "knowledge intensive language tasks".

Dataset irds.kilt.codec.queries

datamaestro_text.datasets.irds.data.Topics

CODEC Entity Ranking sub-task.

Dataset irds.kilt.codec.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

CODEC Entity Ranking sub-task.

Dataset irds.kilt.codec

datamaestro_text.datasets.irds.data.Adhoc

CODEC Entity Ranking sub-task.

Dataset irds.kilt.codec.economics.queries

datamaestro_text.datasets.irds.data.Topics

Subset of codec that only contains topics about economics.

Dataset irds.kilt.codec.economics.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Subset of codec that only contains topics about economics.

Dataset irds.kilt.codec.economics

datamaestro_text.datasets.irds.data.Adhoc

Subset of codec that only contains topics about economics.

Dataset irds.kilt.codec.history.queries

datamaestro_text.datasets.irds.data.Topics

Subset of codec that only contains topics about history.

Dataset irds.kilt.codec.history.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Subset of codec that only contains topics about history.

Dataset irds.kilt.codec.history

datamaestro_text.datasets.irds.data.Adhoc

Subset of codec that only contains topics about history.

Dataset irds.kilt.codec.politics.queries

datamaestro_text.datasets.irds.data.Topics

Subset of codec that only contains topics about politics.

Dataset irds.kilt.codec.politics.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Subset of codec that only contains topics about politics.

Dataset irds.kilt.codec.politics

datamaestro_text.datasets.irds.data.Adhoc

Subset of codec that only contains topics about politics.

lotte/lifestyle/dev

Answers from lifestyle-focused forums, including bicycles, coffee, crafts, diy, gardening, lifehacks, mechanics, music, outdoors, parenting, pets, sports, and travel.

Dataset irds.lotte.lifestyle.dev.documents

datamaestro_text.datasets.irds.data.Documents

Answers from lifestyle-focused forums, including bicycles, coffee, crafts, diy, gardening, lifehacks, mechanics, music, outdoors, parenting, pets, sports, and travel.

Dataset irds.lotte.lifestyle.dev.forum.queries

datamaestro_text.datasets.irds.data.Topics

Forum queries for lotte/lifestyle/dev.

Dataset irds.lotte.lifestyle.dev.forum.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Forum queries for lotte/lifestyle/dev.

Dataset irds.lotte.lifestyle.dev.forum

datamaestro_text.datasets.irds.data.Adhoc

Forum queries for lotte/lifestyle/dev.

Dataset irds.lotte.lifestyle.dev.search.queries

datamaestro_text.datasets.irds.data.Topics

Search queries for lotte/lifestyle/dev.

Dataset irds.lotte.lifestyle.dev.search.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Search queries for lotte/lifestyle/dev.

Dataset irds.lotte.lifestyle.dev.search

datamaestro_text.datasets.irds.data.Adhoc

Search queries for lotte/lifestyle/dev.

lotte/lifestyle/test

Queries and answers from lifestyle-focused forums, including bicycles, coffee, crafts, diy, gardening, lifehacks, mechanics, music, outdoors, parenting, pets, sports, and travel.

Dataset irds.lotte.lifestyle.test.documents

datamaestro_text.datasets.irds.data.Documents

Queries and answers from lifestyle-focused forums, including bicycles, coffee, crafts, diy, gardening, lifehacks, mechanics, music, outdoors, parenting, pets, sports, and travel.

Dataset irds.lotte.lifestyle.test.forum.queries

datamaestro_text.datasets.irds.data.Topics

Forum queries for lotte/lifestyle/test.

Dataset irds.lotte.lifestyle.test.forum.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Forum queries for lotte/lifestyle/test.

Dataset irds.lotte.lifestyle.test.forum

datamaestro_text.datasets.irds.data.Adhoc

Forum queries for lotte/lifestyle/test.

Dataset irds.lotte.lifestyle.test.search.queries

datamaestro_text.datasets.irds.data.Topics

Search queries for lotte/lifestyle/test.

Dataset irds.lotte.lifestyle.test.search.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Search queries for lotte/lifestyle/test.

Dataset irds.lotte.lifestyle.test.search

datamaestro_text.datasets.irds.data.Adhoc

Search queries for lotte/lifestyle/test.

lotte/pooled/dev

Combined version of lotte/lifestyle/dev, lotte/recreation/dev, lotte/science/dev, lotte/technology/dev, and lotte/writing/dev.

Dataset irds.lotte.pooled.dev.documents

datamaestro_text.datasets.irds.data.Documents

Combined version of lotte/lifestyle/dev, lotte/recreation/dev, lotte/science/dev, lotte/technology/dev, and lotte/writing/dev.

Dataset irds.lotte.pooled.dev.forum.queries

datamaestro_text.datasets.irds.data.Topics

Forum queries for lotte/pooled/dev.

Dataset irds.lotte.pooled.dev.forum.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Forum queries for lotte/pooled/dev.

Dataset irds.lotte.pooled.dev.forum

datamaestro_text.datasets.irds.data.Adhoc

Forum queries for lotte/pooled/dev.

Dataset irds.lotte.pooled.dev.search.queries

datamaestro_text.datasets.irds.data.Topics

Search queries for lotte/pooled/dev.

Dataset irds.lotte.pooled.dev.search.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Search queries for lotte/pooled/dev.

Dataset irds.lotte.pooled.dev.search

datamaestro_text.datasets.irds.data.Adhoc

Search queries for lotte/pooled/dev.

lotte/pooled/test

Combined version of lotte/lifestyle/test, lotte/recreation/test, lotte/science/test, lotte/technology/test, and lotte/writing/test.

Dataset irds.lotte.pooled.test.documents

datamaestro_text.datasets.irds.data.Documents

Combined version of lotte/lifestyle/test, lotte/recreation/test, lotte/science/test, lotte/technology/test, and lotte/writing/test.

Dataset irds.lotte.pooled.test.forum.queries

datamaestro_text.datasets.irds.data.Topics

Forum queries for lotte/pooled/test.

Dataset irds.lotte.pooled.test.forum.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Forum queries for lotte/pooled/test.

Dataset irds.lotte.pooled.test.forum

datamaestro_text.datasets.irds.data.Adhoc

Forum queries for lotte/pooled/test.

Dataset irds.lotte.pooled.test.search.queries

datamaestro_text.datasets.irds.data.Topics

Search queries for lotte/pooled/test.

Dataset irds.lotte.pooled.test.search.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Search queries for lotte/pooled/test.

Dataset irds.lotte.pooled.test.search

datamaestro_text.datasets.irds.data.Adhoc

Search queries for lotte/pooled/test.

lotte/recreation/dev

Answers from recreation-focused forums, including anime, boardgames, gaming, movies, photo, rpg, and scifi.

Dataset irds.lotte.recreation.dev.documents

datamaestro_text.datasets.irds.data.Documents

Answers from recreation-focused forums, including anime, boardgames, gaming, movies, photo, rpg, and scifi.

Dataset irds.lotte.recreation.dev.forum.queries

datamaestro_text.datasets.irds.data.Topics

Forum queries for lotte/recreation/dev.

Dataset irds.lotte.recreation.dev.forum.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Forum queries for lotte/recreation/dev.

Dataset irds.lotte.recreation.dev.forum

datamaestro_text.datasets.irds.data.Adhoc

Forum queries for lotte/recreation/dev.

Dataset irds.lotte.recreation.dev.search.queries

datamaestro_text.datasets.irds.data.Topics

Search queries for lotte/recreation/dev.

Dataset irds.lotte.recreation.dev.search.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Search queries for lotte/recreation/dev.

Dataset irds.lotte.recreation.dev.search

datamaestro_text.datasets.irds.data.Adhoc

Search queries for lotte/recreation/dev.

lotte/recreation/test

Answers from recreation-focused forums, including anime, boardgames, gaming, movies, photo, rpg, and scifi.

Dataset irds.lotte.recreation.test.documents

datamaestro_text.datasets.irds.data.Documents

Answers from recreation-focused forums, including anime, boardgames, gaming, movies, photo, rpg, and scifi.

Dataset irds.lotte.recreation.test.forum.queries

datamaestro_text.datasets.irds.data.Topics

Forum queries for lotte/recreation/test.

Dataset irds.lotte.recreation.test.forum.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Forum queries for lotte/recreation/test.

Dataset irds.lotte.recreation.test.forum

datamaestro_text.datasets.irds.data.Adhoc

Forum queries for lotte/recreation/test.

Dataset irds.lotte.recreation.test.search.queries

datamaestro_text.datasets.irds.data.Topics

Search queries for lotte/recreation/test.

Dataset irds.lotte.recreation.test.search.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Search queries for lotte/recreation/test.

Dataset irds.lotte.recreation.test.search

datamaestro_text.datasets.irds.data.Adhoc

Search queries for lotte/recreation/test.

lotte/science/dev

Answers from science-focused forums, including academia, astronomy, biology, chemistry, datasciene, earthscience, engineering, math, philosophy, physics, and stats.

Dataset irds.lotte.science.dev.documents

datamaestro_text.datasets.irds.data.Documents

Answers from science-focused forums, including academia, astronomy, biology, chemistry, datasciene, earthscience, engineering, math, philosophy, physics, and stats.

Dataset irds.lotte.science.dev.forum.queries

datamaestro_text.datasets.irds.data.Topics

Forum queries for lotte/science/dev.

Dataset irds.lotte.science.dev.forum.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Forum queries for lotte/science/dev.

Dataset irds.lotte.science.dev.forum

datamaestro_text.datasets.irds.data.Adhoc

Forum queries for lotte/science/dev.

Dataset irds.lotte.science.dev.search.queries

datamaestro_text.datasets.irds.data.Topics

Search queries for lotte/science/dev.

Dataset irds.lotte.science.dev.search.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Search queries for lotte/science/dev.

Dataset irds.lotte.science.dev.search

datamaestro_text.datasets.irds.data.Adhoc

Search queries for lotte/science/dev.

lotte/science/test

Answers from science-focused forums, including academia, astronomy, biology, chemistry, datasciene, earthscience, engineering, math, philosophy, physics, and stats.

Dataset irds.lotte.science.test.documents

datamaestro_text.datasets.irds.data.Documents

Answers from science-focused forums, including academia, astronomy, biology, chemistry, datasciene, earthscience, engineering, math, philosophy, physics, and stats.

Dataset irds.lotte.science.test.forum.queries

datamaestro_text.datasets.irds.data.Topics

Forum queries for lotte/science/test.

Dataset irds.lotte.science.test.forum.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Forum queries for lotte/science/test.

Dataset irds.lotte.science.test.forum

datamaestro_text.datasets.irds.data.Adhoc

Forum queries for lotte/science/test.

Dataset irds.lotte.science.test.search.queries

datamaestro_text.datasets.irds.data.Topics

Search queries for lotte/science/test.

Dataset irds.lotte.science.test.search.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Search queries for lotte/science/test.

Dataset irds.lotte.science.test.search

datamaestro_text.datasets.irds.data.Adhoc

Search queries for lotte/science/test.

lotte/technology/dev

Answers from technology-focused forums, including android, apple, askubuntu, electronics, networkengineering, security, serverfault, softwareengineering, superuser, unix, and webapps.

Dataset irds.lotte.technology.dev.documents

datamaestro_text.datasets.irds.data.Documents

Answers from technology-focused forums, including android, apple, askubuntu, electronics, networkengineering, security, serverfault, softwareengineering, superuser, unix, and webapps.

Dataset irds.lotte.technology.dev.forum.queries

datamaestro_text.datasets.irds.data.Topics

Forum queries for lotte/technology/dev.

Dataset irds.lotte.technology.dev.forum.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Forum queries for lotte/technology/dev.

Dataset irds.lotte.technology.dev.forum

datamaestro_text.datasets.irds.data.Adhoc

Forum queries for lotte/technology/dev.

Dataset irds.lotte.technology.dev.search.queries

datamaestro_text.datasets.irds.data.Topics

Search queries for lotte/technology/dev.

Dataset irds.lotte.technology.dev.search.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Search queries for lotte/technology/dev.

Dataset irds.lotte.technology.dev.search

datamaestro_text.datasets.irds.data.Adhoc

Search queries for lotte/technology/dev.

lotte/technology/test

Answers from technology-focused forums, including android, apple, askubuntu, electronics, networkengineering, security, serverfault, softwareengineering, superuser, unix, and webapps.

Dataset irds.lotte.technology.test.documents

datamaestro_text.datasets.irds.data.Documents

Answers from technology-focused forums, including android, apple, askubuntu, electronics, networkengineering, security, serverfault, softwareengineering, superuser, unix, and webapps.

Dataset irds.lotte.technology.test.forum.queries

datamaestro_text.datasets.irds.data.Topics

Forum queries for lotte/technology/test.

Dataset irds.lotte.technology.test.forum.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Forum queries for lotte/technology/test.

Dataset irds.lotte.technology.test.forum

datamaestro_text.datasets.irds.data.Adhoc

Forum queries for lotte/technology/test.

Dataset irds.lotte.technology.test.search.queries

datamaestro_text.datasets.irds.data.Topics

Search queries for lotte/technology/test.

Dataset irds.lotte.technology.test.search.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Search queries for lotte/technology/test.

Dataset irds.lotte.technology.test.search

datamaestro_text.datasets.irds.data.Adhoc

Search queries for lotte/technology/test.

lotte/writing/dev

Answers from writing-focused forums, including ell, english, linguistics, literature, worldbuilding, and writing.

Dataset irds.lotte.writing.dev.documents

datamaestro_text.datasets.irds.data.Documents

Answers from writing-focused forums, including ell, english, linguistics, literature, worldbuilding, and writing.

Dataset irds.lotte.writing.dev.forum.queries

datamaestro_text.datasets.irds.data.Topics

Forum queries for lotte/writing/dev.

Dataset irds.lotte.writing.dev.forum.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Forum queries for lotte/writing/dev.

Dataset irds.lotte.writing.dev.forum

datamaestro_text.datasets.irds.data.Adhoc

Forum queries for lotte/writing/dev.

Dataset irds.lotte.writing.dev.search.queries

datamaestro_text.datasets.irds.data.Topics

Search queries for lotte/writing/dev.

Dataset irds.lotte.writing.dev.search.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Search queries for lotte/writing/dev.

Dataset irds.lotte.writing.dev.search

datamaestro_text.datasets.irds.data.Adhoc

Search queries for lotte/writing/dev.

lotte/writing/test

Answers from writing-focused forums, including ell, english, linguistics, literature, worldbuilding, and writing.

Dataset irds.lotte.writing.test.documents

datamaestro_text.datasets.irds.data.Documents

Answers from writing-focused forums, including ell, english, linguistics, literature, worldbuilding, and writing.

Dataset irds.lotte.writing.test.forum.queries

datamaestro_text.datasets.irds.data.Topics

Forum queries for lotte/writing/test.

Dataset irds.lotte.writing.test.forum.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Forum queries for lotte/writing/test.

Dataset irds.lotte.writing.test.forum

datamaestro_text.datasets.irds.data.Adhoc

Forum queries for lotte/writing/test.

Dataset irds.lotte.writing.test.search.queries

datamaestro_text.datasets.irds.data.Topics

Search queries for lotte/writing/test.

Dataset irds.lotte.writing.test.search.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Search queries for lotte/writing/test.

Dataset irds.lotte.writing.test.search

datamaestro_text.datasets.irds.data.Adhoc

Search queries for lotte/writing/test.

miracl/ar

The Arabic corpus.

Dataset irds.miracl.ar.documents

datamaestro_text.datasets.irds.data.Documents

The Arabic corpus.

Dataset irds.miracl.ar.dev.queries

datamaestro_text.datasets.irds.data.Topics

The dev set for Arabic.

Dataset irds.miracl.ar.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The dev set for Arabic.

Dataset irds.miracl.ar.dev

datamaestro_text.datasets.irds.data.Adhoc

The dev set for Arabic.

Dataset irds.miracl.ar.test-a.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version a) for Arabic.

Dataset irds.miracl.ar.test-b.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version b) for Arabic.

Dataset irds.miracl.ar.train.queries

datamaestro_text.datasets.irds.data.Topics

The train set for Arabic.

Dataset irds.miracl.ar.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The train set for Arabic.

Dataset irds.miracl.ar.train

datamaestro_text.datasets.irds.data.Adhoc

The train set for Arabic.

miracl/bn

The Bengali corpus.

Dataset irds.miracl.bn.documents

datamaestro_text.datasets.irds.data.Documents

The Bengali corpus.

Dataset irds.miracl.bn.dev.queries

datamaestro_text.datasets.irds.data.Topics

The dev set for Bengali.

Dataset irds.miracl.bn.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The dev set for Bengali.

Dataset irds.miracl.bn.dev

datamaestro_text.datasets.irds.data.Adhoc

The dev set for Bengali.

Dataset irds.miracl.bn.test-a.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version a) for Bengali.

Dataset irds.miracl.bn.test-b.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version b) for Bengali.

Dataset irds.miracl.bn.train.queries

datamaestro_text.datasets.irds.data.Topics

The train set for Bengali.

Dataset irds.miracl.bn.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The train set for Bengali.

Dataset irds.miracl.bn.train

datamaestro_text.datasets.irds.data.Adhoc

The train set for Bengali.

miracl/de

The German corpus.

Dataset irds.miracl.de.documents

datamaestro_text.datasets.irds.data.Documents

The German corpus.

Dataset irds.miracl.de.dev.queries

datamaestro_text.datasets.irds.data.Topics

The dev set for German.

Dataset irds.miracl.de.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The dev set for German.

Dataset irds.miracl.de.dev

datamaestro_text.datasets.irds.data.Adhoc

The dev set for German.

Dataset irds.miracl.de.test-b.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version b) for German.

miracl/en

The English corpus.

Dataset irds.miracl.en.documents

datamaestro_text.datasets.irds.data.Documents

The English corpus.

Dataset irds.miracl.en.dev.queries

datamaestro_text.datasets.irds.data.Topics

The dev set for English.

Dataset irds.miracl.en.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The dev set for English.

Dataset irds.miracl.en.dev

datamaestro_text.datasets.irds.data.Adhoc

The dev set for English.

Dataset irds.miracl.en.test-a.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version a) for English.

Dataset irds.miracl.en.test-b.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version b) for English.

Dataset irds.miracl.en.train.queries

datamaestro_text.datasets.irds.data.Topics

The train set for English.

Dataset irds.miracl.en.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The train set for English.

Dataset irds.miracl.en.train

datamaestro_text.datasets.irds.data.Adhoc

The train set for English.

miracl/es

The Spanish corpus.

Dataset irds.miracl.es.documents

datamaestro_text.datasets.irds.data.Documents

The Spanish corpus.

Dataset irds.miracl.es.dev.queries

datamaestro_text.datasets.irds.data.Topics

The dev set for Spanish.

Dataset irds.miracl.es.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The dev set for Spanish.

Dataset irds.miracl.es.dev

datamaestro_text.datasets.irds.data.Adhoc

The dev set for Spanish.

Dataset irds.miracl.es.test-b.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version b) for Spanish.

Dataset irds.miracl.es.train.queries

datamaestro_text.datasets.irds.data.Topics

The train set for Spanish.

Dataset irds.miracl.es.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The train set for Spanish.

Dataset irds.miracl.es.train

datamaestro_text.datasets.irds.data.Adhoc

The train set for Spanish.

miracl/fa

The Persian corpus.

Dataset irds.miracl.fa.documents

datamaestro_text.datasets.irds.data.Documents

The Persian corpus.

Dataset irds.miracl.fa.dev.queries

datamaestro_text.datasets.irds.data.Topics

The dev set for Persian.

Dataset irds.miracl.fa.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The dev set for Persian.

Dataset irds.miracl.fa.dev

datamaestro_text.datasets.irds.data.Adhoc

The dev set for Persian.

Dataset irds.miracl.fa.test-b.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version b) for Persian.

Dataset irds.miracl.fa.train.queries

datamaestro_text.datasets.irds.data.Topics

The train set for Persian.

Dataset irds.miracl.fa.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The train set for Persian.

Dataset irds.miracl.fa.train

datamaestro_text.datasets.irds.data.Adhoc

The train set for Persian.

miracl/fi

The Finnish corpus.

Dataset irds.miracl.fi.documents

datamaestro_text.datasets.irds.data.Documents

The Finnish corpus.

Dataset irds.miracl.fi.dev.queries

datamaestro_text.datasets.irds.data.Topics

The dev set for Finnish.

Dataset irds.miracl.fi.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The dev set for Finnish.

Dataset irds.miracl.fi.dev

datamaestro_text.datasets.irds.data.Adhoc

The dev set for Finnish.

Dataset irds.miracl.fi.test-a.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version a) for Finnish.

Dataset irds.miracl.fi.test-b.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version b) for Finnish.

Dataset irds.miracl.fi.train.queries

datamaestro_text.datasets.irds.data.Topics

The train set for Finnish.

Dataset irds.miracl.fi.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The train set for Finnish.

Dataset irds.miracl.fi.train

datamaestro_text.datasets.irds.data.Adhoc

The train set for Finnish.

miracl/fr

The French corpus.

Dataset irds.miracl.fr.documents

datamaestro_text.datasets.irds.data.Documents

The French corpus.

Dataset irds.miracl.fr.dev.queries

datamaestro_text.datasets.irds.data.Topics

The dev set for French.

Dataset irds.miracl.fr.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The dev set for French.

Dataset irds.miracl.fr.dev

datamaestro_text.datasets.irds.data.Adhoc

The dev set for French.

Dataset irds.miracl.fr.test-b.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version b) for French.

Dataset irds.miracl.fr.train.queries

datamaestro_text.datasets.irds.data.Topics

The train set for French.

Dataset irds.miracl.fr.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The train set for French.

Dataset irds.miracl.fr.train

datamaestro_text.datasets.irds.data.Adhoc

The train set for French.

miracl/hi

The Hindi corpus.

Dataset irds.miracl.hi.documents

datamaestro_text.datasets.irds.data.Documents

The Hindi corpus.

Dataset irds.miracl.hi.dev.queries

datamaestro_text.datasets.irds.data.Topics

The dev set for Hindi.

Dataset irds.miracl.hi.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The dev set for Hindi.

Dataset irds.miracl.hi.dev

datamaestro_text.datasets.irds.data.Adhoc

The dev set for Hindi.

Dataset irds.miracl.hi.test-b.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version b) for Hindi.

Dataset irds.miracl.hi.train.queries

datamaestro_text.datasets.irds.data.Topics

The train set for Hindi.

Dataset irds.miracl.hi.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The train set for Hindi.

Dataset irds.miracl.hi.train

datamaestro_text.datasets.irds.data.Adhoc

The train set for Hindi.

miracl/id

The Indonesian corpus.

Dataset irds.miracl.id.documents

datamaestro_text.datasets.irds.data.Documents

The Indonesian corpus.

Dataset irds.miracl.id.dev.queries

datamaestro_text.datasets.irds.data.Topics

The dev set for Indonesian.

Dataset irds.miracl.id.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The dev set for Indonesian.

Dataset irds.miracl.id.dev

datamaestro_text.datasets.irds.data.Adhoc

The dev set for Indonesian.

Dataset irds.miracl.id.test-a.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version a) for Indonesian.

Dataset irds.miracl.id.test-b.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version b) for Indonesian.

Dataset irds.miracl.id.train.queries

datamaestro_text.datasets.irds.data.Topics

The train set for Indonesian.

Dataset irds.miracl.id.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The train set for Indonesian.

Dataset irds.miracl.id.train

datamaestro_text.datasets.irds.data.Adhoc

The train set for Indonesian.

miracl/ja

The Japanese corpus.

Dataset irds.miracl.ja.documents

datamaestro_text.datasets.irds.data.Documents

The Japanese corpus.

Dataset irds.miracl.ja.dev.queries

datamaestro_text.datasets.irds.data.Topics

The dev set for Japanese.

Dataset irds.miracl.ja.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The dev set for Japanese.

Dataset irds.miracl.ja.dev

datamaestro_text.datasets.irds.data.Adhoc

The dev set for Japanese.

Dataset irds.miracl.ja.test-a.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version a) for Japanese.

Dataset irds.miracl.ja.test-b.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version b) for Japanese.

Dataset irds.miracl.ja.train.queries

datamaestro_text.datasets.irds.data.Topics

The train set for Japanese.

Dataset irds.miracl.ja.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The train set for Japanese.

Dataset irds.miracl.ja.train

datamaestro_text.datasets.irds.data.Adhoc

The train set for Japanese.

miracl/ko

The Korean corpus.

Dataset irds.miracl.ko.documents

datamaestro_text.datasets.irds.data.Documents

The Korean corpus.

Dataset irds.miracl.ko.dev.queries

datamaestro_text.datasets.irds.data.Topics

The dev set for Korean.

Dataset irds.miracl.ko.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The dev set for Korean.

Dataset irds.miracl.ko.dev

datamaestro_text.datasets.irds.data.Adhoc

The dev set for Korean.

Dataset irds.miracl.ko.test-a.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version a) for Korean.

Dataset irds.miracl.ko.test-b.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version b) for Korean.

Dataset irds.miracl.ko.train.queries

datamaestro_text.datasets.irds.data.Topics

The train set for Korean.

Dataset irds.miracl.ko.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The train set for Korean.

Dataset irds.miracl.ko.train

datamaestro_text.datasets.irds.data.Adhoc

The train set for Korean.

miracl/ru

The Russian corpus.

Dataset irds.miracl.ru.documents

datamaestro_text.datasets.irds.data.Documents

The Russian corpus.

Dataset irds.miracl.ru.dev.queries

datamaestro_text.datasets.irds.data.Topics

The dev set for Russian.

Dataset irds.miracl.ru.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The dev set for Russian.

Dataset irds.miracl.ru.dev

datamaestro_text.datasets.irds.data.Adhoc

The dev set for Russian.

Dataset irds.miracl.ru.test-a.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version a) for Russian.

Dataset irds.miracl.ru.test-b.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version b) for Russian.

Dataset irds.miracl.ru.train.queries

datamaestro_text.datasets.irds.data.Topics

The train set for Russian.

Dataset irds.miracl.ru.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The train set for Russian.

Dataset irds.miracl.ru.train

datamaestro_text.datasets.irds.data.Adhoc

The train set for Russian.

miracl/sw

The Swahili corpus.

Dataset irds.miracl.sw.documents

datamaestro_text.datasets.irds.data.Documents

The Swahili corpus.

Dataset irds.miracl.sw.dev.queries

datamaestro_text.datasets.irds.data.Topics

The dev set for Swahili.

Dataset irds.miracl.sw.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The dev set for Swahili.

Dataset irds.miracl.sw.dev

datamaestro_text.datasets.irds.data.Adhoc

The dev set for Swahili.

Dataset irds.miracl.sw.test-a.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version a) for Swahili.

Dataset irds.miracl.sw.test-b.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version b) for Swahili.

Dataset irds.miracl.sw.train.queries

datamaestro_text.datasets.irds.data.Topics

The train set for Swahili.

Dataset irds.miracl.sw.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The train set for Swahili.

Dataset irds.miracl.sw.train

datamaestro_text.datasets.irds.data.Adhoc

The train set for Swahili.

miracl/te

The Telugu corpus.

Dataset irds.miracl.te.documents

datamaestro_text.datasets.irds.data.Documents

The Telugu corpus.

Dataset irds.miracl.te.dev.queries

datamaestro_text.datasets.irds.data.Topics

The dev set for Telugu.

Dataset irds.miracl.te.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The dev set for Telugu.

Dataset irds.miracl.te.dev

datamaestro_text.datasets.irds.data.Adhoc

The dev set for Telugu.

Dataset irds.miracl.te.test-a.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version a) for Telugu.

Dataset irds.miracl.te.test-b.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version b) for Telugu.

Dataset irds.miracl.te.train.queries

datamaestro_text.datasets.irds.data.Topics

The train set for Telugu.

Dataset irds.miracl.te.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The train set for Telugu.

Dataset irds.miracl.te.train

datamaestro_text.datasets.irds.data.Adhoc

The train set for Telugu.

miracl/th

The Thai corpus.

Dataset irds.miracl.th.documents

datamaestro_text.datasets.irds.data.Documents

The Thai corpus.

Dataset irds.miracl.th.dev.queries

datamaestro_text.datasets.irds.data.Topics

The dev set for Thai.

Dataset irds.miracl.th.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The dev set for Thai.

Dataset irds.miracl.th.dev

datamaestro_text.datasets.irds.data.Adhoc

The dev set for Thai.

Dataset irds.miracl.th.test-a.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version a) for Thai.

Dataset irds.miracl.th.test-b.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version b) for Thai.

Dataset irds.miracl.th.train.queries

datamaestro_text.datasets.irds.data.Topics

The train set for Thai.

Dataset irds.miracl.th.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The train set for Thai.

Dataset irds.miracl.th.train

datamaestro_text.datasets.irds.data.Adhoc

The train set for Thai.

miracl/yo

The Yoruba corpus.

Dataset irds.miracl.yo.documents

datamaestro_text.datasets.irds.data.Documents

The Yoruba corpus.

Dataset irds.miracl.yo.dev.queries

datamaestro_text.datasets.irds.data.Topics

The dev set for Yoruba.

Dataset irds.miracl.yo.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The dev set for Yoruba.

Dataset irds.miracl.yo.dev

datamaestro_text.datasets.irds.data.Adhoc

The dev set for Yoruba.

Dataset irds.miracl.yo.test-b.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version b) for Yoruba.

miracl/zh

The Chinese corpus.

Dataset irds.miracl.zh.documents

datamaestro_text.datasets.irds.data.Documents

The Chinese corpus.

Dataset irds.miracl.zh.dev.queries

datamaestro_text.datasets.irds.data.Topics

The dev set for Chinese.

Dataset irds.miracl.zh.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The dev set for Chinese.

Dataset irds.miracl.zh.dev

datamaestro_text.datasets.irds.data.Adhoc

The dev set for Chinese.

Dataset irds.miracl.zh.test-b.queries

datamaestro_text.datasets.irds.data.Topics

The held-out test set (version b) for Chinese.

Dataset irds.miracl.zh.train.queries

datamaestro_text.datasets.irds.data.Topics

The train set for Chinese.

Dataset irds.miracl.zh.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The train set for Chinese.

Dataset irds.miracl.zh.train

datamaestro_text.datasets.irds.data.Adhoc

The train set for Chinese.

MSMARCO (passage)

A passage ranking benchmark with a collection of 8.8 million passages and question queries. Most relevance judgments are shallow (typically at most 1-2 per query), but the TREC Deep Learning track adds deep judgments. Evaluation typically conducted using MRR@10.

Note that the original document source files for this collection contain a double-encoding error that cause strange sequences like "å¬" and "ðºð". These are automatically corrrected (properly converting previous examples to "公" and "🇺🇸").

Dataset irds.msmarco-passage.documents

datamaestro_text.datasets.irds.data.Documents

A passage ranking benchmark with a collection of 8.8 million passages and question queries. Most relevance judgments are shallow (typically at most 1-2 per query), but the TREC Deep Learning track adds deep judgments. Evaluation typically conducted using MRR@10.

Note that the original document source files for this collection contain a double-encoding error that cause strange sequences like "å¬" and "ðºð". These are automatically corrrected (properly converting previous examples to "公" and "🇺🇸").

Dataset irds.msmarco-passage.dev.queries

datamaestro_text.datasets.irds.data.Topics

Official dev set.

scoreddocs are the top 1000 results from BM25. These are used for the "re-ranking" setting. Note that these are sub-sampled to about 1/8 of the total available dev queries by the MSMARCO authors for faster evaluation. The BM25 scores from scoreddocs are not available (all have a score of 0).

Dataset irds.msmarco-passage.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official dev set.

scoreddocs are the top 1000 results from BM25. These are used for the "re-ranking" setting. Note that these are sub-sampled to about 1/8 of the total available dev queries by the MSMARCO authors for faster evaluation. The BM25 scores from scoreddocs are not available (all have a score of 0).

Dataset irds.msmarco-passage.dev

datamaestro_text.datasets.irds.data.Adhoc

Official dev set.

scoreddocs are the top 1000 results from BM25. These are used for the "re-ranking" setting. Note that these are sub-sampled to about 1/8 of the total available dev queries by the MSMARCO authors for faster evaluation. The BM25 scores from scoreddocs are not available (all have a score of 0).

Dataset irds.msmarco-passage.dev.2.queries

datamaestro_text.datasets.irds.data.Topics

"Dev2" split of the msmarco-passage/dev set. Originally released as part of the v2 corpus.

Dataset irds.msmarco-passage.dev.2.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

"Dev2" split of the msmarco-passage/dev set. Originally released as part of the v2 corpus.

Dataset irds.msmarco-passage.dev.2

datamaestro_text.datasets.irds.data.Adhoc

"Dev2" split of the msmarco-passage/dev set. Originally released as part of the v2 corpus.

Dataset irds.msmarco-passage.dev.judged.queries

datamaestro_text.datasets.irds.data.Topics

Subset of msmarco-passage/dev that only includes queries that have at least one qrel.

Dataset irds.msmarco-passage.dev.judged.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Subset of msmarco-passage/dev that only includes queries that have at least one qrel.

Dataset irds.msmarco-passage.dev.judged

datamaestro_text.datasets.irds.data.Adhoc

Subset of msmarco-passage/dev that only includes queries that have at least one qrel.

Dataset irds.msmarco-passage.dev.small.queries

datamaestro_text.datasets.irds.data.Topics

Official "small" version of the dev set, consisting of 6,980 queries (6.9% of the full dev set).

Dataset irds.msmarco-passage.dev.small.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Official "small" version of the dev set, consisting of 6,980 queries (6.9% of the full dev set).

Dataset irds.msmarco-passage.dev.small.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official "small" version of the dev set, consisting of 6,980 queries (6.9% of the full dev set).

Dataset irds.msmarco-passage.dev.small

datamaestro_text.datasets.irds.data.Adhoc

Official "small" version of the dev set, consisting of 6,980 queries (6.9% of the full dev set).

Dataset irds.msmarco-passage.eval.queries

datamaestro_text.datasets.irds.data.Topics

Official eval set for submission to MS MARCO leaderboard. Relevance judgments are hidden.

scoreddocs are the top 1000 results from BM25. These are used for the "re-ranking" setting. Note that these are sub-sampled to about 1/8 of the total available eval queries by the MSMARCO authors for faster evaluation. The BM25 scores from scoreddocs are not available (all have a score of 0).

Dataset irds.msmarco-passage.eval.small.queries

datamaestro_text.datasets.irds.data.Topics

Official "small" version of the eval set, consisting of 6,837 queries (6.8% of the full eval set).

Dataset irds.msmarco-passage.eval.small.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Official "small" version of the eval set, consisting of 6,837 queries (6.8% of the full eval set).

Dataset irds.msmarco-passage.train.queries

datamaestro_text.datasets.irds.data.Topics

Official train set.

Not all queries have relevance judgments. Use msmarco-passage/train/judged for a filtered list that only includes documents that have at least one qrel.

scoreddocs are the top 1000 results from BM25. These are used for the "re-ranking" setting. Note that these are sub-sampled to about 1/8 of the total available train queries by the MSMARCO authors for faster evaluation. The BM25 scores from scoreddocs are not available (all have a score of 0).

docpairs provides access to the "official" sequence for pairwise training.

Dataset irds.msmarco-passage.train.docpairs

Official train set.

Not all queries have relevance judgments. Use msmarco-passage/train/judged for a filtered list that only includes documents that have at least one qrel.

scoreddocs are the top 1000 results from BM25. These are used for the "re-ranking" setting. Note that these are sub-sampled to about 1/8 of the total available train queries by the MSMARCO authors for faster evaluation. The BM25 scores from scoreddocs are not available (all have a score of 0).

docpairs provides access to the "official" sequence for pairwise training.

Dataset irds.msmarco-passage.train.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Official train set.

Not all queries have relevance judgments. Use msmarco-passage/train/judged for a filtered list that only includes documents that have at least one qrel.

scoreddocs are the top 1000 results from BM25. These are used for the "re-ranking" setting. Note that these are sub-sampled to about 1/8 of the total available train queries by the MSMARCO authors for faster evaluation. The BM25 scores from scoreddocs are not available (all have a score of 0).

docpairs provides access to the "official" sequence for pairwise training.

Dataset irds.msmarco-passage.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official train set.

Not all queries have relevance judgments. Use msmarco-passage/train/judged for a filtered list that only includes documents that have at least one qrel.

scoreddocs are the top 1000 results from BM25. These are used for the "re-ranking" setting. Note that these are sub-sampled to about 1/8 of the total available train queries by the MSMARCO authors for faster evaluation. The BM25 scores from scoreddocs are not available (all have a score of 0).

docpairs provides access to the "official" sequence for pairwise training.

Dataset irds.msmarco-passage.train

datamaestro_text.datasets.irds.data.Adhoc

Official train set.

Not all queries have relevance judgments. Use msmarco-passage/train/judged for a filtered list that only includes documents that have at least one qrel.

scoreddocs are the top 1000 results from BM25. These are used for the "re-ranking" setting. Note that these are sub-sampled to about 1/8 of the total available train queries by the MSMARCO authors for faster evaluation. The BM25 scores from scoreddocs are not available (all have a score of 0).

docpairs provides access to the "official" sequence for pairwise training.

Dataset irds.msmarco-passage.train.judged.queries

datamaestro_text.datasets.irds.data.Topics

Subset of msmarco-passage/train that only includes queries that have at least one qrel.

Dataset irds.msmarco-passage.train.judged.docpairs

Subset of msmarco-passage/train that only includes queries that have at least one qrel.

Dataset irds.msmarco-passage.train.judged.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Subset of msmarco-passage/train that only includes queries that have at least one qrel.

Dataset irds.msmarco-passage.train.judged.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Subset of msmarco-passage/train that only includes queries that have at least one qrel.

Dataset irds.msmarco-passage.train.judged

datamaestro_text.datasets.irds.data.Adhoc

Subset of msmarco-passage/train that only includes queries that have at least one qrel.

Dataset irds.msmarco-passage.train.medical.queries

datamaestro_text.datasets.irds.data.Topics

Subset of msmarco-passage/train that only includes queries that have a layman or expert medical term. Note that this includes about 20% false matches due to terms with multiple senses.

Dataset irds.msmarco-passage.train.medical.docpairs

Subset of msmarco-passage/train that only includes queries that have a layman or expert medical term. Note that this includes about 20% false matches due to terms with multiple senses.

Dataset irds.msmarco-passage.train.medical.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Subset of msmarco-passage/train that only includes queries that have a layman or expert medical term. Note that this includes about 20% false matches due to terms with multiple senses.

Dataset irds.msmarco-passage.train.medical.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Subset of msmarco-passage/train that only includes queries that have a layman or expert medical term. Note that this includes about 20% false matches due to terms with multiple senses.

Dataset irds.msmarco-passage.train.medical

datamaestro_text.datasets.irds.data.Adhoc

Subset of msmarco-passage/train that only includes queries that have a layman or expert medical term. Note that this includes about 20% false matches due to terms with multiple senses.

Dataset irds.msmarco-passage.train.split200-train.queries

datamaestro_text.datasets.irds.data.Topics

Subset of msmarco-passage/train without 200 queries that are meant to be used as a small validation set. From various works.

Dataset irds.msmarco-passage.train.split200-train.docpairs

Subset of msmarco-passage/train without 200 queries that are meant to be used as a small validation set. From various works.

Dataset irds.msmarco-passage.train.split200-train.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Subset of msmarco-passage/train without 200 queries that are meant to be used as a small validation set. From various works.

Dataset irds.msmarco-passage.train.split200-train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Subset of msmarco-passage/train without 200 queries that are meant to be used as a small validation set. From various works.

Dataset irds.msmarco-passage.train.split200-train

datamaestro_text.datasets.irds.data.Adhoc

Subset of msmarco-passage/train without 200 queries that are meant to be used as a small validation set. From various works.

Dataset irds.msmarco-passage.train.split200-valid.queries

datamaestro_text.datasets.irds.data.Topics

Subset of msmarco-passage/train with only 200 queries that are meant to be used as a small validation set. From various works.

Dataset irds.msmarco-passage.train.split200-valid.docpairs

Subset of msmarco-passage/train with only 200 queries that are meant to be used as a small validation set. From various works.

Dataset irds.msmarco-passage.train.split200-valid.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Subset of msmarco-passage/train with only 200 queries that are meant to be used as a small validation set. From various works.

Dataset irds.msmarco-passage.train.split200-valid.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Subset of msmarco-passage/train with only 200 queries that are meant to be used as a small validation set. From various works.

Dataset irds.msmarco-passage.train.split200-valid

datamaestro_text.datasets.irds.data.Adhoc

Subset of msmarco-passage/train with only 200 queries that are meant to be used as a small validation set. From various works.

Dataset irds.msmarco-passage.train.triples-small.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/train, but with the "small" triples file (a 10% sample of the full file).

Note that to save on storage space (27GB), the contents of the file are mapped to their corresponding query and document IDs. This process takes a few minutes to run the first time the triples are requested.

Dataset irds.msmarco-passage.train.triples-small.docpairs

Version of msmarco-passage/train, but with the "small" triples file (a 10% sample of the full file).

Note that to save on storage space (27GB), the contents of the file are mapped to their corresponding query and document IDs. This process takes a few minutes to run the first time the triples are requested.

Dataset irds.msmarco-passage.train.triples-small.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Version of msmarco-passage/train, but with the "small" triples file (a 10% sample of the full file).

Note that to save on storage space (27GB), the contents of the file are mapped to their corresponding query and document IDs. This process takes a few minutes to run the first time the triples are requested.

Dataset irds.msmarco-passage.train.triples-small.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/train, but with the "small" triples file (a 10% sample of the full file).

Note that to save on storage space (27GB), the contents of the file are mapped to their corresponding query and document IDs. This process takes a few minutes to run the first time the triples are requested.

Dataset irds.msmarco-passage.train.triples-small

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/train, but with the "small" triples file (a 10% sample of the full file).

Note that to save on storage space (27GB), the contents of the file are mapped to their corresponding query and document IDs. This process takes a few minutes to run the first time the triples are requested.

Dataset irds.msmarco-passage.train.triples-v2.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/train, but with version 2 of the triples file.

This version of the triples file includes rows that were accidently missing from version 1 of the file (see discussion here).

Note that this is sorted by the IDs in the file, so you probably would not want to use it unless you first shuffle it before usage. We opened an issue suggesting that a third version of the file is provided that is shuffled so that the order is consistent across groups using the data, but at this time, no such file exists in an official capacity.

Dataset irds.msmarco-passage.train.triples-v2.docpairs

Version of msmarco-passage/train, but with version 2 of the triples file.

This version of the triples file includes rows that were accidently missing from version 1 of the file (see discussion here).

Note that this is sorted by the IDs in the file, so you probably would not want to use it unless you first shuffle it before usage. We opened an issue suggesting that a third version of the file is provided that is shuffled so that the order is consistent across groups using the data, but at this time, no such file exists in an official capacity.

Dataset irds.msmarco-passage.train.triples-v2.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Version of msmarco-passage/train, but with version 2 of the triples file.

This version of the triples file includes rows that were accidently missing from version 1 of the file (see discussion here).

Note that this is sorted by the IDs in the file, so you probably would not want to use it unless you first shuffle it before usage. We opened an issue suggesting that a third version of the file is provided that is shuffled so that the order is consistent across groups using the data, but at this time, no such file exists in an official capacity.

Dataset irds.msmarco-passage.train.triples-v2.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/train, but with version 2 of the triples file.

This version of the triples file includes rows that were accidently missing from version 1 of the file (see discussion here).

Note that this is sorted by the IDs in the file, so you probably would not want to use it unless you first shuffle it before usage. We opened an issue suggesting that a third version of the file is provided that is shuffled so that the order is consistent across groups using the data, but at this time, no such file exists in an official capacity.

Dataset irds.msmarco-passage.train.triples-v2

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/train, but with version 2 of the triples file.

This version of the triples file includes rows that were accidently missing from version 1 of the file (see discussion here).

Note that this is sorted by the IDs in the file, so you probably would not want to use it unless you first shuffle it before usage. We opened an issue suggesting that a third version of the file is provided that is shuffled so that the order is consistent across groups using the data, but at this time, no such file exists in an official capacity.

Dataset irds.msmarco-passage.trec-dl-2019.queries

datamaestro_text.datasets.irds.data.Topics

Queries from the TREC Deep Learning (DL) 2019 shared task, which were sampled from msmarco-passage/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-passage/trec-dl-2019/judged).

Dataset irds.msmarco-passage.trec-dl-2019.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Queries from the TREC Deep Learning (DL) 2019 shared task, which were sampled from msmarco-passage/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-passage/trec-dl-2019/judged).

Dataset irds.msmarco-passage.trec-dl-2019.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Queries from the TREC Deep Learning (DL) 2019 shared task, which were sampled from msmarco-passage/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-passage/trec-dl-2019/judged).

Dataset irds.msmarco-passage.trec-dl-2019

datamaestro_text.datasets.irds.data.Adhoc

Queries from the TREC Deep Learning (DL) 2019 shared task, which were sampled from msmarco-passage/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-passage/trec-dl-2019/judged).

Dataset irds.msmarco-passage.trec-dl-2019.judged.queries

datamaestro_text.datasets.irds.data.Topics

Subset of msmarco-passage/trec-dl-2019, only including queries with qrels.

Dataset irds.msmarco-passage.trec-dl-2019.judged.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Subset of msmarco-passage/trec-dl-2019, only including queries with qrels.

Dataset irds.msmarco-passage.trec-dl-2019.judged.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Subset of msmarco-passage/trec-dl-2019, only including queries with qrels.

Dataset irds.msmarco-passage.trec-dl-2019.judged

datamaestro_text.datasets.irds.data.Adhoc

Subset of msmarco-passage/trec-dl-2019, only including queries with qrels.

Dataset irds.msmarco-passage.trec-dl-2020.queries

datamaestro_text.datasets.irds.data.Topics

Queries from the TREC Deep Learning (DL) 2020 shared task, which were sampled from msmarco-passage/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-passage/trec-dl-2020/judged).

Dataset irds.msmarco-passage.trec-dl-2020.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Queries from the TREC Deep Learning (DL) 2020 shared task, which were sampled from msmarco-passage/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-passage/trec-dl-2020/judged).

Dataset irds.msmarco-passage.trec-dl-2020.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Queries from the TREC Deep Learning (DL) 2020 shared task, which were sampled from msmarco-passage/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-passage/trec-dl-2020/judged).

Dataset irds.msmarco-passage.trec-dl-2020

datamaestro_text.datasets.irds.data.Adhoc

Queries from the TREC Deep Learning (DL) 2020 shared task, which were sampled from msmarco-passage/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-passage/trec-dl-2020/judged).

Dataset irds.msmarco-passage.trec-dl-2020.judged.queries

datamaestro_text.datasets.irds.data.Topics

Subset of msmarco-passage/trec-dl-2020, only including queries with qrels.

Dataset irds.msmarco-passage.trec-dl-2020.judged.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Subset of msmarco-passage/trec-dl-2020, only including queries with qrels.

Dataset irds.msmarco-passage.trec-dl-2020.judged.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Subset of msmarco-passage/trec-dl-2020, only including queries with qrels.

Dataset irds.msmarco-passage.trec-dl-2020.judged

datamaestro_text.datasets.irds.data.Adhoc

Subset of msmarco-passage/trec-dl-2020, only including queries with qrels.

Dataset irds.msmarco-passage.trec-dl-hard.queries

datamaestro_text.datasets.irds.data.Topics

A more challenging subset of msmarco-passage/trec-dl-2019 and msmarco-document/trec-dl-2020.

Dataset irds.msmarco-passage.trec-dl-hard.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A more challenging subset of msmarco-passage/trec-dl-2019 and msmarco-document/trec-dl-2020.

Dataset irds.msmarco-passage.trec-dl-hard

datamaestro_text.datasets.irds.data.Adhoc

A more challenging subset of msmarco-passage/trec-dl-2019 and msmarco-document/trec-dl-2020.

Dataset irds.msmarco-passage.trec-dl-hard.fold1.queries

datamaestro_text.datasets.irds.data.Topics

Fold 1 of msmarco-passage/trec-dl-hard

Dataset irds.msmarco-passage.trec-dl-hard.fold1.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Fold 1 of msmarco-passage/trec-dl-hard

Dataset irds.msmarco-passage.trec-dl-hard.fold1

datamaestro_text.datasets.irds.data.Adhoc

Fold 1 of msmarco-passage/trec-dl-hard

Dataset irds.msmarco-passage.trec-dl-hard.fold2.queries

datamaestro_text.datasets.irds.data.Topics

Fold 2 of msmarco-passage/trec-dl-hard

Dataset irds.msmarco-passage.trec-dl-hard.fold2.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Fold 2 of msmarco-passage/trec-dl-hard

Dataset irds.msmarco-passage.trec-dl-hard.fold2

datamaestro_text.datasets.irds.data.Adhoc

Fold 2 of msmarco-passage/trec-dl-hard

Dataset irds.msmarco-passage.trec-dl-hard.fold3.queries

datamaestro_text.datasets.irds.data.Topics

Fold 3 of msmarco-passage/trec-dl-hard

Dataset irds.msmarco-passage.trec-dl-hard.fold3.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Fold 3 of msmarco-passage/trec-dl-hard

Dataset irds.msmarco-passage.trec-dl-hard.fold3

datamaestro_text.datasets.irds.data.Adhoc

Fold 3 of msmarco-passage/trec-dl-hard

Dataset irds.msmarco-passage.trec-dl-hard.fold4.queries

datamaestro_text.datasets.irds.data.Topics

Fold 4 of msmarco-passage/trec-dl-hard

Dataset irds.msmarco-passage.trec-dl-hard.fold4.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Fold 4 of msmarco-passage/trec-dl-hard

Dataset irds.msmarco-passage.trec-dl-hard.fold4

datamaestro_text.datasets.irds.data.Adhoc

Fold 4 of msmarco-passage/trec-dl-hard

Dataset irds.msmarco-passage.trec-dl-hard.fold5.queries

datamaestro_text.datasets.irds.data.Topics

Fold 5 of msmarco-passage/trec-dl-hard

Dataset irds.msmarco-passage.trec-dl-hard.fold5.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Fold 5 of msmarco-passage/trec-dl-hard

Dataset irds.msmarco-passage.trec-dl-hard.fold5

datamaestro_text.datasets.irds.data.Adhoc

Fold 5 of msmarco-passage/trec-dl-hard

mmarco/de

Version of msmarco-passage, with documents translated into German.

Dataset irds.mmarco.de.documents

datamaestro_text.datasets.irds.data.Documents

Version of msmarco-passage, with documents translated into German.

Dataset irds.mmarco.de.dev.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev, with queries and documents translated into German.

Dataset irds.mmarco.de.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev, with queries and documents translated into German.

Dataset irds.mmarco.de.dev

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev, with queries and documents translated into German.

Dataset irds.mmarco.de.dev.small.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev/small, with queries and documents translated into German.

Dataset irds.mmarco.de.dev.small.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Version of msmarco-passage/dev/small, with queries and documents translated into German.

Dataset irds.mmarco.de.dev.small.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev/small, with queries and documents translated into German.

Dataset irds.mmarco.de.dev.small

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev/small, with queries and documents translated into German.

Dataset irds.mmarco.de.train.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/train, with queries and documents translated into German.

Dataset irds.mmarco.de.train.docpairs

Version of msmarco-passage/train, with queries and documents translated into German.

Dataset irds.mmarco.de.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/train, with queries and documents translated into German.

Dataset irds.mmarco.de.train

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/train, with queries and documents translated into German.

mmarco/es

Version of msmarco-passage, with documents translated into Spanish.

Dataset irds.mmarco.es.documents

datamaestro_text.datasets.irds.data.Documents

Version of msmarco-passage, with documents translated into Spanish.

Dataset irds.mmarco.es.dev.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev, with queries and documents translated into Spanish.

Dataset irds.mmarco.es.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev, with queries and documents translated into Spanish.

Dataset irds.mmarco.es.dev

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev, with queries and documents translated into Spanish.

Dataset irds.mmarco.es.dev.small.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev/small, with queries and documents translated into Spanish.

Dataset irds.mmarco.es.dev.small.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Version of msmarco-passage/dev/small, with queries and documents translated into Spanish.

Dataset irds.mmarco.es.dev.small.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev/small, with queries and documents translated into Spanish.

Dataset irds.mmarco.es.dev.small

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev/small, with queries and documents translated into Spanish.

Dataset irds.mmarco.es.train.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/train, with queries and documents translated into Spanish.

Dataset irds.mmarco.es.train.docpairs

Version of msmarco-passage/train, with queries and documents translated into Spanish.

Dataset irds.mmarco.es.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/train, with queries and documents translated into Spanish.

Dataset irds.mmarco.es.train

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/train, with queries and documents translated into Spanish.

mmarco/fr

Version of msmarco-passage, with documents translated into French.

Dataset irds.mmarco.fr.documents

datamaestro_text.datasets.irds.data.Documents

Version of msmarco-passage, with documents translated into French.

Dataset irds.mmarco.fr.dev.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev, with queries and documents translated into French.

Dataset irds.mmarco.fr.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev, with queries and documents translated into French.

Dataset irds.mmarco.fr.dev

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev, with queries and documents translated into French.

Dataset irds.mmarco.fr.dev.small.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev/small, with queries and documents translated into French.

Dataset irds.mmarco.fr.dev.small.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Version of msmarco-passage/dev/small, with queries and documents translated into French.

Dataset irds.mmarco.fr.dev.small.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev/small, with queries and documents translated into French.

Dataset irds.mmarco.fr.dev.small

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev/small, with queries and documents translated into French.

Dataset irds.mmarco.fr.train.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/train, with queries and documents translated into French.

Dataset irds.mmarco.fr.train.docpairs

Version of msmarco-passage/train, with queries and documents translated into French.

Dataset irds.mmarco.fr.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/train, with queries and documents translated into French.

Dataset irds.mmarco.fr.train

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/train, with queries and documents translated into French.

mmarco/id

Version of msmarco-passage, with documents translated into Indonesian.

Dataset irds.mmarco.id.documents

datamaestro_text.datasets.irds.data.Documents

Version of msmarco-passage, with documents translated into Indonesian.

Dataset irds.mmarco.id.dev.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev, with queries and documents translated into Indonesian.

Dataset irds.mmarco.id.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev, with queries and documents translated into Indonesian.

Dataset irds.mmarco.id.dev

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev, with queries and documents translated into Indonesian.

Dataset irds.mmarco.id.dev.small.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev/small, with queries and documents translated into Indonesian.

Dataset irds.mmarco.id.dev.small.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Version of msmarco-passage/dev/small, with queries and documents translated into Indonesian.

Dataset irds.mmarco.id.dev.small.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev/small, with queries and documents translated into Indonesian.

Dataset irds.mmarco.id.dev.small

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev/small, with queries and documents translated into Indonesian.

Dataset irds.mmarco.id.train.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/train, with queries and documents translated into Indonesian.

Dataset irds.mmarco.id.train.docpairs

Version of msmarco-passage/train, with queries and documents translated into Indonesian.

Dataset irds.mmarco.id.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/train, with queries and documents translated into Indonesian.

Dataset irds.mmarco.id.train

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/train, with queries and documents translated into Indonesian.

mmarco/it

Version of msmarco-passage, with documents translated into Italian.

Dataset irds.mmarco.it.documents

datamaestro_text.datasets.irds.data.Documents

Version of msmarco-passage, with documents translated into Italian.

Dataset irds.mmarco.it.dev.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev, with queries and documents translated into Italian.

Dataset irds.mmarco.it.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev, with queries and documents translated into Italian.

Dataset irds.mmarco.it.dev

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev, with queries and documents translated into Italian.

Dataset irds.mmarco.it.dev.small.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev/small, with queries and documents translated into Italian.

Dataset irds.mmarco.it.dev.small.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Version of msmarco-passage/dev/small, with queries and documents translated into Italian.

Dataset irds.mmarco.it.dev.small.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev/small, with queries and documents translated into Italian.

Dataset irds.mmarco.it.dev.small

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev/small, with queries and documents translated into Italian.

Dataset irds.mmarco.it.train.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/train, with queries and documents translated into Italian.

Dataset irds.mmarco.it.train.docpairs

Version of msmarco-passage/train, with queries and documents translated into Italian.

Dataset irds.mmarco.it.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/train, with queries and documents translated into Italian.

Dataset irds.mmarco.it.train

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/train, with queries and documents translated into Italian.

mmarco/pt

Version of msmarco-passage, with documents translated into Portuguese.

Dataset irds.mmarco.pt.documents

datamaestro_text.datasets.irds.data.Documents

Version of msmarco-passage, with documents translated into Portuguese.

Dataset irds.mmarco.pt.dev.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev, with queries and documents translated into Portuguese.

Dataset irds.mmarco.pt.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev, with queries and documents translated into Portuguese.

Dataset irds.mmarco.pt.dev

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev, with queries and documents translated into Portuguese.

Dataset irds.mmarco.pt.dev.small.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev/small, with queries and documents translated into Portuguese.

Dataset irds.mmarco.pt.dev.small.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev/small, with queries and documents translated into Portuguese.

Dataset irds.mmarco.pt.dev.small

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev/small, with queries and documents translated into Portuguese.

Dataset irds.mmarco.pt.dev.small.v1.1.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev, with queries and documents translated into Portuguese.

Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.

Dataset irds.mmarco.pt.dev.small.v1.1.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Version of msmarco-passage/dev, with queries and documents translated into Portuguese.

Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.

Dataset irds.mmarco.pt.dev.small.v1.1.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev, with queries and documents translated into Portuguese.

Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.

Dataset irds.mmarco.pt.dev.small.v1.1

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev, with queries and documents translated into Portuguese.

Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.

Dataset irds.mmarco.pt.dev.v1.1.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev, with queries and documents translated into Portuguese.

Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.

Dataset irds.mmarco.pt.dev.v1.1.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev, with queries and documents translated into Portuguese.

Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.

Dataset irds.mmarco.pt.dev.v1.1

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev, with queries and documents translated into Portuguese.

Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.

Dataset irds.mmarco.pt.train.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/train, with queries and documents translated into Portuguese.

Dataset irds.mmarco.pt.train.docpairs

Version of msmarco-passage/train, with queries and documents translated into Portuguese.

Dataset irds.mmarco.pt.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/train, with queries and documents translated into Portuguese.

Dataset irds.mmarco.pt.train

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/train, with queries and documents translated into Portuguese.

Dataset irds.mmarco.pt.train.v1.1.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/train, with queries and documents translated into Portuguese.

Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.

Dataset irds.mmarco.pt.train.v1.1.docpairs

Version of msmarco-passage/train, with queries and documents translated into Portuguese.

Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.

Dataset irds.mmarco.pt.train.v1.1.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/train, with queries and documents translated into Portuguese.

Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.

Dataset irds.mmarco.pt.train.v1.1

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/train, with queries and documents translated into Portuguese.

Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here. It also removes some duplicated query IDs.

mmarco/ru

Version of msmarco-passage, with documents translated into Russian.

Dataset irds.mmarco.ru.documents

datamaestro_text.datasets.irds.data.Documents

Version of msmarco-passage, with documents translated into Russian.

Dataset irds.mmarco.ru.dev.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev, with queries and documents translated into Russian.

Dataset irds.mmarco.ru.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev, with queries and documents translated into Russian.

Dataset irds.mmarco.ru.dev

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev, with queries and documents translated into Russian.

Dataset irds.mmarco.ru.dev.small.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev/small, with queries and documents translated into Russian.

Dataset irds.mmarco.ru.dev.small.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Version of msmarco-passage/dev/small, with queries and documents translated into Russian.

Dataset irds.mmarco.ru.dev.small.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev/small, with queries and documents translated into Russian.

Dataset irds.mmarco.ru.dev.small

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev/small, with queries and documents translated into Russian.

Dataset irds.mmarco.ru.train.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/train, with queries and documents translated into Russian.

Dataset irds.mmarco.ru.train.docpairs

Version of msmarco-passage/train, with queries and documents translated into Russian.

Dataset irds.mmarco.ru.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/train, with queries and documents translated into Russian.

Dataset irds.mmarco.ru.train

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/train, with queries and documents translated into Russian.

mmarco/v2/ar

Version of msmarco-passage, with queries and documents translated into Arabic.

Dataset irds.mmarco.v2.ar.documents

datamaestro_text.datasets.irds.data.Documents

Version of msmarco-passage, with queries and documents translated into Arabic.

Dataset irds.mmarco.v2.ar.dev.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev, with queries and documents translated into Arabic.

Dataset irds.mmarco.v2.ar.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev, with queries and documents translated into Arabic.

Dataset irds.mmarco.v2.ar.dev

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev, with queries and documents translated into Arabic.

Dataset irds.mmarco.v2.ar.dev.small.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev/small, with queries and documents translated into Arabic.

Dataset irds.mmarco.v2.ar.dev.small.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Version of msmarco-passage/dev/small, with queries and documents translated into Arabic.

Dataset irds.mmarco.v2.ar.dev.small.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev/small, with queries and documents translated into Arabic.

Dataset irds.mmarco.v2.ar.dev.small

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev/small, with queries and documents translated into Arabic.

Dataset irds.mmarco.v2.ar.train.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/train, with queries and documents translated into Arabic.

Dataset irds.mmarco.v2.ar.train.docpairs

Version of msmarco-passage/train, with queries and documents translated into Arabic.

Dataset irds.mmarco.v2.ar.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/train, with queries and documents translated into Arabic.

Dataset irds.mmarco.v2.ar.train

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/train, with queries and documents translated into Arabic.

mmarco/v2/de

Version of msmarco-passage, with queries and documents translated into German.

Dataset irds.mmarco.v2.de.documents

datamaestro_text.datasets.irds.data.Documents

Version of msmarco-passage, with queries and documents translated into German.

Dataset irds.mmarco.v2.de.dev.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev, with queries and documents translated into German.

Dataset irds.mmarco.v2.de.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev, with queries and documents translated into German.

Dataset irds.mmarco.v2.de.dev

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev, with queries and documents translated into German.

Dataset irds.mmarco.v2.de.dev.small.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev/small, with queries and documents translated into German.

Dataset irds.mmarco.v2.de.dev.small.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Version of msmarco-passage/dev/small, with queries and documents translated into German.

Dataset irds.mmarco.v2.de.dev.small.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev/small, with queries and documents translated into German.

Dataset irds.mmarco.v2.de.dev.small

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev/small, with queries and documents translated into German.

Dataset irds.mmarco.v2.de.train.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/train, with queries and documents translated into German.

Dataset irds.mmarco.v2.de.train.docpairs

Version of msmarco-passage/train, with queries and documents translated into German.

Dataset irds.mmarco.v2.de.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/train, with queries and documents translated into German.

Dataset irds.mmarco.v2.de.train

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/train, with queries and documents translated into German.

mmarco/v2/dt

Version of msmarco-passage, with queries and documents translated into Dutch.

Dataset irds.mmarco.v2.dt.documents

datamaestro_text.datasets.irds.data.Documents

Version of msmarco-passage, with queries and documents translated into Dutch.

Dataset irds.mmarco.v2.dt.dev.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev, with queries and documents translated into Dutch.

Dataset irds.mmarco.v2.dt.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev, with queries and documents translated into Dutch.

Dataset irds.mmarco.v2.dt.dev

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev, with queries and documents translated into Dutch.

Dataset irds.mmarco.v2.dt.dev.small.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev/small, with queries and documents translated into Dutch.

Dataset irds.mmarco.v2.dt.dev.small.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Version of msmarco-passage/dev/small, with queries and documents translated into Dutch.

Dataset irds.mmarco.v2.dt.dev.small.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev/small, with queries and documents translated into Dutch.

Dataset irds.mmarco.v2.dt.dev.small

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev/small, with queries and documents translated into Dutch.

Dataset irds.mmarco.v2.dt.train.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/train, with queries and documents translated into Dutch.

Dataset irds.mmarco.v2.dt.train.docpairs

Version of msmarco-passage/train, with queries and documents translated into Dutch.

Dataset irds.mmarco.v2.dt.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/train, with queries and documents translated into Dutch.

Dataset irds.mmarco.v2.dt.train

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/train, with queries and documents translated into Dutch.

mmarco/v2/es

Version of msmarco-passage, with queries and documents translated into Spanish.

Dataset irds.mmarco.v2.es.documents

datamaestro_text.datasets.irds.data.Documents

Version of msmarco-passage, with queries and documents translated into Spanish.

Dataset irds.mmarco.v2.es.dev.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev, with queries and documents translated into Spanish.

Dataset irds.mmarco.v2.es.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev, with queries and documents translated into Spanish.

Dataset irds.mmarco.v2.es.dev

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev, with queries and documents translated into Spanish.

Dataset irds.mmarco.v2.es.dev.small.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev/small, with queries and documents translated into Spanish.

Dataset irds.mmarco.v2.es.dev.small.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Version of msmarco-passage/dev/small, with queries and documents translated into Spanish.

Dataset irds.mmarco.v2.es.dev.small.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev/small, with queries and documents translated into Spanish.

Dataset irds.mmarco.v2.es.dev.small

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev/small, with queries and documents translated into Spanish.

Dataset irds.mmarco.v2.es.train.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/train, with queries and documents translated into Spanish.

Dataset irds.mmarco.v2.es.train.docpairs

Version of msmarco-passage/train, with queries and documents translated into Spanish.

Dataset irds.mmarco.v2.es.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/train, with queries and documents translated into Spanish.

Dataset irds.mmarco.v2.es.train

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/train, with queries and documents translated into Spanish.

mmarco/v2/fr

Version of msmarco-passage, with queries and documents translated into French.

Dataset irds.mmarco.v2.fr.documents

datamaestro_text.datasets.irds.data.Documents

Version of msmarco-passage, with queries and documents translated into French.

Dataset irds.mmarco.v2.fr.dev.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev, with queries and documents translated into French.

Dataset irds.mmarco.v2.fr.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev, with queries and documents translated into French.

Dataset irds.mmarco.v2.fr.dev

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev, with queries and documents translated into French.

Dataset irds.mmarco.v2.fr.dev.small.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev/small, with queries and documents translated into French.

Dataset irds.mmarco.v2.fr.dev.small.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Version of msmarco-passage/dev/small, with queries and documents translated into French.

Dataset irds.mmarco.v2.fr.dev.small.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev/small, with queries and documents translated into French.

Dataset irds.mmarco.v2.fr.dev.small

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev/small, with queries and documents translated into French.

Dataset irds.mmarco.v2.fr.train.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/train, with queries and documents translated into French.

Dataset irds.mmarco.v2.fr.train.docpairs

Version of msmarco-passage/train, with queries and documents translated into French.

Dataset irds.mmarco.v2.fr.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/train, with queries and documents translated into French.

Dataset irds.mmarco.v2.fr.train

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/train, with queries and documents translated into French.

mmarco/v2/hi

Version of msmarco-passage, with queries and documents translated into Hindi.

Dataset irds.mmarco.v2.hi.documents

datamaestro_text.datasets.irds.data.Documents

Version of msmarco-passage, with queries and documents translated into Hindi.

Dataset irds.mmarco.v2.hi.dev.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev, with queries and documents translated into Hindi.

Dataset irds.mmarco.v2.hi.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev, with queries and documents translated into Hindi.

Dataset irds.mmarco.v2.hi.dev

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev, with queries and documents translated into Hindi.

Dataset irds.mmarco.v2.hi.dev.small.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev/small, with queries and documents translated into Hindi.

Dataset irds.mmarco.v2.hi.dev.small.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Version of msmarco-passage/dev/small, with queries and documents translated into Hindi.

Dataset irds.mmarco.v2.hi.dev.small.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev/small, with queries and documents translated into Hindi.

Dataset irds.mmarco.v2.hi.dev.small

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev/small, with queries and documents translated into Hindi.

Dataset irds.mmarco.v2.hi.train.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/train, with queries and documents translated into Hindi.

Dataset irds.mmarco.v2.hi.train.docpairs

Version of msmarco-passage/train, with queries and documents translated into Hindi.

Dataset irds.mmarco.v2.hi.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/train, with queries and documents translated into Hindi.

Dataset irds.mmarco.v2.hi.train

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/train, with queries and documents translated into Hindi.

mmarco/v2/id

Version of msmarco-passage, with queries and documents translated into Indonesian.

Dataset irds.mmarco.v2.id.documents

datamaestro_text.datasets.irds.data.Documents

Version of msmarco-passage, with queries and documents translated into Indonesian.

Dataset irds.mmarco.v2.id.dev.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev, with queries and documents translated into Indonesian.

Dataset irds.mmarco.v2.id.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev, with queries and documents translated into Indonesian.

Dataset irds.mmarco.v2.id.dev

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev, with queries and documents translated into Indonesian.

Dataset irds.mmarco.v2.id.dev.small.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev/small, with queries and documents translated into Indonesian.

Dataset irds.mmarco.v2.id.dev.small.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Version of msmarco-passage/dev/small, with queries and documents translated into Indonesian.

Dataset irds.mmarco.v2.id.dev.small.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev/small, with queries and documents translated into Indonesian.

Dataset irds.mmarco.v2.id.dev.small

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev/small, with queries and documents translated into Indonesian.

Dataset irds.mmarco.v2.id.train.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/train, with queries and documents translated into Indonesian.

Dataset irds.mmarco.v2.id.train.docpairs

Version of msmarco-passage/train, with queries and documents translated into Indonesian.

Dataset irds.mmarco.v2.id.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/train, with queries and documents translated into Indonesian.

Dataset irds.mmarco.v2.id.train

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/train, with queries and documents translated into Indonesian.

mmarco/v2/it

Version of msmarco-passage, with queries and documents translated into Italian.

Dataset irds.mmarco.v2.it.documents

datamaestro_text.datasets.irds.data.Documents

Version of msmarco-passage, with queries and documents translated into Italian.

Dataset irds.mmarco.v2.it.dev.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev, with queries and documents translated into Italian.

Dataset irds.mmarco.v2.it.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev, with queries and documents translated into Italian.

Dataset irds.mmarco.v2.it.dev

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev, with queries and documents translated into Italian.

Dataset irds.mmarco.v2.it.dev.small.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev/small, with queries and documents translated into Italian.

Dataset irds.mmarco.v2.it.dev.small.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Version of msmarco-passage/dev/small, with queries and documents translated into Italian.

Dataset irds.mmarco.v2.it.dev.small.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev/small, with queries and documents translated into Italian.

Dataset irds.mmarco.v2.it.dev.small

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev/small, with queries and documents translated into Italian.

Dataset irds.mmarco.v2.it.train.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/train, with queries and documents translated into Italian.

Dataset irds.mmarco.v2.it.train.docpairs

Version of msmarco-passage/train, with queries and documents translated into Italian.

Dataset irds.mmarco.v2.it.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/train, with queries and documents translated into Italian.

Dataset irds.mmarco.v2.it.train

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/train, with queries and documents translated into Italian.

mmarco/v2/ja

Version of msmarco-passage, with queries and documents translated into Japanese.

Dataset irds.mmarco.v2.ja.documents

datamaestro_text.datasets.irds.data.Documents

Version of msmarco-passage, with queries and documents translated into Japanese.

Dataset irds.mmarco.v2.ja.dev.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev, with queries and documents translated into Japanese.

Dataset irds.mmarco.v2.ja.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev, with queries and documents translated into Japanese.

Dataset irds.mmarco.v2.ja.dev

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev, with queries and documents translated into Japanese.

Dataset irds.mmarco.v2.ja.dev.small.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev/small, with queries and documents translated into Japanese.

Dataset irds.mmarco.v2.ja.dev.small.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Version of msmarco-passage/dev/small, with queries and documents translated into Japanese.

Dataset irds.mmarco.v2.ja.dev.small.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev/small, with queries and documents translated into Japanese.

Dataset irds.mmarco.v2.ja.dev.small

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev/small, with queries and documents translated into Japanese.

Dataset irds.mmarco.v2.ja.train.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/train, with queries and documents translated into Japanese.

Dataset irds.mmarco.v2.ja.train.docpairs

Version of msmarco-passage/train, with queries and documents translated into Japanese.

Dataset irds.mmarco.v2.ja.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/train, with queries and documents translated into Japanese.

Dataset irds.mmarco.v2.ja.train

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/train, with queries and documents translated into Japanese.

mmarco/v2/pt

Version of msmarco-passage, with queries and documents translated into Portuguese.

Dataset irds.mmarco.v2.pt.documents

datamaestro_text.datasets.irds.data.Documents

Version of msmarco-passage, with queries and documents translated into Portuguese.

Dataset irds.mmarco.v2.pt.dev.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev, with queries and documents translated into Portuguese.

Dataset irds.mmarco.v2.pt.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev, with queries and documents translated into Portuguese.

Dataset irds.mmarco.v2.pt.dev

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev, with queries and documents translated into Portuguese.

Dataset irds.mmarco.v2.pt.dev.small.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev/small, with queries and documents translated into Portuguese.

Dataset irds.mmarco.v2.pt.dev.small.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Version of msmarco-passage/dev/small, with queries and documents translated into Portuguese.

Dataset irds.mmarco.v2.pt.dev.small.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev/small, with queries and documents translated into Portuguese.

Dataset irds.mmarco.v2.pt.dev.small

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev/small, with queries and documents translated into Portuguese.

Dataset irds.mmarco.v2.pt.train.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/train, with queries and documents translated into Portuguese.

Dataset irds.mmarco.v2.pt.train.docpairs

Version of msmarco-passage/train, with queries and documents translated into Portuguese.

Dataset irds.mmarco.v2.pt.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/train, with queries and documents translated into Portuguese.

Dataset irds.mmarco.v2.pt.train

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/train, with queries and documents translated into Portuguese.

mmarco/v2/ru

Version of msmarco-passage, with queries and documents translated into Russian.

Dataset irds.mmarco.v2.ru.documents

datamaestro_text.datasets.irds.data.Documents

Version of msmarco-passage, with queries and documents translated into Russian.

Dataset irds.mmarco.v2.ru.dev.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev, with queries and documents translated into Russian.

Dataset irds.mmarco.v2.ru.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev, with queries and documents translated into Russian.

Dataset irds.mmarco.v2.ru.dev

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev, with queries and documents translated into Russian.

Dataset irds.mmarco.v2.ru.dev.small.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev/small, with queries and documents translated into Russian.

Dataset irds.mmarco.v2.ru.dev.small.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Version of msmarco-passage/dev/small, with queries and documents translated into Russian.

Dataset irds.mmarco.v2.ru.dev.small.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev/small, with queries and documents translated into Russian.

Dataset irds.mmarco.v2.ru.dev.small

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev/small, with queries and documents translated into Russian.

Dataset irds.mmarco.v2.ru.train.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/train, with queries and documents translated into Russian.

Dataset irds.mmarco.v2.ru.train.docpairs

Version of msmarco-passage/train, with queries and documents translated into Russian.

Dataset irds.mmarco.v2.ru.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/train, with queries and documents translated into Russian.

Dataset irds.mmarco.v2.ru.train

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/train, with queries and documents translated into Russian.

mmarco/v2/vi

Version of msmarco-passage, with queries and documents translated into Vietnamese.

Dataset irds.mmarco.v2.vi.documents

datamaestro_text.datasets.irds.data.Documents

Version of msmarco-passage, with queries and documents translated into Vietnamese.

Dataset irds.mmarco.v2.vi.dev.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev, with queries and documents translated into Vietnamese.

Dataset irds.mmarco.v2.vi.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev, with queries and documents translated into Vietnamese.

Dataset irds.mmarco.v2.vi.dev

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev, with queries and documents translated into Vietnamese.

Dataset irds.mmarco.v2.vi.dev.small.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev/small, with queries and documents translated into Vietnamese.

Dataset irds.mmarco.v2.vi.dev.small.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Version of msmarco-passage/dev/small, with queries and documents translated into Vietnamese.

Dataset irds.mmarco.v2.vi.dev.small.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev/small, with queries and documents translated into Vietnamese.

Dataset irds.mmarco.v2.vi.dev.small

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev/small, with queries and documents translated into Vietnamese.

Dataset irds.mmarco.v2.vi.train.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/train, with queries and documents translated into Vietnamese.

Dataset irds.mmarco.v2.vi.train.docpairs

Version of msmarco-passage/train, with queries and documents translated into Vietnamese.

Dataset irds.mmarco.v2.vi.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/train, with queries and documents translated into Vietnamese.

Dataset irds.mmarco.v2.vi.train

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/train, with queries and documents translated into Vietnamese.

mmarco/v2/zh

Version of msmarco-passage, with queries and documents translated into Chinese.

Dataset irds.mmarco.v2.zh.documents

datamaestro_text.datasets.irds.data.Documents

Version of msmarco-passage, with queries and documents translated into Chinese.

Dataset irds.mmarco.v2.zh.dev.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev, with queries and documents translated into Chinese.

Dataset irds.mmarco.v2.zh.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev, with queries and documents translated into Chinese.

Dataset irds.mmarco.v2.zh.dev

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev, with queries and documents translated into Chinese.

Dataset irds.mmarco.v2.zh.dev.small.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev/small, with queries and documents translated into Chinese.

Dataset irds.mmarco.v2.zh.dev.small.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Version of msmarco-passage/dev/small, with queries and documents translated into Chinese.

Dataset irds.mmarco.v2.zh.dev.small.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev/small, with queries and documents translated into Chinese.

Dataset irds.mmarco.v2.zh.dev.small

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev/small, with queries and documents translated into Chinese.

Dataset irds.mmarco.v2.zh.train.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/train, with queries and documents translated into Chinese.

Dataset irds.mmarco.v2.zh.train.docpairs

Version of msmarco-passage/train, with queries and documents translated into Chinese.

Dataset irds.mmarco.v2.zh.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/train, with queries and documents translated into Chinese.

Dataset irds.mmarco.v2.zh.train

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/train, with queries and documents translated into Chinese.

mmarco/zh

Version of msmarco-passage, with documents translated into Chinese.

Dataset irds.mmarco.zh.documents

datamaestro_text.datasets.irds.data.Documents

Version of msmarco-passage, with documents translated into Chinese.

Dataset irds.mmarco.zh.dev.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev, with queries and documents translated into Chinese.

Dataset irds.mmarco.zh.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev, with queries and documents translated into Chinese.

Dataset irds.mmarco.zh.dev

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev, with queries and documents translated into Chinese.

Dataset irds.mmarco.zh.dev.small.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev/small, with queries and documents translated into Chinese.

Dataset irds.mmarco.zh.dev.small.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev/small, with queries and documents translated into Chinese.

Dataset irds.mmarco.zh.dev.small

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev/small, with queries and documents translated into Chinese.

Dataset irds.mmarco.zh.dev.small.v1.1.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev, with queries and documents translated into Chinese.

Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here.

Dataset irds.mmarco.zh.dev.small.v1.1.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Version of msmarco-passage/dev, with queries and documents translated into Chinese.

Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here.

Dataset irds.mmarco.zh.dev.small.v1.1.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev, with queries and documents translated into Chinese.

Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here.

Dataset irds.mmarco.zh.dev.small.v1.1

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev, with queries and documents translated into Chinese.

Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here.

Dataset irds.mmarco.zh.dev.v1.1.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/dev, with queries and documents translated into Chinese.

Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here.

Dataset irds.mmarco.zh.dev.v1.1.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/dev, with queries and documents translated into Chinese.

Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here.

Dataset irds.mmarco.zh.dev.v1.1

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/dev, with queries and documents translated into Chinese.

Version 1.1 of this file includes manual corrections from the authorss of the translated files. See discussion here.

Dataset irds.mmarco.zh.train.queries

datamaestro_text.datasets.irds.data.Topics

Version of msmarco-passage/train, with queries and documents translated into Chinese.

Dataset irds.mmarco.zh.train.docpairs

Version of msmarco-passage/train, with queries and documents translated into Chinese.

Dataset irds.mmarco.zh.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Version of msmarco-passage/train, with queries and documents translated into Chinese.

Dataset irds.mmarco.zh.train

datamaestro_text.datasets.irds.data.Adhoc

Version of msmarco-passage/train, with queries and documents translated into Chinese.

mr-tydi/ar

Complete Arabic dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.ar.documents

datamaestro_text.datasets.irds.data.Documents

Complete Arabic dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.ar.queries

datamaestro_text.datasets.irds.data.Topics

Complete Arabic dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.ar.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Complete Arabic dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.ar

datamaestro_text.datasets.irds.data.Adhoc

Complete Arabic dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.ar.dev.queries

datamaestro_text.datasets.irds.data.Topics

Development set for Arabic

Dataset irds.mr-tydi.ar.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Development set for Arabic

Dataset irds.mr-tydi.ar.dev

datamaestro_text.datasets.irds.data.Adhoc

Development set for Arabic

Dataset irds.mr-tydi.ar.test.queries

datamaestro_text.datasets.irds.data.Topics

Test set for Arabic

Dataset irds.mr-tydi.ar.test.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Test set for Arabic

Dataset irds.mr-tydi.ar.test

datamaestro_text.datasets.irds.data.Adhoc

Test set for Arabic

Dataset irds.mr-tydi.ar.train.queries

datamaestro_text.datasets.irds.data.Topics

Train set for Arabic

Dataset irds.mr-tydi.ar.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Train set for Arabic

Dataset irds.mr-tydi.ar.train

datamaestro_text.datasets.irds.data.Adhoc

Train set for Arabic

mr-tydi/bn

Complete Bengali dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.bn.documents

datamaestro_text.datasets.irds.data.Documents

Complete Bengali dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.bn.queries

datamaestro_text.datasets.irds.data.Topics

Complete Bengali dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.bn.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Complete Bengali dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.bn

datamaestro_text.datasets.irds.data.Adhoc

Complete Bengali dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.bn.dev.queries

datamaestro_text.datasets.irds.data.Topics

Development set for Bengali

Dataset irds.mr-tydi.bn.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Development set for Bengali

Dataset irds.mr-tydi.bn.dev

datamaestro_text.datasets.irds.data.Adhoc

Development set for Bengali

Dataset irds.mr-tydi.bn.test.queries

datamaestro_text.datasets.irds.data.Topics

Test set for Bengali

Dataset irds.mr-tydi.bn.test.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Test set for Bengali

Dataset irds.mr-tydi.bn.test

datamaestro_text.datasets.irds.data.Adhoc

Test set for Bengali

Dataset irds.mr-tydi.bn.train.queries

datamaestro_text.datasets.irds.data.Topics

Train set for Bengali

Dataset irds.mr-tydi.bn.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Train set for Bengali

Dataset irds.mr-tydi.bn.train

datamaestro_text.datasets.irds.data.Adhoc

Train set for Bengali

mr-tydi/en

Complete English dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.en.documents

datamaestro_text.datasets.irds.data.Documents

Complete English dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.en.queries

datamaestro_text.datasets.irds.data.Topics

Complete English dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.en.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Complete English dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.en

datamaestro_text.datasets.irds.data.Adhoc

Complete English dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.en.dev.queries

datamaestro_text.datasets.irds.data.Topics

Development set for English

Dataset irds.mr-tydi.en.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Development set for English

Dataset irds.mr-tydi.en.dev

datamaestro_text.datasets.irds.data.Adhoc

Development set for English

Dataset irds.mr-tydi.en.test.queries

datamaestro_text.datasets.irds.data.Topics

Test set for English

Dataset irds.mr-tydi.en.test.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Test set for English

Dataset irds.mr-tydi.en.test

datamaestro_text.datasets.irds.data.Adhoc

Test set for English

Dataset irds.mr-tydi.en.train.queries

datamaestro_text.datasets.irds.data.Topics

Train set for English

Dataset irds.mr-tydi.en.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Train set for English

Dataset irds.mr-tydi.en.train

datamaestro_text.datasets.irds.data.Adhoc

Train set for English

mr-tydi/fi

Complete Finnish dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.fi.documents

datamaestro_text.datasets.irds.data.Documents

Complete Finnish dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.fi.queries

datamaestro_text.datasets.irds.data.Topics

Complete Finnish dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.fi.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Complete Finnish dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.fi

datamaestro_text.datasets.irds.data.Adhoc

Complete Finnish dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.fi.dev.queries

datamaestro_text.datasets.irds.data.Topics

Development set for Finnish

Dataset irds.mr-tydi.fi.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Development set for Finnish

Dataset irds.mr-tydi.fi.dev

datamaestro_text.datasets.irds.data.Adhoc

Development set for Finnish

Dataset irds.mr-tydi.fi.test.queries

datamaestro_text.datasets.irds.data.Topics

Test set for Finnish

Dataset irds.mr-tydi.fi.test.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Test set for Finnish

Dataset irds.mr-tydi.fi.test

datamaestro_text.datasets.irds.data.Adhoc

Test set for Finnish

Dataset irds.mr-tydi.fi.train.queries

datamaestro_text.datasets.irds.data.Topics

Train set for Finnish

Dataset irds.mr-tydi.fi.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Train set for Finnish

Dataset irds.mr-tydi.fi.train

datamaestro_text.datasets.irds.data.Adhoc

Train set for Finnish

mr-tydi/id

Complete Indonesian dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.id.documents

datamaestro_text.datasets.irds.data.Documents

Complete Indonesian dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.id.queries

datamaestro_text.datasets.irds.data.Topics

Complete Indonesian dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.id.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Complete Indonesian dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.id

datamaestro_text.datasets.irds.data.Adhoc

Complete Indonesian dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.id.dev.queries

datamaestro_text.datasets.irds.data.Topics

Development set for Indonesian

Dataset irds.mr-tydi.id.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Development set for Indonesian

Dataset irds.mr-tydi.id.dev

datamaestro_text.datasets.irds.data.Adhoc

Development set for Indonesian

Dataset irds.mr-tydi.id.test.queries

datamaestro_text.datasets.irds.data.Topics

Test set for Indonesian

Dataset irds.mr-tydi.id.test.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Test set for Indonesian

Dataset irds.mr-tydi.id.test

datamaestro_text.datasets.irds.data.Adhoc

Test set for Indonesian

Dataset irds.mr-tydi.id.train.queries

datamaestro_text.datasets.irds.data.Topics

Train set for Indonesian

Dataset irds.mr-tydi.id.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Train set for Indonesian

Dataset irds.mr-tydi.id.train

datamaestro_text.datasets.irds.data.Adhoc

Train set for Indonesian

mr-tydi/ja

Complete Japanese dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.ja.documents

datamaestro_text.datasets.irds.data.Documents

Complete Japanese dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.ja.queries

datamaestro_text.datasets.irds.data.Topics

Complete Japanese dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.ja.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Complete Japanese dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.ja

datamaestro_text.datasets.irds.data.Adhoc

Complete Japanese dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.ja.dev.queries

datamaestro_text.datasets.irds.data.Topics

Development set for Japanese

Dataset irds.mr-tydi.ja.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Development set for Japanese

Dataset irds.mr-tydi.ja.dev

datamaestro_text.datasets.irds.data.Adhoc

Development set for Japanese

Dataset irds.mr-tydi.ja.test.queries

datamaestro_text.datasets.irds.data.Topics

Test set for Japanese

Dataset irds.mr-tydi.ja.test.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Test set for Japanese

Dataset irds.mr-tydi.ja.test

datamaestro_text.datasets.irds.data.Adhoc

Test set for Japanese

Dataset irds.mr-tydi.ja.train.queries

datamaestro_text.datasets.irds.data.Topics

Train set for Japanese

Dataset irds.mr-tydi.ja.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Train set for Japanese

Dataset irds.mr-tydi.ja.train

datamaestro_text.datasets.irds.data.Adhoc

Train set for Japanese

mr-tydi/ko

Complete Korean dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.ko.documents

datamaestro_text.datasets.irds.data.Documents

Complete Korean dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.ko.queries

datamaestro_text.datasets.irds.data.Topics

Complete Korean dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.ko.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Complete Korean dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.ko

datamaestro_text.datasets.irds.data.Adhoc

Complete Korean dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.ko.dev.queries

datamaestro_text.datasets.irds.data.Topics

Development set for Korean

Dataset irds.mr-tydi.ko.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Development set for Korean

Dataset irds.mr-tydi.ko.dev

datamaestro_text.datasets.irds.data.Adhoc

Development set for Korean

Dataset irds.mr-tydi.ko.test.queries

datamaestro_text.datasets.irds.data.Topics

Test set for Korean

Dataset irds.mr-tydi.ko.test.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Test set for Korean

Dataset irds.mr-tydi.ko.test

datamaestro_text.datasets.irds.data.Adhoc

Test set for Korean

Dataset irds.mr-tydi.ko.train.queries

datamaestro_text.datasets.irds.data.Topics

Train set for Korean

Dataset irds.mr-tydi.ko.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Train set for Korean

Dataset irds.mr-tydi.ko.train

datamaestro_text.datasets.irds.data.Adhoc

Train set for Korean

mr-tydi/ru

Complete Russian dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.ru.documents

datamaestro_text.datasets.irds.data.Documents

Complete Russian dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.ru.queries

datamaestro_text.datasets.irds.data.Topics

Complete Russian dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.ru.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Complete Russian dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.ru

datamaestro_text.datasets.irds.data.Adhoc

Complete Russian dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.ru.dev.queries

datamaestro_text.datasets.irds.data.Topics

Development set for Russian

Dataset irds.mr-tydi.ru.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Development set for Russian

Dataset irds.mr-tydi.ru.dev

datamaestro_text.datasets.irds.data.Adhoc

Development set for Russian

Dataset irds.mr-tydi.ru.test.queries

datamaestro_text.datasets.irds.data.Topics

Test set for Russian

Dataset irds.mr-tydi.ru.test.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Test set for Russian

Dataset irds.mr-tydi.ru.test

datamaestro_text.datasets.irds.data.Adhoc

Test set for Russian

Dataset irds.mr-tydi.ru.train.queries

datamaestro_text.datasets.irds.data.Topics

Train set for Russian

Dataset irds.mr-tydi.ru.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Train set for Russian

Dataset irds.mr-tydi.ru.train

datamaestro_text.datasets.irds.data.Adhoc

Train set for Russian

mr-tydi/sw

Complete Swahili dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.sw.documents

datamaestro_text.datasets.irds.data.Documents

Complete Swahili dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.sw.queries

datamaestro_text.datasets.irds.data.Topics

Complete Swahili dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.sw.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Complete Swahili dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.sw

datamaestro_text.datasets.irds.data.Adhoc

Complete Swahili dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.sw.dev.queries

datamaestro_text.datasets.irds.data.Topics

Development set for Swahili

Dataset irds.mr-tydi.sw.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Development set for Swahili

Dataset irds.mr-tydi.sw.dev

datamaestro_text.datasets.irds.data.Adhoc

Development set for Swahili

Dataset irds.mr-tydi.sw.test.queries

datamaestro_text.datasets.irds.data.Topics

Test set for Swahili

Dataset irds.mr-tydi.sw.test.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Test set for Swahili

Dataset irds.mr-tydi.sw.test

datamaestro_text.datasets.irds.data.Adhoc

Test set for Swahili

Dataset irds.mr-tydi.sw.train.queries

datamaestro_text.datasets.irds.data.Topics

Train set for Swahili

Dataset irds.mr-tydi.sw.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Train set for Swahili

Dataset irds.mr-tydi.sw.train

datamaestro_text.datasets.irds.data.Adhoc

Train set for Swahili

mr-tydi/te

Complete Telugu dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.te.documents

datamaestro_text.datasets.irds.data.Documents

Complete Telugu dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.te.queries

datamaestro_text.datasets.irds.data.Topics

Complete Telugu dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.te.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Complete Telugu dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.te

datamaestro_text.datasets.irds.data.Adhoc

Complete Telugu dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.te.dev.queries

datamaestro_text.datasets.irds.data.Topics

Development set for Telugu

Dataset irds.mr-tydi.te.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Development set for Telugu

Dataset irds.mr-tydi.te.dev

datamaestro_text.datasets.irds.data.Adhoc

Development set for Telugu

Dataset irds.mr-tydi.te.test.queries

datamaestro_text.datasets.irds.data.Topics

Test set for Telugu

Dataset irds.mr-tydi.te.test.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Test set for Telugu

Dataset irds.mr-tydi.te.test

datamaestro_text.datasets.irds.data.Adhoc

Test set for Telugu

Dataset irds.mr-tydi.te.train.queries

datamaestro_text.datasets.irds.data.Topics

Train set for Telugu

Dataset irds.mr-tydi.te.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Train set for Telugu

Dataset irds.mr-tydi.te.train

datamaestro_text.datasets.irds.data.Adhoc

Train set for Telugu

mr-tydi/th

Complete Thai dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.th.documents

datamaestro_text.datasets.irds.data.Documents

Complete Thai dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.th.queries

datamaestro_text.datasets.irds.data.Topics

Complete Thai dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.th.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Complete Thai dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.th

datamaestro_text.datasets.irds.data.Adhoc

Complete Thai dataset, including all train, dev, and test queries and qrels.

Dataset irds.mr-tydi.th.dev.queries

datamaestro_text.datasets.irds.data.Topics

Development set for Thai

Dataset irds.mr-tydi.th.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Development set for Thai

Dataset irds.mr-tydi.th.dev

datamaestro_text.datasets.irds.data.Adhoc

Development set for Thai

Dataset irds.mr-tydi.th.test.queries

datamaestro_text.datasets.irds.data.Topics

Test set for Thai

Dataset irds.mr-tydi.th.test.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Test set for Thai

Dataset irds.mr-tydi.th.test

datamaestro_text.datasets.irds.data.Adhoc

Test set for Thai

Dataset irds.mr-tydi.th.train.queries

datamaestro_text.datasets.irds.data.Topics

Train set for Thai

Dataset irds.mr-tydi.th.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Train set for Thai

Dataset irds.mr-tydi.th.train

datamaestro_text.datasets.irds.data.Adhoc

Train set for Thai

MSMARCO (document)

"Based the questions in the [MS-MARCO] Question Answering Dataset and the documents which answered the questions a document ranking task was formulated. There are 3.2 million documents and the goal is to rank based on their relevance. Relevance labels are derived from what passages was marked as having the answer in the QnA dataset."

Dataset irds.msmarco-document.documents

datamaestro_text.datasets.irds.data.Documents

"Based the questions in the [MS-MARCO] Question Answering Dataset and the documents which answered the questions a document ranking task was formulated. There are 3.2 million documents and the goal is to rank based on their relevance. Relevance labels are derived from what passages was marked as having the answer in the QnA dataset."

Dataset irds.msmarco-document.dev.queries

datamaestro_text.datasets.irds.data.Topics

Official dev set. All queries have exactly 1 (positive) relevance judgment.

scoreddocs are the top 100 results from Indri QL. These are used for the "re-ranking" setting.

Dataset irds.msmarco-document.dev.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Official dev set. All queries have exactly 1 (positive) relevance judgment.

scoreddocs are the top 100 results from Indri QL. These are used for the "re-ranking" setting.

Dataset irds.msmarco-document.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official dev set. All queries have exactly 1 (positive) relevance judgment.

scoreddocs are the top 100 results from Indri QL. These are used for the "re-ranking" setting.

Dataset irds.msmarco-document.dev

datamaestro_text.datasets.irds.data.Adhoc

Official dev set. All queries have exactly 1 (positive) relevance judgment.

scoreddocs are the top 100 results from Indri QL. These are used for the "re-ranking" setting.

Dataset irds.msmarco-document.eval.queries

datamaestro_text.datasets.irds.data.Topics

Official eval set for submission to MS MARCO leaderboard. Relevance judgments are hidden.

scoreddocs are the top 100 results from Indri QL. These are used for the "re-ranking" setting.

Dataset irds.msmarco-document.eval.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Official eval set for submission to MS MARCO leaderboard. Relevance judgments are hidden.

scoreddocs are the top 100 results from Indri QL. These are used for the "re-ranking" setting.

Dataset irds.msmarco-document.orcas.queries

datamaestro_text.datasets.irds.data.Topics

"ORCAS is a click-based dataset associated with the TREC Deep Learning Track. It covers 1.4 million of the TREC DL documents, providing 18 million connections to 10 million distinct queries."

  • Queries: From query log
  • Relevance Data: User clicks
  • Scored docs: Indri Query Likelihood model
  • Dataset Paper
Dataset irds.msmarco-document.orcas.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

"ORCAS is a click-based dataset associated with the TREC Deep Learning Track. It covers 1.4 million of the TREC DL documents, providing 18 million connections to 10 million distinct queries."

  • Queries: From query log
  • Relevance Data: User clicks
  • Scored docs: Indri Query Likelihood model
  • Dataset Paper
Dataset irds.msmarco-document.orcas.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

"ORCAS is a click-based dataset associated with the TREC Deep Learning Track. It covers 1.4 million of the TREC DL documents, providing 18 million connections to 10 million distinct queries."

  • Queries: From query log
  • Relevance Data: User clicks
  • Scored docs: Indri Query Likelihood model
  • Dataset Paper
Dataset irds.msmarco-document.orcas

datamaestro_text.datasets.irds.data.Adhoc

"ORCAS is a click-based dataset associated with the TREC Deep Learning Track. It covers 1.4 million of the TREC DL documents, providing 18 million connections to 10 million distinct queries."

  • Queries: From query log
  • Relevance Data: User clicks
  • Scored docs: Indri Query Likelihood model
  • Dataset Paper
Dataset irds.msmarco-document.train.queries

datamaestro_text.datasets.irds.data.Topics

Official train set. All queries have exactly 1 (positive) relevance judgment.

scoreddocs are the top 100 results from Indri QL. These are used for the "re-ranking" setting.

Dataset irds.msmarco-document.train.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Official train set. All queries have exactly 1 (positive) relevance judgment.

scoreddocs are the top 100 results from Indri QL. These are used for the "re-ranking" setting.

Dataset irds.msmarco-document.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official train set. All queries have exactly 1 (positive) relevance judgment.

scoreddocs are the top 100 results from Indri QL. These are used for the "re-ranking" setting.

Dataset irds.msmarco-document.train

datamaestro_text.datasets.irds.data.Adhoc

Official train set. All queries have exactly 1 (positive) relevance judgment.

scoreddocs are the top 100 results from Indri QL. These are used for the "re-ranking" setting.

Dataset irds.msmarco-document.trec-dl-2019.queries

datamaestro_text.datasets.irds.data.Topics

Queries from the TREC Deep Learning (DL) 2019 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document/trec-dl-2019/judged).

Dataset irds.msmarco-document.trec-dl-2019.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Queries from the TREC Deep Learning (DL) 2019 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document/trec-dl-2019/judged).

Dataset irds.msmarco-document.trec-dl-2019.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Queries from the TREC Deep Learning (DL) 2019 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document/trec-dl-2019/judged).

Dataset irds.msmarco-document.trec-dl-2019

datamaestro_text.datasets.irds.data.Adhoc

Queries from the TREC Deep Learning (DL) 2019 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document/trec-dl-2019/judged).

Dataset irds.msmarco-document.trec-dl-2019.judged.queries

datamaestro_text.datasets.irds.data.Topics

Subset of msmarco-document/trec-dl-2019, only including queries with qrels.

Dataset irds.msmarco-document.trec-dl-2019.judged.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Subset of msmarco-document/trec-dl-2019, only including queries with qrels.

Dataset irds.msmarco-document.trec-dl-2019.judged.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Subset of msmarco-document/trec-dl-2019, only including queries with qrels.

Dataset irds.msmarco-document.trec-dl-2019.judged

datamaestro_text.datasets.irds.data.Adhoc

Subset of msmarco-document/trec-dl-2019, only including queries with qrels.

Dataset irds.msmarco-document.trec-dl-2020.queries

datamaestro_text.datasets.irds.data.Topics

Queries from the TREC Deep Learning (DL) 2020 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document/trec-dl-2020/judged).

Dataset irds.msmarco-document.trec-dl-2020.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Queries from the TREC Deep Learning (DL) 2020 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document/trec-dl-2020/judged).

Dataset irds.msmarco-document.trec-dl-2020.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Queries from the TREC Deep Learning (DL) 2020 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document/trec-dl-2020/judged).

Dataset irds.msmarco-document.trec-dl-2020

datamaestro_text.datasets.irds.data.Adhoc

Queries from the TREC Deep Learning (DL) 2020 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document/trec-dl-2020/judged).

Dataset irds.msmarco-document.trec-dl-2020.judged.queries

datamaestro_text.datasets.irds.data.Topics

Subset of msmarco-document/trec-dl-2020, only including queries with qrels.

Dataset irds.msmarco-document.trec-dl-2020.judged.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Subset of msmarco-document/trec-dl-2020, only including queries with qrels.

Dataset irds.msmarco-document.trec-dl-2020.judged.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Subset of msmarco-document/trec-dl-2020, only including queries with qrels.

Dataset irds.msmarco-document.trec-dl-2020.judged

datamaestro_text.datasets.irds.data.Adhoc

Subset of msmarco-document/trec-dl-2020, only including queries with qrels.

Dataset irds.msmarco-document.trec-dl-hard.queries

datamaestro_text.datasets.irds.data.Topics

A more challenging subset of msmarco-document/trec-dl-2019 and msmarco-document/trec-dl-2020.

Dataset irds.msmarco-document.trec-dl-hard.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A more challenging subset of msmarco-document/trec-dl-2019 and msmarco-document/trec-dl-2020.

Dataset irds.msmarco-document.trec-dl-hard

datamaestro_text.datasets.irds.data.Adhoc

A more challenging subset of msmarco-document/trec-dl-2019 and msmarco-document/trec-dl-2020.

Dataset irds.msmarco-document.trec-dl-hard.fold1.queries

datamaestro_text.datasets.irds.data.Topics

Fold 1 of msmarco-document/trec-dl-hard

Dataset irds.msmarco-document.trec-dl-hard.fold1.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Fold 1 of msmarco-document/trec-dl-hard

Dataset irds.msmarco-document.trec-dl-hard.fold1

datamaestro_text.datasets.irds.data.Adhoc

Fold 1 of msmarco-document/trec-dl-hard

Dataset irds.msmarco-document.trec-dl-hard.fold2.queries

datamaestro_text.datasets.irds.data.Topics

Fold 2 of msmarco-document/trec-dl-hard

Dataset irds.msmarco-document.trec-dl-hard.fold2.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Fold 2 of msmarco-document/trec-dl-hard

Dataset irds.msmarco-document.trec-dl-hard.fold2

datamaestro_text.datasets.irds.data.Adhoc

Fold 2 of msmarco-document/trec-dl-hard

Dataset irds.msmarco-document.trec-dl-hard.fold3.queries

datamaestro_text.datasets.irds.data.Topics

Fold 3 of msmarco-document/trec-dl-hard

Dataset irds.msmarco-document.trec-dl-hard.fold3.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Fold 3 of msmarco-document/trec-dl-hard

Dataset irds.msmarco-document.trec-dl-hard.fold3

datamaestro_text.datasets.irds.data.Adhoc

Fold 3 of msmarco-document/trec-dl-hard

Dataset irds.msmarco-document.trec-dl-hard.fold4.queries

datamaestro_text.datasets.irds.data.Topics

Fold 4 of msmarco-document/trec-dl-hard

Dataset irds.msmarco-document.trec-dl-hard.fold4.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Fold 4 of msmarco-document/trec-dl-hard

Dataset irds.msmarco-document.trec-dl-hard.fold4

datamaestro_text.datasets.irds.data.Adhoc

Fold 4 of msmarco-document/trec-dl-hard

Dataset irds.msmarco-document.trec-dl-hard.fold5.queries

datamaestro_text.datasets.irds.data.Topics

Fold 5 of msmarco-document/trec-dl-hard

Dataset irds.msmarco-document.trec-dl-hard.fold5.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Fold 5 of msmarco-document/trec-dl-hard

Dataset irds.msmarco-document.trec-dl-hard.fold5

datamaestro_text.datasets.irds.data.Adhoc

Fold 5 of msmarco-document/trec-dl-hard

Anchor Text for Version 1 of MS MARCO

For version 1 of MS MARCO, the anchor text collection enriches 1,703,834 documents with anchor text extracted from six Common Crawl snapshots. To keep the collection size reasonable, we sampled 1,000 anchor texts for documents with more than 1,000 anchor texts (this sampling yields that all anchor text is included for 94% of the documents). The text field contains the anchor texts concatenated and the anchors field contains the anchor texts as list. The raw dataset with additional information (roughly 100GB) is available online.

Dataset irds.msmarco-document.anchor-text.documents

datamaestro_text.datasets.irds.data.Documents

For version 1 of MS MARCO, the anchor text collection enriches 1,703,834 documents with anchor text extracted from six Common Crawl snapshots. To keep the collection size reasonable, we sampled 1,000 anchor texts for documents with more than 1,000 anchor texts (this sampling yields that all anchor text is included for 94% of the documents). The text field contains the anchor texts concatenated and the anchors field contains the anchor texts as list. The raw dataset with additional information (roughly 100GB) is available online.

MSMARCO (document, version 2)

Version 2 of the MS MARCO document ranking dataset. The corpus contains 12M documents (roughly 3x as many as version 1).

  • Version 1 of dataset: msmarco-document
  • Documents: Text extracted from web pages
  • Queries: Natural language questions (from query log)
  • Dataset Paper
Dataset irds.msmarco-document-v2.documents

datamaestro_text.datasets.irds.data.Documents

Version 2 of the MS MARCO document ranking dataset. The corpus contains 12M documents (roughly 3x as many as version 1).

  • Version 1 of dataset: msmarco-document
  • Documents: Text extracted from web pages
  • Queries: Natural language questions (from query log)
  • Dataset Paper
Dataset irds.msmarco-document-v2.dev1.queries

datamaestro_text.datasets.irds.data.Topics

Official dev1 set with 4,552 queries.

Dataset irds.msmarco-document-v2.dev1.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Official dev1 set with 4,552 queries.

Dataset irds.msmarco-document-v2.dev1.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official dev1 set with 4,552 queries.

Dataset irds.msmarco-document-v2.dev1

datamaestro_text.datasets.irds.data.Adhoc

Official dev1 set with 4,552 queries.

Dataset irds.msmarco-document-v2.dev2.queries

datamaestro_text.datasets.irds.data.Topics

Official dev2 set with 5,000 queries.

Dataset irds.msmarco-document-v2.dev2.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Official dev2 set with 5,000 queries.

Dataset irds.msmarco-document-v2.dev2.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official dev2 set with 5,000 queries.

Dataset irds.msmarco-document-v2.dev2

datamaestro_text.datasets.irds.data.Adhoc

Official dev2 set with 5,000 queries.

Dataset irds.msmarco-document-v2.train.queries

datamaestro_text.datasets.irds.data.Topics

Official train set with 322,196 queries.

Dataset irds.msmarco-document-v2.train.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Official train set with 322,196 queries.

Dataset irds.msmarco-document-v2.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official train set with 322,196 queries.

Dataset irds.msmarco-document-v2.train

datamaestro_text.datasets.irds.data.Adhoc

Official train set with 322,196 queries.

Dataset irds.msmarco-document-v2.trec-dl-2019.queries

datamaestro_text.datasets.irds.data.Topics

Queries from the TREC Deep Learning (DL) 2019 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document-v2/trec-dl-2019/judged).

Dataset irds.msmarco-document-v2.trec-dl-2019.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Queries from the TREC Deep Learning (DL) 2019 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document-v2/trec-dl-2019/judged).

Dataset irds.msmarco-document-v2.trec-dl-2019

datamaestro_text.datasets.irds.data.Adhoc

Queries from the TREC Deep Learning (DL) 2019 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document-v2/trec-dl-2019/judged).

Dataset irds.msmarco-document-v2.trec-dl-2019.judged.queries

datamaestro_text.datasets.irds.data.Topics

Subset of msmarco-document-v2/trec-dl-2019, only including queries with qrels.

Dataset irds.msmarco-document-v2.trec-dl-2019.judged.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Subset of msmarco-document-v2/trec-dl-2019, only including queries with qrels.

Dataset irds.msmarco-document-v2.trec-dl-2019.judged

datamaestro_text.datasets.irds.data.Adhoc

Subset of msmarco-document-v2/trec-dl-2019, only including queries with qrels.

Dataset irds.msmarco-document-v2.trec-dl-2020.queries

datamaestro_text.datasets.irds.data.Topics

Queries from the TREC Deep Learning (DL) 2020 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document-v2/trec-dl-2020/judged).

Dataset irds.msmarco-document-v2.trec-dl-2020.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Queries from the TREC Deep Learning (DL) 2020 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document-v2/trec-dl-2020/judged).

Dataset irds.msmarco-document-v2.trec-dl-2020

datamaestro_text.datasets.irds.data.Adhoc

Queries from the TREC Deep Learning (DL) 2020 shared task, which were sampled from msmarco-document/eval. A subset of these queries were judged by NIST assessors, (filtered list available in msmarco-document-v2/trec-dl-2020/judged).

Dataset irds.msmarco-document-v2.trec-dl-2020.judged.queries

datamaestro_text.datasets.irds.data.Topics

Subset of msmarco-document-v2/trec-dl-2020, only including queries with qrels.

Dataset irds.msmarco-document-v2.trec-dl-2020.judged.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Subset of msmarco-document-v2/trec-dl-2020, only including queries with qrels.

Dataset irds.msmarco-document-v2.trec-dl-2020.judged

datamaestro_text.datasets.irds.data.Adhoc

Subset of msmarco-document-v2/trec-dl-2020, only including queries with qrels.

Dataset irds.msmarco-document-v2.trec-dl-2021.queries

datamaestro_text.datasets.irds.data.Topics

Official topics for the TREC Deep Learning (DL) 2021 shared task.

Note that at this time, qrels are only available to those with TREC active participant login credentials.

Dataset irds.msmarco-document-v2.trec-dl-2021.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Official topics for the TREC Deep Learning (DL) 2021 shared task.

Note that at this time, qrels are only available to those with TREC active participant login credentials.

Dataset irds.msmarco-document-v2.trec-dl-2021.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official topics for the TREC Deep Learning (DL) 2021 shared task.

Note that at this time, qrels are only available to those with TREC active participant login credentials.

Dataset irds.msmarco-document-v2.trec-dl-2021

datamaestro_text.datasets.irds.data.Adhoc

Official topics for the TREC Deep Learning (DL) 2021 shared task.

Note that at this time, qrels are only available to those with TREC active participant login credentials.

Dataset irds.msmarco-document-v2.trec-dl-2021.judged.queries

datamaestro_text.datasets.irds.data.Topics

msmarco-document-v2/trec-dl-2021, but filtered down to the 57 queries with qrels.

Note that at this time, this is only available to those with TREC active participant login credentials.

Dataset irds.msmarco-document-v2.trec-dl-2021.judged.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

msmarco-document-v2/trec-dl-2021, but filtered down to the 57 queries with qrels.

Note that at this time, this is only available to those with TREC active participant login credentials.

Dataset irds.msmarco-document-v2.trec-dl-2021.judged.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

msmarco-document-v2/trec-dl-2021, but filtered down to the 57 queries with qrels.

Note that at this time, this is only available to those with TREC active participant login credentials.

Dataset irds.msmarco-document-v2.trec-dl-2021.judged

datamaestro_text.datasets.irds.data.Adhoc

msmarco-document-v2/trec-dl-2021, but filtered down to the 57 queries with qrels.

Note that at this time, this is only available to those with TREC active participant login credentials.

Dataset irds.msmarco-document-v2.trec-dl-2022.queries

datamaestro_text.datasets.irds.data.Topics

Official topics for the TREC Deep Learning (DL) 2022 shared task.

Note that these qrels are inferred from the passage ranking task; a document's relevance label is the maximum of the labels of its passages.

Dataset irds.msmarco-document-v2.trec-dl-2022.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Official topics for the TREC Deep Learning (DL) 2022 shared task.

Note that these qrels are inferred from the passage ranking task; a document's relevance label is the maximum of the labels of its passages.

Dataset irds.msmarco-document-v2.trec-dl-2022.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official topics for the TREC Deep Learning (DL) 2022 shared task.

Note that these qrels are inferred from the passage ranking task; a document's relevance label is the maximum of the labels of its passages.

Dataset irds.msmarco-document-v2.trec-dl-2022

datamaestro_text.datasets.irds.data.Adhoc

Official topics for the TREC Deep Learning (DL) 2022 shared task.

Note that these qrels are inferred from the passage ranking task; a document's relevance label is the maximum of the labels of its passages.

Dataset irds.msmarco-document-v2.trec-dl-2022.judged.queries

datamaestro_text.datasets.irds.data.Topics

msmarco-document-v2/trec-dl-2022, but filtered down to only the queries with qrels.

Dataset irds.msmarco-document-v2.trec-dl-2022.judged.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

msmarco-document-v2/trec-dl-2022, but filtered down to only the queries with qrels.

Dataset irds.msmarco-document-v2.trec-dl-2022.judged.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

msmarco-document-v2/trec-dl-2022, but filtered down to only the queries with qrels.

Dataset irds.msmarco-document-v2.trec-dl-2022.judged

datamaestro_text.datasets.irds.data.Adhoc

msmarco-document-v2/trec-dl-2022, but filtered down to only the queries with qrels.

Dataset irds.msmarco-document-v2.trec-dl-2023.queries

datamaestro_text.datasets.irds.data.Topics

Official topics for the TREC Deep Learning (DL) 2023 shared task.

Dataset irds.msmarco-document-v2.trec-dl-2023.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Official topics for the TREC Deep Learning (DL) 2023 shared task.

Anchor Text for version 2 of MS Marco

For version 2 of MS MARCO, the anchor text collection enriches 4,821,244 documents with anchor text extracted from six Common Crawl snapshots. To keep the collection size reasonable, we sampled 1,000 anchor texts for documents with more than 1,000 anchor texts (this sampling yields that all anchor text is included for 97% of the documents). The text field contains the anchor texts concatenated and the anchors field contains the anchor texts as list. The raw dataset with additional information (roughly 100GB) is available online.

Dataset irds.msmarco-document-v2.anchor-text.documents

datamaestro_text.datasets.irds.data.Documents

For version 2 of MS MARCO, the anchor text collection enriches 4,821,244 documents with anchor text extracted from six Common Crawl snapshots. To keep the collection size reasonable, we sampled 1,000 anchor texts for documents with more than 1,000 anchor texts (this sampling yields that all anchor text is included for 97% of the documents). The text field contains the anchor texts concatenated and the anchors field contains the anchor texts as list. The raw dataset with additional information (roughly 100GB) is available online.

MSMARCO (passage, version 2)

Version 2 of the MS MARCO passage ranking dataset. The corpus contains 138M passages, which can be linked up with documents in msmarco-document-v2.

  • Version 1 of dataset: msmarco-passage
  • Documents: Text extracted from web pages
  • Queries: Natural language questions (from query log)
  • Dataset Paper

Change Log

  • On July 21, 2021, the task organizers updated the train, dev1, and dev2 qrels to remove duplicate entries from the files. This should not have change results from evaluation tools, but may result in non-repeatable results if these files were used in another process (e.g., model training). The original qrels file for msmarco-passage-v2/train can be found here to aid in result repeatability.
Dataset irds.msmarco-passage-v2.documents

datamaestro_text.datasets.irds.data.Documents

Version 2 of the MS MARCO passage ranking dataset. The corpus contains 138M passages, which can be linked up with documents in msmarco-document-v2.

  • Version 1 of dataset: msmarco-passage
  • Documents: Text extracted from web pages
  • Queries: Natural language questions (from query log)
  • Dataset Paper

Change Log

  • On July 21, 2021, the task organizers updated the train, dev1, and dev2 qrels to remove duplicate entries from the files. This should not have change results from evaluation tools, but may result in non-repeatable results if these files were used in another process (e.g., model training). The original qrels file for msmarco-passage-v2/train can be found here to aid in result repeatability.
Dataset irds.msmarco-passage-v2.dev1.queries

datamaestro_text.datasets.irds.data.Topics

Official dev1 set with 3,903 queries.

Note that that qrels in this dataset are not directly human-assessed; labels from msmarco-passage are mapped to documents via URL, these documents are re-passaged, and then the best approximate match is identified.

Dataset irds.msmarco-passage-v2.dev1.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Official dev1 set with 3,903 queries.

Note that that qrels in this dataset are not directly human-assessed; labels from msmarco-passage are mapped to documents via URL, these documents are re-passaged, and then the best approximate match is identified.

Dataset irds.msmarco-passage-v2.dev1.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official dev1 set with 3,903 queries.

Note that that qrels in this dataset are not directly human-assessed; labels from msmarco-passage are mapped to documents via URL, these documents are re-passaged, and then the best approximate match is identified.

Dataset irds.msmarco-passage-v2.dev1

datamaestro_text.datasets.irds.data.Adhoc

Official dev1 set with 3,903 queries.

Note that that qrels in this dataset are not directly human-assessed; labels from msmarco-passage are mapped to documents via URL, these documents are re-passaged, and then the best approximate match is identified.

Dataset irds.msmarco-passage-v2.dev2.queries

datamaestro_text.datasets.irds.data.Topics

Official dev2 set with 4,281 queries.

Note that that qrels in this dataset are not directly human-assessed; labels from msmarco-passage are mapped to documents via URL, these documents are re-passaged, and then the best approximate match is identified.

Dataset irds.msmarco-passage-v2.dev2.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Official dev2 set with 4,281 queries.

Note that that qrels in this dataset are not directly human-assessed; labels from msmarco-passage are mapped to documents via URL, these documents are re-passaged, and then the best approximate match is identified.

Dataset irds.msmarco-passage-v2.dev2.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official dev2 set with 4,281 queries.

Note that that qrels in this dataset are not directly human-assessed; labels from msmarco-passage are mapped to documents via URL, these documents are re-passaged, and then the best approximate match is identified.

Dataset irds.msmarco-passage-v2.dev2

datamaestro_text.datasets.irds.data.Adhoc

Official dev2 set with 4,281 queries.

Note that that qrels in this dataset are not directly human-assessed; labels from msmarco-passage are mapped to documents via URL, these documents are re-passaged, and then the best approximate match is identified.

Dataset irds.msmarco-passage-v2.train.queries

datamaestro_text.datasets.irds.data.Topics

Official train set with 277,144 queries.

Dataset irds.msmarco-passage-v2.train.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Official train set with 277,144 queries.

Dataset irds.msmarco-passage-v2.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official train set with 277,144 queries.

Dataset irds.msmarco-passage-v2.train

datamaestro_text.datasets.irds.data.Adhoc

Official train set with 277,144 queries.

Dataset irds.msmarco-passage-v2.trec-dl-2021.queries

datamaestro_text.datasets.irds.data.Topics

Official topics for the TREC Deep Learning (DL) 2021 shared task.

Dataset irds.msmarco-passage-v2.trec-dl-2021.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Official topics for the TREC Deep Learning (DL) 2021 shared task.

Dataset irds.msmarco-passage-v2.trec-dl-2021.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official topics for the TREC Deep Learning (DL) 2021 shared task.

Dataset irds.msmarco-passage-v2.trec-dl-2021

datamaestro_text.datasets.irds.data.Adhoc

Official topics for the TREC Deep Learning (DL) 2021 shared task.

Dataset irds.msmarco-passage-v2.trec-dl-2021.judged.queries

datamaestro_text.datasets.irds.data.Topics

msmarco-passage-v2/trec-dl-2021, but filtered down to the 53 queries with qrels.

Dataset irds.msmarco-passage-v2.trec-dl-2021.judged.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

msmarco-passage-v2/trec-dl-2021, but filtered down to the 53 queries with qrels.

Dataset irds.msmarco-passage-v2.trec-dl-2021.judged.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

msmarco-passage-v2/trec-dl-2021, but filtered down to the 53 queries with qrels.

Dataset irds.msmarco-passage-v2.trec-dl-2021.judged

datamaestro_text.datasets.irds.data.Adhoc

msmarco-passage-v2/trec-dl-2021, but filtered down to the 53 queries with qrels.

Dataset irds.msmarco-passage-v2.trec-dl-2022.queries

datamaestro_text.datasets.irds.data.Topics

Official topics for the TREC Deep Learning (DL) 2022 shared task.

Note that the officially-released qrels include relevance labels propagated to duplicate passages, while results presented in the notebook papers remove duplicate documents. This means that the results are not directly comparable, and extra care should be taken when making comparisions among systems to ensure that they were evaluated in the same settings.

Dataset irds.msmarco-passage-v2.trec-dl-2022.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Official topics for the TREC Deep Learning (DL) 2022 shared task.

Note that the officially-released qrels include relevance labels propagated to duplicate passages, while results presented in the notebook papers remove duplicate documents. This means that the results are not directly comparable, and extra care should be taken when making comparisions among systems to ensure that they were evaluated in the same settings.

Dataset irds.msmarco-passage-v2.trec-dl-2022.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official topics for the TREC Deep Learning (DL) 2022 shared task.

Note that the officially-released qrels include relevance labels propagated to duplicate passages, while results presented in the notebook papers remove duplicate documents. This means that the results are not directly comparable, and extra care should be taken when making comparisions among systems to ensure that they were evaluated in the same settings.

Dataset irds.msmarco-passage-v2.trec-dl-2022

datamaestro_text.datasets.irds.data.Adhoc

Official topics for the TREC Deep Learning (DL) 2022 shared task.

Note that the officially-released qrels include relevance labels propagated to duplicate passages, while results presented in the notebook papers remove duplicate documents. This means that the results are not directly comparable, and extra care should be taken when making comparisions among systems to ensure that they were evaluated in the same settings.

Dataset irds.msmarco-passage-v2.trec-dl-2022.judged.queries

datamaestro_text.datasets.irds.data.Topics

msmarco-passage-v2/trec-dl-2022, but filtered down to only the queries with qrels.

Dataset irds.msmarco-passage-v2.trec-dl-2022.judged.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

msmarco-passage-v2/trec-dl-2022, but filtered down to only the queries with qrels.

Dataset irds.msmarco-passage-v2.trec-dl-2022.judged.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

msmarco-passage-v2/trec-dl-2022, but filtered down to only the queries with qrels.

Dataset irds.msmarco-passage-v2.trec-dl-2022.judged

datamaestro_text.datasets.irds.data.Adhoc

msmarco-passage-v2/trec-dl-2022, but filtered down to only the queries with qrels.

Dataset irds.msmarco-passage-v2.trec-dl-2023.queries

datamaestro_text.datasets.irds.data.Topics

Official topics for the TREC Deep Learning (DL) 2023 shared task.

Dataset irds.msmarco-passage-v2.trec-dl-2023.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Official topics for the TREC Deep Learning (DL) 2023 shared task.

msmarco-passage-v2/dedup

Dataset irds.msmarco-passage-v2.dedup.documents

datamaestro_text.datasets.irds.data.Documents

MSMARCO (QnA)

The MS MARCO Question Answering dataset. This is the source collection of msmarco-passage and msmarco-document.

It is prohibited to use information from this dataset for submissions to the MS MARCO passage and document leaderboards or the TREC DL shared task.

Query IDs in this collection align with those found in msmarco-passage and msmarco-document. The collection does not provide doc_ids, so these are assigned in the following format: [msmarco_passage_id]-[url_seq], where [msmarco_passage_id] is the document from msmarco-passage that has matching contents and [url_seq] is assigned sequentially for each URL encountered. In other words, all documents with the same prefix have the same text; they only differ in the originating document.

Doc msmarco_passage_id fields are assigned by matching pasasge contents in msmarco-passage, and this field is provided for every document. Doc msmarco_document_id fields are assigned by matching the URL to the one found in msmarco-document. Due to how msmarco-document was constructed, there is not necessarily a match (value will be None if no match).

Dataset irds.msmarco-qna.documents

datamaestro_text.datasets.irds.data.Documents

The MS MARCO Question Answering dataset. This is the source collection of msmarco-passage and msmarco-document.

It is prohibited to use information from this dataset for submissions to the MS MARCO passage and document leaderboards or the TREC DL shared task.

Query IDs in this collection align with those found in msmarco-passage and msmarco-document. The collection does not provide doc_ids, so these are assigned in the following format: [msmarco_passage_id]-[url_seq], where [msmarco_passage_id] is the document from msmarco-passage that has matching contents and [url_seq] is assigned sequentially for each URL encountered. In other words, all documents with the same prefix have the same text; they only differ in the originating document.

Doc msmarco_passage_id fields are assigned by matching pasasge contents in msmarco-passage, and this field is provided for every document. Doc msmarco_document_id fields are assigned by matching the URL to the one found in msmarco-document. Due to how msmarco-document was constructed, there is not necessarily a match (value will be None if no match).

Dataset irds.msmarco-qna.dev.queries

datamaestro_text.datasets.irds.data.Topics

Official dev set.

The scoreddocs provides the roughtly 10 passages presented to the user for annotation, where the score indicates the order presented.

Dataset irds.msmarco-qna.dev.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Official dev set.

The scoreddocs provides the roughtly 10 passages presented to the user for annotation, where the score indicates the order presented.

Dataset irds.msmarco-qna.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official dev set.

The scoreddocs provides the roughtly 10 passages presented to the user for annotation, where the score indicates the order presented.

Dataset irds.msmarco-qna.dev

datamaestro_text.datasets.irds.data.Adhoc

Official dev set.

The scoreddocs provides the roughtly 10 passages presented to the user for annotation, where the score indicates the order presented.

Dataset irds.msmarco-qna.eval.queries

datamaestro_text.datasets.irds.data.Topics

Official eval set.

The scoreddocs provides the roughtly 10 passages presented to the user for annotation, where the score indicates the order presented.

Dataset irds.msmarco-qna.eval.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Official eval set.

The scoreddocs provides the roughtly 10 passages presented to the user for annotation, where the score indicates the order presented.

Dataset irds.msmarco-qna.train.queries

datamaestro_text.datasets.irds.data.Topics

Official train set.

The scoreddocs provides the roughtly 10 passages presented to the user for annotation, where the score indicates the order presented.

Dataset irds.msmarco-qna.train.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Official train set.

The scoreddocs provides the roughtly 10 passages presented to the user for annotation, where the score indicates the order presented.

Dataset irds.msmarco-qna.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official train set.

The scoreddocs provides the roughtly 10 passages presented to the user for annotation, where the score indicates the order presented.

Dataset irds.msmarco-qna.train

datamaestro_text.datasets.irds.data.Adhoc

Official train set.

The scoreddocs provides the roughtly 10 passages presented to the user for annotation, where the score indicates the order presented.

nano-beir/arguana

A version of the ArguAna Counterargs dataset, for argument retrieval.

Dataset irds.nano-beir.arguana.documents

datamaestro_text.datasets.irds.data.Documents

A version of the ArguAna Counterargs dataset, for argument retrieval.

Dataset irds.nano-beir.arguana.queries

datamaestro_text.datasets.irds.data.Topics

A version of the ArguAna Counterargs dataset, for argument retrieval.

Dataset irds.nano-beir.arguana.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the ArguAna Counterargs dataset, for argument retrieval.

Dataset irds.nano-beir.arguana

datamaestro_text.datasets.irds.data.Adhoc

A version of the ArguAna Counterargs dataset, for argument retrieval.

nano-beir/climate-fever

A version of the CLIMATE-FEVER dataset, for fact verification on claims about climate.

Dataset irds.nano-beir.climate-fever.documents

datamaestro_text.datasets.irds.data.Documents

A version of the CLIMATE-FEVER dataset, for fact verification on claims about climate.

Dataset irds.nano-beir.climate-fever.queries

datamaestro_text.datasets.irds.data.Topics

A version of the CLIMATE-FEVER dataset, for fact verification on claims about climate.

Dataset irds.nano-beir.climate-fever.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the CLIMATE-FEVER dataset, for fact verification on claims about climate.

Dataset irds.nano-beir.climate-fever

datamaestro_text.datasets.irds.data.Adhoc

A version of the CLIMATE-FEVER dataset, for fact verification on claims about climate.

nano-beir/dbpedia-entity

A version of the DBPedia-Entity-v2 dataset for entity retrieval.

Dataset irds.nano-beir.dbpedia-entity.documents

datamaestro_text.datasets.irds.data.Documents

A version of the DBPedia-Entity-v2 dataset for entity retrieval.

Dataset irds.nano-beir.dbpedia-entity.queries

datamaestro_text.datasets.irds.data.Topics

A version of the DBPedia-Entity-v2 dataset for entity retrieval.

Dataset irds.nano-beir.dbpedia-entity.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the DBPedia-Entity-v2 dataset for entity retrieval.

Dataset irds.nano-beir.dbpedia-entity

datamaestro_text.datasets.irds.data.Adhoc

A version of the DBPedia-Entity-v2 dataset for entity retrieval.

nano-beir/fever

A version of the FEVER dataset for fact verification.

Dataset irds.nano-beir.fever.documents

datamaestro_text.datasets.irds.data.Documents

A version of the FEVER dataset for fact verification.

Dataset irds.nano-beir.fever.queries

datamaestro_text.datasets.irds.data.Topics

A version of the FEVER dataset for fact verification.

Dataset irds.nano-beir.fever.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the FEVER dataset for fact verification.

Dataset irds.nano-beir.fever

datamaestro_text.datasets.irds.data.Adhoc

A version of the FEVER dataset for fact verification.

nano-beir/fiqa

A version of the FIQA-2018 dataset (financial opinion question answering).

Dataset irds.nano-beir.fiqa.documents

datamaestro_text.datasets.irds.data.Documents

A version of the FIQA-2018 dataset (financial opinion question answering).

Dataset irds.nano-beir.fiqa.queries

datamaestro_text.datasets.irds.data.Topics

A version of the FIQA-2018 dataset (financial opinion question answering).

Dataset irds.nano-beir.fiqa.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the FIQA-2018 dataset (financial opinion question answering).

Dataset irds.nano-beir.fiqa

datamaestro_text.datasets.irds.data.Adhoc

A version of the FIQA-2018 dataset (financial opinion question answering).

nano-beir/hotpotqa

A version of the Hotpot QA dataset for multi-hop question answering.

Dataset irds.nano-beir.hotpotqa.documents

datamaestro_text.datasets.irds.data.Documents

A version of the Hotpot QA dataset for multi-hop question answering.

Dataset irds.nano-beir.hotpotqa.queries

datamaestro_text.datasets.irds.data.Topics

A version of the Hotpot QA dataset for multi-hop question answering.

Dataset irds.nano-beir.hotpotqa.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the Hotpot QA dataset for multi-hop question answering.

Dataset irds.nano-beir.hotpotqa

datamaestro_text.datasets.irds.data.Adhoc

A version of the Hotpot QA dataset for multi-hop question answering.

nano-beir/msmarco

A version of the MS MARCO passage ranking dataset.

Note that this version differs from msmarco-passage, in that it does not correct the encoding problems in the source documents.

Dataset irds.nano-beir.msmarco.documents

datamaestro_text.datasets.irds.data.Documents

A version of the MS MARCO passage ranking dataset.

Note that this version differs from msmarco-passage, in that it does not correct the encoding problems in the source documents.

Dataset irds.nano-beir.msmarco.queries

datamaestro_text.datasets.irds.data.Topics

A version of the MS MARCO passage ranking dataset.

Note that this version differs from msmarco-passage, in that it does not correct the encoding problems in the source documents.

Dataset irds.nano-beir.msmarco.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the MS MARCO passage ranking dataset.

Note that this version differs from msmarco-passage, in that it does not correct the encoding problems in the source documents.

Dataset irds.nano-beir.msmarco

datamaestro_text.datasets.irds.data.Adhoc

A version of the MS MARCO passage ranking dataset.

Note that this version differs from msmarco-passage, in that it does not correct the encoding problems in the source documents.

nano-beir/nfcorpus

A version of the NF Corpus (Nutrition Facts).

Data pre-processing may be different than what is done in nfcorpus.

Dataset irds.nano-beir.nfcorpus.documents

datamaestro_text.datasets.irds.data.Documents

A version of the NF Corpus (Nutrition Facts).

Data pre-processing may be different than what is done in nfcorpus.

Dataset irds.nano-beir.nfcorpus.queries

datamaestro_text.datasets.irds.data.Topics

A version of the NF Corpus (Nutrition Facts).

Data pre-processing may be different than what is done in nfcorpus.

Dataset irds.nano-beir.nfcorpus.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the NF Corpus (Nutrition Facts).

Data pre-processing may be different than what is done in nfcorpus.

Dataset irds.nano-beir.nfcorpus

datamaestro_text.datasets.irds.data.Adhoc

A version of the NF Corpus (Nutrition Facts).

Data pre-processing may be different than what is done in nfcorpus.

nano-beir/nq

A version of the Natural Questions dev dataset.

Data pre-processing differs both from what is done in natural-questions and dpr-w100/natural-questions, especially with respect to the document collection and filtering conducted on the queries. See the Beir paper for details.

Dataset irds.nano-beir.nq.documents

datamaestro_text.datasets.irds.data.Documents

A version of the Natural Questions dev dataset.

Data pre-processing differs both from what is done in natural-questions and dpr-w100/natural-questions, especially with respect to the document collection and filtering conducted on the queries. See the Beir paper for details.

Dataset irds.nano-beir.nq.queries

datamaestro_text.datasets.irds.data.Topics

A version of the Natural Questions dev dataset.

Data pre-processing differs both from what is done in natural-questions and dpr-w100/natural-questions, especially with respect to the document collection and filtering conducted on the queries. See the Beir paper for details.

Dataset irds.nano-beir.nq.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the Natural Questions dev dataset.

Data pre-processing differs both from what is done in natural-questions and dpr-w100/natural-questions, especially with respect to the document collection and filtering conducted on the queries. See the Beir paper for details.

Dataset irds.nano-beir.nq

datamaestro_text.datasets.irds.data.Adhoc

A version of the Natural Questions dev dataset.

Data pre-processing differs both from what is done in natural-questions and dpr-w100/natural-questions, especially with respect to the document collection and filtering conducted on the queries. See the Beir paper for details.

nano-beir/quora

A version of the Quora duplicate question detection dataset (QQP).

Dataset irds.nano-beir.quora.documents

datamaestro_text.datasets.irds.data.Documents

A version of the Quora duplicate question detection dataset (QQP).

Dataset irds.nano-beir.quora.queries

datamaestro_text.datasets.irds.data.Topics

A version of the Quora duplicate question detection dataset (QQP).

Dataset irds.nano-beir.quora.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the Quora duplicate question detection dataset (QQP).

Dataset irds.nano-beir.quora

datamaestro_text.datasets.irds.data.Adhoc

A version of the Quora duplicate question detection dataset (QQP).

nano-beir/scidocs

A version of the SciDocs dataset, used for citation retrieval.

Dataset irds.nano-beir.scidocs.documents

datamaestro_text.datasets.irds.data.Documents

A version of the SciDocs dataset, used for citation retrieval.

Dataset irds.nano-beir.scidocs.queries

datamaestro_text.datasets.irds.data.Topics

A version of the SciDocs dataset, used for citation retrieval.

Dataset irds.nano-beir.scidocs.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the SciDocs dataset, used for citation retrieval.

Dataset irds.nano-beir.scidocs

datamaestro_text.datasets.irds.data.Adhoc

A version of the SciDocs dataset, used for citation retrieval.

nano-beir/scifact

A version of the SciFact dataset, for fact verification.

Dataset irds.nano-beir.scifact.documents

datamaestro_text.datasets.irds.data.Documents

A version of the SciFact dataset, for fact verification.

Dataset irds.nano-beir.scifact.queries

datamaestro_text.datasets.irds.data.Topics

A version of the SciFact dataset, for fact verification.

Dataset irds.nano-beir.scifact.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of the SciFact dataset, for fact verification.

Dataset irds.nano-beir.scifact

datamaestro_text.datasets.irds.data.Adhoc

A version of the SciFact dataset, for fact verification.

nano-beir/webis-touche2020

Original version of the Touchè-2020 dataset, for argument retrieval.

Consider using beir/webis-touche2020/v2 instead; it uses an updated, more complete version of the qrels.
Dataset irds.nano-beir.webis-touche2020.documents

datamaestro_text.datasets.irds.data.Documents

Original version of the Touchè-2020 dataset, for argument retrieval.

Consider using beir/webis-touche2020/v2 instead; it uses an updated, more complete version of the qrels.
Dataset irds.nano-beir.webis-touche2020.queries

datamaestro_text.datasets.irds.data.Topics

Original version of the Touchè-2020 dataset, for argument retrieval.

Consider using beir/webis-touche2020/v2 instead; it uses an updated, more complete version of the qrels.
Dataset irds.nano-beir.webis-touche2020.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Original version of the Touchè-2020 dataset, for argument retrieval.

Consider using beir/webis-touche2020/v2 instead; it uses an updated, more complete version of the qrels.
Dataset irds.nano-beir.webis-touche2020

datamaestro_text.datasets.irds.data.Adhoc

Original version of the Touchè-2020 dataset, for argument retrieval.

Consider using beir/webis-touche2020/v2 instead; it uses an updated, more complete version of the qrels.

neumarco/fa

The msmarco-passage corpus, translated to Persian (Farsi).

Dataset irds.neumarco.fa.documents

datamaestro_text.datasets.irds.data.Documents

The msmarco-passage corpus, translated to Persian (Farsi).

Dataset irds.neumarco.fa.dev.queries

datamaestro_text.datasets.irds.data.Topics

A version of msmarco-passage/dev, with the corpus translated to Persian (Farsi).

Dataset irds.neumarco.fa.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of msmarco-passage/dev, with the corpus translated to Persian (Farsi).

Dataset irds.neumarco.fa.dev

datamaestro_text.datasets.irds.data.Adhoc

A version of msmarco-passage/dev, with the corpus translated to Persian (Farsi).

Dataset irds.neumarco.fa.dev.judged.queries

datamaestro_text.datasets.irds.data.Topics

A version of msmarco-passage/dev/judged, with the corpus translated to Persian (Farsi).

Dataset irds.neumarco.fa.dev.judged.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of msmarco-passage/dev/judged, with the corpus translated to Persian (Farsi).

Dataset irds.neumarco.fa.dev.judged

datamaestro_text.datasets.irds.data.Adhoc

A version of msmarco-passage/dev/judged, with the corpus translated to Persian (Farsi).

Dataset irds.neumarco.fa.dev.small.queries

datamaestro_text.datasets.irds.data.Topics

A version of msmarco-passage/dev/small, with the corpus translated to Persian (Farsi).

Dataset irds.neumarco.fa.dev.small.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of msmarco-passage/dev/small, with the corpus translated to Persian (Farsi).

Dataset irds.neumarco.fa.dev.small

datamaestro_text.datasets.irds.data.Adhoc

A version of msmarco-passage/dev/small, with the corpus translated to Persian (Farsi).

Dataset irds.neumarco.fa.train.queries

datamaestro_text.datasets.irds.data.Topics

A version of msmarco-passage/train, with the corpus translated to Persian (Farsi).

Dataset irds.neumarco.fa.train.docpairs

A version of msmarco-passage/train, with the corpus translated to Persian (Farsi).

Dataset irds.neumarco.fa.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of msmarco-passage/train, with the corpus translated to Persian (Farsi).

Dataset irds.neumarco.fa.train

datamaestro_text.datasets.irds.data.Adhoc

A version of msmarco-passage/train, with the corpus translated to Persian (Farsi).

Dataset irds.neumarco.fa.train.judged.queries

datamaestro_text.datasets.irds.data.Topics

A version of msmarco-passage/train/judged, with the corpus translated to Persian (Farsi).

Dataset irds.neumarco.fa.train.judged.docpairs

A version of msmarco-passage/train/judged, with the corpus translated to Persian (Farsi).

Dataset irds.neumarco.fa.train.judged.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of msmarco-passage/train/judged, with the corpus translated to Persian (Farsi).

Dataset irds.neumarco.fa.train.judged

datamaestro_text.datasets.irds.data.Adhoc

A version of msmarco-passage/train/judged, with the corpus translated to Persian (Farsi).

neumarco/ru

The msmarco-passage corpus, translated to Russian.

Dataset irds.neumarco.ru.documents

datamaestro_text.datasets.irds.data.Documents

The msmarco-passage corpus, translated to Russian.

Dataset irds.neumarco.ru.dev.queries

datamaestro_text.datasets.irds.data.Topics

A version of msmarco-passage/dev, with the corpus translated to Russian.

Dataset irds.neumarco.ru.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of msmarco-passage/dev, with the corpus translated to Russian.

Dataset irds.neumarco.ru.dev

datamaestro_text.datasets.irds.data.Adhoc

A version of msmarco-passage/dev, with the corpus translated to Russian.

Dataset irds.neumarco.ru.dev.judged.queries

datamaestro_text.datasets.irds.data.Topics

A version of msmarco-passage/dev/judged, with the corpus translated to Russian.

Dataset irds.neumarco.ru.dev.judged.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of msmarco-passage/dev/judged, with the corpus translated to Russian.

Dataset irds.neumarco.ru.dev.judged

datamaestro_text.datasets.irds.data.Adhoc

A version of msmarco-passage/dev/judged, with the corpus translated to Russian.

Dataset irds.neumarco.ru.dev.small.queries

datamaestro_text.datasets.irds.data.Topics

A version of msmarco-passage/dev/small, with the corpus translated to Russian.

Dataset irds.neumarco.ru.dev.small.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of msmarco-passage/dev/small, with the corpus translated to Russian.

Dataset irds.neumarco.ru.dev.small

datamaestro_text.datasets.irds.data.Adhoc

A version of msmarco-passage/dev/small, with the corpus translated to Russian.

Dataset irds.neumarco.ru.train.queries

datamaestro_text.datasets.irds.data.Topics

A version of msmarco-passage/train, with the corpus translated to Russian.

Dataset irds.neumarco.ru.train.docpairs

A version of msmarco-passage/train, with the corpus translated to Russian.

Dataset irds.neumarco.ru.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of msmarco-passage/train, with the corpus translated to Russian.

Dataset irds.neumarco.ru.train

datamaestro_text.datasets.irds.data.Adhoc

A version of msmarco-passage/train, with the corpus translated to Russian.

Dataset irds.neumarco.ru.train.judged.queries

datamaestro_text.datasets.irds.data.Topics

A version of msmarco-passage/train/judged, with the corpus translated to Russian.

Dataset irds.neumarco.ru.train.judged.docpairs

A version of msmarco-passage/train/judged, with the corpus translated to Russian.

Dataset irds.neumarco.ru.train.judged.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of msmarco-passage/train/judged, with the corpus translated to Russian.

Dataset irds.neumarco.ru.train.judged

datamaestro_text.datasets.irds.data.Adhoc

A version of msmarco-passage/train/judged, with the corpus translated to Russian.

neumarco/zh

The msmarco-passage corpus, translated to Chinese.

Dataset irds.neumarco.zh.documents

datamaestro_text.datasets.irds.data.Documents

The msmarco-passage corpus, translated to Chinese.

Dataset irds.neumarco.zh.dev.queries

datamaestro_text.datasets.irds.data.Topics

A version of msmarco-passage/dev, with the corpus translated to Chinese.

Dataset irds.neumarco.zh.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of msmarco-passage/dev, with the corpus translated to Chinese.

Dataset irds.neumarco.zh.dev

datamaestro_text.datasets.irds.data.Adhoc

A version of msmarco-passage/dev, with the corpus translated to Chinese.

Dataset irds.neumarco.zh.dev.judged.queries

datamaestro_text.datasets.irds.data.Topics

A version of msmarco-passage/dev/judged, with the corpus translated to Chinese.

Dataset irds.neumarco.zh.dev.judged.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of msmarco-passage/dev/judged, with the corpus translated to Chinese.

Dataset irds.neumarco.zh.dev.judged

datamaestro_text.datasets.irds.data.Adhoc

A version of msmarco-passage/dev/judged, with the corpus translated to Chinese.

Dataset irds.neumarco.zh.dev.small.queries

datamaestro_text.datasets.irds.data.Topics

A version of msmarco-passage/dev/small, with the corpus translated to Chinese.

Dataset irds.neumarco.zh.dev.small.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of msmarco-passage/dev/small, with the corpus translated to Chinese.

Dataset irds.neumarco.zh.dev.small

datamaestro_text.datasets.irds.data.Adhoc

A version of msmarco-passage/dev/small, with the corpus translated to Chinese.

Dataset irds.neumarco.zh.train.queries

datamaestro_text.datasets.irds.data.Topics

A version of msmarco-passage/train, with the corpus translated to Chinese.

Dataset irds.neumarco.zh.train.docpairs

A version of msmarco-passage/train, with the corpus translated to Chinese.

Dataset irds.neumarco.zh.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of msmarco-passage/train, with the corpus translated to Chinese.

Dataset irds.neumarco.zh.train

datamaestro_text.datasets.irds.data.Adhoc

A version of msmarco-passage/train, with the corpus translated to Chinese.

Dataset irds.neumarco.zh.train.judged.queries

datamaestro_text.datasets.irds.data.Topics

A version of msmarco-passage/train/judged, with the corpus translated to Chinese.

Dataset irds.neumarco.zh.train.judged.docpairs

A version of msmarco-passage/train/judged, with the corpus translated to Chinese.

Dataset irds.neumarco.zh.train.judged.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of msmarco-passage/train/judged, with the corpus translated to Chinese.

Dataset irds.neumarco.zh.train.judged

datamaestro_text.datasets.irds.data.Adhoc

A version of msmarco-passage/train/judged, with the corpus translated to Chinese.

NFCorpus (NutritionFacts)

"NFCorpus is a full-text English retrieval data set for Medical Information Retrieval. It contains a total of 3,244 natural language queries (written in non-technical English, harvested from the NutritionFacts.org site) with 169,756 automatically extracted relevance judgments for 9,964 medical documents (written in a complex terminology-heavy language), mostly from PubMed."

Dataset irds.nfcorpus.documents

datamaestro_text.datasets.irds.data.Documents

"NFCorpus is a full-text English retrieval data set for Medical Information Retrieval. It contains a total of 3,244 natural language queries (written in non-technical English, harvested from the NutritionFacts.org site) with 169,756 automatically extracted relevance judgments for 9,964 medical documents (written in a complex terminology-heavy language), mostly from PubMed."

Dataset irds.nfcorpus.dev.queries

datamaestro_text.datasets.irds.data.Topics

Official dev set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)

Dataset irds.nfcorpus.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official dev set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)

Dataset irds.nfcorpus.dev

datamaestro_text.datasets.irds.data.Adhoc

Official dev set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)

Dataset irds.nfcorpus.dev.nontopic.queries

datamaestro_text.datasets.irds.data.Topics

Official dev set, filtered to exclude queries from topic pages.

Dataset irds.nfcorpus.dev.nontopic.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official dev set, filtered to exclude queries from topic pages.

Dataset irds.nfcorpus.dev.nontopic

datamaestro_text.datasets.irds.data.Adhoc

Official dev set, filtered to exclude queries from topic pages.

Dataset irds.nfcorpus.dev.video.queries

datamaestro_text.datasets.irds.data.Topics

Official dev set, filtered to only include queries from video pages.

Dataset irds.nfcorpus.dev.video.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official dev set, filtered to only include queries from video pages.

Dataset irds.nfcorpus.dev.video

datamaestro_text.datasets.irds.data.Adhoc

Official dev set, filtered to only include queries from video pages.

Dataset irds.nfcorpus.test.queries

datamaestro_text.datasets.irds.data.Topics

Official test set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)

Dataset irds.nfcorpus.test.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official test set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)

Dataset irds.nfcorpus.test

datamaestro_text.datasets.irds.data.Adhoc

Official test set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)

Dataset irds.nfcorpus.test.nontopic.queries

datamaestro_text.datasets.irds.data.Topics

Official test set, filtered to exclude queries from topic pages.

Dataset irds.nfcorpus.test.nontopic.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official test set, filtered to exclude queries from topic pages.

Dataset irds.nfcorpus.test.nontopic

datamaestro_text.datasets.irds.data.Adhoc

Official test set, filtered to exclude queries from topic pages.

Dataset irds.nfcorpus.test.video.queries

datamaestro_text.datasets.irds.data.Topics

Official test set, filtered to only include queries from video pages.

Dataset irds.nfcorpus.test.video.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official test set, filtered to only include queries from video pages.

Dataset irds.nfcorpus.test.video

datamaestro_text.datasets.irds.data.Adhoc

Official test set, filtered to only include queries from video pages.

Dataset irds.nfcorpus.train.queries

datamaestro_text.datasets.irds.data.Topics

Official train set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)

Dataset irds.nfcorpus.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official train set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)

Dataset irds.nfcorpus.train

datamaestro_text.datasets.irds.data.Adhoc

Official train set. Queries include both title and combinted "all" text field (titles, descriptions, topics, transcripts and comments)

Dataset irds.nfcorpus.train.nontopic.queries

datamaestro_text.datasets.irds.data.Topics

Official train set, filtered to exclude queries from topic pages.

Dataset irds.nfcorpus.train.nontopic.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official train set, filtered to exclude queries from topic pages.

Dataset irds.nfcorpus.train.nontopic

datamaestro_text.datasets.irds.data.Adhoc

Official train set, filtered to exclude queries from topic pages.

Dataset irds.nfcorpus.train.video.queries

datamaestro_text.datasets.irds.data.Topics

Official train set, filtered to only include queries from video pages.

Dataset irds.nfcorpus.train.video.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official train set, filtered to only include queries from video pages.

Dataset irds.nfcorpus.train.video

datamaestro_text.datasets.irds.data.Adhoc

Official train set, filtered to only include queries from video pages.

Natural Questions

Google Natural Questions is a Q&A dataset containing long, short, and Yes/No answers from Wikipedia. ir_datasets frames this around an ad-hoc ranking setting by building a collection of all long answer candidate passages. However, short and Yes/No annotations are also available in the qrels, as are the passages presented to the annotators (via scoreddocs).

Importantly, the document collection does not consist of all Wikipedia passages, but instead a union of the candidate passages presented to the annotators (akin to MS MARCO). dph-w100/natural-questions/train and dph-w100/natural-questions/dev contain a filtered set of the questions in this dataset and a full Wikipedia dump (which is a more realistic retrieval setting).

Dataset irds.natural-questions.documents

datamaestro_text.datasets.irds.data.Documents

Google Natural Questions is a Q&A dataset containing long, short, and Yes/No answers from Wikipedia. ir_datasets frames this around an ad-hoc ranking setting by building a collection of all long answer candidate passages. However, short and Yes/No annotations are also available in the qrels, as are the passages presented to the annotators (via scoreddocs).

Importantly, the document collection does not consist of all Wikipedia passages, but instead a union of the candidate passages presented to the annotators (akin to MS MARCO). dph-w100/natural-questions/train and dph-w100/natural-questions/dev contain a filtered set of the questions in this dataset and a full Wikipedia dump (which is a more realistic retrieval setting).

Dataset irds.natural-questions.dev.queries

datamaestro_text.datasets.irds.data.Topics

Official dev set.

Dataset irds.natural-questions.dev.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Official dev set.

Dataset irds.natural-questions.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official dev set.

Dataset irds.natural-questions.dev

datamaestro_text.datasets.irds.data.Adhoc

Official dev set.

Dataset irds.natural-questions.train.queries

datamaestro_text.datasets.irds.data.Topics

Official train set.

Dataset irds.natural-questions.train.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Official train set.

Dataset irds.natural-questions.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Official train set.

Dataset irds.natural-questions.train

datamaestro_text.datasets.irds.data.Adhoc

Official train set.

NYT

The New York Times Annotated Corpus. Consists of articles published between 1987 and 2007. It is used in TREC Core 2017 and it is also useful for transferring relevance signals in cases where training data is in short supply.

Uses data from LDC2008T19. The source collection can be downloaded from the LDC.

Dataset irds.nyt.documents

datamaestro_text.datasets.irds.data.Documents

The New York Times Annotated Corpus. Consists of articles published between 1987 and 2007. It is used in TREC Core 2017 and it is also useful for transferring relevance signals in cases where training data is in short supply.

Uses data from LDC2008T19. The source collection can be downloaded from the LDC.

Dataset irds.nyt.trec-core-2017.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Common Core 2017 benchmark.

Note that this dataset only contains the 50 queries assessed by NIST.

Dataset irds.nyt.trec-core-2017.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The TREC Common Core 2017 benchmark.

Note that this dataset only contains the 50 queries assessed by NIST.

Dataset irds.nyt.trec-core-2017

datamaestro_text.datasets.irds.data.Adhoc

The TREC Common Core 2017 benchmark.

Note that this dataset only contains the 50 queries assessed by NIST.

Dataset irds.nyt.wksup.queries

datamaestro_text.datasets.irds.data.Topics

Training set (without held-out nyt/wksup/valid) for transferring relevance signals from NYT corpus.

Dataset irds.nyt.wksup.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Training set (without held-out nyt/wksup/valid) for transferring relevance signals from NYT corpus.

Dataset irds.nyt.wksup

datamaestro_text.datasets.irds.data.Adhoc

Training set (without held-out nyt/wksup/valid) for transferring relevance signals from NYT corpus.

Dataset irds.nyt.wksup.train.queries

datamaestro_text.datasets.irds.data.Topics

Training set (without held-out nyt/wksup/valid) for transferring relevance signals from NYT corpus.

Dataset irds.nyt.wksup.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Training set (without held-out nyt/wksup/valid) for transferring relevance signals from NYT corpus.

Dataset irds.nyt.wksup.train

datamaestro_text.datasets.irds.data.Adhoc

Training set (without held-out nyt/wksup/valid) for transferring relevance signals from NYT corpus.

Dataset irds.nyt.wksup.valid.queries

datamaestro_text.datasets.irds.data.Topics

Held-out validation set for transferring relevance signals from NYT corpus (see nyt/wksup/train).

Dataset irds.nyt.wksup.valid.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Held-out validation set for transferring relevance signals from NYT corpus (see nyt/wksup/train).

Dataset irds.nyt.wksup.valid

datamaestro_text.datasets.irds.data.Adhoc

Held-out validation set for transferring relevance signals from NYT corpus (see nyt/wksup/train).

pmc/v1

Subset of PMC articles used for the TREC 2014 and 2015 tasks (v1). Inclues titles, abstracts, full text. Collected from the open access segment on January 21, 2014.

pmc/v2

Subset of PMC articles used for the TREC 2016 task (v2). Inclues titles, abstracts, full text. Collected from the open access segment on March 28, 2016.

Touché Image Search

Corpus version 2022-06-13 with 23 841 images. It was released on June 13, 2022 on Zenodo.

This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.

Dataset irds.touche-image.2022-06-13.documents

datamaestro_text.datasets.irds.data.Documents

Corpus version 2022-06-13 with 23 841 images. It was released on June 13, 2022 on Zenodo.

This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.

Dataset irds.touche-image.2022-06-13.touche-2022-task-3.queries

datamaestro_text.datasets.irds.data.Topics

Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2022 is the third lab on argument retrieval at CLEF 2022 featuring three tasks.

Given a controversial topic, the task is to retrieve images (from touche-image/2022-06-13) for each stance (pro/con) that show support for that stance.

Systems are evaluated on Touché topics 1-50 by the ratio of images among the 20 retrieved images for each topic (10 images for each stance) that are all three: relevant to the topic, argumentative, and have the associated stance.

Dataset irds.touche-image.2022-06-13.touche-2022-task-3.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2022 is the third lab on argument retrieval at CLEF 2022 featuring three tasks.

Given a controversial topic, the task is to retrieve images (from touche-image/2022-06-13) for each stance (pro/con) that show support for that stance.

Systems are evaluated on Touché topics 1-50 by the ratio of images among the 20 retrieved images for each topic (10 images for each stance) that are all three: relevant to the topic, argumentative, and have the associated stance.

Dataset irds.touche-image.2022-06-13.touche-2022-task-3

datamaestro_text.datasets.irds.data.Adhoc

Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2022 is the third lab on argument retrieval at CLEF 2022 featuring three tasks.

Given a controversial topic, the task is to retrieve images (from touche-image/2022-06-13) for each stance (pro/con) that show support for that stance.

Systems are evaluated on Touché topics 1-50 by the ratio of images among the 20 retrieved images for each topic (10 images for each stance) that are all three: relevant to the topic, argumentative, and have the associated stance.

Touché 2022 Task 2: Argument Retrieval for Comparative Questions

Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2022 is the third lab on argument retrieval at CLEF 2022 featuring three tasks.

Given a comparative topic and a collection of documents, the task is to retrieve relevant argumentative passages for either compared object or for both and to detect their respective stances with respect to the object they talk about.

Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.

Additionally, classify the stance of the retrieved text passages towards the compared objects in questions. For instance, in the question Who is a better friend, a cat or a dog? the terms cat and dog are the comparison objects. An answer candidate like Cats can be quite affectionate and attentive, and thus are good friends should be classified as pro the cat object, while Cats are less faithful than dogs as supporting the dog object.

Dataset irds.clueweb12.touche-2022-task-2.documents

datamaestro_text.datasets.irds.data.Documents

Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2022 is the third lab on argument retrieval at CLEF 2022 featuring three tasks.

Given a comparative topic and a collection of documents, the task is to retrieve relevant argumentative passages for either compared object or for both and to detect their respective stances with respect to the object they talk about.

Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.

Additionally, classify the stance of the retrieved text passages towards the compared objects in questions. For instance, in the question Who is a better friend, a cat or a dog? the terms cat and dog are the comparison objects. An answer candidate like Cats can be quite affectionate and attentive, and thus are good friends should be classified as pro the cat object, while Cats are less faithful than dogs as supporting the dog object.

Dataset irds.clueweb12.touche-2022-task-2.queries

datamaestro_text.datasets.irds.data.Topics

Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2022 is the third lab on argument retrieval at CLEF 2022 featuring three tasks.

Given a comparative topic and a collection of documents, the task is to retrieve relevant argumentative passages for either compared object or for both and to detect their respective stances with respect to the object they talk about.

Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.

Additionally, classify the stance of the retrieved text passages towards the compared objects in questions. For instance, in the question Who is a better friend, a cat or a dog? the terms cat and dog are the comparison objects. An answer candidate like Cats can be quite affectionate and attentive, and thus are good friends should be classified as pro the cat object, while Cats are less faithful than dogs as supporting the dog object.

Dataset irds.clueweb12.touche-2022-task-2.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2022 is the third lab on argument retrieval at CLEF 2022 featuring three tasks.

Given a comparative topic and a collection of documents, the task is to retrieve relevant argumentative passages for either compared object or for both and to detect their respective stances with respect to the object they talk about.

Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.

Additionally, classify the stance of the retrieved text passages towards the compared objects in questions. For instance, in the question Who is a better friend, a cat or a dog? the terms cat and dog are the comparison objects. An answer candidate like Cats can be quite affectionate and attentive, and thus are good friends should be classified as pro the cat object, while Cats are less faithful than dogs as supporting the dog object.

Dataset irds.clueweb12.touche-2022-task-2

datamaestro_text.datasets.irds.data.Adhoc

Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach. Touché 2022 is the third lab on argument retrieval at CLEF 2022 featuring three tasks.

Given a comparative topic and a collection of documents, the task is to retrieve relevant argumentative passages for either compared object or for both and to detect their respective stances with respect to the object they talk about.

Documents are judged based on their general topical relevance and for rhetorical quality, i.e., "well-writtenness" of the document: (1) whether the text has a good style of speech (formal language is preferred over informal), (2) whether the text has a proper sentence structure and is easy to read, (3) whether it includes profanity, has typos, and makes use of other detrimental style choices.

Additionally, classify the stance of the retrieved text passages towards the compared objects in questions. For instance, in the question Who is a better friend, a cat or a dog? the terms cat and dog are the comparison objects. An answer candidate like Cats can be quite affectionate and attentive, and thus are good friends should be classified as pro the cat object, while Cats are less faithful than dogs as supporting the dog object.

Touché 2022 Task 2: Argument Retrieval for Comparative Questions (Expanded)

Pre-processed version of clueweb12/touche-2022-task-2 where each passage has been expanded with queries generated using DocT5Query.

Dataset irds.clueweb12.touche-2022-task-2.expanded-doc-t5-query.documents

datamaestro_text.datasets.irds.data.Documents

Pre-processed version of clueweb12/touche-2022-task-2 where each passage has been expanded with queries generated using DocT5Query.

Dataset irds.clueweb12.touche-2022-task-2.expanded-doc-t5-query.queries

datamaestro_text.datasets.irds.data.Topics

Pre-processed version of clueweb12/touche-2022-task-2 where each passage has been expanded with queries generated using DocT5Query.

Dataset irds.clueweb12.touche-2022-task-2.expanded-doc-t5-query.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Pre-processed version of clueweb12/touche-2022-task-2 where each passage has been expanded with queries generated using DocT5Query.

Dataset irds.clueweb12.touche-2022-task-2.expanded-doc-t5-query

datamaestro_text.datasets.irds.data.Adhoc

Pre-processed version of clueweb12/touche-2022-task-2 where each passage has been expanded with queries generated using DocT5Query.

TREC Arabic

A collection of news articles in Arabic, used for multi-lingual evaluation in TREC 2001 and TREC 2002.

Document collection from LDC2001T55.

Dataset irds.trec-arabic.documents

datamaestro_text.datasets.irds.data.Documents

A collection of news articles in Arabic, used for multi-lingual evaluation in TREC 2001 and TREC 2002.

Document collection from LDC2001T55.

Dataset irds.trec-arabic.ar2001.queries

datamaestro_text.datasets.irds.data.Topics

Arabic benchmark from TREC 2001.

Dataset irds.trec-arabic.ar2001.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Arabic benchmark from TREC 2001.

Dataset irds.trec-arabic.ar2001

datamaestro_text.datasets.irds.data.Adhoc

Arabic benchmark from TREC 2001.

Dataset irds.trec-arabic.ar2002.queries

datamaestro_text.datasets.irds.data.Topics

Arabic benchmark from TREC 2002.

Dataset irds.trec-arabic.ar2002.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Arabic benchmark from TREC 2002.

Dataset irds.trec-arabic.ar2002

datamaestro_text.datasets.irds.data.Adhoc

Arabic benchmark from TREC 2002.

TREC Mandarin

A collection of news articles in Mandarin in Simplified Chinese, used for multi-lingual evaluation in TREC 5 and TREC 6.

Document collection from LDC2000T52.

Dataset irds.trec-mandarin.documents

datamaestro_text.datasets.irds.data.Documents

A collection of news articles in Mandarin in Simplified Chinese, used for multi-lingual evaluation in TREC 5 and TREC 6.

Document collection from LDC2000T52.

Dataset irds.trec-mandarin.trec5.queries

datamaestro_text.datasets.irds.data.Topics

Mandarin Chinese benchmark from TREC 5.

Dataset irds.trec-mandarin.trec5.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Mandarin Chinese benchmark from TREC 5.

Dataset irds.trec-mandarin.trec5

datamaestro_text.datasets.irds.data.Adhoc

Mandarin Chinese benchmark from TREC 5.

Dataset irds.trec-mandarin.trec6.queries

datamaestro_text.datasets.irds.data.Topics

Mandarin Chinese benchmark from TREC 6.

Dataset irds.trec-mandarin.trec6.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Mandarin Chinese benchmark from TREC 6.

Dataset irds.trec-mandarin.trec6

datamaestro_text.datasets.irds.data.Adhoc

Mandarin Chinese benchmark from TREC 6.

TREC Spanish

A collection of news articles in Spanish, used for multi-lingual evaluation in TREC 3 and TREC 4.

Document collection from LDC2000T51.

Dataset irds.trec-spanish.documents

datamaestro_text.datasets.irds.data.Documents

A collection of news articles in Spanish, used for multi-lingual evaluation in TREC 3 and TREC 4.

Document collection from LDC2000T51.

Dataset irds.trec-spanish.trec3.queries

datamaestro_text.datasets.irds.data.Topics

Spanish benchmark from TREC 3.

Dataset irds.trec-spanish.trec3.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Spanish benchmark from TREC 3.

Dataset irds.trec-spanish.trec3

datamaestro_text.datasets.irds.data.Adhoc

Spanish benchmark from TREC 3.

Dataset irds.trec-spanish.trec4.queries

datamaestro_text.datasets.irds.data.Topics

Spanish benchmark from TREC 4.

Dataset irds.trec-spanish.trec4.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Spanish benchmark from TREC 4.

Dataset irds.trec-spanish.trec4

datamaestro_text.datasets.irds.data.Adhoc

Spanish benchmark from TREC 4.

trec-tot/2023

Corpus for the TREC 2023 tip-of-the-tongue search track.

Dataset irds.trec-tot.2023.documents

datamaestro_text.datasets.irds.data.Documents

Corpus for the TREC 2023 tip-of-the-tongue search track.

Dataset irds.trec-tot.2023.train.queries

datamaestro_text.datasets.irds.data.Topics

Train query set for TREC 2023 tip-of-the-tongue search track.

Dataset irds.trec-tot.2023.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Train query set for TREC 2023 tip-of-the-tongue search track.

Dataset irds.trec-tot.2023.train

datamaestro_text.datasets.irds.data.Adhoc

Train query set for TREC 2023 tip-of-the-tongue search track.

Dataset irds.trec-tot.2023.dev.queries

datamaestro_text.datasets.irds.data.Topics

Dev query set for TREC 2023 tip-of-the-tongue search track.

Dataset irds.trec-tot.2023.dev.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Dev query set for TREC 2023 tip-of-the-tongue search track.

Dataset irds.trec-tot.2023.dev

datamaestro_text.datasets.irds.data.Adhoc

Dev query set for TREC 2023 tip-of-the-tongue search track.

trec-tot/2024

Corpus for the TREC 2024 tip-of-the-tongue search track.

Dataset irds.trec-tot.2024.documents

datamaestro_text.datasets.irds.data.Documents

Corpus for the TREC 2024 tip-of-the-tongue search track.

Dataset irds.trec-tot.2024.test.queries

datamaestro_text.datasets.irds.data.Topics

Test query set for TREC 2024 tip-of-the-tongue search track.

TripClick

TripClick is a large collection from the Trip Database. Relevance is inferred from click signals.

A copy of this dataset can be obtained from the Trip Database through the process described here. Documents, queries, and qrels require the "TripClick IR Benchmark"; for scoreddocs and docpairs, you will also need to request the "TripClick Training Package for Deep Learning Models".

Dataset irds.tripclick.documents

datamaestro_text.datasets.irds.data.Documents

TripClick is a large collection from the Trip Database. Relevance is inferred from click signals.

A copy of this dataset can be obtained from the Trip Database through the process described here. Documents, queries, and qrels require the "TripClick IR Benchmark"; for scoreddocs and docpairs, you will also need to request the "TripClick Training Package for Deep Learning Models".

Dataset irds.tripclick.test.queries

datamaestro_text.datasets.irds.data.Topics

Test subset of tripclick, including all queries from tripclick/test/head, tripclick/test/torso, and tripclick/test/tail.

The scoreddocs are the official BM25 results from Anserini.

Dataset irds.tripclick.test.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Test subset of tripclick, including all queries from tripclick/test/head, tripclick/test/torso, and tripclick/test/tail.

The scoreddocs are the official BM25 results from Anserini.

Dataset irds.tripclick.test.head.queries

datamaestro_text.datasets.irds.data.Topics

The most frequent queries in the validation set. This represents 20% of the search engine traffic.

Dataset irds.tripclick.test.head.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

The most frequent queries in the validation set. This represents 20% of the search engine traffic.

Dataset irds.tripclick.test.tail.queries

datamaestro_text.datasets.irds.data.Topics

The least frequent queries in the test set. This represents 50% of the search engine traffic.

Dataset irds.tripclick.test.tail.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

The least frequent queries in the test set. This represents 50% of the search engine traffic.

Dataset irds.tripclick.test.torso.queries

datamaestro_text.datasets.irds.data.Topics

The moderately frequent queries in the test set. This represents 30% of the search engine traffic.

Dataset irds.tripclick.test.torso.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

The moderately frequent queries in the test set. This represents 30% of the search engine traffic.

Dataset irds.tripclick.train.queries

datamaestro_text.datasets.irds.data.Topics

Training subset of tripclick, including all queries from tripclick/train/head, tripclick/train/torso, and tripclick/train/tail.

The dataset provides docpairs in a full text format; we map this text back to the query and doc IDs. A small number of docpairs could not be mapped back, so they are skipped.

Dataset irds.tripclick.train.docpairs

Training subset of tripclick, including all queries from tripclick/train/head, tripclick/train/torso, and tripclick/train/tail.

The dataset provides docpairs in a full text format; we map this text back to the query and doc IDs. A small number of docpairs could not be mapped back, so they are skipped.

Dataset irds.tripclick.train.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Training subset of tripclick, including all queries from tripclick/train/head, tripclick/train/torso, and tripclick/train/tail.

The dataset provides docpairs in a full text format; we map this text back to the query and doc IDs. A small number of docpairs could not be mapped back, so they are skipped.

Dataset irds.tripclick.train

datamaestro_text.datasets.irds.data.Adhoc

Training subset of tripclick, including all queries from tripclick/train/head, tripclick/train/torso, and tripclick/train/tail.

The dataset provides docpairs in a full text format; we map this text back to the query and doc IDs. A small number of docpairs could not be mapped back, so they are skipped.

Dataset irds.tripclick.train.head.queries

datamaestro_text.datasets.irds.data.Topics

The most frequent queries in the train set. This represents 20% of the search engine traffic.

Dataset irds.tripclick.train.head.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The most frequent queries in the train set. This represents 20% of the search engine traffic.

Dataset irds.tripclick.train.head

datamaestro_text.datasets.irds.data.Adhoc

The most frequent queries in the train set. This represents 20% of the search engine traffic.

Dataset irds.tripclick.train.head.dctr.queries

datamaestro_text.datasets.irds.data.Topics

The most frequent queries in the train set. This represents 20% of the search engine traffic.

Dataset irds.tripclick.train.head.dctr.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The most frequent queries in the train set. This represents 20% of the search engine traffic.

Dataset irds.tripclick.train.head.dctr

datamaestro_text.datasets.irds.data.Adhoc

The most frequent queries in the train set. This represents 20% of the search engine traffic.

Dataset irds.tripclick.train.hofstaetter-triples.queries

datamaestro_text.datasets.irds.data.Topics

A version of tripclick/train that replaces the original (noisy) training triples (docpairs) with those sampled from BM25 instead, as suggested by Hofstätter et al (2022).

Dataset irds.tripclick.train.hofstaetter-triples.docpairs

A version of tripclick/train that replaces the original (noisy) training triples (docpairs) with those sampled from BM25 instead, as suggested by Hofstätter et al (2022).

Dataset irds.tripclick.train.hofstaetter-triples.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A version of tripclick/train that replaces the original (noisy) training triples (docpairs) with those sampled from BM25 instead, as suggested by Hofstätter et al (2022).

Dataset irds.tripclick.train.hofstaetter-triples

datamaestro_text.datasets.irds.data.Adhoc

A version of tripclick/train that replaces the original (noisy) training triples (docpairs) with those sampled from BM25 instead, as suggested by Hofstätter et al (2022).

Dataset irds.tripclick.train.tail.queries

datamaestro_text.datasets.irds.data.Topics

The least frequent queries in the train set. This represents 50% of the search engine traffic.

Dataset irds.tripclick.train.tail.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The least frequent queries in the train set. This represents 50% of the search engine traffic.

Dataset irds.tripclick.train.tail

datamaestro_text.datasets.irds.data.Adhoc

The least frequent queries in the train set. This represents 50% of the search engine traffic.

Dataset irds.tripclick.train.torso.queries

datamaestro_text.datasets.irds.data.Topics

The moderately frequent queries in the train set. This represents 30% of the search engine traffic.

Dataset irds.tripclick.train.torso.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The moderately frequent queries in the train set. This represents 30% of the search engine traffic.

Dataset irds.tripclick.train.torso

datamaestro_text.datasets.irds.data.Adhoc

The moderately frequent queries in the train set. This represents 30% of the search engine traffic.

Dataset irds.tripclick.val.queries

datamaestro_text.datasets.irds.data.Topics

Validation subset of tripclick, including all queries from tripclick/val/head, tripclick/val/torso, and tripclick/val/tail.

The scoreddocs are the official BM25 results from Anserini.

Dataset irds.tripclick.val.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

Validation subset of tripclick, including all queries from tripclick/val/head, tripclick/val/torso, and tripclick/val/tail.

The scoreddocs are the official BM25 results from Anserini.

Dataset irds.tripclick.val.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

Validation subset of tripclick, including all queries from tripclick/val/head, tripclick/val/torso, and tripclick/val/tail.

The scoreddocs are the official BM25 results from Anserini.

Dataset irds.tripclick.val

datamaestro_text.datasets.irds.data.Adhoc

Validation subset of tripclick, including all queries from tripclick/val/head, tripclick/val/torso, and tripclick/val/tail.

The scoreddocs are the official BM25 results from Anserini.

Dataset irds.tripclick.val.head.queries

datamaestro_text.datasets.irds.data.Topics

The most frequent queries in the validation set. This represents 20% of the search engine traffic.

Dataset irds.tripclick.val.head.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

The most frequent queries in the validation set. This represents 20% of the search engine traffic.

Dataset irds.tripclick.val.head.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The most frequent queries in the validation set. This represents 20% of the search engine traffic.

Dataset irds.tripclick.val.head

datamaestro_text.datasets.irds.data.Adhoc

The most frequent queries in the validation set. This represents 20% of the search engine traffic.

Dataset irds.tripclick.val.head.dctr.queries

datamaestro_text.datasets.irds.data.Topics

The most frequent queries in the validation set. This represents 20% of the search engine traffic.

Dataset irds.tripclick.val.head.dctr.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

The most frequent queries in the validation set. This represents 20% of the search engine traffic.

Dataset irds.tripclick.val.head.dctr.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The most frequent queries in the validation set. This represents 20% of the search engine traffic.

Dataset irds.tripclick.val.head.dctr

datamaestro_text.datasets.irds.data.Adhoc

The most frequent queries in the validation set. This represents 20% of the search engine traffic.

Dataset irds.tripclick.val.tail.queries

datamaestro_text.datasets.irds.data.Topics

The least frequent queries in the validation set. This represents 50% of the search engine traffic.

Dataset irds.tripclick.val.tail.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

The least frequent queries in the validation set. This represents 50% of the search engine traffic.

Dataset irds.tripclick.val.tail.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The least frequent queries in the validation set. This represents 50% of the search engine traffic.

Dataset irds.tripclick.val.tail

datamaestro_text.datasets.irds.data.Adhoc

The least frequent queries in the validation set. This represents 50% of the search engine traffic.

Dataset irds.tripclick.val.torso.queries

datamaestro_text.datasets.irds.data.Topics

The moderately frequent queries in the validation set. This represents 30% of the search engine traffic.

Dataset irds.tripclick.val.torso.scoreddocs

datamaestro_text.datasets.irds.data.AdhocRun

The moderately frequent queries in the validation set. This represents 30% of the search engine traffic.

Dataset irds.tripclick.val.torso.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

The moderately frequent queries in the validation set. This represents 30% of the search engine traffic.

Dataset irds.tripclick.val.torso

datamaestro_text.datasets.irds.data.Adhoc

The moderately frequent queries in the validation set. This represents 30% of the search engine traffic.

tripclick/logs

Raw query logs from TripClick.

Note that this subset includes a broader set of documents than the main collection, but they only provide the title and URL.

Dataset irds.tripclick.logs.documents

datamaestro_text.datasets.irds.data.Documents

Raw query logs from TripClick.

Note that this subset includes a broader set of documents than the main collection, but they only provide the title and URL.

Tweets 2013 (Internet Archive)

A collection of tweets from a 2-month window achived by the Internet Achive. This collection can be a stand-in document collection for the TREC Microblog 2013-14 tasks. (Even though it is not exactly the same collection, Sequiera and Lin show that it it close enough.)

This collection is automatically downloaded from the Internet Archive, though download speeds are often slow so it takes some time. ir_datasets constructs a new directory hierarchy during the download process to facilitate fast lookups and slices.

Dataset irds.tweets2013-ia.documents

datamaestro_text.datasets.irds.data.Documents

A collection of tweets from a 2-month window achived by the Internet Achive. This collection can be a stand-in document collection for the TREC Microblog 2013-14 tasks. (Even though it is not exactly the same collection, Sequiera and Lin show that it it close enough.)

This collection is automatically downloaded from the Internet Archive, though download speeds are often slow so it takes some time. ir_datasets constructs a new directory hierarchy during the download process to facilitate fast lookups and slices.

Dataset irds.tweets2013-ia.trec-mb-2013.queries

datamaestro_text.datasets.irds.data.Topics

TREC Microblog 2013 test collection.

Dataset irds.tweets2013-ia.trec-mb-2013.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

TREC Microblog 2013 test collection.

Dataset irds.tweets2013-ia.trec-mb-2013

datamaestro_text.datasets.irds.data.Adhoc

TREC Microblog 2013 test collection.

Dataset irds.tweets2013-ia.trec-mb-2014.queries

datamaestro_text.datasets.irds.data.Topics

TREC Microblog 2014 test collection.

Dataset irds.tweets2013-ia.trec-mb-2014.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

TREC Microblog 2014 test collection.

Dataset irds.tweets2013-ia.trec-mb-2014

datamaestro_text.datasets.irds.data.Adhoc

TREC Microblog 2014 test collection.

Vaswani

A small corpus of roughly 11,000 scientific abstracts.

Dataset irds.vaswani.documents

datamaestro_text.datasets.irds.data.Documents

A small corpus of roughly 11,000 scientific abstracts.

Dataset irds.vaswani.queries

datamaestro_text.datasets.irds.data.Topics

A small corpus of roughly 11,000 scientific abstracts.

Dataset irds.vaswani.qrels

datamaestro_text.datasets.irds.data.AdhocAssessments

A small corpus of roughly 11,000 scientific abstracts.

Dataset irds.vaswani

datamaestro_text.datasets.irds.data.Adhoc

A small corpus of roughly 11,000 scientific abstracts.

wapo/v2

Version 2 of the Washington Post collection, consisting of articles published between 2012-2017.

The collection is obtained from NIST by requesting it from NIST here.

body contains all body text in plain text format, including paragrphs and multi-media captions. body_paras_html contains only source paragraphs and contains HTML markup. body_media contains images, videos, tweets, and galeries, along with a link to the content and a textual caption.

Dataset irds.wapo.v2.documents

datamaestro_text.datasets.irds.data.Documents

Version 2 of the Washington Post collection, consisting of articles published between 2012-2017.

The collection is obtained from NIST by requesting it from NIST here.

body contains all body text in plain text format, including paragrphs and multi-media captions. body_paras_html contains only source paragraphs and contains HTML markup. body_media contains images, videos, tweets, and galeries, along with a link to the content and a textual caption.

Dataset irds.wapo.v2.trec-core-2018.queries

datamaestro_text.datasets.irds.data.Topics

The TREC Common Core 2018 benchmark.

  • Queries: TREC-style (keyword, description, narrative)
  • Relevance: Deeply-annotated
  • Shared Task Website
    Dataset irds.wapo.v2.trec-core-2018.qrels

    datamaestro_text.datasets.irds.data.AdhocAssessments

    The TREC Common Core 2018 benchmark.

    • Queries: TREC-style (keyword, description, narrative)
    • Relevance: Deeply-annotated
    • Shared Task Website
      Dataset irds.wapo.v2.trec-core-2018

      datamaestro_text.datasets.irds.data.Adhoc

      The TREC Common Core 2018 benchmark.

      • Queries: TREC-style (keyword, description, narrative)
      • Relevance: Deeply-annotated
      • Shared Task Website
        Dataset irds.wapo.v2.trec-news-2018.queries

        datamaestro_text.datasets.irds.data.Topics

        The TREC News 2018 Background Linking task. The task is to find relevant background information for the provided articles.

        Dataset irds.wapo.v2.trec-news-2018.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        The TREC News 2018 Background Linking task. The task is to find relevant background information for the provided articles.

        Dataset irds.wapo.v2.trec-news-2018

        datamaestro_text.datasets.irds.data.Adhoc

        The TREC News 2018 Background Linking task. The task is to find relevant background information for the provided articles.

        Dataset irds.wapo.v2.trec-news-2019.queries

        datamaestro_text.datasets.irds.data.Topics

        The TREC News 2019 Background Linking task. The task is to find relevant background information for the provided articles.

        Dataset irds.wapo.v2.trec-news-2019.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        The TREC News 2019 Background Linking task. The task is to find relevant background information for the provided articles.

        Dataset irds.wapo.v2.trec-news-2019

        datamaestro_text.datasets.irds.data.Adhoc

        The TREC News 2019 Background Linking task. The task is to find relevant background information for the provided articles.

        wapo/v4

        Dataset irds.wapo.v4.documents

        datamaestro_text.datasets.irds.data.Documents

        wikiclir/ar

        WikiCLIR with Arabic documents.

        Dataset irds.wikiclir.ar.documents

        datamaestro_text.datasets.irds.data.Documents

        WikiCLIR with Arabic documents.

        Dataset irds.wikiclir.ar.queries

        datamaestro_text.datasets.irds.data.Topics

        WikiCLIR with Arabic documents.

        Dataset irds.wikiclir.ar.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        WikiCLIR with Arabic documents.

        Dataset irds.wikiclir.ar

        datamaestro_text.datasets.irds.data.Adhoc

        WikiCLIR with Arabic documents.

        wikiclir/ca

        WikiCLIR with Catalan documents.

        Dataset irds.wikiclir.ca.documents

        datamaestro_text.datasets.irds.data.Documents

        WikiCLIR with Catalan documents.

        Dataset irds.wikiclir.ca.queries

        datamaestro_text.datasets.irds.data.Topics

        WikiCLIR with Catalan documents.

        Dataset irds.wikiclir.ca.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        WikiCLIR with Catalan documents.

        Dataset irds.wikiclir.ca

        datamaestro_text.datasets.irds.data.Adhoc

        WikiCLIR with Catalan documents.

        wikiclir/cs

        WikiCLIR with Czech documents.

        Dataset irds.wikiclir.cs.documents

        datamaestro_text.datasets.irds.data.Documents

        WikiCLIR with Czech documents.

        Dataset irds.wikiclir.cs.queries

        datamaestro_text.datasets.irds.data.Topics

        WikiCLIR with Czech documents.

        Dataset irds.wikiclir.cs.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        WikiCLIR with Czech documents.

        Dataset irds.wikiclir.cs

        datamaestro_text.datasets.irds.data.Adhoc

        WikiCLIR with Czech documents.

        wikiclir/de

        WikiCLIR with German documents.

        Dataset irds.wikiclir.de.documents

        datamaestro_text.datasets.irds.data.Documents

        WikiCLIR with German documents.

        Dataset irds.wikiclir.de.queries

        datamaestro_text.datasets.irds.data.Topics

        WikiCLIR with German documents.

        Dataset irds.wikiclir.de.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        WikiCLIR with German documents.

        Dataset irds.wikiclir.de

        datamaestro_text.datasets.irds.data.Adhoc

        WikiCLIR with German documents.

        wikiclir/en-simple

        WikiCLIR with Simple English documents.

        Dataset irds.wikiclir.en-simple.documents

        datamaestro_text.datasets.irds.data.Documents

        WikiCLIR with Simple English documents.

        Dataset irds.wikiclir.en-simple.queries

        datamaestro_text.datasets.irds.data.Topics

        WikiCLIR with Simple English documents.

        Dataset irds.wikiclir.en-simple.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        WikiCLIR with Simple English documents.

        Dataset irds.wikiclir.en-simple

        datamaestro_text.datasets.irds.data.Adhoc

        WikiCLIR with Simple English documents.

        wikiclir/es

        WikiCLIR with Spanish documents.

        Dataset irds.wikiclir.es.documents

        datamaestro_text.datasets.irds.data.Documents

        WikiCLIR with Spanish documents.

        Dataset irds.wikiclir.es.queries

        datamaestro_text.datasets.irds.data.Topics

        WikiCLIR with Spanish documents.

        Dataset irds.wikiclir.es.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        WikiCLIR with Spanish documents.

        Dataset irds.wikiclir.es

        datamaestro_text.datasets.irds.data.Adhoc

        WikiCLIR with Spanish documents.

        wikiclir/fi

        WikiCLIR with Finnish documents.

        Dataset irds.wikiclir.fi.documents

        datamaestro_text.datasets.irds.data.Documents

        WikiCLIR with Finnish documents.

        Dataset irds.wikiclir.fi.queries

        datamaestro_text.datasets.irds.data.Topics

        WikiCLIR with Finnish documents.

        Dataset irds.wikiclir.fi.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        WikiCLIR with Finnish documents.

        Dataset irds.wikiclir.fi

        datamaestro_text.datasets.irds.data.Adhoc

        WikiCLIR with Finnish documents.

        wikiclir/fr

        WikiCLIR with French documents.

        Dataset irds.wikiclir.fr.documents

        datamaestro_text.datasets.irds.data.Documents

        WikiCLIR with French documents.

        Dataset irds.wikiclir.fr.queries

        datamaestro_text.datasets.irds.data.Topics

        WikiCLIR with French documents.

        Dataset irds.wikiclir.fr.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        WikiCLIR with French documents.

        Dataset irds.wikiclir.fr

        datamaestro_text.datasets.irds.data.Adhoc

        WikiCLIR with French documents.

        wikiclir/it

        WikiCLIR with Italian documents.

        Dataset irds.wikiclir.it.documents

        datamaestro_text.datasets.irds.data.Documents

        WikiCLIR with Italian documents.

        Dataset irds.wikiclir.it.queries

        datamaestro_text.datasets.irds.data.Topics

        WikiCLIR with Italian documents.

        Dataset irds.wikiclir.it.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        WikiCLIR with Italian documents.

        Dataset irds.wikiclir.it

        datamaestro_text.datasets.irds.data.Adhoc

        WikiCLIR with Italian documents.

        wikiclir/ja

        WikiCLIR with Japanese documents.

        Dataset irds.wikiclir.ja.documents

        datamaestro_text.datasets.irds.data.Documents

        WikiCLIR with Japanese documents.

        Dataset irds.wikiclir.ja.queries

        datamaestro_text.datasets.irds.data.Topics

        WikiCLIR with Japanese documents.

        Dataset irds.wikiclir.ja.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        WikiCLIR with Japanese documents.

        Dataset irds.wikiclir.ja

        datamaestro_text.datasets.irds.data.Adhoc

        WikiCLIR with Japanese documents.

        wikiclir/ko

        WikiCLIR with Korean documents.

        Dataset irds.wikiclir.ko.documents

        datamaestro_text.datasets.irds.data.Documents

        WikiCLIR with Korean documents.

        Dataset irds.wikiclir.ko.queries

        datamaestro_text.datasets.irds.data.Topics

        WikiCLIR with Korean documents.

        Dataset irds.wikiclir.ko.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        WikiCLIR with Korean documents.

        Dataset irds.wikiclir.ko

        datamaestro_text.datasets.irds.data.Adhoc

        WikiCLIR with Korean documents.

        wikiclir/nl

        WikiCLIR with Dutch documents.

        Dataset irds.wikiclir.nl.documents

        datamaestro_text.datasets.irds.data.Documents

        WikiCLIR with Dutch documents.

        Dataset irds.wikiclir.nl.queries

        datamaestro_text.datasets.irds.data.Topics

        WikiCLIR with Dutch documents.

        Dataset irds.wikiclir.nl.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        WikiCLIR with Dutch documents.

        Dataset irds.wikiclir.nl

        datamaestro_text.datasets.irds.data.Adhoc

        WikiCLIR with Dutch documents.

        wikiclir/nn

        WikiCLIR with Norwegian (Bokmål) documents.

        Dataset irds.wikiclir.nn.documents

        datamaestro_text.datasets.irds.data.Documents

        WikiCLIR with Norwegian (Bokmål) documents.

        Dataset irds.wikiclir.nn.queries

        datamaestro_text.datasets.irds.data.Topics

        WikiCLIR with Norwegian (Bokmål) documents.

        Dataset irds.wikiclir.nn.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        WikiCLIR with Norwegian (Bokmål) documents.

        Dataset irds.wikiclir.nn

        datamaestro_text.datasets.irds.data.Adhoc

        WikiCLIR with Norwegian (Bokmål) documents.

        wikiclir/no

        WikiCLIR with Norwegian (Nynorsk) documents.

        Dataset irds.wikiclir.no.documents

        datamaestro_text.datasets.irds.data.Documents

        WikiCLIR with Norwegian (Nynorsk) documents.

        Dataset irds.wikiclir.no.queries

        datamaestro_text.datasets.irds.data.Topics

        WikiCLIR with Norwegian (Nynorsk) documents.

        Dataset irds.wikiclir.no.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        WikiCLIR with Norwegian (Nynorsk) documents.

        Dataset irds.wikiclir.no

        datamaestro_text.datasets.irds.data.Adhoc

        WikiCLIR with Norwegian (Nynorsk) documents.

        wikiclir/pl

        WikiCLIR with Polish documents.

        Dataset irds.wikiclir.pl.documents

        datamaestro_text.datasets.irds.data.Documents

        WikiCLIR with Polish documents.

        Dataset irds.wikiclir.pl.queries

        datamaestro_text.datasets.irds.data.Topics

        WikiCLIR with Polish documents.

        Dataset irds.wikiclir.pl.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        WikiCLIR with Polish documents.

        Dataset irds.wikiclir.pl

        datamaestro_text.datasets.irds.data.Adhoc

        WikiCLIR with Polish documents.

        wikiclir/pt

        WikiCLIR with Portuguese documents.

        Dataset irds.wikiclir.pt.documents

        datamaestro_text.datasets.irds.data.Documents

        WikiCLIR with Portuguese documents.

        Dataset irds.wikiclir.pt.queries

        datamaestro_text.datasets.irds.data.Topics

        WikiCLIR with Portuguese documents.

        Dataset irds.wikiclir.pt.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        WikiCLIR with Portuguese documents.

        Dataset irds.wikiclir.pt

        datamaestro_text.datasets.irds.data.Adhoc

        WikiCLIR with Portuguese documents.

        wikiclir/ro

        WikiCLIR with Romanian documents.

        Dataset irds.wikiclir.ro.documents

        datamaestro_text.datasets.irds.data.Documents

        WikiCLIR with Romanian documents.

        Dataset irds.wikiclir.ro.queries

        datamaestro_text.datasets.irds.data.Topics

        WikiCLIR with Romanian documents.

        Dataset irds.wikiclir.ro.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        WikiCLIR with Romanian documents.

        Dataset irds.wikiclir.ro

        datamaestro_text.datasets.irds.data.Adhoc

        WikiCLIR with Romanian documents.

        wikiclir/ru

        WikiCLIR with Russian documents.

        Dataset irds.wikiclir.ru.documents

        datamaestro_text.datasets.irds.data.Documents

        WikiCLIR with Russian documents.

        Dataset irds.wikiclir.ru.queries

        datamaestro_text.datasets.irds.data.Topics

        WikiCLIR with Russian documents.

        Dataset irds.wikiclir.ru.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        WikiCLIR with Russian documents.

        Dataset irds.wikiclir.ru

        datamaestro_text.datasets.irds.data.Adhoc

        WikiCLIR with Russian documents.

        wikiclir/sv

        WikiCLIR with Swedish documents.

        Dataset irds.wikiclir.sv.documents

        datamaestro_text.datasets.irds.data.Documents

        WikiCLIR with Swedish documents.

        Dataset irds.wikiclir.sv.queries

        datamaestro_text.datasets.irds.data.Topics

        WikiCLIR with Swedish documents.

        Dataset irds.wikiclir.sv.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        WikiCLIR with Swedish documents.

        Dataset irds.wikiclir.sv

        datamaestro_text.datasets.irds.data.Adhoc

        WikiCLIR with Swedish documents.

        wikiclir/sw

        WikiCLIR with Swahili documents.

        Dataset irds.wikiclir.sw.documents

        datamaestro_text.datasets.irds.data.Documents

        WikiCLIR with Swahili documents.

        Dataset irds.wikiclir.sw.queries

        datamaestro_text.datasets.irds.data.Topics

        WikiCLIR with Swahili documents.

        Dataset irds.wikiclir.sw.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        WikiCLIR with Swahili documents.

        Dataset irds.wikiclir.sw

        datamaestro_text.datasets.irds.data.Adhoc

        WikiCLIR with Swahili documents.

        wikiclir/tl

        WikiCLIR with Tagalog documents.

        Dataset irds.wikiclir.tl.documents

        datamaestro_text.datasets.irds.data.Documents

        WikiCLIR with Tagalog documents.

        Dataset irds.wikiclir.tl.queries

        datamaestro_text.datasets.irds.data.Topics

        WikiCLIR with Tagalog documents.

        Dataset irds.wikiclir.tl.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        WikiCLIR with Tagalog documents.

        Dataset irds.wikiclir.tl

        datamaestro_text.datasets.irds.data.Adhoc

        WikiCLIR with Tagalog documents.

        wikiclir/tr

        WikiCLIR with Turkish documents.

        Dataset irds.wikiclir.tr.documents

        datamaestro_text.datasets.irds.data.Documents

        WikiCLIR with Turkish documents.

        Dataset irds.wikiclir.tr.queries

        datamaestro_text.datasets.irds.data.Topics

        WikiCLIR with Turkish documents.

        Dataset irds.wikiclir.tr.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        WikiCLIR with Turkish documents.

        Dataset irds.wikiclir.tr

        datamaestro_text.datasets.irds.data.Adhoc

        WikiCLIR with Turkish documents.

        wikiclir/uk

        WikiCLIR with Ukrainian documents.

        Dataset irds.wikiclir.uk.documents

        datamaestro_text.datasets.irds.data.Documents

        WikiCLIR with Ukrainian documents.

        Dataset irds.wikiclir.uk.queries

        datamaestro_text.datasets.irds.data.Topics

        WikiCLIR with Ukrainian documents.

        Dataset irds.wikiclir.uk.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        WikiCLIR with Ukrainian documents.

        Dataset irds.wikiclir.uk

        datamaestro_text.datasets.irds.data.Adhoc

        WikiCLIR with Ukrainian documents.

        wikiclir/vi

        WikiCLIR with Vietnamese documents.

        Dataset irds.wikiclir.vi.documents

        datamaestro_text.datasets.irds.data.Documents

        WikiCLIR with Vietnamese documents.

        Dataset irds.wikiclir.vi.queries

        datamaestro_text.datasets.irds.data.Topics

        WikiCLIR with Vietnamese documents.

        Dataset irds.wikiclir.vi.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        WikiCLIR with Vietnamese documents.

        Dataset irds.wikiclir.vi

        datamaestro_text.datasets.irds.data.Adhoc

        WikiCLIR with Vietnamese documents.

        wikiclir/zh

        WikiCLIR with Chinese documents.

        Dataset irds.wikiclir.zh.documents

        datamaestro_text.datasets.irds.data.Documents

        WikiCLIR with Chinese documents.

        Dataset irds.wikiclir.zh.queries

        datamaestro_text.datasets.irds.data.Topics

        WikiCLIR with Chinese documents.

        Dataset irds.wikiclir.zh.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        WikiCLIR with Chinese documents.

        Dataset irds.wikiclir.zh

        datamaestro_text.datasets.irds.data.Adhoc

        WikiCLIR with Chinese documents.

        wikir/en1k

        A small version of WikIR for English.

        Dataset irds.wikir.en1k.documents

        datamaestro_text.datasets.irds.data.Documents

        A small version of WikIR for English.

        Dataset irds.wikir.en1k.test.queries

        datamaestro_text.datasets.irds.data.Topics

        Test set of wikir/en1k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en1k.test.scoreddocs

        datamaestro_text.datasets.irds.data.AdhocRun

        Test set of wikir/en1k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en1k.test.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Test set of wikir/en1k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en1k.test

        datamaestro_text.datasets.irds.data.Adhoc

        Test set of wikir/en1k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en1k.training.queries

        datamaestro_text.datasets.irds.data.Topics

        Training set of wikir/en1k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en1k.training.scoreddocs

        datamaestro_text.datasets.irds.data.AdhocRun

        Training set of wikir/en1k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en1k.training.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Training set of wikir/en1k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en1k.training

        datamaestro_text.datasets.irds.data.Adhoc

        Training set of wikir/en1k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en1k.validation.queries

        datamaestro_text.datasets.irds.data.Topics

        Validation set of wikir/en1k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en1k.validation.scoreddocs

        datamaestro_text.datasets.irds.data.AdhocRun

        Validation set of wikir/en1k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en1k.validation.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Validation set of wikir/en1k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en1k.validation

        datamaestro_text.datasets.irds.data.Adhoc

        Validation set of wikir/en1k. Scoreddocs are the provided BM25 run.

        wikir/en59k

        WikIR for English.

        Dataset irds.wikir.en59k.documents

        datamaestro_text.datasets.irds.data.Documents

        WikIR for English.

        Dataset irds.wikir.en59k.test.queries

        datamaestro_text.datasets.irds.data.Topics

        Test set of wikir/en59k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en59k.test.scoreddocs

        datamaestro_text.datasets.irds.data.AdhocRun

        Test set of wikir/en59k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en59k.test.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Test set of wikir/en59k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en59k.test

        datamaestro_text.datasets.irds.data.Adhoc

        Test set of wikir/en59k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en59k.training.queries

        datamaestro_text.datasets.irds.data.Topics

        Training set of wikir/en59k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en59k.training.scoreddocs

        datamaestro_text.datasets.irds.data.AdhocRun

        Training set of wikir/en59k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en59k.training.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Training set of wikir/en59k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en59k.training

        datamaestro_text.datasets.irds.data.Adhoc

        Training set of wikir/en59k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en59k.validation.queries

        datamaestro_text.datasets.irds.data.Topics

        Validation set of wikir/en59k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en59k.validation.scoreddocs

        datamaestro_text.datasets.irds.data.AdhocRun

        Validation set of wikir/en59k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en59k.validation.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Validation set of wikir/en59k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en59k.validation

        datamaestro_text.datasets.irds.data.Adhoc

        Validation set of wikir/en59k. Scoreddocs are the provided BM25 run.

        wikir/en78k

        WikIR for English. This is one of the two versions used in Frej2020Wikir.

        Dataset irds.wikir.en78k.documents

        datamaestro_text.datasets.irds.data.Documents

        WikIR for English. This is one of the two versions used in Frej2020Wikir.

        Dataset irds.wikir.en78k.test.queries

        datamaestro_text.datasets.irds.data.Topics

        Test set of wikir/en78k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en78k.test.scoreddocs

        datamaestro_text.datasets.irds.data.AdhocRun

        Test set of wikir/en78k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en78k.test.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Test set of wikir/en78k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en78k.test

        datamaestro_text.datasets.irds.data.Adhoc

        Test set of wikir/en78k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en78k.training.queries

        datamaestro_text.datasets.irds.data.Topics

        Training set of wikir/en78k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en78k.training.scoreddocs

        datamaestro_text.datasets.irds.data.AdhocRun

        Training set of wikir/en78k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en78k.training.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Training set of wikir/en78k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en78k.training

        datamaestro_text.datasets.irds.data.Adhoc

        Training set of wikir/en78k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en78k.validation.queries

        datamaestro_text.datasets.irds.data.Topics

        Validation set of wikir/en78k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en78k.validation.scoreddocs

        datamaestro_text.datasets.irds.data.AdhocRun

        Validation set of wikir/en78k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en78k.validation.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Validation set of wikir/en78k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.en78k.validation

        datamaestro_text.datasets.irds.data.Adhoc

        Validation set of wikir/en78k. Scoreddocs are the provided BM25 run.

        wikir/ens78k

        WikIR for English, using the first sentences of articles as queries. This is one of the two versions used in Frej2020Wikir.

        Dataset irds.wikir.ens78k.documents

        datamaestro_text.datasets.irds.data.Documents

        WikIR for English, using the first sentences of articles as queries. This is one of the two versions used in Frej2020Wikir.

        Dataset irds.wikir.ens78k.test.queries

        datamaestro_text.datasets.irds.data.Topics

        Test set of wikir/ens78k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.ens78k.test.scoreddocs

        datamaestro_text.datasets.irds.data.AdhocRun

        Test set of wikir/ens78k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.ens78k.test.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Test set of wikir/ens78k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.ens78k.test

        datamaestro_text.datasets.irds.data.Adhoc

        Test set of wikir/ens78k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.ens78k.training.queries

        datamaestro_text.datasets.irds.data.Topics

        Training set of wikir/ens78k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.ens78k.training.scoreddocs

        datamaestro_text.datasets.irds.data.AdhocRun

        Training set of wikir/ens78k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.ens78k.training.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Training set of wikir/ens78k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.ens78k.training

        datamaestro_text.datasets.irds.data.Adhoc

        Training set of wikir/ens78k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.ens78k.validation.queries

        datamaestro_text.datasets.irds.data.Topics

        Validation set of wikir/ens78k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.ens78k.validation.scoreddocs

        datamaestro_text.datasets.irds.data.AdhocRun

        Validation set of wikir/ens78k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.ens78k.validation.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Validation set of wikir/ens78k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.ens78k.validation

        datamaestro_text.datasets.irds.data.Adhoc

        Validation set of wikir/ens78k. Scoreddocs are the provided BM25 run.

        wikir/es13k

        WikIR for Spanish.

        Dataset irds.wikir.es13k.documents

        datamaestro_text.datasets.irds.data.Documents

        WikIR for Spanish.

        Dataset irds.wikir.es13k.test.queries

        datamaestro_text.datasets.irds.data.Topics

        Test set of wikir/es13k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.es13k.test.scoreddocs

        datamaestro_text.datasets.irds.data.AdhocRun

        Test set of wikir/es13k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.es13k.test.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Test set of wikir/es13k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.es13k.test

        datamaestro_text.datasets.irds.data.Adhoc

        Test set of wikir/es13k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.es13k.training.queries

        datamaestro_text.datasets.irds.data.Topics

        Training set of wikir/es13k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.es13k.training.scoreddocs

        datamaestro_text.datasets.irds.data.AdhocRun

        Training set of wikir/es13k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.es13k.training.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Training set of wikir/es13k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.es13k.training

        datamaestro_text.datasets.irds.data.Adhoc

        Training set of wikir/es13k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.es13k.validation.queries

        datamaestro_text.datasets.irds.data.Topics

        Validation set of wikir/es13k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.es13k.validation.scoreddocs

        datamaestro_text.datasets.irds.data.AdhocRun

        Validation set of wikir/es13k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.es13k.validation.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Validation set of wikir/es13k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.es13k.validation

        datamaestro_text.datasets.irds.data.Adhoc

        Validation set of wikir/es13k. Scoreddocs are the provided BM25 run.

        wikir/fr14k

        WikIR for French.

        Dataset irds.wikir.fr14k.documents

        datamaestro_text.datasets.irds.data.Documents

        WikIR for French.

        Dataset irds.wikir.fr14k.test.queries

        datamaestro_text.datasets.irds.data.Topics

        Test set of wikir/fr14k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.fr14k.test.scoreddocs

        datamaestro_text.datasets.irds.data.AdhocRun

        Test set of wikir/fr14k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.fr14k.test.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Test set of wikir/fr14k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.fr14k.test

        datamaestro_text.datasets.irds.data.Adhoc

        Test set of wikir/fr14k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.fr14k.training.queries

        datamaestro_text.datasets.irds.data.Topics

        Training set of wikir/fr14k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.fr14k.training.scoreddocs

        datamaestro_text.datasets.irds.data.AdhocRun

        Training set of wikir/fr14k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.fr14k.training.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Training set of wikir/fr14k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.fr14k.training

        datamaestro_text.datasets.irds.data.Adhoc

        Training set of wikir/fr14k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.fr14k.validation.queries

        datamaestro_text.datasets.irds.data.Topics

        Validation set of wikir/fr14k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.fr14k.validation.scoreddocs

        datamaestro_text.datasets.irds.data.AdhocRun

        Validation set of wikir/fr14k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.fr14k.validation.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Validation set of wikir/fr14k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.fr14k.validation

        datamaestro_text.datasets.irds.data.Adhoc

        Validation set of wikir/fr14k. Scoreddocs are the provided BM25 run.

        wikir/it16k

        WikIR for Italian.

        Dataset irds.wikir.it16k.documents

        datamaestro_text.datasets.irds.data.Documents

        WikIR for Italian.

        Dataset irds.wikir.it16k.test.queries

        datamaestro_text.datasets.irds.data.Topics

        Test set of wikir/it16k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.it16k.test.scoreddocs

        datamaestro_text.datasets.irds.data.AdhocRun

        Test set of wikir/it16k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.it16k.test.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Test set of wikir/it16k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.it16k.test

        datamaestro_text.datasets.irds.data.Adhoc

        Test set of wikir/it16k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.it16k.training.queries

        datamaestro_text.datasets.irds.data.Topics

        Training set of wikir/it16k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.it16k.training.scoreddocs

        datamaestro_text.datasets.irds.data.AdhocRun

        Training set of wikir/it16k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.it16k.training.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Training set of wikir/it16k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.it16k.training

        datamaestro_text.datasets.irds.data.Adhoc

        Training set of wikir/it16k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.it16k.validation.queries

        datamaestro_text.datasets.irds.data.Topics

        Validation set of wikir/it16k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.it16k.validation.scoreddocs

        datamaestro_text.datasets.irds.data.AdhocRun

        Validation set of wikir/it16k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.it16k.validation.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Validation set of wikir/it16k. Scoreddocs are the provided BM25 run.

        Dataset irds.wikir.it16k.validation

        datamaestro_text.datasets.irds.data.Adhoc

        Validation set of wikir/it16k. Scoreddocs are the provided BM25 run.

        TREC Fair Ranking

        The TREC Fair Ranking track evaluates systems according to how well they fairly rank documents.

        Dataset irds.trec-fair.2021.documents

        datamaestro_text.datasets.irds.data.Documents

        The TREC Fair Ranking track evaluates systems according to how well they fairly rank documents.

        Dataset irds.trec-fair.2021.train.queries

        datamaestro_text.datasets.irds.data.Topics

        Official TREC Fair Ranking 2021 train set.

        Dataset irds.trec-fair.2021.train.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Official TREC Fair Ranking 2021 train set.

        Dataset irds.trec-fair.2021.train

        datamaestro_text.datasets.irds.data.Adhoc

        Official TREC Fair Ranking 2021 train set.

        Dataset irds.trec-fair.2021.eval.queries

        datamaestro_text.datasets.irds.data.Topics

        Official TREC Fair Ranking 2021 evaluation set.

        Dataset irds.trec-fair.2021.eval.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Official TREC Fair Ranking 2021 evaluation set.

        Dataset irds.trec-fair.2021.eval

        datamaestro_text.datasets.irds.data.Adhoc

        Official TREC Fair Ranking 2021 evaluation set.

        trec-fair/2022

        The TREC Fair Ranking 2022 track focuses on fairly prioritising Wikimedia articles for editing to provide a fair exposure to articles from different groups.

        Dataset irds.trec-fair.2022.documents

        datamaestro_text.datasets.irds.data.Documents

        The TREC Fair Ranking 2022 track focuses on fairly prioritising Wikimedia articles for editing to provide a fair exposure to articles from different groups.

        Dataset irds.trec-fair.2022.train.queries

        datamaestro_text.datasets.irds.data.Topics

        Official TREC Fair Ranking 2022 train set.

        Dataset irds.trec-fair.2022.train.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Official TREC Fair Ranking 2022 train set.

        Dataset irds.trec-fair.2022.train

        datamaestro_text.datasets.irds.data.Adhoc

        Official TREC Fair Ranking 2022 train set.

        trec-cast/v0

        Version 0 of the TREC CAsT corpus. This version uses documents from the Washington Post (version 2), TREC CAR (version 2), and MS MARCO passage (version 1).

        This corpus was originally meant to be used for evaluation of the 2019 task, but the Washington Post corpus was not included for scoring in the final version due to "an error in the process led to ambiguous document ids," and Washington Post documents were removed from participating systems. As such, trec-cast/v1 (which doesn't include the Washington Post) should be used for the 2019 version of the task. However, this version still can be used for the training set (trec-cast/v0/train) or for replicating the original submissions to the track (prior to the removal of Washingotn Post documents).

        Dataset irds.trec-cast.v0.documents

        datamaestro_text.datasets.irds.data.Documents

        Version 0 of the TREC CAsT corpus. This version uses documents from the Washington Post (version 2), TREC CAR (version 2), and MS MARCO passage (version 1).

        This corpus was originally meant to be used for evaluation of the 2019 task, but the Washington Post corpus was not included for scoring in the final version due to "an error in the process led to ambiguous document ids," and Washington Post documents were removed from participating systems. As such, trec-cast/v1 (which doesn't include the Washington Post) should be used for the 2019 version of the task. However, this version still can be used for the training set (trec-cast/v0/train) or for replicating the original submissions to the track (prior to the removal of Washingotn Post documents).

        Dataset irds.trec-cast.v0.train.queries

        datamaestro_text.datasets.irds.data.Topics

        Training set provided by TREC CAsT 2019.

        Dataset irds.trec-cast.v0.train.scoreddocs

        datamaestro_text.datasets.irds.data.AdhocRun

        Training set provided by TREC CAsT 2019.

        Dataset irds.trec-cast.v0.train.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Training set provided by TREC CAsT 2019.

        Dataset irds.trec-cast.v0.train

        datamaestro_text.datasets.irds.data.Adhoc

        Training set provided by TREC CAsT 2019.

        Dataset irds.trec-cast.v0.train.judged.queries

        datamaestro_text.datasets.irds.data.Topics

        trec-cast/2019/train, but with queries that do not appear in the qrels removed.

        Dataset irds.trec-cast.v0.train.judged.scoreddocs

        datamaestro_text.datasets.irds.data.AdhocRun

        trec-cast/2019/train, but with queries that do not appear in the qrels removed.

        Dataset irds.trec-cast.v0.train.judged.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        trec-cast/2019/train, but with queries that do not appear in the qrels removed.

        Dataset irds.trec-cast.v0.train.judged

        datamaestro_text.datasets.irds.data.Adhoc

        trec-cast/2019/train, but with queries that do not appear in the qrels removed.

        trec-cast/v1

        Version 1 of the TREC CAsT corpus. This version uses documents from the TREC CAR (version 2) and MS MARCO passage (version 1). This version of the corpus was used for TREC CAsT 2019 and 2020.

        Dataset irds.trec-cast.v1.documents

        datamaestro_text.datasets.irds.data.Documents

        Version 1 of the TREC CAsT corpus. This version uses documents from the TREC CAR (version 2) and MS MARCO passage (version 1). This version of the corpus was used for TREC CAsT 2019 and 2020.

        Dataset irds.trec-cast.v1.2019.queries

        datamaestro_text.datasets.irds.data.Topics

        Official evaluation set for TREC CAsT 2019.

        Dataset irds.trec-cast.v1.2019.scoreddocs

        datamaestro_text.datasets.irds.data.AdhocRun

        Official evaluation set for TREC CAsT 2019.

        Dataset irds.trec-cast.v1.2019.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Official evaluation set for TREC CAsT 2019.

        Dataset irds.trec-cast.v1.2019

        datamaestro_text.datasets.irds.data.Adhoc

        Official evaluation set for TREC CAsT 2019.

        Dataset irds.trec-cast.v1.2019.judged.queries

        datamaestro_text.datasets.irds.data.Topics

        trec-cast/v1/2019, but with queries that do not appear in the qrels removed.

        Dataset irds.trec-cast.v1.2019.judged.scoreddocs

        datamaestro_text.datasets.irds.data.AdhocRun

        trec-cast/v1/2019, but with queries that do not appear in the qrels removed.

        Dataset irds.trec-cast.v1.2019.judged.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        trec-cast/v1/2019, but with queries that do not appear in the qrels removed.

        Dataset irds.trec-cast.v1.2019.judged

        datamaestro_text.datasets.irds.data.Adhoc

        trec-cast/v1/2019, but with queries that do not appear in the qrels removed.

        Dataset irds.trec-cast.v1.2020.queries

        datamaestro_text.datasets.irds.data.Topics

        Official evaluation set for TREC CAsT 2020.

        Dataset irds.trec-cast.v1.2020.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Official evaluation set for TREC CAsT 2020.

        Dataset irds.trec-cast.v1.2020

        datamaestro_text.datasets.irds.data.Adhoc

        Official evaluation set for TREC CAsT 2020.

        Dataset irds.trec-cast.v1.2020.judged.queries

        datamaestro_text.datasets.irds.data.Topics

        trec-cast/v1/2020, but with queries that do not appear in the qrels removed.

        Dataset irds.trec-cast.v1.2020.judged.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        trec-cast/v1/2020, but with queries that do not appear in the qrels removed.

        Dataset irds.trec-cast.v1.2020.judged

        datamaestro_text.datasets.irds.data.Adhoc

        trec-cast/v1/2020, but with queries that do not appear in the qrels removed.

        hc4/fa

        The Persian collection contains English queries and Persian documents for retrieval. Human and machine translated queries are provided in the query object for running monolingual retrieval or cross-language retrival assuming the machine query tranlstion into Persian is available.

        Dataset irds.hc4.fa.documents

        datamaestro_text.datasets.irds.data.Documents

        The Persian collection contains English queries and Persian documents for retrieval. Human and machine translated queries are provided in the query object for running monolingual retrieval or cross-language retrival assuming the machine query tranlstion into Persian is available.

        Dataset irds.hc4.fa.dev.queries

        datamaestro_text.datasets.irds.data.Topics

        Development split of hc4/fa.

        Dataset irds.hc4.fa.dev.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Development split of hc4/fa.

        Dataset irds.hc4.fa.dev

        datamaestro_text.datasets.irds.data.Adhoc

        Development split of hc4/fa.

        Dataset irds.hc4.fa.test.queries

        datamaestro_text.datasets.irds.data.Topics

        Test split of hc4/fa.

        Dataset irds.hc4.fa.test.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Test split of hc4/fa.

        Dataset irds.hc4.fa.test

        datamaestro_text.datasets.irds.data.Adhoc

        Test split of hc4/fa.

        Dataset irds.hc4.fa.train.queries

        datamaestro_text.datasets.irds.data.Topics

        Train split of hc4/fa.

        Dataset irds.hc4.fa.train.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Train split of hc4/fa.

        Dataset irds.hc4.fa.train

        datamaestro_text.datasets.irds.data.Adhoc

        Train split of hc4/fa.

        hc4/ru

        The Russian collection contains English queries and Russian documents for retrieval. Human and machine translated queries are provided in the query object for running monolingual retrieval or cross-language retrival assuming the machine query tranlstion into Russian is available.

        Dataset irds.hc4.ru.documents

        datamaestro_text.datasets.irds.data.Documents

        The Russian collection contains English queries and Russian documents for retrieval. Human and machine translated queries are provided in the query object for running monolingual retrieval or cross-language retrival assuming the machine query tranlstion into Russian is available.

        Dataset irds.hc4.ru.dev.queries

        datamaestro_text.datasets.irds.data.Topics

        Development split of hc4/ru.

        Dataset irds.hc4.ru.dev.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Development split of hc4/ru.

        Dataset irds.hc4.ru.dev

        datamaestro_text.datasets.irds.data.Adhoc

        Development split of hc4/ru.

        Dataset irds.hc4.ru.test.queries

        datamaestro_text.datasets.irds.data.Topics

        Test split of hc4/ru.

        Dataset irds.hc4.ru.test.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Test split of hc4/ru.

        Dataset irds.hc4.ru.test

        datamaestro_text.datasets.irds.data.Adhoc

        Test split of hc4/ru.

        Dataset irds.hc4.ru.train.queries

        datamaestro_text.datasets.irds.data.Topics

        Train split of hc4/ru.

        Dataset irds.hc4.ru.train.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Train split of hc4/ru.

        Dataset irds.hc4.ru.train

        datamaestro_text.datasets.irds.data.Adhoc

        Train split of hc4/ru.

        hc4/zh

        The Chinese collection contains English queries and Chinese documents for retrieval. Human and machine translated queries are provided in the query object for running monolingual retrieval or cross-language retrival assuming the machine query tranlstion into Chinese is available.

        Dataset irds.hc4.zh.documents

        datamaestro_text.datasets.irds.data.Documents

        The Chinese collection contains English queries and Chinese documents for retrieval. Human and machine translated queries are provided in the query object for running monolingual retrieval or cross-language retrival assuming the machine query tranlstion into Chinese is available.

        Dataset irds.hc4.zh.dev.queries

        datamaestro_text.datasets.irds.data.Topics

        Development split of hc4/zh.

        Dataset irds.hc4.zh.dev.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Development split of hc4/zh.

        Dataset irds.hc4.zh.dev

        datamaestro_text.datasets.irds.data.Adhoc

        Development split of hc4/zh.

        Dataset irds.hc4.zh.test.queries

        datamaestro_text.datasets.irds.data.Topics

        Test split of hc4/zh.

        Dataset irds.hc4.zh.test.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Test split of hc4/zh.

        Dataset irds.hc4.zh.test

        datamaestro_text.datasets.irds.data.Adhoc

        Test split of hc4/zh.

        Dataset irds.hc4.zh.train.queries

        datamaestro_text.datasets.irds.data.Topics

        Train split of hc4/zh.

        Dataset irds.hc4.zh.train.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Train split of hc4/zh.

        Dataset irds.hc4.zh.train

        datamaestro_text.datasets.irds.data.Adhoc

        Train split of hc4/zh.

        neuclir/1/fa

        The Persian collection contains English queries (to be released) and Persian documents for retrieval. Human and machine translated queries will be provided in the query object for running monolingual retrieval or cross-language retrival assuming the machine query tranlstion into Persian is available.

        Dataset irds.neuclir.1.fa.documents

        datamaestro_text.datasets.irds.data.Documents

        The Persian collection contains English queries (to be released) and Persian documents for retrieval. Human and machine translated queries will be provided in the query object for running monolingual retrieval or cross-language retrival assuming the machine query tranlstion into Persian is available.

        Dataset irds.neuclir.1.fa.trec-2022.queries

        datamaestro_text.datasets.irds.data.Topics

        Topics and assessments for the TREC NeuCLIR 2022 (Persian language CLIR).

        Dataset irds.neuclir.1.fa.trec-2022.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Topics and assessments for the TREC NeuCLIR 2022 (Persian language CLIR).

        Dataset irds.neuclir.1.fa.trec-2022

        datamaestro_text.datasets.irds.data.Adhoc

        Topics and assessments for the TREC NeuCLIR 2022 (Persian language CLIR).

        Dataset irds.neuclir.1.fa.trec-2023.queries

        datamaestro_text.datasets.irds.data.Topics

        Topics and assessments for the TREC NeuCLIR 2023 (Persian language CLIR).

        Dataset irds.neuclir.1.fa.trec-2023.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Topics and assessments for the TREC NeuCLIR 2023 (Persian language CLIR).

        Dataset irds.neuclir.1.fa.trec-2023

        datamaestro_text.datasets.irds.data.Adhoc

        Topics and assessments for the TREC NeuCLIR 2023 (Persian language CLIR).

        neuclir/1/fa/hc4-filtered

        Subset of the Persian collection that intersect with HC4. The 60 queries are the hc4/fa/dev and hc4/fa/test sets combined.

        Dataset irds.neuclir.1.fa.hc4-filtered.documents

        datamaestro_text.datasets.irds.data.Documents

        Subset of the Persian collection that intersect with HC4. The 60 queries are the hc4/fa/dev and hc4/fa/test sets combined.

        Dataset irds.neuclir.1.fa.hc4-filtered.queries

        datamaestro_text.datasets.irds.data.Topics

        Subset of the Persian collection that intersect with HC4. The 60 queries are the hc4/fa/dev and hc4/fa/test sets combined.

        Dataset irds.neuclir.1.fa.hc4-filtered.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Subset of the Persian collection that intersect with HC4. The 60 queries are the hc4/fa/dev and hc4/fa/test sets combined.

        Dataset irds.neuclir.1.fa.hc4-filtered

        datamaestro_text.datasets.irds.data.Adhoc

        Subset of the Persian collection that intersect with HC4. The 60 queries are the hc4/fa/dev and hc4/fa/test sets combined.

        neuclir/1/multi

        A combined corpus of NeuCLIR v1 including all Persian, Russian, and Chinese documents.

        Dataset irds.neuclir.1.multi.documents

        datamaestro_text.datasets.irds.data.Documents

        A combined corpus of NeuCLIR v1 including all Persian, Russian, and Chinese documents.

        Dataset irds.neuclir.1.multi.trec-2023.queries

        datamaestro_text.datasets.irds.data.Topics

        Topics and assessments for the TREC NeuCLIR 2023 multi-language retrieval task.

        Dataset irds.neuclir.1.multi.trec-2023.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Topics and assessments for the TREC NeuCLIR 2023 multi-language retrieval task.

        Dataset irds.neuclir.1.multi.trec-2023

        datamaestro_text.datasets.irds.data.Adhoc

        Topics and assessments for the TREC NeuCLIR 2023 multi-language retrieval task.

        neuclir/1/ru

        The Russian collection contains English queries (to be released) and Russian documents for retrieval. Human and machine translated queries will be provided in the query object for running monolingual retrieval or cross-language retrival assuming the machine query tranlstion into Russian is available.

        Dataset irds.neuclir.1.ru.documents

        datamaestro_text.datasets.irds.data.Documents

        The Russian collection contains English queries (to be released) and Russian documents for retrieval. Human and machine translated queries will be provided in the query object for running monolingual retrieval or cross-language retrival assuming the machine query tranlstion into Russian is available.

        Dataset irds.neuclir.1.ru.trec-2022.queries

        datamaestro_text.datasets.irds.data.Topics

        Topics and assessments for the TREC NeuCLIR 2022 (Russian language CLIR).

        Dataset irds.neuclir.1.ru.trec-2022.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Topics and assessments for the TREC NeuCLIR 2022 (Russian language CLIR).

        Dataset irds.neuclir.1.ru.trec-2022

        datamaestro_text.datasets.irds.data.Adhoc

        Topics and assessments for the TREC NeuCLIR 2022 (Russian language CLIR).

        Dataset irds.neuclir.1.ru.trec-2023.queries

        datamaestro_text.datasets.irds.data.Topics

        Topics and assessments for the TREC NeuCLIR 2023 (Russian language CLIR).

        Dataset irds.neuclir.1.ru.trec-2023.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Topics and assessments for the TREC NeuCLIR 2023 (Russian language CLIR).

        Dataset irds.neuclir.1.ru.trec-2023

        datamaestro_text.datasets.irds.data.Adhoc

        Topics and assessments for the TREC NeuCLIR 2023 (Russian language CLIR).

        neuclir/1/ru/hc4-filtered

        Subset of the Russian collection that intersect with HC4. The 54 queries are the hc4/ru/dev and hc4/ru/test sets combined.

        Dataset irds.neuclir.1.ru.hc4-filtered.documents

        datamaestro_text.datasets.irds.data.Documents

        Subset of the Russian collection that intersect with HC4. The 54 queries are the hc4/ru/dev and hc4/ru/test sets combined.

        Dataset irds.neuclir.1.ru.hc4-filtered.queries

        datamaestro_text.datasets.irds.data.Topics

        Subset of the Russian collection that intersect with HC4. The 54 queries are the hc4/ru/dev and hc4/ru/test sets combined.

        Dataset irds.neuclir.1.ru.hc4-filtered.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Subset of the Russian collection that intersect with HC4. The 54 queries are the hc4/ru/dev and hc4/ru/test sets combined.

        Dataset irds.neuclir.1.ru.hc4-filtered

        datamaestro_text.datasets.irds.data.Adhoc

        Subset of the Russian collection that intersect with HC4. The 54 queries are the hc4/ru/dev and hc4/ru/test sets combined.

        neuclir/1/zh

        The Chinese collection contains English queries (to be released) and Chinese documents for retrieval. Human and machine translated queries will be provided in the query object for running monolingual retrieval or cross-language retrival assuming the machine query tranlstion into Chinese is available.

        Dataset irds.neuclir.1.zh.documents

        datamaestro_text.datasets.irds.data.Documents

        The Chinese collection contains English queries (to be released) and Chinese documents for retrieval. Human and machine translated queries will be provided in the query object for running monolingual retrieval or cross-language retrival assuming the machine query tranlstion into Chinese is available.

        Dataset irds.neuclir.1.zh.trec-2022.queries

        datamaestro_text.datasets.irds.data.Topics

        Topics and assessments for the TREC NeuCLIR 2022 (Chinese language CLIR).

        Dataset irds.neuclir.1.zh.trec-2022.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Topics and assessments for the TREC NeuCLIR 2022 (Chinese language CLIR).

        Dataset irds.neuclir.1.zh.trec-2022

        datamaestro_text.datasets.irds.data.Adhoc

        Topics and assessments for the TREC NeuCLIR 2022 (Chinese language CLIR).

        Dataset irds.neuclir.1.zh.trec-2023.queries

        datamaestro_text.datasets.irds.data.Topics

        Topics and assessments for the TREC NeuCLIR 2023 (Chinese language CLIR).

        Dataset irds.neuclir.1.zh.trec-2023.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Topics and assessments for the TREC NeuCLIR 2023 (Chinese language CLIR).

        Dataset irds.neuclir.1.zh.trec-2023

        datamaestro_text.datasets.irds.data.Adhoc

        Topics and assessments for the TREC NeuCLIR 2023 (Chinese language CLIR).

        neuclir/1/zh/hc4-filtered

        Subset of the Chinse collection that intersect with HC4. The 60 queries are the hc4/zh/dev and hc4/zh/test sets combined.

        Dataset irds.neuclir.1.zh.hc4-filtered.documents

        datamaestro_text.datasets.irds.data.Documents

        Subset of the Chinse collection that intersect with HC4. The 60 queries are the hc4/zh/dev and hc4/zh/test sets combined.

        Dataset irds.neuclir.1.zh.hc4-filtered.queries

        datamaestro_text.datasets.irds.data.Topics

        Subset of the Chinse collection that intersect with HC4. The 60 queries are the hc4/zh/dev and hc4/zh/test sets combined.

        Dataset irds.neuclir.1.zh.hc4-filtered.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Subset of the Chinse collection that intersect with HC4. The 60 queries are the hc4/zh/dev and hc4/zh/test sets combined.

        Dataset irds.neuclir.1.zh.hc4-filtered

        datamaestro_text.datasets.irds.data.Adhoc

        Subset of the Chinse collection that intersect with HC4. The 60 queries are the hc4/zh/dev and hc4/zh/test sets combined.

        SARA

        A set of sensitivity-aware relevance assessments. More information is avaliable here:

        Dataset irds.sara.documents

        datamaestro_text.datasets.irds.data.Documents

        A set of sensitivity-aware relevance assessments. More information is avaliable here:

        Dataset irds.sara.queries

        datamaestro_text.datasets.irds.data.Topics

        A set of sensitivity-aware relevance assessments. More information is avaliable here:

        Dataset irds.sara.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        A set of sensitivity-aware relevance assessments. More information is avaliable here:

        Dataset irds.sara

        datamaestro_text.datasets.irds.data.Adhoc

        A set of sensitivity-aware relevance assessments. More information is avaliable here:

        trec-tot/2025

        Dataset irds.trec-tot.2025.documents

        datamaestro_text.datasets.irds.data.Documents

        trec-tot/2025/train

        Dataset irds.trec-tot.2025.train.documents

        datamaestro_text.datasets.irds.data.Documents

        Dataset irds.trec-tot.2025.train.queries

        datamaestro_text.datasets.irds.data.Topics

        Dataset irds.trec-tot.2025.train.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Dataset irds.trec-tot.2025.train

        datamaestro_text.datasets.irds.data.Adhoc

        trec-tot/2025/dev1

        Dataset irds.trec-tot.2025.dev1.documents

        datamaestro_text.datasets.irds.data.Documents

        Dataset irds.trec-tot.2025.dev1.queries

        datamaestro_text.datasets.irds.data.Topics

        Dataset irds.trec-tot.2025.dev1.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Dataset irds.trec-tot.2025.dev1

        datamaestro_text.datasets.irds.data.Adhoc

        trec-tot/2025/dev2

        Dataset irds.trec-tot.2025.dev2.documents

        datamaestro_text.datasets.irds.data.Documents

        Dataset irds.trec-tot.2025.dev2.queries

        datamaestro_text.datasets.irds.data.Topics

        Dataset irds.trec-tot.2025.dev2.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Dataset irds.trec-tot.2025.dev2

        datamaestro_text.datasets.irds.data.Adhoc

        trec-tot/2025/dev3

        Dataset irds.trec-tot.2025.dev3.documents

        datamaestro_text.datasets.irds.data.Documents

        Dataset irds.trec-tot.2025.dev3.queries

        datamaestro_text.datasets.irds.data.Topics

        Dataset irds.trec-tot.2025.dev3.qrels

        datamaestro_text.datasets.irds.data.AdhocAssessments

        Dataset irds.trec-tot.2025.dev3

        datamaestro_text.datasets.irds.data.Adhoc