Information Retrieval API

Collection

XPM Configdatamaestro_text.data.ir.Adhoc(*, id, documents, topics, assessments)

An Adhoc IR collection

id: str

The unique dataset ID

documents: datamaestro_text.data.ir.AdhocDocuments

The set of documents

topics: datamaestro_text.data.ir.AdhocTopics

The set of topics

assessments: datamaestro_text.data.ir.AdhocAssessments

The set of assessments (for each topic)

Topics

XPM Configdatamaestro_text.data.ir.AdhocTopics(*, id)
id: str

The unique dataset ID

iter() Iterator[datamaestro_text.data.ir.AdhocTopic]

Returns an iterator over topics

XPM Configdatamaestro_text.data.ir.csv.AdhocTopics(*, id, separator, path)

Pairs of query id - query using a separator

id: str

The unique dataset ID

separator: str
path: Path
class datamaestro_text.data.ir.AdhocTopic(qid: str, text: str, metadata: Dict[str, str])

The most generic topic: an ID with some text

Documents

XPM Configdatamaestro_text.data.ir.AdhocDocuments(*, id, count)

A set of documents with identifiers

id: str

The unique dataset ID

count: int

Number of documents

XPM Configdatamaestro_text.data.ir.cord19.Documents(*, id, path, delimiter, ignore, names_row, count)
id: str

The unique dataset ID

path: Path

The path of the file

delimiter: str = ,
ignore: int = 0
names_row: int = -1
count: int

Number of documents

XPM Configdatamaestro_text.data.ir.csv.AdhocDocuments(*, id, count, path, separator)

One line per document, format pid<SEP>text

id: str

The unique dataset ID

count: int

Number of documents

path: Path
separator: str

Assessments

XPM Configdatamaestro_text.data.ir.AdhocAssessments(*, id)

Ad-hoc assessements (qrels)

id: str

The unique dataset ID

iter() Iterator[datamaestro_text.data.ir.AdhocAssessedTopic]

Returns an iterator over assessments

XPM Configdatamaestro_text.data.ir.trec.TrecAdhocAssessments(*, id, path)
id: str

The unique dataset ID

path: Path
class datamaestro_text.data.ir.AdhocAssessedTopic(qid: str, assessments: List[datamaestro_text.data.ir.AdhocAssessment])
class datamaestro_text.data.ir.AdhocAssessment(docno: str, rel: float)

Adhoc assessments associate a document ID with a relevance

Runs

XPM Configdatamaestro_text.data.ir.AdhocRun(*, id)

IR adhoc run

id: str

The unique dataset ID

XPM Configdatamaestro_text.data.ir.csv.AdhocRunWithText(*, id, separator, path)

(qid, doc.id, query, passage)

id: str

The unique dataset ID

separator: str
path: Path

Reranking

XPM Configdatamaestro_text.data.ir.RerankAdhoc(*, id, documents, topics, assessments, run)

Re-ranking ad-hoc task based on an existing run

id: str

The unique dataset ID

documents: datamaestro_text.data.ir.AdhocDocuments

The set of documents

topics: datamaestro_text.data.ir.AdhocTopics

The set of topics

assessments: datamaestro_text.data.ir.AdhocAssessments

The set of assessments (for each topic)

run: datamaestro_text.data.ir.AdhocRun

The run to re-rank

Training triplets

XPM Configdatamaestro_text.data.ir.TrainingTriplets(*, id, ids)

Triplet for training IR systems: query / query ID, positive document, negative document

id: str

The unique dataset ID

ids: bool
XPM Configdatamaestro_text.data.ir.TrainingTripletsLines(*, id, ids, sep, path)

Training triplets with one line per triple (text only)

id: str

The unique dataset ID

ids: bool
sep: str
path: Path
XPM Configdatamaestro_text.data.ir.csv.TrainingTriplets(*, id, path, separator)

Training triplets (full text)

id: str

The unique dataset ID

ids: bool = Trueconstant
path: Path
separator: str
XPM Configdatamaestro_text.data.ir.csv.TrainingTripletsID(*, id, sep, path, separator, documents, topics)

Training triplets (query/document IDs only)

id: str

The unique dataset ID

ids: bool = Trueconstant

Whether documents are IDs or full text

sep: str
path: Path
separator: str

Field separator

documents: datamaestro_text.data.ir.AdhocDocuments

The documents

topics: datamaestro_text.data.ir.AdhocTopics

The topics