Information Retrieval API
Collection
- XPM Configdatamaestro_text.data.ir.Adhoc(*, id, documents, topics, assessments)
An Adhoc IR collection
- id: str
The unique dataset ID
- documents: datamaestro_text.data.ir.AdhocDocuments
The set of documents
- topics: datamaestro_text.data.ir.AdhocTopics
The set of topics
- assessments: datamaestro_text.data.ir.AdhocAssessments
The set of assessments (for each topic)
Topics
- XPM Configdatamaestro_text.data.ir.AdhocTopics(*, id)
- id: str
The unique dataset ID
- iter() Iterator[datamaestro_text.data.ir.AdhocTopic]
Returns an iterator over topics
- XPM Configdatamaestro_text.data.ir.csv.AdhocTopics(*, id, separator, path)
Pairs of query id - query using a separator
- id: str
The unique dataset ID
- separator: str
- path: Path
- class datamaestro_text.data.ir.AdhocTopic(qid: str, text: str, metadata: Dict[str, str])
The most generic topic: an ID with some text
Documents
- XPM Configdatamaestro_text.data.ir.AdhocDocuments(*, id, count)
A set of documents with identifiers
- id: str
The unique dataset ID
- count: int
Number of documents
Assessments
- XPM Configdatamaestro_text.data.ir.AdhocAssessments(*, id)
Ad-hoc assessements (qrels)
- id: str
The unique dataset ID
- iter() Iterator[datamaestro_text.data.ir.AdhocAssessedTopic]
Returns an iterator over assessments
- XPM Configdatamaestro_text.data.ir.trec.TrecAdhocAssessments(*, id, path)
- id: str
The unique dataset ID
- path: Path
- class datamaestro_text.data.ir.AdhocAssessedTopic(qid: str, assessments: List[datamaestro_text.data.ir.AdhocAssessment])
- class datamaestro_text.data.ir.AdhocAssessment(docno: str, rel: float)
Adhoc assessments associate a document ID with a relevance
Runs
Reranking
- XPM Configdatamaestro_text.data.ir.RerankAdhoc(*, id, documents, topics, assessments, run)
Re-ranking ad-hoc task based on an existing run
- id: str
The unique dataset ID
- documents: datamaestro_text.data.ir.AdhocDocuments
The set of documents
- topics: datamaestro_text.data.ir.AdhocTopics
The set of topics
- assessments: datamaestro_text.data.ir.AdhocAssessments
The set of assessments (for each topic)
- run: datamaestro_text.data.ir.AdhocRun
The run to re-rank
Training triplets
- XPM Configdatamaestro_text.data.ir.TrainingTriplets(*, id, ids)
Triplet for training IR systems: query / query ID, positive document, negative document
- id: str
The unique dataset ID
- ids: bool
- XPM Configdatamaestro_text.data.ir.TrainingTripletsLines(*, id, ids, sep, path)
Training triplets with one line per triple (text only)
- id: str
The unique dataset ID
- ids: bool
- sep: str
- path: Path
- XPM Configdatamaestro_text.data.ir.csv.TrainingTriplets(*, id, path, separator)
Training triplets (full text)
- id: str
The unique dataset ID
- ids: bool = Trueconstant
- path: Path
- separator: str
- XPM Configdatamaestro_text.data.ir.csv.TrainingTripletsID(*, id, sep, path, separator, documents, topics)
Training triplets (query/document IDs only)
- id: str
The unique dataset ID
- ids: bool = Trueconstant
Whether documents are IDs or full text
- sep: str
- path: Path
- separator: str
Field separator
- documents: datamaestro_text.data.ir.AdhocDocuments
The documents
- topics: datamaestro_text.data.ir.AdhocTopics
The topics