Conversation API

This module provides data types for conversational information retrieval and query understanding tasks.

Core Data Classes

Entry types for conversation turns:

class datamaestro_text.data.conversation.base.AnswerEntry(answer: str)

Bases: Item

A system answer

answer: str

The system answer

class datamaestro_text.data.conversation.base.RetrievedEntry(documents: List[str], relevant_documents: Dict[int, Tuple[int | None, int | None]] | None = None)

Bases: Item

List of system-retrieved documents and their relevance

documents: List[str]

List of retrieved documents

relevant_documents: Dict[int, Tuple[int | None, int | None]] | None

List of relevance status (optional), with start/stop position

class datamaestro_text.data.conversation.base.ClarifyingQuestionEntry

Bases: Item

A system-generated clarifying question

class datamaestro_text.data.conversation.base.DecontextualizedItem

Bases: Item

A topic record with decontextualized versions of the topic

abstract get_decontextualized_query(mode=None) str

Returns the decontextualized query

Conversation structures:

datamaestro_text.data.conversation.base.ConversationHistory

The conversation

class datamaestro_text.data.conversation.base.ConversationHistoryItem(history: Sequence[Record])

Bases: Item

A user interaction contextualized within a conversation

history: Sequence[Record]

The history

Conversational IR

XPM Configdatamaestro_text.data.conversation.base.ConversationUserTopics(*, id, conversations)

Bases: Topics

Extract user topics from conversations

id: str

The unique (sub-)dataset ID

conversations: datamaestro_text.data.conversation.base.ConversationDataset

Contextual Query Reformulation

Base class for conversation datasets:

XPM Configdatamaestro_text.data.conversation.base.ConversationDataset(*, id)

Bases: Base, ABC

A dataset made of conversations

id: str

The unique (sub-)dataset ID

CANARD Dataset

XPM Configdatamaestro_text.data.conversation.canard.CanardDataset(*, id, path)

Bases: ConversationDataset, File

A dataset in the CANARD JSON format

The CANARD dataset is composed of

id: str

The unique (sub-)dataset ID

path: path

The path of the file

OrConvQA Dataset

XPM Configdatamaestro_text.data.conversation.orconvqa.OrConvQADataset(*, id, path)

Bases: ConversationDataset, File

id: str

The unique (sub-)dataset ID

path: path

The path of the file

QReCC Dataset

XPM Configdatamaestro_text.data.conversation.qrecc.QReCCDataset(*, id, path)

Bases: ConversationDataset, File

id: str

The unique (sub-)dataset ID

path: path

The path of the file

iKAT Dataset

XPM Configdatamaestro_text.data.conversation.ikat.IkatConversations(*, id, path)

Bases: ConversationDataset, File

A dataset containing conversations from the IKAT project

id: str

The unique (sub-)dataset ID

path: path

The path of the file