Datamaestro Text Datasets ========================= This section lists the datasets available through the datamaestro-text plugin. Datasets are organized by domain: - :doc:`ir` - Information retrieval benchmark collections (MS MARCO, TREC, etc.) - :doc:`irds` - Integration with the `ir-datasets `_ library - :doc:`conversation` - Conversational search and query reformulation - :doc:`text` - Raw text corpora - :doc:`embeddings` - Pre-trained word embeddings - :doc:`recommendation` - Rating and recommendation datasets To load a dataset: .. code-block:: python from datamaestro import prepare_dataset # Load by dataset ID dataset = prepare_dataset("com.microsoft.msmarco.passage") To discover available datasets: .. code-block:: bash # List all datasets datamaestro search text # Search by keyword datamaestro search "trec" .. toctree:: :maxdepth: 1 :caption: Contents: ir irds conversation text embeddings recommendation