Word embeddings

Glove

GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.

Dataset edu.stanford.glove.6b

datamaestro_text.data.embeddings.WordEmbeddingsText

Embeddings for 6B words in various dimensions

Dataset edu.stanford.glove.6b.50

datamaestro_text.data.embeddings.WordEmbeddingsText

Glove 6B - dimension 50

Dataset edu.stanford.glove.6b.100

datamaestro_text.data.embeddings.WordEmbeddingsText

Glove 6B - dimension 100

Dataset edu.stanford.glove.6b.200

datamaestro_text.data.embeddings.WordEmbeddingsText

Glove 6B - dimension 200

Dataset edu.stanford.glove.6b.300

datamaestro_text.data.embeddings.WordEmbeddingsText

Glove 6B - dimension 200

Dataset edu.stanford.glove.42b

datamaestro_text.data.embeddings.WordEmbeddingsText

Glove embeddings trained on Common Crawl with 42B tokens

Dataset edu.stanford.glove.840b

datamaestro_text.data.embeddings.WordEmbeddingsText

Glove embeddings trained on Common Crawl with 840B tokens