Word embeddings
Glove
GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.
-
Dataset edu.stanford.glove.6b
datamaestro_text.data.embeddings.WordEmbeddingsText
Embeddings for 6B words in various dimensions
-
Dataset edu.stanford.glove.6b.50
datamaestro_text.data.embeddings.WordEmbeddingsText
Glove 6B - dimension 50
-
Dataset edu.stanford.glove.6b.100
datamaestro_text.data.embeddings.WordEmbeddingsText
Glove 6B - dimension 100
-
Dataset edu.stanford.glove.6b.200
datamaestro_text.data.embeddings.WordEmbeddingsText
Glove 6B - dimension 200
-
Dataset edu.stanford.glove.6b.300
datamaestro_text.data.embeddings.WordEmbeddingsText
Glove 6B - dimension 200
-
Dataset edu.stanford.glove.42b
datamaestro_text.data.embeddings.WordEmbeddingsText
Glove embeddings trained on Common Crawl with 42B tokens
-
Dataset edu.stanford.glove.840b
datamaestro_text.data.embeddings.WordEmbeddingsText
Glove embeddings trained on Common Crawl with 840B tokens