Discussion:
[gensim:11771] Is there a way within Gensim's word2vec that I can access both the context word vector representation and the center word vector representation?
a***@thisismetis.com
2018-11-12 23:28:37 UTC
Permalink
When training a word2vec model, each word has two vector representations
depending on whether it is acting as a center word or a context word.
According to resources that I've seen, these two representations are then
averaged together, and this averaged word vector is used as the 'final'
embedding produced by word2vec.

I am interested in experimenting with different weighting schemes of
center/context word vectors, and to that end, I'd need access to both the
center and context word representation. Is there a way to get these
representations from Gensim's word2vec implementation? If not, is anyone
aware of a word2vec implementation where I can get access to both
context/center word vector representations after training?

Thank you!
--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Gordon Mohr
2018-11-13 03:02:21 UTC
Permalink
In fact what's usually considered the "word vectors" from a Word2Vec model
are certain "projection vectors" that exist, one for each known-vocabulary
word, and are used to compose the inputs to the internal neural-network –
either as a single skip-gram context word, or as an average of multiple
surrounding context words.

In the original and typical use of Word2Vec, no alternate vectors (like
some 2nd "center word" representation you mention) are read out from the
model. The above-mentioned input-vectors are the final word-vectors.

When specifically using the negative-sampling training mode, the outputs of
the neural network are one node per predictable word, and the network's
weights into those per-word output nodes can be seen as a sort of 2nd
vector for the individual words. (So, in a way, like the "center word"
representation you mention, but I've not seen it described that way
before.)

Some researchers, notably Bhaskar Mitra at Microsoft et al in their work
about a "Dual Embedding Space Model (DESM)", note that these "output"
representations can also be useful, either alone or concatenated with the
traditional input-representations. But, they're not usually included in the
word-vectors you get from a Word2Vec model.

You can get the array of these output vectors in a negative-sampling
Word2Vec model from the property `model.trainables.syn1neg` – the vectors
are in the same order as the `model.wv.vectors` holding the input-vectors.

- Gordon
Post by a***@thisismetis.com
When training a word2vec model, each word has two vector representations
depending on whether it is acting as a center word or a context word.
According to resources that I've seen, these two representations are then
averaged together, and this averaged word vector is used as the 'final'
embedding produced by word2vec.
I am interested in experimenting with different weighting schemes of
center/context word vectors, and to that end, I'd need access to both the
center and context word representation. Is there a way to get these
representations from Gensim's word2vec implementation? If not, is anyone
aware of a word2vec implementation where I can get access to both
context/center word vector representations after training?
Thank you!
--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...