Jay Qadan
2018-12-01 02:53:34 UTC
am trying to use this example Doc2vec-wikipedia
<https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-wikipedia.ipynb> but
to use similarity with an arbitary document like a news article in attached
sample. Due to computational challenges, I used 'text8' instead of full
wikipedia dump using the gensim api.load("text8"):
- is this the best approach (doc2vec) to find document similarity with
large corpus? any suggestion if there is a better method to get similaities
based on topic rather than similar words?
- As suggested, I used this code to look up similarity, considering that
I use larger number of words than just: 'machine','learning'
- print(model.docvecs.most_similar(positive=[model.infer_vector(['machine','learning'])],
topn=20))
- however the result I get is in this format: (502,
0.5730128288269043), (94, 0.5560649633407593), (187, 0.5478538870811462)
not article title as in the orginial example, any suggestions how to get
the article titles in 'text8' corpus?
<https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-wikipedia.ipynb> but
to use similarity with an arbitary document like a news article in attached
sample. Due to computational challenges, I used 'text8' instead of full
wikipedia dump using the gensim api.load("text8"):
- is this the best approach (doc2vec) to find document similarity with
large corpus? any suggestion if there is a better method to get similaities
based on topic rather than similar words?
- As suggested, I used this code to look up similarity, considering that
I use larger number of words than just: 'machine','learning'
- print(model.docvecs.most_similar(positive=[model.infer_vector(['machine','learning'])],
topn=20))
- however the result I get is in this format: (502,
0.5730128288269043), (94, 0.5560649633407593), (187, 0.5478538870811462)
not article title as in the orginial example, any suggestions how to get
the article titles in 'text8' corpus?
--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.