Frastxxx
2018-11-01 17:12:27 UTC
Hello,
I prepared some document for LDA learning. Before I generate Lda Model I do
filtering on given documents. I remove common words and rare words. There
is a situation that one of my documents have only rare words so basically
in final results I have an empty vector. None of those document word's are
present in dictionary.
I am suprised because when I iterate on LDA model (generated with
previously mentioned dictionary) I see that my document has some percentage
value to some topics. Which is weird because topics were generated based on
dictionary that does not contain even single word from this document).
Because of that I get misleading results when I look for completly
different document similarities.
It that a correct behavior, or I'am missing something here(which is
possible, I just started using gensim)
The other thing is that when I try to use previously generated LDA model
with document that also does not contain a single word from dictonary I got
the biggest similarity to previously mentioned document.
I prepared some document for LDA learning. Before I generate Lda Model I do
filtering on given documents. I remove common words and rare words. There
is a situation that one of my documents have only rare words so basically
in final results I have an empty vector. None of those document word's are
present in dictionary.
I am suprised because when I iterate on LDA model (generated with
previously mentioned dictionary) I see that my document has some percentage
value to some topics. Which is weird because topics were generated based on
dictionary that does not contain even single word from this document).
Because of that I get misleading results when I look for completly
different document similarities.
It that a correct behavior, or I'am missing something here(which is
possible, I just started using gensim)
The other thing is that when I try to use previously generated LDA model
with document that also does not contain a single word from dictonary I got
the biggest similarity to previously mentioned document.
--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.