hans mohan
2018-11-01 08:53:33 UTC
Hi,
I have created a similarity index using LSI model for a corpus size of
73000 documents (Each document is a English text sentence in excel file)
Again, we have created a validation set which has 500 queries which have
been reworded/rephrased manually reading through the indexed docs to check
the similarity score.
*Question :As we do this exercise and get the similarity scores for the 500
queries set, what all hyperparameters we can consider to improve the score
or model accuracy ?*
*Code Pipeline Details :*
*1. *Construct a dictionary object with *filter_extremes *method.
*no_below =2, no_above = 1.0*
*For 73K docs, the resulting dictionary has 36,000 unique tokens.*
*2. *We then perform a stacked transformation to construct the feature
vector.
*BoW->Tf-IDF ->LSI*
*num_topics = 500, *
*3. *Construct a similarity index. *num_features = 500.*
*Reading through the gensim documentation, we have considered following
hyper-parameters:*
*a) no_below b) no_above 3)
num_topics in LSI model.*
Many Thanks and Regards,
Hans Mohan
I have created a similarity index using LSI model for a corpus size of
73000 documents (Each document is a English text sentence in excel file)
Again, we have created a validation set which has 500 queries which have
been reworded/rephrased manually reading through the indexed docs to check
the similarity score.
*Question :As we do this exercise and get the similarity scores for the 500
queries set, what all hyperparameters we can consider to improve the score
or model accuracy ?*
*Code Pipeline Details :*
*1. *Construct a dictionary object with *filter_extremes *method.
*no_below =2, no_above = 1.0*
*For 73K docs, the resulting dictionary has 36,000 unique tokens.*
*2. *We then perform a stacked transformation to construct the feature
vector.
*BoW->Tf-IDF ->LSI*
*num_topics = 500, *
*3. *Construct a similarity index. *num_features = 500.*
*Reading through the gensim documentation, we have considered following
hyper-parameters:*
*a) no_below b) no_above 3)
num_topics in LSI model.*
Many Thanks and Regards,
Hans Mohan
--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.