[gensim:11740] Hyperparameter tuning for Improving the similarity Score using LSI model.

hans mohan

2018-11-01 08:53:33 UTC

Hi,

I have created a similarity index using LSI model for a corpus size of
73000 documents (Each document is a English text sentence in excel file)

Again, we have created a validation set which has 500 queries which have
been reworded/rephrased manually reading through the indexed docs to check
the similarity score.

*Question :As we do this exercise and get the similarity scores for the 500
queries set, what all hyperparameters we can consider to improve the score
or model accuracy ?*

*Code Pipeline Details :*

*1. *Construct a dictionary object with *filter_extremes *method.
*no_below =2, no_above = 1.0*

*For 73K docs, the resulting dictionary has 36,000 unique tokens.*

*2. *We then perform a stacked transformation to construct the feature
vector.
*BoW->Tf-IDF ->LSI*
*num_topics = 500, *

*3. *Construct a similarity index. *num_features = 500.*

*Reading through the gensim documentation, we have considered following
hyper-parameters:*
*a) no_below b) no_above 3)
num_topics in LSI model.*

Many Thanks and Regards,
Hans Mohan

--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.