Pedro Rivero
2018-11-29 12:26:24 UTC
Hello!
I'm doing a project for the uni, it involves doc2vec to get vectors
repressenting comments of users in restaurants, then using those to predict
the comments a user would say in a restaurant he hasn't commented in. It's
pretty much done at this point, I'm going to test it out doing
cross-validation (a k_fold, dividing the documents into k parts, and for
each one, building a doc2vec model without it and using my program to
predict the part I left out, then measure the difference between the one I
predicted and the real one) and I'm not sure what parameters to try.
To get you some info, there 870.982 distinct documents, about 99 words long
each after preprocessing (none of them below 50 words).
From what I understand: better not to tinker with alpha and min_alpha,
number of iterations between 10 and 20, size of about 100(?), min_count and
window of about 2, 5 and 10 or so... Negative sampling between 5-20 words,
not sure about the exponent.
Could you help me out and give some hyperparameter combinations to try?
Thanks in advance!
I'm doing a project for the uni, it involves doc2vec to get vectors
repressenting comments of users in restaurants, then using those to predict
the comments a user would say in a restaurant he hasn't commented in. It's
pretty much done at this point, I'm going to test it out doing
cross-validation (a k_fold, dividing the documents into k parts, and for
each one, building a doc2vec model without it and using my program to
predict the part I left out, then measure the difference between the one I
predicted and the real one) and I'm not sure what parameters to try.
To get you some info, there 870.982 distinct documents, about 99 words long
each after preprocessing (none of them below 50 words).
From what I understand: better not to tinker with alpha and min_alpha,
number of iterations between 10 and 20, size of about 100(?), min_count and
window of about 2, 5 and 10 or so... Negative sampling between 5-20 words,
not sure about the exponent.
Could you help me out and give some hyperparameter combinations to try?
Thanks in advance!
--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.