Post by Gordon Mohrprint(sum(1 for _ in sentences)) # total count of training examples
1565475
first = iter(sentences).next() # get 1st item
print(len(first)) # 1st item's length in words 91
And what about the output of the 3rd print statement, "print(first[0:3]) #
1st item's 1st 3 words"?
Also, it would be better to simply run all four suggested lines as given,
after `sentences` was created, then copy & paste the exact 3 lines of
output, rather than pasting results at the end of each line. Now, I'm less
sure that all lines were run together, in order. (Doing that would have
also checked for another common error in peoples' corpus-iterable-object.
If you've collected the results for different lines in different runs, the
output isn't as useful. If you got any errors trying to run the 4 suggested
lines, that'd be useful info.)
I have also attached new training loss after I ran it again.
Those are very odd results, in that the difference-in-loss becomes 0 after
10 iterations.
I suspect some or all of:
(1) An error in your difference calculation/display;
(2) A problem with your training corpus; running all 4 requested lines
together would help identify or rule out some of these potential problems.
(3) You've been changing other things about your parameters/code at the
same time as you're following my suggestions, introducing new problems. For
example, your previous strange output was for 20 iterations, and showed
essentially no decrease-in-epoch-loss over 20 passes. This new output shows
25 iterations, and a decrease-in-epoch-loss for the 1st 10 passes, then the
odd stabilization at per-epoch loss of 0. So it looks like you're trying
several things at the same time, without sharing all the details of what
you've changed, making it very hard to guess what could be causing that
output.
Post by Gordon MohrIf I cannot compare 2 training loss from different models then how can I
know which paramters are better suitable for my data?
As mentioned in my 1st response on this thread:
"And while loss is definitionally the thing that the single Word2Vec model
is locally optimizing, it's not the thing to optimize in the whole system
of model-plus-downstream-uses. That should be some quantitative measurement
of model quality specific to your downstream tasks, and the smallest-loss
Word2Vec model is unlikely to be the best-general-performance model for
downstream tasks."
That means: you have to test the resulting model/word-vectors on some
version of the real task(s) where you want to use word-vectors. That's the
only real measure of whether you've chosen good parameters.
If you don't have a way to run such a test, you could look at other more
generic measures - there's a method `evaluate_word_analogies()` on the
word-vectors object (`model.wv`) that can be fed a series of word-analogy
problems from the original Google word2vec.c release, and return a score on
that task. But of course that may not test your corpus's most important
words, and further, word-vectors that do best on analogies may not do best
for classification problems, or info-retrieval, or other tasks. To know
which parameters are best for your project, you need to check them against
some version of that task.
- Gordon
Thanks
Post by Gordon MohrPost by Gordon MohrPost by Heta SaraiyaOkay thank you so much for the help. I only have one more question. If I
change the paramaters and train again then can I compare loss values to the
current values to see which model performs better?
No, as mentioned previously, the loss is not a reliable indicator of
overall model quality. The model with the lowest loss could perform worse
on real tasks â as in the given example of an overfit model. It's just an
indicator of training progress, and when loss stops improving it's a hint
that further training can't help.
Further, many of the parameters change the type/amount of training that
happens. For example, a different 'negative' value means more
negative-examples are trained. A different 'window' means more
(context->target) examples are constructed. A different `sample` value
drops a different proportion of words. A different 'min_count' drops
different low-frequency words. The loss values are at best just comparable
within a single model, over the course of its training.
Is there a reason you can't share the `sentences` output I suggested to
debug your problem? Did you try that at all, and did it lead lead you to
discover an error you were making that explained the prior atypical loss
behavior?
- Gordon
Post by Heta SaraiyaThanks
Post by Heta SaraiyaHi,
I am training a dataset using Word2Vec and saving the training loss
after each epoch. But the training loss does not decrease after some epochs
but it increases. Can you give me any idea of why this happens?
Thanks
--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.