Discussion:
[gensim:11854] question
rmi mez
2018-12-02 22:19:30 UTC
Permalink
hi all of you,

can you tell me the minimal data size and number of iterations for generate
a word2vec model?


hachemi mohamed ramzi.
--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Gordon Mohr
2018-12-03 02:07:48 UTC
Permalink
There's no absolute minimum, but more data is always better. Typical
minimal training sets will include tens-of-thousands of unique words, with
dozens-or-more examples of each word for which you want a good vector.

To some extent, you can squeeze demo-able results from smaller datasets by
shrinking the vector sizes, and possibly by increasing the training
repetitions. But you won't be really getting the benefit of Word2Vec and
related algorithms unless you're using lots of varied text data. (If you
have a tiny project-specific corpus, you might even want to find some
public dataset with similar patterns of language use, and mix it with your
project-specific data.)

- Gordon
Post by rmi mez
hi all of you,
can you tell me the minimal data size and number of iterations for
generate a word2vec model?
hachemi mohamed ramzi.
--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
rmi mez
2018-12-03 07:44:27 UTC
Permalink
Thanks sir
Post by Gordon Mohr
There's no absolute minimum, but more data is always better. Typical
minimal training sets will include tens-of-thousands of unique words, with
dozens-or-more examples of each word for which you want a good vector.
To some extent, you can squeeze demo-able results from smaller datasets by
shrinking the vector sizes, and possibly by increasing the training
repetitions. But you won't be really getting the benefit of Word2Vec and
related algorithms unless you're using lots of varied text data. (If you
have a tiny project-specific corpus, you might even want to find some
public dataset with similar patterns of language use, and mix it with your
project-specific data.)
- Gordon
Post by rmi mez
hi all of you,
can you tell me the minimal data size and number of iterations for
generate a word2vec model?
hachemi mohamed ramzi.
--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...