[gensim:11854] question

Discussion:

rmi mez

2018-12-02 22:19:30 UTC

hi all of you,

can you tell me the minimal data size and number of iterations for generate
a word2vec model?

hachemi mohamed ramzi.

--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gordon Mohr

2018-12-03 02:07:48 UTC

Permalink

There's no absolute minimum, but more data is always better. Typical
minimal training sets will include tens-of-thousands of unique words, with
dozens-or-more examples of each word for which you want a good vector.

To some extent, you can squeeze demo-able results from smaller datasets by
shrinking the vector sizes, and possibly by increasing the training
repetitions. But you won't be really getting the benefit of Word2Vec and
related algorithms unless you're using lots of varied text data. (If you
have a tiny project-specific corpus, you might even want to find some
public dataset with similar patterns of language use, and mix it with your
project-specific data.)

- Gordon

Post by rmi mez
hi all of you,
can you tell me the minimal data size and number of iterations for
generate a word2vec model?
hachemi mohamed ramzi.

rmi mez

2018-12-03 07:44:27 UTC

Permalink

Thanks sir

Post by Gordon Mohr
There's no absolute minimum, but more data is always better. Typical
minimal training sets will include tens-of-thousands of unique words, with
dozens-or-more examples of each word for which you want a good vector.
To some extent, you can squeeze demo-able results from smaller datasets by
shrinking the vector sizes, and possibly by increasing the training
repetitions. But you won't be really getting the benefit of Word2Vec and
related algorithms unless you're using lots of varied text data. (If you
have a tiny project-specific corpus, you might even want to find some
public dataset with similar patterns of language use, and mix it with your
project-specific data.)
- Gordon

Post by rmi mez
hi all of you,
can you tell me the minimal data size and number of iterations for
generate a word2vec model?
hachemi mohamed ramzi.

--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.