Jeremy Gollehon
2018-10-27 22:57:16 UTC
Hi. I'm hoping to get pointed in the right direction.
I'm using Python 3.6 and loading a 1.5GB text file and trying to get the
unigram count after preprocessing.
The code takes 5 minutes to run on my computer.
text = (Path() / "output" / "longabstract_corpus.txt").read_text(encoding=
"utf-8")
word_list = text.split()
unigram_count = Counter(word_list)
I stopped the process after 60 minutes when running this code.
text = (Path() / "output" / "longabstract_corpus.txt").read_text(encoding=
"utf-8")
word_list = gensim.parsing.preprocessing.preprocess_string(text)
unigram_count = Counter(word_list)
Any ideas on how to speed up preprocessing?
Thanks!
Jeremy
I'm using Python 3.6 and loading a 1.5GB text file and trying to get the
unigram count after preprocessing.
The code takes 5 minutes to run on my computer.
text = (Path() / "output" / "longabstract_corpus.txt").read_text(encoding=
"utf-8")
word_list = text.split()
unigram_count = Counter(word_list)
I stopped the process after 60 minutes when running this code.
text = (Path() / "output" / "longabstract_corpus.txt").read_text(encoding=
"utf-8")
word_list = gensim.parsing.preprocessing.preprocess_string(text)
unigram_count = Counter(word_list)
Any ideas on how to speed up preprocessing?
Thanks!
Jeremy
--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.