[gensim:4349] LdaMulticore spawning #workers processes but using a single processor

Discussion:

Stephen Wu

2015-06-18 21:02:47 UTC

I'm running on a machine with 16 cores. LdaMulticore seems to recognize
that I have 16 cores and by default starts 16 workers. However, all the
workers are divvying up work on the same processor. So on my 900k-document
corpus, this is taking a while.

I had a few hypotheses about why this was the case and talked to others
about some of these. So far, I don't think the culprit is any of the below
but I could be wrong:

- I wrapped LdaMulticore in a custom scikit-learn estimator, and this
estimator does give real results after being trained.
- I am running on a 900k-doc corpus that sits in memory at about 10+GB
- I'm kicking it off within iPython within a screen session
- I've tested running a few other Python processes, and they all use the
same CPU. E.g., I'm trying to parse wikipedia using gensim, and its
worker(s) also use the same CPU.

Any help appreciated.

--
You received this message because you are subscribed to the Google Groups "gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Stephen Wu

2015-06-19 16:21:16 UTC

Permalink

I killed the processes and reran them with no/minimal changes and
parallelization is working just fine. Unclear why, which is a bit
unsatisfying after several hours of digging.
Leading hypothesis: this was probably some OS-level thing, e.g., processes
might have wanted to stay on the same processor to make use of caches
efficiently.

stephen

Post by Stephen Wu
I'm running on a machine with 16 cores. LdaMulticore seems to recognize
that I have 16 cores and by default starts 16 workers. However, all the
workers are divvying up work on the same processor. So on my 900k-document
corpus, this is taking a while.
I had a few hypotheses about why this was the case and talked to others
about some of these. So far, I don't think the culprit is any of the below
- I wrapped LdaMulticore in a custom scikit-learn estimator, and this
estimator does give real results after being trained.
- I am running on a 900k-doc corpus that sits in memory at about 10+GB
- I'm kicking it off within iPython within a screen session
- I've tested running a few other Python processes, and they all use
the same CPU. E.g., I'm trying to parse wikipedia using gensim, and its
worker(s) also use the same CPU.
Any help appreciated.

Radim Řehůřek

2015-06-19 16:39:53 UTC

Permalink

Hello Stephen,

do you happen to have a log from when things didn't work (INFO level, or
preferably DEBUG)?

I'm thinking maybe one of the processes failed / died for some reason, and
the multiprocessing didn't recover. If that's the case, there should be a
stack trace in the log.

Just a wild hypothesis :)

Radim

Post by Stephen Wu
I killed the processes and reran them with no/minimal changes and
parallelization is working just fine. Unclear why, which is a bit
unsatisfying after several hours of digging.
Leading hypothesis: this was probably some OS-level thing, e.g., processes
might have wanted to stay on the same processor to make use of caches
efficiently.
stephen

Stephen Wu

2015-06-19 17:34:11 UTC

Permalink

Thanks for following up. I haven't actually gotten the training to work in
the end, so I'd welcome you looking at the issue!

I didn't see anything notable in INFO but unfortunately I don't have the
logs for LdaMulticore. I was running make_wiki simultaneously, though, and
it was trying to do everything on the same core that LdaMulticore was -- so
maybe there's something in that. The make_wiki process would have
completed but was just going really slow. Below is the fairly normal INFO
output of make_wiki, and where I cut it off.

stephen

2015-06-18 10:17:54,373 : INFO : adding document #2990000 to
Dictionary(2000000 unique tokens: [u'tripolitan', u'ftdna', u'fi\u0250',
u'soestdijk', u'phintella']...)
2015-06-18 10:20:31,873 : INFO : discarding 37835 tokens: [(u'giravee', 1),
(u'actuariesindia', 1), (u'wonho', 1), (u'nerdocrumbesia', 1), (u'jidova',
1), (u'alfredomacias', 1), (u'ysa\u04f1e', 1), (u'saraldi', 1),
(u'belvilacqua', 1), (u'cargharay', 1)]...
2015-06-18 10:20:31,879 : INFO : keeping 2000000 tokens which were in no
less than 0 and no more than 3000000 (=100.0%) documents
2015-06-18 10:20:43,771 : INFO : resulting dictionary: Dictionary(2000000
unique tokens: [u'tripolitan', u'ftdna', u'fi\u0250', u'soestdijk',
u'phintella']...)
2015-06-18 10:20:43,940 : INFO : adding document #3000000 to
Dictionary(2000000 unique tokens: [u'tripolitan', u'ftdna', u'fi\u0250',
u'soestdijk', u'phintella']...)^C
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File
"/home/swu/trapit/research/.virt/lib/python2.7/site-packages/gensim/scripts/make_wiki.py",
line 83, in <module>
wiki = WikiCorpus(inp, lemmatize=lemmatize) # takes about 9h on a
macbook pro, for 3.5m articles (june 2011)
File
"/home/swu/trapit/research/.virt/local/lib/python2.7/site-packages/gensim/corpora/wikicorpus.py",
line 270, in __init__
self.dictionary = Dictionary(self.get_texts())
File
"/home/swu/trapit/research/.virt/local/lib/python2.7/site-packages/gensim/corpora/dictionary.py",
line 58, in __init__
self.add_documents(documents, prune_at=prune_at)
File
"/home/swu/trapit/research/.virt/local/lib/python2.7/site-packages/gensim/corpora/dictionary.py",
line 124, in add_documents
logger.info("adding document #%i to %s", docno, self)
File "/usr/lib/python2.7/logging/__init__.py", line 1140, in info
self._log(INFO, msg, args, **kwargs)
File "/usr/lib/python2.7/logging/__init__.py", line 1258, in _log
self.handle(record)
File "/usr/lib/python2.7/logging/__init__.py", line 1268, in handle
self.callHandlers(record)
File "/usr/lib/python2.7/logging/__init__.py", line 1308, in callHandlers
hdlr.handle(record)
File "/usr/lib/python2.7/logging/__init__.py", line 748, in handle
self.emit(record)
File "/usr/lib/python2.7/logging/__init__.py", line 867, in emit
stream.write(fs % msg)
KeyboardInterrupt
Process PoolWorker-15:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in
_bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 85, in worker
task = get()
File "/usr/lib/python2.7/multiprocessing/queues.py", line 374, in get
racquire()

Post by Radim ÅehÅ¯Åek
Hello Stephen,
do you happen to have a log from when things didn't work (INFO level, or
preferably DEBUG)?
I'm thinking maybe one of the processes failed / died for some reason, and
the multiprocessing didn't recover. If that's the case, there should be a
stack trace in the log.
Just a wild hypothesis :)
Radim

Post by Stephen Wu
I killed the processes and reran them with no/minimal changes and
parallelization is working just fine. Unclear why, which is a bit
unsatisfying after several hours of digging.
Leading hypothesis: this was probably some OS-level thing, e.g.,
processes might have wanted to stay on the same processor to make use of
caches efficiently.
stephen

Post by Stephen Wu
I'm running on a machine with 16 cores. LdaMulticore seems to recognize
that I have 16 cores and by default starts 16 workers. However, all the
workers are divvying up work on the same processor. So on my 900k-document
corpus, this is taking a while.
I had a few hypotheses about why this was the case and talked to others
about some of these. So far, I don't think the culprit is any of the below
- I wrapped LdaMulticore in a custom scikit-learn estimator, and
this estimator does give real results after being trained.
- I am running on a 900k-doc corpus that sits in memory at about 10+GB
- I'm kicking it off within iPython within a screen session
- I've tested running a few other Python processes, and they all use
the same CPU. E.g., I'm trying to parse wikipedia using gensim, and its
worker(s) also use the same CPU.
Any help appreciated.

You received this message because you are subscribed to a topic in the
Google Groups "gensim" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/gensim/2pYRRDaFriY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
For more options, visit https://groups.google.com/d/optout.

o***@berkeley.edu

2015-06-25 04:01:17 UTC

Permalink

Hello,

I'm having the same problem and would also really appreciate some help.

Checking "ps -F -A | grep NameOfMyProgram" shows that Gensim is spawning
the correct number of processes by default, but that they are all on the
same processor (I'm on a 24 core Red Hat machine). I'm running inside a
virtual environment, but it looks like that shouldn't effect things and
when I launched from outside the virtual environment processes ran on 4
cores, which was better, but still not good. Note, I think I'm calling
Gensim correctly as it does distribute to the two cores on my laptop when I
run the same code there.

Any help or suggestions are really appreciated, as I'm not really sure
where to go from here.

Thanks.
Orianna

Post by Stephen Wu
Thanks for following up. I haven't actually gotten the training to work
in the end, so I'd welcome you looking at the issue!
I didn't see anything notable in INFO but unfortunately I don't have the
logs for LdaMulticore. I was running make_wiki simultaneously, though, and
it was trying to do everything on the same core that LdaMulticore was -- so
maybe there's something in that. The make_wiki process would have
completed but was just going really slow. Below is the fairly normal INFO
output of make_wiki, and where I cut it off.
stephen
2015-06-18 10:17:54,373 : INFO : adding document #2990000 to
Dictionary(2000000 unique tokens: [u'tripolitan', u'ftdna', u'fi\u0250',
u'soestdijk', u'phintella']...)
2015-06-18 10:20:31,873 : INFO : discarding 37835 tokens: [(u'giravee',
1), (u'actuariesindia', 1), (u'wonho', 1), (u'nerdocrumbesia', 1),
(u'jidova', 1), (u'alfredomacias', 1), (u'ysa\u04f1e', 1), (u'saraldi', 1),
(u'belvilacqua', 1), (u'cargharay', 1)]...
2015-06-18 10:20:31,879 : INFO : keeping 2000000 tokens which were in no
less than 0 and no more than 3000000 (=100.0%) documents
2015-06-18 10:20:43,771 : INFO : resulting dictionary: Dictionary(2000000
unique tokens: [u'tripolitan', u'ftdna', u'fi\u0250', u'soestdijk',
u'phintella']...)
2015-06-18 10:20:43,940 : INFO : adding document #3000000 to
Dictionary(2000000 unique tokens: [u'tripolitan', u'ftdna', u'fi\u0250',
u'soestdijk', u'phintella']...)^C
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File
"/home/swu/trapit/research/.virt/lib/python2.7/site-packages/gensim/scripts/make_wiki.py",
line 83, in <module>
wiki = WikiCorpus(inp, lemmatize=lemmatize) # takes about 9h on a
macbook pro, for 3.5m articles (june 2011)
File
"/home/swu/trapit/research/.virt/local/lib/python2.7/site-packages/gensim/corpora/wikicorpus.py",
line 270, in __init__
self.dictionary = Dictionary(self.get_texts())
File
"/home/swu/trapit/research/.virt/local/lib/python2.7/site-packages/gensim/corpora/dictionary.py",
line 58, in __init__
self.add_documents(documents, prune_at=prune_at)
File
"/home/swu/trapit/research/.virt/local/lib/python2.7/site-packages/gensim/corpora/dictionary.py",
line 124, in add_documents
logger.info("adding document #%i to %s", docno, self)
File "/usr/lib/python2.7/logging/__init__.py", line 1140, in info
self._log(INFO, msg, args, **kwargs)
File "/usr/lib/python2.7/logging/__init__.py", line 1258, in _log
self.handle(record)
File "/usr/lib/python2.7/logging/__init__.py", line 1268, in handle
self.callHandlers(record)
File "/usr/lib/python2.7/logging/__init__.py", line 1308, in callHandlers
hdlr.handle(record)
File "/usr/lib/python2.7/logging/__init__.py", line 748, in handle
self.emit(record)
File "/usr/lib/python2.7/logging/__init__.py", line 867, in emit
stream.write(fs % msg)
KeyboardInterrupt
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in
_bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 85, in worker
task = get()
File "/usr/lib/python2.7/multiprocessing/queues.py", line 374, in get
racquire()

Post by Radim ÅehÅ¯Åek
Hello Stephen,
do you happen to have a log from when things didn't work (INFO level, or
preferably DEBUG)?
I'm thinking maybe one of the processes failed / died for some reason,
and the multiprocessing didn't recover. If that's the case, there should be
a stack trace in the log.
Just a wild hypothesis :)
Radim

Post by Stephen Wu
I killed the processes and reran them with no/minimal changes and
parallelization is working just fine. Unclear why, which is a bit
unsatisfying after several hours of digging.
Leading hypothesis: this was probably some OS-level thing, e.g.,
processes might have wanted to stay on the same processor to make use of
caches efficiently.
stephen

Post by Stephen Wu
I'm running on a machine with 16 cores. LdaMulticore seems to
recognize that I have 16 cores and by default starts 16 workers. However,
all the workers are divvying up work on the same processor. So on my
900k-document corpus, this is taking a while.
I had a few hypotheses about why this was the case and talked to others
about some of these. So far, I don't think the culprit is any of the below
- I wrapped LdaMulticore in a custom scikit-learn estimator, and
this estimator does give real results after being trained.
- I am running on a 900k-doc corpus that sits in memory at about 10+GB
- I'm kicking it off within iPython within a screen session
- I've tested running a few other Python processes, and they all
use the same CPU. E.g., I'm trying to parse wikipedia using gensim, and
its worker(s) also use the same CPU.
Any help appreciated.

Stephen Wu

2015-06-25 16:01:19 UTC

Permalink

Interesting, Orianna. My problem does reappear as well -- shutting down
processes and restarting them doesn't always work. Also, I suspect that
some of the methods may end up jumping on the same core later on in
processing? Could be totally wrong about that. Radim, is there
gensim-specific logging that you're looking for?

stephen

Post by o***@berkeley.edu
Hello,
I'm having the same problem and would also really appreciate some help.
Checking "ps -F -A | grep NameOfMyProgram" shows that Gensim is spawning
the correct number of processes by default, but that they are all on the
same processor (I'm on a 24 core Red Hat machine). I'm running inside a
virtual environment, but it looks like that shouldn't effect things and
when I launched from outside the virtual environment processes ran on 4
cores, which was better, but still not good. Note, I think I'm calling
Gensim correctly as it does distribute to the two cores on my laptop when I
run the same code there.
Any help or suggestions are really appreciated, as I'm not really sure
where to go from here.
Thanks.
Orianna

Post by Radim ÅehÅ¯Åek
Hello Stephen,
do you happen to have a log from when things didn't work (INFO level, or
preferably DEBUG)?
I'm thinking maybe one of the processes failed / died for some reason,
and the multiprocessing didn't recover. If that's the case, there should be
a stack trace in the log.
Just a wild hypothesis :)
Radim

Post by Stephen Wu
I killed the processes and reran them with no/minimal changes and
parallelization is working just fine. Unclear why, which is a bit
unsatisfying after several hours of digging.
Leading hypothesis: this was probably some OS-level thing, e.g.,
processes might have wanted to stay on the same processor to make use of
caches efficiently.
stephen

Post by Stephen Wu
I'm running on a machine with 16 cores. LdaMulticore seems to
recognize that I have 16 cores and by default starts 16 workers. However,
all the workers are divvying up work on the same processor. So on my
900k-document corpus, this is taking a while.
I had a few hypotheses about why this was the case and talked to
others about some of these. So far, I don't think the culprit is any of
- I wrapped LdaMulticore in a custom scikit-learn estimator, and
this estimator does give real results after being trained.
- I am running on a 900k-doc corpus that sits in memory at about 10+GB
- I'm kicking it off within iPython within a screen session
- I've tested running a few other Python processes, and they all
use the same CPU. E.g., I'm trying to parse wikipedia using gensim, and
its worker(s) also use the same CPU.
Any help appreciated.

o***@berkeley.edu

2015-06-25 20:45:24 UTC

Permalink

Hi,

Yes, this is a very unfortunate problem that I'll be happy to fix.

Ok, so I double checked that running in the virtual environment isn't
causing any problems. When I run outside I also get 26 processes allocating

Post by Stephen Wu

Post by o***@berkeley.edu
ps -F -A

UID PID PPID C SZ RSS PSR STIME TTY TIME CMD
[*snip*]
odemasi 61669 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python
RunLDA.py 2
odemasi 61670 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python
RunLDA.py 2
odemasi 61671 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python
RunLDA.py 2
odemasi 61672 59981 0 2738764 9821696 14 03:42 pts/5 00:00:00 python
RunLDA.py 2
odemasi 61673 59981 0 2738764 9821696 14 03:42 pts/5 00:00:00 python
RunLDA.py 2
odemasi 61674 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python
RunLDA.py 2
odemasi 61675 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python
RunLDA.py 2
odemasi 61676 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python
RunLDA.py 2
odemasi 61681 59981 0 2738764 9821680 14 03:42 pts/5 00:00:00 python
RunLDA.py 2
odemasi 61682 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python
RunLDA.py 2
odemasi 61683 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python
RunLDA.py 2
odemasi 61684 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python
RunLDA.py 2
odemasi 61685 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python
RunLDA.py 2
odemasi 61686 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python
RunLDA.py 2
odemasi 61687 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python
RunLDA.py 2
odemasi 61688 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python
RunLDA.py 2
odemasi 61689 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python
RunLDA.py 2
odemasi 61694 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python
RunLDA.py 2
odemasi 61698 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python
RunLDA.py 2
odemasi 61699 59981 0 2738764 9821704 23 03:42 pts/5 00:00:00 python
RunLDA.py 2
odemasi 61700 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python
RunLDA.py 2
odemasi 61701 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python
RunLDA.py 2
odemasi 61702 59981 0 2738764 9821696 14 03:42 pts/5 00:00:00 python
RunLDA.py 2
odemasi 61703 59981 0 2738764 9821696 14 03:42 pts/5 00:00:00 python
RunLDA.py 2
[*snip*]

The standard out that I'm getting is:
/home/odemasi/Packages/venv/lib/python2.6/site-packages/numpy/lib/utils.py:95:
DeprecationWarning: `scipy.sparse.sparsetools` is deprecated!
scipy.sparse.sparsetools is a private module for scipy.sparse, and should
not be used.
warnings.warn(depdoc, DeprecationWarning)
/home/odemasi/Packages/venv/lib/python2.6/site-packages/scipy/lib/_util.py:67:
DeprecationWarning: Module scipy.linalg.blas.fblas is deprecated, use
scipy.linalg.blas instead
DeprecationWarning)
2015-06-25 03:36:38,835 : INFO : adding document #0 to Dictionary(0 unique
tokens: [])
2015-06-25 03:39:34,893 : INFO : built Dictionary(5060602 unique tokens:
[u'loyalsubscribers', u'iftheyclosedchipotleiddie',
u'\u666e\u6bb5\u306e\u53e3\u8abf\u3067\u4f55\u6ce3\u3044\u3066\u308b\u3093\u3067\u3059\u304b\u79c1\u306f\u3069\u3053\u306b\u3082\u884c\u304d\u307e\u305b\u3093\u304b\u3089\u5927\u4e08\u592b\u3067\u3059\u3092\u8a00\u3046',
u'deargodmakeatrade', u'billycorgan']...) from 1 documents (total 5060602
corpus positions)
2015-06-25 03:39:36,283 : INFO : using symmetric alpha at 0.01
2015-06-25 03:39:36,283 : INFO : using serial LDA version on this node
2015-06-25 03:42:20,479 : WARNING : input corpus stream has no len();
counting documents
2015-06-25 03:42:25,018 : INFO : running online LDA training, 100 topics, 1
passes over the supplied corpus of 100000 documents, updating every 48000
documents, evaluating every ~100000 documents, iterating 50x with a
convergence threshold of 0.001000
2015-06-25 03:42:25,018 : WARNING : too few updates, training might not
converge; consider increasing the number of passes or iterations to improve
accuracy
2015-06-25 03:42:25,023 : INFO : training LDA model using 24 processes
2015-06-25 03:42:27,407 : INFO : PROGRESS: pass 0, dispatched chunk #0 =
documents up to #2000/100000, outstanding queue size 1
Traceback (most recent call last):
File "/usr/lib64/python2.6/multiprocessing/queues.py", line 242, in _feed
send(obj)
SystemError: NULL result without error in PyObject_Call
2015-06-25 03:42:30,449 : INFO : PROGRESS: pass 0, dispatched chunk #1 =
documents up to #4000/100000, outstanding queue size 2
2015-06-25 03:42:30,612 : INFO : PROGRESS: pass 0, dispatched chunk #2 =
documents up to #6000/100000, outstanding queue size 3
2015-06-25 03:42:30,793 : INFO : PROGRESS: pass 0, dispatched chunk #3 =
documents up to #8000/100000, outstanding queue size 4

A little more about my application: each document is very tiny and right
now I'm constraining the training to 100,000 documents. It takes < 1min to
load and stream through the data. I know that running with this little data
won't give me much performance gain, but until I can get it dispersing the
work I can't run withe more data. The process has already been running for
17 hours, and that seems like a ridiculously long time for a corpus that is
a few MB (9 million documents is ~1.5GB).

Any suggestions of what to check next?

Thanks!
Orianna

Post by Stephen Wu
Interesting, Orianna. My problem does reappear as well -- shutting down
processes and restarting them doesn't always work. Also, I suspect that
some of the methods may end up jumping on the same core later on in
processing? Could be totally wrong about that. Radim, is there
gensim-specific logging that you're looking for?
stephen

Post by Stephen Wu
Thanks for following up. I haven't actually gotten the training to work
in the end, so I'd welcome you looking at the issue!
I didn't see anything notable in INFO but unfortunately I don't have the
logs for LdaMulticore. I was running make_wiki simultaneously, though, and
it was trying to do everything on the same core that LdaMulticore was -- so
maybe there's something in that. The make_wiki process would have
completed but was just going really slow. Below is the fairly normal INFO
output of make_wiki, and where I cut it off.
stephen
2015-06-18 10:17:54,373 : INFO : adding document #2990000 to
Dictionary(2000000 unique tokens: [u'tripolitan', u'ftdna', u'fi\u0250',
u'soestdijk', u'phintella']...)
2015-06-18 10:20:31,873 : INFO : discarding 37835 tokens: [(u'giravee',
1), (u'actuariesindia', 1), (u'wonho', 1), (u'nerdocrumbesia', 1),
(u'jidova', 1), (u'alfredomacias', 1), (u'ysa\u04f1e', 1), (u'saraldi', 1),
(u'belvilacqua', 1), (u'cargharay', 1)]...
2015-06-18 10:20:31,879 : INFO : keeping 2000000 tokens which were in no
less than 0 and no more than 3000000 (=100.0%) documents
Dictionary(2000000 unique tokens: [u'tripolitan', u'ftdna', u'fi\u0250',
u'soestdijk', u'phintella']...)
2015-06-18 10:20:43,940 : INFO : adding document #3000000 to
Dictionary(2000000 unique tokens: [u'tripolitan', u'ftdna', u'fi\u0250',
u'soestdijk', u'phintella']...)^C
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File
"/home/swu/trapit/research/.virt/lib/python2.7/site-packages/gensim/scripts/make_wiki.py",
line 83, in <module>
wiki = WikiCorpus(inp, lemmatize=lemmatize) # takes about 9h on a
macbook pro, for 3.5m articles (june 2011)
File
"/home/swu/trapit/research/.virt/local/lib/python2.7/site-packages/gensim/corpora/wikicorpus.py",
line 270, in __init__
self.dictionary = Dictionary(self.get_texts())
File
"/home/swu/trapit/research/.virt/local/lib/python2.7/site-packages/gensim/corpora/dictionary.py",
line 58, in __init__
self.add_documents(documents, prune_at=prune_at)
File
"/home/swu/trapit/research/.virt/local/lib/python2.7/site-packages/gensim/corpora/dictionary.py",
line 124, in add_documents
logger.info("adding document #%i to %s", docno, self)
File "/usr/lib/python2.7/logging/__init__.py", line 1140, in info
self._log(INFO, msg, args, **kwargs)
File "/usr/lib/python2.7/logging/__init__.py", line 1258, in _log
self.handle(record)
File "/usr/lib/python2.7/logging/__init__.py", line 1268, in handle
self.callHandlers(record)
File "/usr/lib/python2.7/logging/__init__.py", line 1308, in callHandlers
hdlr.handle(record)
File "/usr/lib/python2.7/logging/__init__.py", line 748, in handle
self.emit(record)
File "/usr/lib/python2.7/logging/__init__.py", line 867, in emit
stream.write(fs % msg)
KeyboardInterrupt
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in
_bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 85, in worker
task = get()
File "/usr/lib/python2.7/multiprocessing/queues.py", line 374, in get
racquire()

Post by Radim ÅehÅ¯Åek
Hello Stephen,
do you happen to have a log from when things didn't work (INFO level,
or preferably DEBUG)?
I'm thinking maybe one of the processes failed / died for some reason,
and the multiprocessing didn't recover. If that's the case, there should be
a stack trace in the log.
Just a wild hypothesis :)
Radim

Post by Stephen Wu
I killed the processes and reran them with no/minimal changes and
parallelization is working just fine. Unclear why, which is a bit
unsatisfying after several hours of digging.
Leading hypothesis: this was probably some OS-level thing, e.g.,
processes might have wanted to stay on the same processor to make use of
caches efficiently.
stephen

Post by Stephen Wu
I'm running on a machine with 16 cores. LdaMulticore seems to
recognize that I have 16 cores and by default starts 16 workers. However,
all the workers are divvying up work on the same processor. So on my
900k-document corpus, this is taking a while.
I had a few hypotheses about why this was the case and talked to
others about some of these. So far, I don't think the culprit is any of
- I wrapped LdaMulticore in a custom scikit-learn estimator, and
this estimator does give real results after being trained.
- I am running on a 900k-doc corpus that sits in memory at about 10+GB
- I'm kicking it off within iPython within a screen session
- I've tested running a few other Python processes, and they all
use the same CPU. E.g., I'm trying to parse wikipedia using gensim, and
its worker(s) also use the same CPU.
Any help appreciated.

o***@berkeley.edu

2015-06-26 18:16:16 UTC

Permalink

Hi Stephen,

tl;dr: I'm hoping it's just a problem with openBLAS declaring task affinity
to the processor the job is launched from, but I can't resolve the issue
with the fixes I found online, so I'm sharing with you in hopes you have
brighter ideas than I had.

Are your scipy and numpy also compiled against OpenBLAS or GotoBLAS? I
think that's what I'm working with (OpenBLAS) and it seems that other
people have also had trouble getting multiple processes to associate with
different cores in Python. In particular, I was looking at the following
and it looked like it pertained to our problem:

http://stackoverflow.com/questions/15639779/why-does-multiprocessing-use-only-a-single-core-after-i-import-numpy
http://bugs.python.org/issue17038#msg180663

I tried both launching gensim with:
export OPENBLAS_MAIN_FREE=1
python myLDAscript.py

and by putting

import os
os.system('taskset -p 0xffffffff %d' % os.getpid()) # also tried
os.system('taskset -p 0xff %d' % os.getpid())

at the begining of myLDAscript.py. Sometimes that gave me a memory error,
so I took it back out:

2015-06-26 00:21:20,469 : INFO : using symmetric alpha at 0.01
2015-06-26 00:21:20,469 : INFO : using serial LDA version on this node
Traceback (most recent call last):
File "RunLDA_copy.py", line 52, in <module>
lda = models.ldamulticore.LdaMulticore(corpus_memory_friendly,
id2word=dictionary, num_topics=NUMTOPICS, workers=None)
File "/usr/lib/python2.6/site-packages/gensim/models/ldamulticore.py",
line 141, in __init__
gamma_threshold=gamma_threshold)
File "/usr/lib/python2.6/site-packages/gensim/models/ldamodel.py", line
313, in __init__
self.sync_state()
File "/usr/lib/python2.6/site-packages/gensim/models/ldamodel.py", line
326, in sync_state
self.expElogbeta = numpy.exp(self.state.get_Elogbeta())
File "/usr/lib/python2.6/site-packages/gensim/models/ldamodel.py", line
161, in get_Elogbeta
return dirichlet_expectation(self.get_lambda())
File "/usr/lib/python2.6/site-packages/gensim/models/ldamodel.py", line
157, in get_lambda
return self.eta + self.sstats
MemoryError

or

2015-06-26 00:22:24,037 : INFO : using symmetric alpha at 0.01
2015-06-26 00:22:24,037 : INFO : using serial LDA version on this node
Traceback (most recent call last):
File "RunLDA_copy2.py", line 52, in <module>
lda = models.ldamulticore.LdaMulticore(corpus_memory_friendly,
id2word=dictionary, num_topics=NUMTOPICS, workers=None)
File "/usr/lib/python2.6/site-packages/gensim/models/ldamulticore.py",
line 141, in __init__
gamma_threshold=gamma_threshold)
File "/usr/lib/python2.6/site-packages/gensim/models/ldamodel.py", line
311, in __init__
self.state = LdaState(self.eta, (self.num_topics, self.num_terms))
File "/usr/lib/python2.6/site-packages/gensim/models/ldamodel.py", line
79, in __init__
self.sstats = numpy.zeros(shape)
MemoryError

I tried editing gensim/utils.py, gensim/matutils.py and put
os.system('taskset -p 0xff %d' % os.getpid()) after the imports in there,
but that didn't seem to fix things either, so I took it out. I did try
running the toy script (with an svd at the heart of the loop) from the
stackoverflow above. It ran and distributed to the multiple cores just
fine, so I couldn't reproduce the error the user had, even though I'm also
running against OpenBLAS. However, gensim still won't work, so I tried the
fixes above to no avail.

After all that, I was inspired by
http://xcorr.net/2013/05/19/python-refuses-to-use-multiple-cores-solution/
and tried following that by putting

import numpy
import scipy
import affinity
import multiprocessing
affinity.set_process_affinity_mask(0,2**multiprocessing.cpu_count()-1)

at the header of myLDAscript.py. That also didn't work.

On a realted note, I also tried to get the distributed gensim running on my
machine, but, well, that didn't go too well. If you got this working and
have any suggestions, it would be great.

I'm at my wits' end. If you have any thoughts I'd love to hear them,
otherwise I might switch to another package that my team has used before.
Thanks!
Orianna

Related:
http://stackoverflow.com/questions/15168014/both-multiprocessing-map-and-joblib-use-only-1-cpu-after-upgrade-from-ubuntu-10
http://stackoverflow.com/questions/12592018/multiprocessing-pool-processes-locked-to-a-single-core

o***@berkeley.edu

2015-06-26 18:28:08 UTC

Permalink

I just made the vocabulary smaller and now it seems to be distributing, and
even more important, flying. I set the OPENBLAS_MAIN_FREE environment
variable and nothing else.

Post by o***@berkeley.edu
Hi Stephen,
tl;dr: I'm hoping it's just a problem with openBLAS declaring task
affinity to the processor the job is launched from, but I can't resolve the
issue with the fixes I found online, so I'm sharing with you in hopes you
have brighter ideas than I had.
Are your scipy and numpy also compiled against OpenBLAS or GotoBLAS? I
think that's what I'm working with (OpenBLAS) and it seems that other
people have also had trouble getting multiple processes to associate with
different cores in Python. In particular, I was looking at the following
http://stackoverflow.com/questions/15639779/why-does-multiprocessing-use-only-a-single-core-after-i-import-numpy
http://bugs.python.org/issue17038#msg180663
export OPENBLAS_MAIN_FREE=1
python myLDAscript.py
and by putting
import os
os.system('taskset -p 0xffffffff %d' % os.getpid()) # also tried
os.system('taskset -p 0xff %d' % os.getpid())
at the begining of myLDAscript.py. Sometimes that gave me a memory error,
2015-06-26 00:21:20,469 : INFO : using symmetric alpha at 0.01
2015-06-26 00:21:20,469 : INFO : using serial LDA version on this node
File "RunLDA_copy.py", line 52, in <module>
lda = models.ldamulticore.LdaMulticore(corpus_memory_friendly,
id2word=dictionary, num_topics=NUMTOPICS, workers=None)
File "/usr/lib/python2.6/site-packages/gensim/models/ldamulticore.py",
line 141, in __init__
gamma_threshold=gamma_threshold)
File "/usr/lib/python2.6/site-packages/gensim/models/ldamodel.py", line
313, in __init__
self.sync_state()
File "/usr/lib/python2.6/site-packages/gensim/models/ldamodel.py", line
326, in sync_state
self.expElogbeta = numpy.exp(self.state.get_Elogbeta())
File "/usr/lib/python2.6/site-packages/gensim/models/ldamodel.py", line
161, in get_Elogbeta
return dirichlet_expectation(self.get_lambda())
File "/usr/lib/python2.6/site-packages/gensim/models/ldamodel.py", line
157, in get_lambda
return self.eta + self.sstats
MemoryError
or
2015-06-26 00:22:24,037 : INFO : using symmetric alpha at 0.01
2015-06-26 00:22:24,037 : INFO : using serial LDA version on this node
File "RunLDA_copy2.py", line 52, in <module>
lda = models.ldamulticore.LdaMulticore(corpus_memory_friendly,
id2word=dictionary, num_topics=NUMTOPICS, workers=None)
File "/usr/lib/python2.6/site-packages/gensim/models/ldamulticore.py",
line 141, in __init__
gamma_threshold=gamma_threshold)
File "/usr/lib/python2.6/site-packages/gensim/models/ldamodel.py", line
311, in __init__
self.state = LdaState(self.eta, (self.num_topics, self.num_terms))
File "/usr/lib/python2.6/site-packages/gensim/models/ldamodel.py", line
79, in __init__
self.sstats = numpy.zeros(shape)
MemoryError
I tried editing gensim/utils.py, gensim/matutils.py and put
os.system('taskset -p 0xff %d' % os.getpid()) after the imports in there,
but that didn't seem to fix things either, so I took it out. I did try
running the toy script (with an svd at the heart of the loop) from the
stackoverflow above. It ran and distributed to the multiple cores just
fine, so I couldn't reproduce the error the user had, even though I'm also
running against OpenBLAS. However, gensim still won't work, so I tried the
fixes above to no avail.
After all that, I was inspired by
http://xcorr.net/2013/05/19/python-refuses-to-use-multiple-cores-solution/
and tried following that by putting
import numpy
import scipy
import affinity
import multiprocessing
affinity.set_process_affinity_mask(0,2**multiprocessing.cpu_count()-1)
at the header of myLDAscript.py. That also didn't work.
On a realted note, I also tried to get the distributed gensim running on
my machine, but, well, that didn't go too well. If you got this working and
have any suggestions, it would be great.
I'm at my wits' end. If you have any thoughts I'd love to hear them,
otherwise I might switch to another package that my team has used before.
Thanks!
Orianna
http://stackoverflow.com/questions/15168014/both-multiprocessing-map-and-joblib-use-only-1-cpu-after-upgrade-from-ubuntu-10
http://stackoverflow.com/questions/12592018/multiprocessing-pool-processes-locked-to-a-single-core

Cloud Marked

2016-12-06 17:52:28 UTC

Permalink

Has anyone ever figured out what the problem was? I am seeing identical
behavior with only one difference. The error message is different. Instead
of complaining about a NULL, the error message is about the length being
invalid. All the other symptoms are same, processes get spawned but all
except 1 are doing nothing. I did reset affinity in the code, and set
OPENBLAS_MAIN_FREE=1

If anyone has figured it out, please, share.