[gensim:11830] trying to reproduce distributed lda tutorial, LdaModel hangs.

John-Paul Robinson

2018-11-28 18:54:03 UTC

Hi,

I'm trying to use gensim's lda topic modeling in a project using the
wikipedia data set as described in the tutorials. I have successfully done
the dictionary conversion and and matrix build.

I'm attempting to do the distributed lda training. I'm able to get a pyro
network running and lda sees it and appears to use it, however, my workers
don't appear to be doing any work and lda doesn't appear to make any
forward progress.

import logging, gensim
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s',

level=logging.DEBUG)

# load id->word mapping (the dictionary), one of the results of step 2

above

id2word =

gensim.corpora.Dictionary.load_from_text('wiki_wordids.txt.bz2')

# load corpus iterator
mm = gensim.corpora.MmCorpus('wiki_tfidf.mm')
# mm = gensim.corpora.MmCorpus('wiki_en_tfidf.mm.bz2') # use this if

you compressed the TFIDF output

print(mm)

2018-11-28 12:33:19,828 : DEBUG : {'kw': {}, 'mode': 'rb', 'uri': 'wiki_wordids.txt.bz2'}
2018-11-28 12:33:20,390 : DEBUG : {'kw': {}, 'mode': 'rb', 'uri': 'wiki_tfidf.mm.index'}
2018-11-28 12:33:20,984 : INFO : loaded corpus index from wiki_tfidf.mm.index
2018-11-28 12:33:20,986 : INFO : initializing cython corpus reader from wiki_tfidf.mm
2018-11-28 12:33:20,986 : DEBUG : {'kw': {}, 'mode': 'rb', 'uri': 'wiki_tfidf.mm'}
2018-11-28 12:33:21,031 : INFO : accepted corpus with 4562950 documents, 100000 features, 720997289 non-zero entries

MmCorpus(4562950 documents, 100000 features, 720997289 non-zero entries)

Here's the next block of code, the call to LdaModel and its output.

# extract 100 LDA topics, using 1 pass and updating once every 1 chunk

(10,000 documents)

lda = gensim.models.ldamodel.LdaModel(corpus=mm, id2word=id2word,

num_topics=100, update_every=1, chunksize=10000, passes=1, distributed=True)

2018-11-28 12:33:25,295 : INFO : using symmetric alpha at 0.01
2018-11-28 12:33:25,296 : INFO : using symmetric eta at 0.01
2018-11-28 12:33:25,435 : DEBUG : looking for dispatcher at PYRO:***@127.0.0.1:34370
2018-11-28 12:33:53,932 : INFO : using distributed version with 20 workers
2018-11-28 12:33:55,211 : INFO : running online (single-pass) LDA training, 100 topics, 1 passes over the supplied corpus of 4562950 documents, updating model once every 200000 documents, evaluating perplexity every 2000000 documents, iterating 50x with a convergence threshold of 0.001000
2018-11-28 12:33:55,213 : INFO : initializing 20 workers
2018-11-28 12:34:09,246 : DEBUG : {'kw': {}, 'mode': 'rb', 'uri': 'wiki_tfidf.mm'}
2018-11-28 12:34:16,371 : INFO : PROGRESS: pass 0, dispatching documents up to #10000/4562950
2018-11-28 12:34:22,982 : INFO : PROGRESS: pass 0, dispatching documents up to #20000/4562950
2018-11-28 12:34:28,247 : INFO : PROGRESS: pass 0, dispatching documents up to #30000/4562950
2018-11-28 12:34:32,974 : INFO : PROGRESS: pass 0, dispatching documents up to #40000/4562950
2018-11-28 12:34:36,557 : INFO : PROGRESS: pass 0, dispatching documents up to #50000/4562950
2018-11-28 12:34:38,797 : INFO : PROGRESS: pass 0, dispatching documents up to #60000/4562950
2018-11-28 12:34:40,663 : INFO : PROGRESS: pass 0, dispatching documents up to #70000/4562950
2018-11-28 12:34:42,512 : INFO : PROGRESS: pass 0, dispatching documents up to #80000/4562950
2018-11-28 12:34:46,500 : INFO : PROGRESS: pass 0, dispatching documents up to #90000/4562950
2018-11-28 12:34:51,097 : INFO : PROGRESS: pass 0, dispatching documents up to #100000/4562950
2018-11-28 12:34:55,239 : INFO : PROGRESS: pass 0, dispatching documents up to #110000/4562950

The code stalls after this last PROGRESS update. I also don't see any CPU load from my lda workers, so I suspect they aren't actually doing any work.

I'm running this in a jupyter notebook. Here's the map of my pyro network:

[***@c0101 nb]$ python -m Pyro4.nsc list
--------START LIST
Pyro.NameServer --> PYRO:***@0.0.0.0:9090
metadata: ['class:Pyro4.naming.NameServer']
gensim.lda_dispatcher --> PYRO:***@127.0.0.1:34370
gensim.lda_worker.149ae3 --> PYRO:***@172.20.201.102:38727
gensim.lda_worker.16ed2e --> PYRO:***@172.20.201.103:45434
gensim.lda_worker.1c77f3 --> PYRO:***@172.20.201.102:38357
gensim.lda_worker.2029a --> PYRO:***@172.20.201.103:40133
gensim.lda_worker.3da37b --> PYRO:***@172.20.201.102:33853
gensim.lda_worker.54c23b --> PYRO:***@172.20.201.102:44257
gensim.lda_worker.5966ea --> PYRO:***@172.20.201.102:44724
gensim.lda_worker.7263f7 --> PYRO:***@172.20.201.103:33762
gensim.lda_worker.74787b --> PYRO:***@172.20.201.102:46607
gensim.lda_worker.796f77 --> PYRO:***@172.20.201.102:40852
gensim.lda_worker.7d1c63 --> PYRO:***@172.20.201.103:33614
gensim.lda_worker.7e7056 --> PYRO:***@172.20.201.103:43702
gensim.lda_worker.9338a8 --> PYRO:***@172.20.201.102:32828
gensim.lda_worker.9e4860 --> PYRO:***@172.20.201.103:34055
gensim.lda_worker.a649cd --> PYRO:***@172.20.201.103:46621
gensim.lda_worker.e6b4f1 --> PYRO:***@172.20.201.103:42194
gensim.lda_worker.ecd5dc --> PYRO:***@172.20.201.102:38420
gensim.lda_worker.fb2844 --> PYRO:***@172.20.201.103:33647
gensim.lda_worker.fb4a65 --> PYRO:***@172.20.201.102:35575
gensim.lda_worker.fe046f --> PYRO:***@172.20.201.103:43371
--------END LIST

Not sure how to move past this point. Any pointers appreciated.

--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.