[gensim:11718] Gensim LDA model topic diff resulting in nan

Discussion:

Virashree Patel

2018-10-26 19:59:14 UTC

Hi,
I am pretty new at topic modeling and Gensim. So, I am still trying to
understand many of concepts. I am trying to run gensim's LDA model on my
corpus that contains around 25,446,114 tweets. I created a streaming corpus
and id2word dictionary using gensim. I am using num_topics = 100, chunk
size = 85000 (loading 85000 tweets at a time)

I am using
Gensim : 3.5.0
Numpy: 1.15.3

Here is the link to corpus and id2word
dictionary: https://drive.google.com/drive/folders/1FrJ8gJbiDqp3VC5syOjRVcQPcESdYOYa?usp=sharing

I don't know what I am doing wrong or how to solve this. Please help !!
Here are the errors I get :

/home/ec2-user/env/lib/python3.7/site-packages/gensim/models/ldamodel.py:
1023: RuntimeWarning: divide by zero encountered in log

diff = np.log(self.expElogbeta)

/home/ec2-user/env/lib/python3.7/site-packages/gensim/models/ldamodel.py:690
: RuntimeWarning: overflow encountered in add

sstats[:, ids] += np.outer(expElogthetad.T, cts / phinorm)

/home/ec2-user/env/lib/python3.7/site-packages/gensim/models/ldamodel.py:700
: RuntimeWarning: invalid value encountered in multiply

sstats *= self.expElogbeta

/home/ec2-user/env/lib/python3.7/site-packages/gensim/models/ldamodel.py:690
: RuntimeWarning: overflow encountered in add

sstats[:, ids] += np.outer(expElogthetad.T, cts / phinorm)

/home/ec2-user/env/lib/python3.7/site-packages/gensim/models/ldamodel.py:700
: RuntimeWarning: invalid value encountered in multiply

sstats *= self.expElogbeta

Process ForkPoolWorker-30:

Traceback (most recent call last):

File
"/home/linuxbrew/.linuxbrew/Cellar/python/3.7.0/lib/python3.7/multiprocessing/process.py"
, line 297, in _bootstrap

self.run()

File
"/home/linuxbrew/.linuxbrew/Cellar/python/3.7.0/lib/python3.7/multiprocessing/process.py"
, line 99, in run

self._target(*self._args, **self._kwargs)

File
"/home/linuxbrew/.linuxbrew/Cellar/python/3.7.0/lib/python3.7/multiprocessing/pool.py"
, line 105, in worker

initializer(*initargs)

File
"/home/ec2-user/env/lib/python3.7/site-packages/gensim/models/ldamulticore.py"
, line 333, in worker_e_step

worker_lda.do_estep(chunk) # TODO: auto-tune alpha?

File
"/home/ec2-user/env/lib/python3.7/site-packages/gensim/models/ldamodel.py",
line 725, in do_estep

gamma, sstats = self.inference(chunk, collect_sstats=True)

File
"/home/ec2-user/env/lib/python3.7/site-packages/gensim/models/ldamodel.py",
line 662, in inference

expElogbetad = self.expElogbeta[:, ids]

IndexError: index 287500 is out of bounds for axis 1 with size 287500

Here is the code I am running

import pprint
import logging
import gensim
logging.basicConfig(filename='gensim.log',
format="%(asctime)s:%(levelname)s:%(message)s",
level=logging.INFO)
corpus = gensim.corpora.MmCorpus('disasterTweets.mm')
id2word = gensim.corpora.Dictionary.load('disasterTweets.dict')
id2word.filter_tokens(bad_ids=[id2word.token2id['eofeofeof']])
print('eofeofeof' in id2word.token2id)

lda_model = gensim.models.LdaMulticore(corpus=corpus,
id2word=id2word,
chunksize=85000,
num_topics=100)
pprint.pprint(lda_model.print_topics())

--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Virashree Patel

2018-10-29 19:05:08 UTC

Permalink

Since, no one has replied so far I wanted to post an update. I stopped
receiving the error above after running the model with following
parameters.

lda_model = gensim.models.LdaMulticore(corpus=corpus,

id2word=id2word,

chunksize=100000,

num_topics=80,

passes=20,

workers=1,

iterations=1000

)

Although my topic coherence score is still "nan". And here are the topics I
got
[(32,
'0.099*"remind" + 0.082*"deep" + 0.076*"elderly" + 0.061*"health" + '
'0.058*"hundred" + 0.043*"clear" + 0.038*"sign" + 0.037*"contact" + '
'0.033*"pregnant" + 0.033*"debris"'),
(34,
'0.462*"irma" + 0.298*"hurricane" + 0.052*"bad" + 0.032*"hour" + '
'0.018*"public" + 0.016*"taxpayer" + 0.015*"top" + 0.013*"reporting" + '
'0.013*"administration" + 0.012*"case"'),
(74,
'0.739*"mexico" + 0.058*"part" + 0.046*"money" + 0.023*"climate" + '
'0.020*"change" + 0.013*"catastrophe" + 0.013*"amazing" + 0.011*"risk" + '
'0.010*"healthcare" + 0.010*"view"'),
(21,
'0.202*"national" + 0.113*"center" + 0.097*"shipping" +
0.078*"washington" + '
'0.064*"thread" + 0.061*"jr" + 0.045*"hurricane" + 0.037*"wife" + '
'0.022*"destructive" + 0.018*"general"'),
(20,
'0.482*"relief" + 0.087*"great" + 0.063*"red" + 0.061*"hurricane" + '
'0.058*"donation" + 0.058*"fund" + 0.045*"cross" + 0.031*"volunteer" + '
'0.027*"worker" + 0.020*"urgent"'),
(57,
'0.253*"due" + 0.093*"street" + 0.085*"child" + 0.077*"hospital" + '
'0.065*"house" + 0.065*"care" + 0.058*"quake" + 0.047*"jose" +
0.043*"flag" '
'+ 0.041*"white"'),
(54,
'0.304*"dog" + 0.197*"death" + 0.081*"aftermath" + 0.069*"real" + '
'0.033*"refuse" + 0.031*"owner" + 0.025*"clinton" + 0.020*"duty" + '
'0.020*"profit" + 0.019*"politic"'),
(73,
'0.375*"hurricaneharvey" + 0.134*"mexicocity" + 0.072*"heartbreaking" + '
'0.061*"mexicoquake" + 0.046*"newsdesk" + 0.040*"unsurprising" + '
'0.018*"arrest" + 0.014*"beloved" + 0.014*"hurricanejose" +
0.011*"freedom"'),
(50,
'0.743*"water" + 0.068*"resident" + 0.016*"port" + 0.015*"drive" + '
'0.013*"someone" + 0.010*"park" + 0.009*"approach" + 0.009*"pleading" + '
'0.008*"neglect" + 0.008*"ten"'),
(29,
'0.181*"heart" + 0.123*"donate" + 0.073*"picture" + 0.062*"bill" + '
'0.032*"gas" + 0.026*"drinkable" + 0.026*"energy" + 0.025*"survive" + '
'0.019*"army" + 0.019*"industry"'),
(6,
'0.351*"storm" + 0.210*"building" + 0.089*"tropical" + 0.052*"ship" + '
'0.049*"atlantic" + 0.044*"hurricane" + 0.029*"season" + 0.027*"system" +
'
'0.025*"depression" + 0.014*"nhc"'),
(77,
'0.933*"people" + 0.019*"effect" + 0.017*"unprecedented" + '
'0.003*"deployment" + 0.003*"broward" + 0.003*"liter" +
0.002*"los_angele" + '
'0.002*"author" + 0.001*"commission" + 0.001*"unable"'),
(12,
'0.131*"caribbean" + 0.131*"government" + 0.107*"cost" + 0.096*"deadly" +
'
'0.057*"devastating" + 0.043*"county" + 0.027*"cnn" + 0.021*"nuclear" + '
'0.019*"control" + 0.018*"senate"'),
(35,
'0.374*"florida" + 0.124*"congress" + 0.044*"footage" + 0.043*"trouble" +
'
'0.038*"show" + 0.034*"gop" + 0.034*"member" + 0.029*"respect" + '
'0.023*"senator" + 0.021*"kid"'),
(56,
'0.219*"peril" + 0.143*"coast" + 0.133*"month" + 0.097*"insult" + '
'0.048*"thousand" + 0.040*"guard" + 0.038*"gulf" + 0.036*"bankruptcy" + '
'0.026*"surge" + 0.021*"humanity"'),
(44,
'0.224*"right" + 0.156*"military" + 0.096*"high" + 0.067*"ground" + '
'0.060*"condition" + 0.043*"road" + 0.042*"anthem" + 0.029*"order" + '
'0.026*"obama" + 0.017*"comms"'),
(49,
'0.169*"force" + 0.124*"serious" + 0.088*"mile" + 0.060*"orlando" + '
'0.050*"click" + 0.038*"barbuda" + 0.031*"task" + 0.030*"shirt" + '
'0.026*"win" + 0.026*"appropriation"'),
(55,
'0.376*"puertorico" + 0.107*"dominica" + 0.097*"miami" + 0.055*"war" + '
'0.045*"important" + 0.034*"information" + 0.033*"sanjuan" + 0.032*"plan"
+ '
'0.029*"safety" + 0.020*"evacuation"'),
(65,
'0.433*"damage" + 0.096*"much" + 0.080*"plane" + 0.078*"hope" + '
'0.036*"hurricane" + 0.033*"sale" + 0.027*"foundation" + 0.018*"drop" + '
'0.014*"difference" + 0.011*"muslim"'),
(69,
'0.201*"massive" + 0.120*"powerful" + 0.100*"destruction" + 0.054*"piece"
+ '
'0.051*"hurricane" + 0.050*"xico" + 0.036*"circle" + 0.036*"nighttime" + '
'0.029*"strike" + 0.025*"form"')]

There are 20 topics only. I have followed the previous posts regarding such
errors and tried all the solutions posted but this is the best I got so
far. Any suggestion?

Post by Virashree Patel
Hi,
I am pretty new at topic modeling and Gensim. So, I am still trying to
understand many of concepts. I am trying to run gensim's LDA model on my
corpus that contains around 25,446,114 tweets. I created a streaming corpus
and id2word dictionary using gensim. I am using num_topics = 100, chunk
size = 85000 (loading 85000 tweets at a time)
I am using
Gensim : 3.5.0
Numpy: 1.15.3
https://drive.google.com/drive/folders/1FrJ8gJbiDqp3VC5syOjRVcQPcESdYOYa?usp=sharing
I don't know what I am doing wrong or how to solve this. Please help !!
1023: RuntimeWarning: divide by zero encountered in log
diff = np.log(self.expElogbeta)
690: RuntimeWarning: overflow encountered in add
sstats[:, ids] += np.outer(expElogthetad.T, cts / phinorm)
700: RuntimeWarning: invalid value encountered in multiply
sstats *= self.expElogbeta
690: RuntimeWarning: overflow encountered in add
sstats[:, ids] += np.outer(expElogthetad.T, cts / phinorm)
700: RuntimeWarning: invalid value encountered in multiply
sstats *= self.expElogbeta
File
"/home/linuxbrew/.linuxbrew/Cellar/python/3.7.0/lib/python3.7/multiprocessing/process.py"
, line 297, in _bootstrap
self.run()
File
"/home/linuxbrew/.linuxbrew/Cellar/python/3.7.0/lib/python3.7/multiprocessing/process.py"
, line 99, in run
self._target(*self._args, **self._kwargs)
File
"/home/linuxbrew/.linuxbrew/Cellar/python/3.7.0/lib/python3.7/multiprocessing/pool.py"
, line 105, in worker
initializer(*initargs)
File
"/home/ec2-user/env/lib/python3.7/site-packages/gensim/models/ldamulticore.py"
, line 333, in worker_e_step
worker_lda.do_estep(chunk) # TODO: auto-tune alpha?
File
"/home/ec2-user/env/lib/python3.7/site-packages/gensim/models/ldamodel.py"
, line 725, in do_estep
gamma, sstats = self.inference(chunk, collect_sstats=True)
File
"/home/ec2-user/env/lib/python3.7/site-packages/gensim/models/ldamodel.py"
, line 662, in inference
expElogbetad = self.expElogbeta[:, ids]
IndexError: index 287500 is out of bounds for axis 1 with size 287500
Here is the code I am running
import pprint
import logging
import gensim
logging.basicConfig(filename='gensim.log',
format="%(asctime)s:%(levelname)s:%(message)s",
level=logging.INFO)
corpus = gensim.corpora.MmCorpus('disasterTweets.mm')
id2word = gensim.corpora.Dictionary.load('disasterTweets.dict')
id2word.filter_tokens(bad_ids=[id2word.token2id['eofeofeof']])
print('eofeofeof' in id2word.token2id)
lda_model = gensim.models.LdaMulticore(corpus=corpus,
id2word=id2word,
chunksize=85000,
num_topics=100)
pprint.pprint(lda_model.print_topics())

Virashree Patel

2018-10-29 19:08:07 UTC

Permalink

Since, no one has replied so far I wanted to post an update. I stopped
receiving the error above after running the model with following
parameters.

lda_model = gensim.models.LdaMulticore(corpus=corpus,
id2word=id2word,
chunksize=100000,
num_topics=80,
passes=20,
workers=1,
iterations=1000
)

Although my topic coherence score is still "nan". And here are the topics I
got

[(32,
'0.099*"remind" + 0.082*"deep" + 0.076*"elderly" + 0.061*"health" + '
'0.058*"hundred" + 0.043*"clear" + 0.038*"sign" + 0.037*"contact" + '
'0.033*"pregnant" + 0.033*"debris"'),
(34,
'0.462*"irma" + 0.298*"hurricane" + 0.052*"bad" + 0.032*"hour" + '
'0.018*"public" + 0.016*"taxpayer" + 0.015*"top" + 0.013*"reporting" + '
'0.013*"administration" + 0.012*"case"'),
(74,
'0.739*"mexico" + 0.058*"part" + 0.046*"money" + 0.023*"climate" + '
'0.020*"change" + 0.013*"catastrophe" + 0.013*"amazing" + 0.011*"risk" + '
'0.010*"healthcare" + 0.010*"view"'),
(21,
'0.202*"national" + 0.113*"center" + 0.097*"shipping" +
0.078*"washington" + '
'0.064*"thread" + 0.061*"jr" + 0.045*"hurricane" + 0.037*"wife" + '
'0.022*"destructive" + 0.018*"general"'),
(20,
'0.482*"relief" + 0.087*"great" + 0.063*"red" + 0.061*"hurricane" + '
'0.058*"donation" + 0.058*"fund" + 0.045*"cross" + 0.031*"volunteer" + '
'0.027*"worker" + 0.020*"urgent"'),
(57,
'0.253*"due" + 0.093*"street" + 0.085*"child" + 0.077*"hospital" + '
'0.065*"house" + 0.065*"care" + 0.058*"quake" + 0.047*"jose" +
0.043*"flag" '
'+ 0.041*"white"'),
(54,
'0.304*"dog" + 0.197*"death" + 0.081*"aftermath" + 0.069*"real" + '
'0.033*"refuse" + 0.031*"owner" + 0.025*"clinton" + 0.020*"duty" + '
'0.020*"profit" + 0.019*"politic"'),
(73,
'0.375*"hurricaneharvey" + 0.134*"mexicocity" + 0.072*"heartbreaking" + '
'0.061*"mexicoquake" + 0.046*"newsdesk" + 0.040*"unsurprising" + '
'0.018*"arrest" + 0.014*"beloved" + 0.014*"hurricanejose" +
0.011*"freedom"'),
(50,
'0.743*"water" + 0.068*"resident" + 0.016*"port" + 0.015*"drive" + '
'0.013*"someone" + 0.010*"park" + 0.009*"approach" + 0.009*"pleading" + '
'0.008*"neglect" + 0.008*"ten"'),
(29,
'0.181*"heart" + 0.123*"donate" + 0.073*"picture" + 0.062*"bill" + '
'0.032*"gas" + 0.026*"drinkable" + 0.026*"energy" + 0.025*"survive" + '
'0.019*"army" + 0.019*"industry"'),
(6,
'0.351*"storm" + 0.210*"building" + 0.089*"tropical" + 0.052*"ship" + '
'0.049*"atlantic" + 0.044*"hurricane" + 0.029*"season" + 0.027*"system" +
'
'0.025*"depression" + 0.014*"nhc"'),
(77,
'0.933*"people" + 0.019*"effect" + 0.017*"unprecedented" + '
'0.003*"deployment" + 0.003*"broward" + 0.003*"liter" +
0.002*"los_angele" + '
'0.002*"author" + 0.001*"commission" + 0.001*"unable"'),
(12,
'0.131*"caribbean" + 0.131*"government" + 0.107*"cost" + 0.096*"deadly" +
'
'0.057*"devastating" + 0.043*"county" + 0.027*"cnn" + 0.021*"nuclear" + '
'0.019*"control" + 0.018*"senate"'),
(35,
'0.374*"florida" + 0.124*"congress" + 0.044*"footage" + 0.043*"trouble" +
'
'0.038*"show" + 0.034*"gop" + 0.034*"member" + 0.029*"respect" + '
'0.023*"senator" + 0.021*"kid"'),
(56,
'0.219*"peril" + 0.143*"coast" + 0.133*"month" + 0.097*"insult" + '
'0.048*"thousand" + 0.040*"guard" + 0.038*"gulf" + 0.036*"bankruptcy" + '
'0.026*"surge" + 0.021*"humanity"'),
(44,
'0.224*"right" + 0.156*"military" + 0.096*"high" + 0.067*"ground" + '
'0.060*"condition" + 0.043*"road" + 0.042*"anthem" + 0.029*"order" + '
'0.026*"obama" + 0.017*"comms"'),
(49,
'0.169*"force" + 0.124*"serious" + 0.088*"mile" + 0.060*"orlando" + '
'0.050*"click" + 0.038*"barbuda" + 0.031*"task" + 0.030*"shirt" + '
'0.026*"win" + 0.026*"appropriation"'),
(55,
'0.376*"puertorico" + 0.107*"dominica" + 0.097*"miami" + 0.055*"war" + '
'0.045*"important" + 0.034*"information" + 0.033*"sanjuan" + 0.032*"plan"
+ '
'0.029*"safety" + 0.020*"evacuation"'),
(65,
'0.433*"damage" + 0.096*"much" + 0.080*"plane" + 0.078*"hope" + '
'0.036*"hurricane" + 0.033*"sale" + 0.027*"foundation" + 0.018*"drop" + '
'0.014*"difference" + 0.011*"muslim"'),
(69,
'0.201*"massive" + 0.120*"powerful" + 0.100*"destruction" + 0.054*"piece"
+ '
'0.051*"hurricane" + 0.050*"xico" + 0.036*"circle" + 0.036*"nighttime" + '
'0.029*"strike" + 0.025*"form"')]

There are 20 topics only. I have followed the previous posts regarding such
errors and tried all the solutions posted but this is the best I got so
far. Any suggestions?

Virashree Patel

2018-10-29 19:08:46 UTC

Permalink

Since, no one has replied so far I wanted to post an update. I stopped
receiving the error above after running the model with following
parameters.

lda_model = gensim.models.LdaMulticore(corpus=corpus,
id2word=id2word,
chunksize=100000,
num_topics=80,
passes=20,
workers=1,
iterations=1000
)

Although my topic coherence score is still "nan". And here are the topics I
got

[(32,
'0.099*"remind" + 0.082*"deep" + 0.076*"elderly" + 0.061*"health" + '
'0.058*"hundred" + 0.043*"clear" + 0.038*"sign" + 0.037*"contact" + '
'0.033*"pregnant" + 0.033*"debris"'),
(34,
'0.462*"irma" + 0.298*"hurricane" + 0.052*"bad" + 0.032*"hour" + '
'0.018*"public" + 0.016*"taxpayer" + 0.015*"top" + 0.013*"reporting" + '
'0.013*"administration" + 0.012*"case"'),
(74,
'0.739*"mexico" + 0.058*"part" + 0.046*"money" + 0.023*"climate" + '
'0.020*"change" + 0.013*"catastrophe" + 0.013*"amazing" + 0.011*"risk" + '
'0.010*"healthcare" + 0.010*"view"'),
(21,
'0.202*"national" + 0.113*"center" + 0.097*"shipping" +
0.078*"washington" + '
'0.064*"thread" + 0.061*"jr" + 0.045*"hurricane" + 0.037*"wife" + '
'0.022*"destructive" + 0.018*"general"'),
(20,
'0.482*"relief" + 0.087*"great" + 0.063*"red" + 0.061*"hurricane" + '
'0.058*"donation" + 0.058*"fund" + 0.045*"cross" + 0.031*"volunteer" + '
'0.027*"worker" + 0.020*"urgent"'),
(57,
'0.253*"due" + 0.093*"street" + 0.085*"child" + 0.077*"hospital" + '
'0.065*"house" + 0.065*"care" + 0.058*"quake" + 0.047*"jose" +
0.043*"flag" '
'+ 0.041*"white"'),
(54,
'0.304*"dog" + 0.197*"death" + 0.081*"aftermath" + 0.069*"real" + '
'0.033*"refuse" + 0.031*"owner" + 0.025*"clinton" + 0.020*"duty" + '
'0.020*"profit" + 0.019*"politic"'),
(73,
'0.375*"hurricaneharvey" + 0.134*"mexicocity" + 0.072*"heartbreaking" + '
'0.061*"mexicoquake" + 0.046*"newsdesk" + 0.040*"unsurprising" + '
'0.018*"arrest" + 0.014*"beloved" + 0.014*"hurricanejose" +
0.011*"freedom"'),
(50,
'0.743*"water" + 0.068*"resident" + 0.016*"port" + 0.015*"drive" + '
'0.013*"someone" + 0.010*"park" + 0.009*"approach" + 0.009*"pleading" + '
'0.008*"neglect" + 0.008*"ten"'),
(29,
'0.181*"heart" + 0.123*"donate" + 0.073*"picture" + 0.062*"bill" + '
'0.032*"gas" + 0.026*"drinkable" + 0.026*"energy" + 0.025*"survive" + '
'0.019*"army" + 0.019*"industry"'),
(6,
'0.351*"storm" + 0.210*"building" + 0.089*"tropical" + 0.052*"ship" + '
'0.049*"atlantic" + 0.044*"hurricane" + 0.029*"season" + 0.027*"system" +
'
'0.025*"depression" + 0.014*"nhc"'),
(77,
'0.933*"people" + 0.019*"effect" + 0.017*"unprecedented" + '
'0.003*"deployment" + 0.003*"broward" + 0.003*"liter" +
0.002*"los_angele" + '
'0.002*"author" + 0.001*"commission" + 0.001*"unable"'),
(12,
'0.131*"caribbean" + 0.131*"government" + 0.107*"cost" + 0.096*"deadly" +
'
'0.057*"devastating" + 0.043*"county" + 0.027*"cnn" + 0.021*"nuclear" + '
'0.019*"control" + 0.018*"senate"'),
(35,
'0.374*"florida" + 0.124*"congress" + 0.044*"footage" + 0.043*"trouble" +
'
'0.038*"show" + 0.034*"gop" + 0.034*"member" + 0.029*"respect" + '
'0.023*"senator" + 0.021*"kid"'),
(56,
'0.219*"peril" + 0.143*"coast" + 0.133*"month" + 0.097*"insult" + '
'0.048*"thousand" + 0.040*"guard" + 0.038*"gulf" + 0.036*"bankruptcy" + '
'0.026*"surge" + 0.021*"humanity"'),
(44,
'0.224*"right" + 0.156*"military" + 0.096*"high" + 0.067*"ground" + '
'0.060*"condition" + 0.043*"road" + 0.042*"anthem" + 0.029*"order" + '
'0.026*"obama" + 0.017*"comms"'),
(49,
'0.169*"force" + 0.124*"serious" + 0.088*"mile" + 0.060*"orlando" + '
'0.050*"click" + 0.038*"barbuda" + 0.031*"task" + 0.030*"shirt" + '
'0.026*"win" + 0.026*"appropriation"'),
(55,
'0.376*"puertorico" + 0.107*"dominica" + 0.097*"miami" + 0.055*"war" + '
'0.045*"important" + 0.034*"information" + 0.033*"sanjuan" + 0.032*"plan"
+ '
'0.029*"safety" + 0.020*"evacuation"'),
(65,
'0.433*"damage" + 0.096*"much" + 0.080*"plane" + 0.078*"hope" + '
'0.036*"hurricane" + 0.033*"sale" + 0.027*"foundation" + 0.018*"drop" + '
'0.014*"difference" + 0.011*"muslim"'),
(69,
'0.201*"massive" + 0.120*"powerful" + 0.100*"destruction" + 0.054*"piece"
+ '
'0.051*"hurricane" + 0.050*"xico" + 0.036*"circle" + 0.036*"nighttime" + '
'0.029*"strike" + 0.025*"form"')]

There are 20 topics only. I have followed the previous posts regarding such
errors and tried all the solutions posted but this is the best I got so
far. Any suggestions?

Alistair Windsor

2018-10-31 17:09:40 UTC

Permalink

If you do lda_model.minimum_probability = 0.00 does this change your output?

Post by Virashree Patel
Since, no one has replied so far I wanted to post an update. I stopped
receiving the error above after running the model with following
parameters.
lda_model = gensim.models.LdaMulticore(corpus=corpus,
id2word=id2word,
chunksize=100000,
num_topics=80,
passes=20,
workers=1,
iterations=1000
)
Although my topic coherence score is still "nan". And here are the topics I
got
[(32,
'0.099*"remind" + 0.082*"deep" + 0.076*"elderly" + 0.061*"health" + '
'0.058*"hundred" + 0.043*"clear" + 0.038*"sign" + 0.037*"contact" + '
'0.033*"pregnant" + 0.033*"debris"'),
(34,
'0.462*"irma" + 0.298*"hurricane" + 0.052*"bad" + 0.032*"hour" + '
'0.018*"public" + 0.016*"taxpayer" + 0.015*"top" + 0.013*"reporting" + '
'0.013*"administration" + 0.012*"case"'),
(74,
'0.739*"mexico" + 0.058*"part" + 0.046*"money" + 0.023*"climate" + '
'0.020*"change" + 0.013*"catastrophe" + 0.013*"amazing" + 0.011*"risk" + '
'0.010*"healthcare" + 0.010*"view"'),
(21,
'0.202*"national" + 0.113*"center" + 0.097*"shipping" +
0.078*"washington" + '
'0.064*"thread" + 0.061*"jr" + 0.045*"hurricane" + 0.037*"wife" + '
'0.022*"destructive" + 0.018*"general"'),
(20,
'0.482*"relief" + 0.087*"great" + 0.063*"red" + 0.061*"hurricane" + '
'0.058*"donation" + 0.058*"fund" + 0.045*"cross" + 0.031*"volunteer" + '
'0.027*"worker" + 0.020*"urgent"'),
(57,
'0.253*"due" + 0.093*"street" + 0.085*"child" + 0.077*"hospital" + '
'0.065*"house" + 0.065*"care" + 0.058*"quake" + 0.047*"jose" +
0.043*"flag" '
'+ 0.041*"white"'),
(54,
'0.304*"dog" + 0.197*"death" + 0.081*"aftermath" + 0.069*"real" + '
'0.033*"refuse" + 0.031*"owner" + 0.025*"clinton" + 0.020*"duty" + '
'0.020*"profit" + 0.019*"politic"'),
(73,
'0.375*"hurricaneharvey" + 0.134*"mexicocity" + 0.072*"heartbreaking" + '
'0.061*"mexicoquake" + 0.046*"newsdesk" + 0.040*"unsurprising" + '
'0.018*"arrest" + 0.014*"beloved" + 0.014*"hurricanejose" +
0.011*"freedom"'),
(50,
'0.743*"water" + 0.068*"resident" + 0.016*"port" + 0.015*"drive" + '
'0.013*"someone" + 0.010*"park" + 0.009*"approach" + 0.009*"pleading" + '
'0.008*"neglect" + 0.008*"ten"'),
(29,
'0.181*"heart" + 0.123*"donate" + 0.073*"picture" + 0.062*"bill" + '
'0.032*"gas" + 0.026*"drinkable" + 0.026*"energy" + 0.025*"survive" + '
'0.019*"army" + 0.019*"industry"'),
(6,
'0.351*"storm" + 0.210*"building" + 0.089*"tropical" + 0.052*"ship" + '
'0.049*"atlantic" + 0.044*"hurricane" + 0.029*"season" + 0.027*"system" +
'
'0.025*"depression" + 0.014*"nhc"'),
(77,
'0.933*"people" + 0.019*"effect" + 0.017*"unprecedented" + '
'0.003*"deployment" + 0.003*"broward" + 0.003*"liter" +
0.002*"los_angele" + '
'0.002*"author" + 0.001*"commission" + 0.001*"unable"'),
(12,
'0.131*"caribbean" + 0.131*"government" + 0.107*"cost" + 0.096*"deadly" +
'
'0.057*"devastating" + 0.043*"county" + 0.027*"cnn" + 0.021*"nuclear" + '
'0.019*"control" + 0.018*"senate"'),
(35,
'0.374*"florida" + 0.124*"congress" + 0.044*"footage" + 0.043*"trouble" +
'
'0.038*"show" + 0.034*"gop" + 0.034*"member" + 0.029*"respect" + '
'0.023*"senator" + 0.021*"kid"'),
(56,
'0.219*"peril" + 0.143*"coast" + 0.133*"month" + 0.097*"insult" + '
'0.048*"thousand" + 0.040*"guard" + 0.038*"gulf" + 0.036*"bankruptcy" + '
'0.026*"surge" + 0.021*"humanity"'),
(44,
'0.224*"right" + 0.156*"military" + 0.096*"high" + 0.067*"ground" + '
'0.060*"condition" + 0.043*"road" + 0.042*"anthem" + 0.029*"order" + '
'0.026*"obama" + 0.017*"comms"'),
(49,
'0.169*"force" + 0.124*"serious" + 0.088*"mile" + 0.060*"orlando" + '
'0.050*"click" + 0.038*"barbuda" + 0.031*"task" + 0.030*"shirt" + '
'0.026*"win" + 0.026*"appropriation"'),
(55,
'0.376*"puertorico" + 0.107*"dominica" + 0.097*"miami" + 0.055*"war" + '
'0.045*"important" + 0.034*"information" + 0.033*"sanjuan" + 0.032*"plan"
+ '
'0.029*"safety" + 0.020*"evacuation"'),
(65,
'0.433*"damage" + 0.096*"much" + 0.080*"plane" + 0.078*"hope" + '
'0.036*"hurricane" + 0.033*"sale" + 0.027*"foundation" + 0.018*"drop" + '
'0.014*"difference" + 0.011*"muslim"'),
(69,
'0.201*"massive" + 0.120*"powerful" + 0.100*"destruction" + 0.054*"piece"
+ '
'0.051*"hurricane" + 0.050*"xico" + 0.036*"circle" + 0.036*"nighttime" + '
'0.029*"strike" + 0.025*"form"')]
There are 20 topics only. I have followed the previous posts regarding such
errors and tried all the solutions posted but this is the best I got so
far. Any suggestions?