Discussion:
[gensim:11668] LSI Model :: Path Hard coded
hans mohan
2018-10-09 17:27:03 UTC
Permalink
Hi ,

I am working on a problem case to find out similarity of a document against
a set of documents. ( 21K in my case as an example , though I have a
bigger dataset)

I create a LSI model and then create a similarity index using
'similarities.Similarity'. As i save the index using the save method,
following index related files are saved:

1."C:\tfs\AI\POC\APIProject\APIProject\APIProject\uploads\gensim_model_sharded\client\_lsiindex.index.0.index.npy"
*2."C:\tfs\AI\POC\APIProject\APIProject\APIProject\uploads\gensim_model_sharded\client\_lsiindex.index"*
3.
"C:\tfs\AI\POC\APIProject\APIProject\APIProject\uploads\gensim_model_sharded\client\_lsiindex.index.0"

The file_2 keeps a hardcoded path which is a problem if I change the folder
structure or directory path . (For example I tried to put directly the
saved index files on deployment machine, which has a different directory
structure , so it failed with an error to load the index back to program as
path is different, so i re created entire index on deployment machine) Is
there a specific reason behind keeping the hard-coded path.
How can I handle this ?

Below is how the index file (file 2 in bold above ) looks like as i open
in notepad++ (see the hardcoded path, i have made it bold)
€ cgensim.similarities.docsim
Similarity
q ) q }q (X
output_prefixq X?
*C:\Users\hans.mohan\gensim_model_sharded\client\_lsiindex*.indexq X
num_featuresq MÃŽ X num_bestq NX normq X l2q X chunksizeq M X
shardsizeq
M €X shardsq ]q (cgensim.similarities.docsim
Shard
q
) q }q (X dirnameq X/
C:\Users\hans.mohan\gensim_model_sharded\clientq X fnameq X
_lsiindex.index.0q X lengthq M €X clsq cgensim.similarities.docsim
MatrixSimilarity
q h NX num_nnzq J0Þù ubh
) q }q (h X/ C:\Users\hans.mohan\gensim_model_sharded\clientq h X
_lsiindex.index.1q h M €h h h Nh J`ðù ubh
) q }q (h X/ C:\Users\hans.mohan\gensim_model_sharded\clientq h X
_lsiindex.index.2q h Mf h h h Nh JPg9 ubeX
fresh_docsq ]q!X fresh_nnzq"K X __numpysq#]q$X __scipysq%]q&X
__ignoredsq']q(X __recursive_saveloadsq)]q*ub.

Thanks and Regards,

Hans Mohan
--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Radim Řehůřek
2018-10-10 19:11:42 UTC
Permalink
Hi Mohan,

how are you loading the index?

Try calling check_moved() after loading the Similarity object with .load():
https://radimrehurek.com/gensim/similarities/docsim.html#gensim.similarities.docsim.Similarity.check_moved

HTH,
Radim
Post by hans mohan
Hi ,
I am working on a problem case to find out similarity of a document
against a set of documents. ( 21K in my case as an example , though I have
a bigger dataset)
I create a LSI model and then create a similarity index using
'similarities.Similarity'. As i save the index using the save method,
1."C:\tfs\AI\POC\APIProject\APIProject\APIProject\uploads\gensim_model_sharded\client\_lsiindex.index.0.index.npy"
*2."C:\tfs\AI\POC\APIProject\APIProject\APIProject\uploads\gensim_model_sharded\client\_lsiindex.index"*
3.
"C:\tfs\AI\POC\APIProject\APIProject\APIProject\uploads\gensim_model_sharded\client\_lsiindex.index.0"
The file_2 keeps a hardcoded path which is a problem if I change the
folder structure or directory path . (For example I tried to put directly
the saved index files on deployment machine, which has a different
directory structure , so it failed with an error to load the index back to
program as path is different, so i re created entire index on deployment
machine) Is there a specific reason behind keeping the hard-coded path.
How can I handle this ?
Below is how the index file (file 2 in bold above ) looks like as i open
in notepad++ (see the hardcoded path, i have made it bold)
€ cgensim.similarities.docsim
Similarity
q ) q }q (X
output_prefixq X?
*C:\Users\hans.mohan\gensim_model_sharded\client\_lsiindex*.indexq X
num_featuresq MÃŽ X num_bestq NX normq X l2q X chunksizeq M X
shardsizeq
M €X shardsq ]q (cgensim.similarities.docsim
Shard
q
) q }q (X dirnameq X/
C:\Users\hans.mohan\gensim_model_sharded\clientq X fnameq X
_lsiindex.index.0q X lengthq M €X clsq cgensim.similarities.docsim
MatrixSimilarity
q h NX num_nnzq J0Þù ubh
) q }q (h X/ C:\Users\hans.mohan\gensim_model_sharded\clientq h X
_lsiindex.index.1q h M €h h h Nh J`ðù ubh
) q }q (h X/ C:\Users\hans.mohan\gensim_model_sharded\clientq h X
_lsiindex.index.2q h Mf h h h Nh JPg9 ubeX
fresh_docsq ]q!X fresh_nnzq"K X __numpysq#]q$X __scipysq%]q&X
__ignoredsq']q(X __recursive_saveloadsq)]q*ub.
Thanks and Regards,
Hans Mohan
--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
hans mohan
2018-11-27 08:02:52 UTC
Permalink
Hi Radim,

Sorry that I am reverting to this thread after a long gap.

I have followed the instructions as suggested but still the error persists.
I load the similarity index through .load() and then do
object.check_moved() .

Situation :
I create the index, model files and save the index to a file directory.
Then I moved the index, model files to another directory. And try to load
the files from new directory. But I get the error : unable to find the file
index file. as it is still trying to find in original Directory.

Can u once again guide me to overcome this problem ?
Post by Radim Řehůřek
Hi Mohan,
how are you loading the index?
https://radimrehurek.com/gensim/similarities/docsim.html#gensim.similarities.docsim.Similarity.check_moved
HTH,
Radim
Post by hans mohan
Hi ,
I am working on a problem case to find out similarity of a document
against a set of documents. ( 21K in my case as an example , though I have
a bigger dataset)
I create a LSI model and then create a similarity index using
'similarities.Similarity'. As i save the index using the save method,
1."C:\tfs\AI\POC\APIProject\APIProject\APIProject\uploads\gensim_model_sharded\client\_lsiindex.index.0.index.npy"
*2."C:\tfs\AI\POC\APIProject\APIProject\APIProject\uploads\gensim_model_sharded\client\_lsiindex.index"*
3.
"C:\tfs\AI\POC\APIProject\APIProject\APIProject\uploads\gensim_model_sharded\client\_lsiindex.index.0"
The file_2 keeps a hardcoded path which is a problem if I change the
folder structure or directory path . (For example I tried to put directly
the saved index files on deployment machine, which has a different
directory structure , so it failed with an error to load the index back to
program as path is different, so i re created entire index on deployment
machine) Is there a specific reason behind keeping the hard-coded path.
How can I handle this ?
Below is how the index file (file 2 in bold above ) looks like as i open
in notepad++ (see the hardcoded path, i have made it bold)
€ cgensim.similarities.docsim
Similarity
q ) q }q (X
output_prefixq X?
*C:\Users\hans.mohan\gensim_model_sharded\client\_lsiindex*.indexq X
num_featuresq MÃŽ X num_bestq NX normq X l2q X chunksizeq M X
shardsizeq
M €X shardsq ]q (cgensim.similarities.docsim
Shard
q
) q }q (X dirnameq X/
C:\Users\hans.mohan\gensim_model_sharded\clientq X fnameq X
_lsiindex.index.0q X lengthq M €X clsq cgensim.similarities.docsim
MatrixSimilarity
q h NX num_nnzq J0Þù ubh
) q }q (h X/ C:\Users\hans.mohan\gensim_model_sharded\clientq h X
_lsiindex.index.1q h M €h h h Nh J`ðù ubh
) q }q (h X/ C:\Users\hans.mohan\gensim_model_sharded\clientq h X
_lsiindex.index.2q h Mf h h h Nh JPg9 ubeX
fresh_docsq ]q!X fresh_nnzq"K X __numpysq#]q$X __scipysq%]q&X
__ignoredsq']q(X __recursive_saveloadsq)]q*ub.
Thanks and Regards,
Hans Mohan
--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Radim Řehůřek
2018-11-29 08:19:31 UTC
Permalink
Hi Hans,

that doesn't sound right. The index is really just a pickle, so loading it
cannot really look into another directory. It loads the pickle from the
location you provide.

Maybe a concrete example of your code would make this clearer.

HTH,
Radim
Post by hans mohan
Hi Radim,
Sorry that I am reverting to this thread after a long gap.
I have followed the instructions as suggested but still the error persists.
I load the similarity index through .load() and then do
object.check_moved() .
I create the index, model files and save the index to a file directory.
Then I moved the index, model files to another directory. And try to load
the files from new directory. But I get the error : unable to find the file
index file. as it is still trying to find in original Directory.
Can u once again guide me to overcome this problem ?
Post by Radim Řehůřek
Hi Mohan,
how are you loading the index?
https://radimrehurek.com/gensim/similarities/docsim.html#gensim.similarities.docsim.Similarity.check_moved
HTH,
Radim
Post by hans mohan
Hi ,
I am working on a problem case to find out similarity of a document
against a set of documents. ( 21K in my case as an example , though I have
a bigger dataset)
I create a LSI model and then create a similarity index using
'similarities.Similarity'. As i save the index using the save method,
1."C:\tfs\AI\POC\APIProject\APIProject\APIProject\uploads\gensim_model_sharded\client\_lsiindex.index.0.index.npy"
*2."C:\tfs\AI\POC\APIProject\APIProject\APIProject\uploads\gensim_model_sharded\client\_lsiindex.index"*
3.
"C:\tfs\AI\POC\APIProject\APIProject\APIProject\uploads\gensim_model_sharded\client\_lsiindex.index.0"
The file_2 keeps a hardcoded path which is a problem if I change the
folder structure or directory path . (For example I tried to put directly
the saved index files on deployment machine, which has a different
directory structure , so it failed with an error to load the index back to
program as path is different, so i re created entire index on deployment
machine) Is there a specific reason behind keeping the hard-coded path.
How can I handle this ?
Below is how the index file (file 2 in bold above ) looks like as i
open in notepad++ (see the hardcoded path, i have made it bold)
€ cgensim.similarities.docsim
Similarity
q ) q }q (X
output_prefixq X?
*C:\Users\hans.mohan\gensim_model_sharded\client\_lsiindex*.indexq X
num_featuresq MÃŽ X num_bestq NX normq X l2q X chunksizeq M
X shardsizeq
M €X shardsq ]q (cgensim.similarities.docsim
Shard
q
) q }q (X dirnameq X/
C:\Users\hans.mohan\gensim_model_sharded\clientq X fnameq X
_lsiindex.index.0q X lengthq M €X clsq cgensim.similarities.docsim
MatrixSimilarity
q h NX num_nnzq J0Þù ubh
) q }q (h X/ C:\Users\hans.mohan\gensim_model_sharded\clientq h X
_lsiindex.index.1q h M €h h h Nh J`ðù ubh
) q }q (h X/ C:\Users\hans.mohan\gensim_model_sharded\clientq h X
_lsiindex.index.2q h Mf h h h Nh JPg9 ubeX
fresh_docsq ]q!X fresh_nnzq"K X __numpysq#]q$X
__scipysq%]q&X
__ignoredsq']q(X __recursive_saveloadsq)]q*ub.
Thanks and Regards,
Hans Mohan
--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...