hans mohan
2018-10-09 17:27:03 UTC
Hi ,
I am working on a problem case to find out similarity of a document against
a set of documents. ( 21K in my case as an example , though I have a
bigger dataset)
I create a LSI model and then create a similarity index using
'similarities.Similarity'. As i save the index using the save method,
following index related files are saved:
1."C:\tfs\AI\POC\APIProject\APIProject\APIProject\uploads\gensim_model_sharded\client\_lsiindex.index.0.index.npy"
*2."C:\tfs\AI\POC\APIProject\APIProject\APIProject\uploads\gensim_model_sharded\client\_lsiindex.index"*
3.
"C:\tfs\AI\POC\APIProject\APIProject\APIProject\uploads\gensim_model_sharded\client\_lsiindex.index.0"
The file_2 keeps a hardcoded path which is a problem if I change the folder
structure or directory path . (For example I tried to put directly the
saved index files on deployment machine, which has a different directory
structure , so it failed with an error to load the index back to program as
path is different, so i re created entire index on deployment machine) Is
there a specific reason behind keeping the hard-coded path.
How can I handle this ?
Below is how the index file (file 2 in bold above ) looks like as i open
in notepad++ (see the hardcoded path, i have made it bold)
⬠cgensim.similarities.docsim
Similarity
q ) q }q (X
output_prefixq X?
*C:\Users\hans.mohan\gensim_model_sharded\client\_lsiindex*.indexq X
num_featuresq MÃŽ X num_bestq NX normq X l2q X chunksizeq M X
shardsizeq
M â¬X shardsq ]q (cgensim.similarities.docsim
Shard
q
) q }q (X dirnameq X/
C:\Users\hans.mohan\gensim_model_sharded\clientq X fnameq X
_lsiindex.index.0q X lengthq M â¬X clsq cgensim.similarities.docsim
MatrixSimilarity
q h NX num_nnzq J0Þù ubh
) q }q (h X/ C:\Users\hans.mohan\gensim_model_sharded\clientq h X
_lsiindex.index.1q h M â¬h h h Nh J`ðù ubh
) q }q (h X/ C:\Users\hans.mohan\gensim_model_sharded\clientq h X
_lsiindex.index.2q h Mf h h h Nh JPg9 ubeX
fresh_docsq ]q!X fresh_nnzq"K X __numpysq#]q$X __scipysq%]q&X
__ignoredsq']q(X __recursive_saveloadsq)]q*ub.
Thanks and Regards,
Hans Mohan
I am working on a problem case to find out similarity of a document against
a set of documents. ( 21K in my case as an example , though I have a
bigger dataset)
I create a LSI model and then create a similarity index using
'similarities.Similarity'. As i save the index using the save method,
following index related files are saved:
1."C:\tfs\AI\POC\APIProject\APIProject\APIProject\uploads\gensim_model_sharded\client\_lsiindex.index.0.index.npy"
*2."C:\tfs\AI\POC\APIProject\APIProject\APIProject\uploads\gensim_model_sharded\client\_lsiindex.index"*
3.
"C:\tfs\AI\POC\APIProject\APIProject\APIProject\uploads\gensim_model_sharded\client\_lsiindex.index.0"
The file_2 keeps a hardcoded path which is a problem if I change the folder
structure or directory path . (For example I tried to put directly the
saved index files on deployment machine, which has a different directory
structure , so it failed with an error to load the index back to program as
path is different, so i re created entire index on deployment machine) Is
there a specific reason behind keeping the hard-coded path.
How can I handle this ?
Below is how the index file (file 2 in bold above ) looks like as i open
in notepad++ (see the hardcoded path, i have made it bold)
⬠cgensim.similarities.docsim
Similarity
q ) q }q (X
output_prefixq X?
*C:\Users\hans.mohan\gensim_model_sharded\client\_lsiindex*.indexq X
num_featuresq MÃŽ X num_bestq NX normq X l2q X chunksizeq M X
shardsizeq
M â¬X shardsq ]q (cgensim.similarities.docsim
Shard
q
) q }q (X dirnameq X/
C:\Users\hans.mohan\gensim_model_sharded\clientq X fnameq X
_lsiindex.index.0q X lengthq M â¬X clsq cgensim.similarities.docsim
MatrixSimilarity
q h NX num_nnzq J0Þù ubh
) q }q (h X/ C:\Users\hans.mohan\gensim_model_sharded\clientq h X
_lsiindex.index.1q h M â¬h h h Nh J`ðù ubh
) q }q (h X/ C:\Users\hans.mohan\gensim_model_sharded\clientq h X
_lsiindex.index.2q h Mf h h h Nh JPg9 ubeX
fresh_docsq ]q!X fresh_nnzq"K X __numpysq#]q$X __scipysq%]q&X
__ignoredsq']q(X __recursive_saveloadsq)]q*ub.
Thanks and Regards,
Hans Mohan
--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.