Nicolas Tastevin
2018-10-29 15:31:26 UTC
The summarizer.py script implements TextRank algorithm to extract most
important sentences from a document.
The core concept of the algorithm relies on PageRank algorithm. The
sumarizer.py script creates an adjacency graph to fill in input of the
PageRank algorithm. The script uses a BM25 score to set weights on edges of
the graph. This score is not symmetric [ BM25(a,b) can be different from
BM25(b,a) ]. Thus the logic of the code is coded in order to preserve this
dissymmetry.
However, the non symmetric BM25 scores matrix generates a symmetric
adjacency matrix after applying the _set_graph_edge_weights function and we
loose the dissymmetry of BM25. This is caused by the use of the
gensim.summarization.graph module. Indeed as documented in the module: *this
module contains abstract class IGraph represents graphs interface and class
Graph (based on IGraph) which implements **undirected** graph*
As a consequence in the function def _set_graph_edge_weights(graph) (from
summarizer.py), the following lines of codes do not generated 2 edges but
only one because of the undirected property of the graph module.
edge_1 = (sentence_1, sentence_2)
edge_2 = (sentence_2, sentence_1)
if not graph.has_edge(edge_1):
graph.add_edge(edge_1, weights[i][j])
if not graph.has_edge(edge_2):
graph.add_edge(edge_2, weights[j][i])
So the code does not implement what it expects to do.
PS: I have already sent an e-mail to warn about another mistake in the
summarization module concerning the implementation of BM25.py
You can find it at this link:
https://groups.google.com/forum/#!topic/gensim/dBEX182kXbg
important sentences from a document.
The core concept of the algorithm relies on PageRank algorithm. The
sumarizer.py script creates an adjacency graph to fill in input of the
PageRank algorithm. The script uses a BM25 score to set weights on edges of
the graph. This score is not symmetric [ BM25(a,b) can be different from
BM25(b,a) ]. Thus the logic of the code is coded in order to preserve this
dissymmetry.
However, the non symmetric BM25 scores matrix generates a symmetric
adjacency matrix after applying the _set_graph_edge_weights function and we
loose the dissymmetry of BM25. This is caused by the use of the
gensim.summarization.graph module. Indeed as documented in the module: *this
module contains abstract class IGraph represents graphs interface and class
Graph (based on IGraph) which implements **undirected** graph*
As a consequence in the function def _set_graph_edge_weights(graph) (from
summarizer.py), the following lines of codes do not generated 2 edges but
only one because of the undirected property of the graph module.
edge_1 = (sentence_1, sentence_2)
edge_2 = (sentence_2, sentence_1)
if not graph.has_edge(edge_1):
graph.add_edge(edge_1, weights[i][j])
if not graph.has_edge(edge_2):
graph.add_edge(edge_2, weights[j][i])
So the code does not implement what it expects to do.
PS: I have already sent an e-mail to warn about another mistake in the
summarization module concerning the implementation of BM25.py
You can find it at this link:
https://groups.google.com/forum/#!topic/gensim/dBEX182kXbg
--
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.