G-Flow is an automatic extractive summarization system that seeks to balance coherence and salience. G-Flow introduces a joint model for selection and ordering that balances coherence and salience. G-Flow's core representation is a graph that approximates the discourse relations across sentences based on indicators including discourse cues, deverbal nouns, co-reference, and more. This graph enables G-Flow to estimate the coherence of a candidate summary.
G-Flow takes sentences processed by Stanford Core-NLP and Ollie as input, and outputs summaries.
For more information, see the G-Flow homepage.
WordNet:
http://wordnetcode.princeton.edu/wn3.1.dict.tar.gz
WordNet accessor:
From http://lyle.smu.edu/~tspell/jaws/:
jaws-bin.jar
Stanford CoreNLP:
From http://nlp.stanford.edu/software/corenlp.shtml#Download:
Stanford CoreNLP version 3.4.1
Ollie:
From http://knowitall.github.io/ollie/:
ollie-app-latest.jar
Weka:
From http://www.cs.waikato.ac.nz/ml/weka/downloading.html:
weka.jar
Original Files:
Start with an empty directory representing your cluster which we'll refer to as <DATA_CLUSTER>. Add a subdirectory titled 'original' to <DATA_CLUSTER>. Add your input news files to original. Each news file should have the same format as those in gflow-2014-9-17/data/gflowExamples/ukrainePlaneCrash/original/ and gflow-2014-9-17/data/gflowExamples/malaysiaPlane/original/.
Stanford CoreNLP Files and Ollie Files:
You can create the Stanford CoreNLP files and the Ollie files by running the preprocessing script we have included from the command line. Make sure you run it in the same folder that engmalt.linear-1.7.mco is in (Ollie requires this file):
python preprocessing.py <STANFORD_CORENLP_DIR> <OLLIE_DIR> <DATA_CLUSTER>
eg:
python preprocessing.py ~Research/lib/stanford-corenlp-full-2014-08-27/ ~/Research/lib/ollie/ data/gflowExamples/malaysiaPlane/
If you're using a different version of stanford-corenlp than 3.4.1, you need to change those numbers in preprocessing.py (search for stanford-corenlp-3.4.1.jar and stanford-corenlp-3.4.1-models.jar).
javac src/edu/washington/cs/knowitall/*/*.java -d bin
java -Xmx2G -classpath bin/ edu.washington.cs.knowitall.main.GFlow <DATA_CLUSTER>
eg:
java -Xmx2G -classpath bin/ edu.washington.cs.knowitall.main.GFlow data/gflowExamples/ukrainePlaneCrash
For more information, please visit the G-Flow homepage at the University of Washington: http://knowitall.cs.washington.edu/gflow/index.html.
If you use G-Flow in your academic work, please cite it as follows:
@inproceedings{christensen_naacl13,
author = {Christensen, Janara and Mausam and Soderland, Stephen and Etzioni, Oren},
title = {Towards Coherent Multi-Document Summarization},
booktitle = {Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2013)},
year = {2013},
location = {Atlanta, Georgia},
}