G-Flow

G-Flow is an automatic extractive summarization system that seeks to balance coherence and salience. G-Flow introduces a joint model for selection and ordering that balances coherence and salience. G-Flow's core representation is a graph that approximates the discourse relations across sentences based on indicators including discourse cues, deverbal nouns, co-reference, and more. This graph enables G-Flow to estimate the coherence of a candidate summary.

G-Flow takes sentences processed by Stanford Core-NLP and Ollie as input, and outputs summaries.

For more information, see the G-Flow homepage.

Dependencies and Setup

  1. Download G-Flow here.

  2. Download the following dependencies:
  3. WordNet:
    http://wordnetcode.princeton.edu/wn3.1.dict.tar.gz

    WordNet accessor:
    From http://lyle.smu.edu/~tspell/jaws/:
    jaws-bin.jar

    Stanford CoreNLP:
    From http://nlp.stanford.edu/software/corenlp.shtml#Download:
    Stanford CoreNLP version 3.4.1

    Ollie:
    From http://knowitall.github.io/ollie/:
    ollie-app-latest.jar

    Weka:
    From http://www.cs.waikato.ac.nz/ml/weka/downloading.html:
    weka.jar

  4. Next you need to set this environment variable:
  5. WORDNET_DICT points to /PATH/Word-Net-3.0/dict

  6. Include the WordNet accessor jar and the Weka jar in the build path.
  7. The other files are not needed for the build path, we use them for preprocessing.

  8. Because parsing takes so long, we have written G-Flow to take in as input the output from Stanford CoreNLP and Ollie.
  9. You need to setup your input files to look like the sample files included with the source code in gflow-2014-9-17/data/gflowExamples/
  10. To compile G-Flow from the root directory of GFlowRelease from the command line, type:

  11. javac src/edu/washington/cs/knowitall/*/*.java -d bin

  12. To run G-Flow from the root directory of GFlowRelease from the command line, type:

  13. java -Xmx2G -classpath bin/ edu.washington.cs.knowitall.main.GFlow <DATA_CLUSTER>

    eg:

    java -Xmx2G -classpath bin/ edu.washington.cs.knowitall.main.GFlow data/gflowExamples/ukrainePlaneCrash

Help and Contact

For more information, please visit the G-Flow homepage at the University of Washington: http://knowitall.cs.washington.edu/gflow/index.html.

Contributors

Citing G-Flow

If you use G-Flow in your academic work, please cite it as follows:


@inproceedings{christensen_naacl13,
 author = {Christensen, Janara and Mausam and Soderland, Stephen and Etzioni, Oren},
 title = {Towards Coherent Multi-Document Summarization},
 booktitle = {Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2013)},
 year = {2013},
 location = {Atlanta, Georgia},
}