Sentence (nlp 1.0.3-SNAPSHOT API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

edu.washington.cs.knowitall
Class Sentence

java.lang.Object
  edu.washington.cs.knowitall.sequence.SimpleLayeredSequence
      edu.washington.cs.knowitall.sequence.BIOLayeredSequence
          edu.washington.cs.knowitall.nlp.ChunkedSentence
              edu.washington.cs.knowitall.Sentence

All Implemented Interfaces:: edu.washington.cs.knowitall.sequence.LayeredSequence, TokenSequence, XmlSerializable, Serializable

public class Sentence
extends edu.washington.cs.knowitall.nlp.ChunkedSentence
implements TokenSequence, Serializable, XmlSerializable
extends edu.washington.cs.knowitall.nlp.ChunkedSentence
implements TokenSequence, Serializable, XmlSerializable

A representation of a sentence. This class extends ChunkedSentence to support types, lemmas, and various serialization methods.

Author:: schmmd
See Also:: Serialized Form

Field Summary
`Long`	`id`
`String`	`originalText`
`protected List<com.google.common.collect.TreeMultimap<String,Type>>`	`typeLookup`

Fields inherited from class edu.washington.cs.knowitall.nlp.ChunkedSentence
`NP_LAYER, POS_LAYER, TOKEN_LAYER`

Constructor Summary
`Sentence(edu.washington.cs.knowitall.nlp.ChunkedSentence chunked, String originalText)`
`Sentence(edu.washington.cs.knowitall.nlp.ChunkedSentence chunked, String originalText, Iterable<String> norms)`
`Sentence(Long id, String originalText, List<String> tokens, Iterable<String> norms, List<String> posTags, List<String> npChunkTags)`
`Sentence(Long id, String originalText, String[] tokens, String[] norms, String[] posTags, String[] npChunkTags)`
`Sentence(String originalText, List<String> tokens, Iterable<String> norms, List<String> posTags, List<String> chunkTags)`
`Sentence(String originalText, String[] tokens, String[] norms, String[] posTags, String[] chunkTags)`

Method Summary
`void`	`addExtraction(Iterable<edu.washington.cs.knowitall.nlp.extraction.ChunkedBinaryExtraction> extractions)` Add multiple extractions to this sentence.
`void`	`addExtraction(RelationExtraction extraction)` Add an extraction to this sentence.
`void`	`addExtractions(Iterable<RelationExtraction> extractions)` Add multiple extractions to this sentence.
`static String`	`convertGroup(edu.washington.cs.knowitall.commonlib.regex.Match.Group<Token> group)`
`boolean`	`equals(Object that)`
`List<RelationExtraction>`	`extractions()` The extractions in this sentence.
`static Iterable<RelationExtraction>`	`extractions(Iterable<Sentence> sentences)`
`static List<Sentence>`	`fromDocument(org.jdom.Document document)` Deserialize sentence from an XML document.
`static Sentence`	`fromXmlElement(org.jdom.Element e)`
`Long`	`getId()`
`List<String>`	`getLemmas()` The lemmas of this sentence.
`List<String>`	`getLemmas(edu.washington.cs.knowitall.commonlib.Range range)` The lemmas of this sentence, constraint to the specified range.
`edu.washington.cs.knowitall.commonlib.Range`	`getRange()`
`edu.washington.cs.knowitall.commonlib.Range`	`getRange(String string)`
`List<Type>`	`getTypes()`
`int`	`hashCode()`
`static edu.washington.cs.knowitall.commonlib.regex.RegularExpression<Token>`	`makeRegex(String regex)` This class compiles regular expressions over the tokens in a sentence into an NFA.
`void`	`tag(Iterable<Type> types)` Add a collection of types to this sentence.
`void`	`tag(Type type)` Add a type to this sentence.
`String`	`toString()`
`org.jdom.Element`	`toXmlElement()`
`List<Type>`	`types()` The types associated with this sentence.
`List<Token>`	`zip()` Represent this sentence as a list of tokens (instead of an object that contains separate array for each field).
`List<Token>`	`zip(edu.washington.cs.knowitall.commonlib.Range range)` Represent a range in this sentence as a list of tokens.

Methods inherited from class edu.washington.cs.knowitall.nlp.ChunkedSentence
`clone, getChunkTag, getChunkTags, getChunkTags, getChunkTags, getChunkTagsAsString, getNpChunkRanges, getPosTag, getPosTags, getPosTags, getPosTags, getPosTagsAsString, getPosTagsAsString, getPosTagsAsString, getSubSequence, getSubSequence, getToken, getTokenRange, getTokens, getTokens, getTokens, getTokensAsString, getTokensAsString, getTokensAsString, toOpenNlpFormat`

Methods inherited from class edu.washington.cs.knowitall.sequence.BIOLayeredSequence
`addSpanLayer, addSpanLayerRanges, getSpans, getSpans, getSubSequence, getSubSequence, isSpanLayer`

Methods inherited from class edu.washington.cs.knowitall.sequence.SimpleLayeredSequence
`addLayer, addLayer, addLayer, get, getLayer, getLayerAsString, getLayerAsString, getLayerAsString, getLayerNames, getLength, getNumLayers, hasLayer`

Methods inherited from class java.lang.Object
`finalize, getClass, notify, notifyAll, wait, wait, wait`

Methods inherited from interface edu.washington.cs.knowitall.TokenSequence
`getChunkTags, getPosTags, getTokens, getTokensAsString`

Field Detail

id

public final Long id

originalText

public final String originalText

typeLookup

protected final List<com.google.common.collect.TreeMultimap<String,Type>> typeLookup

Constructor Detail

Sentence

public Sentence(edu.washington.cs.knowitall.nlp.ChunkedSentence chunked,
                String originalText,
                Iterable<String> norms)
         throws edu.washington.cs.knowitall.sequence.SequenceException

Throws:: edu.washington.cs.knowitall.sequence.SequenceException

Sentence

public Sentence(edu.washington.cs.knowitall.nlp.ChunkedSentence chunked,
                String originalText)

Sentence

public Sentence(String originalText,
                String[] tokens,
                String[] norms,
                String[] posTags,
                String[] chunkTags)
         throws edu.washington.cs.knowitall.sequence.SequenceException

Throws:: edu.washington.cs.knowitall.sequence.SequenceException

Sentence

public Sentence(String originalText,
                List<String> tokens,
                Iterable<String> norms,
                List<String> posTags,
                List<String> chunkTags)
         throws edu.washington.cs.knowitall.sequence.SequenceException

Throws:: edu.washington.cs.knowitall.sequence.SequenceException

Sentence

public Sentence(Long id,
                String originalText,
                String[] tokens,
                String[] norms,
                String[] posTags,
                String[] npChunkTags)
         throws edu.washington.cs.knowitall.sequence.SequenceException

Throws:: edu.washington.cs.knowitall.sequence.SequenceException

Sentence

public Sentence(Long id,
                String originalText,
                List<String> tokens,
                Iterable<String> norms,
                List<String> posTags,
                List<String> npChunkTags)

Method Detail

fromDocument

public static List<Sentence> fromDocument(org.jdom.Document document)

Deserialize sentence from an XML document.

Parameters:: document - document to deserialize
Returns:: resulting sentence object

toString

public String toString()

Overrides:: toString in class edu.washington.cs.knowitall.nlp.ChunkedSentence

equals

public boolean equals(Object that)

Overrides:: equals in class edu.washington.cs.knowitall.sequence.SimpleLayeredSequence

hashCode

public int hashCode()

Overrides:: hashCode in class edu.washington.cs.knowitall.sequence.SimpleLayeredSequence

zip

public List<Token> zip()

Represent this sentence as a list of tokens (instead of an object that contains separate array for each field). This is used by the regular expression library.

The list is cached for speed.

Specified by:: zip in interface TokenSequence

zip

public List<Token> zip(edu.washington.cs.knowitall.commonlib.Range range)

Represent a range in this sentence as a list of tokens. The returned object is a view into the cached list of the entire sentence.

Parameters:: range -
Returns:

getRange

public edu.washington.cs.knowitall.commonlib.Range getRange(String string)

types

public List<Type> types()

The types associated with this sentence.

Returns:

tag

public void tag(Iterable<Type> types)

Add a collection of types to this sentence.

Parameters:: types -

tag

public void tag(Type type)

Add a type to this sentence.

Parameters:: type -

getRange

public edu.washington.cs.knowitall.commonlib.Range getRange()

Returns:: the range of this sentence

getLemmas

public List<String> getLemmas()

The lemmas of this sentence. Lemmas are normalizations of the token strings.

Specified by:: getLemmas in interface TokenSequence

getLemmas

public List<String> getLemmas(edu.washington.cs.knowitall.commonlib.Range range)

The lemmas of this sentence, constraint to the specified range.

Parameters:: range -
Returns:

addExtraction

public void addExtraction(RelationExtraction extraction)

Add an extraction to this sentence.

Parameters:: extraction -

addExtractions

public void addExtractions(Iterable<RelationExtraction> extractions)

Add multiple extractions to this sentence.

Parameters:: extractions -

addExtraction

public void addExtraction(Iterable<edu.washington.cs.knowitall.nlp.extraction.ChunkedBinaryExtraction> extractions)

Add multiple extractions to this sentence. The ReVerb style extractions will be converted into instances of RelationExtraction.

Parameters:: extractions -

extractions

public List<RelationExtraction> extractions()

The extractions in this sentence.

Returns:

fromXmlElement

public static Sentence fromXmlElement(org.jdom.Element e)

toXmlElement

public org.jdom.Element toXmlElement()

Specified by:: toXmlElement in interface XmlSerializable

extractions

public static Iterable<RelationExtraction> extractions(Iterable<Sentence> sentences)

makeRegex

public static edu.washington.cs.knowitall.commonlib.regex.RegularExpression<Token> makeRegex(String regex)

This class compiles regular expressions over the tokens in a sentence into an NFA. There is a lot of redundancy in their expressiveness. This is largely because it supports pattern matching on the fields This is not necessary but is an optimization and a shorthand (i.e. <pos="NNPS?"> is equivalent to "<pos="NNP" | pos="NNPS"> and (?:<pos="NNP"> | <pos="NNPS">).

Here are some equivalent examples:

<pos="JJ">* <pos="NNP.">+
<pos="JJ">* <pos="NNPS?">+
<pos="JJ">* <pos="NNP" | pos="NNPS">+
<pos="JJ">* (?:<pos="NNP"> | <pos="NNPS">)+

Note that (3) and (4) are not preferred for efficiency reasons. Regex OR (in example (4)) should only be used on multi-token sequences.

The Regular Expressions support named groups (: ... ), unnamed groups (?: ... ), and capturing groups ( ... ). The operators allowed are +, ?, *, and |. The Logic Expressions (that describe each token) allow grouping "( ... )", not '!', or '|', and and '&'.

Parameters:: regex -
Returns:

convertGroup

public static String convertGroup(edu.washington.cs.knowitall.commonlib.regex.Match.Group<Token> group)

getId

public Long getId()

getTypes

public List<Type> getTypes()

Specified by:: getTypes in interface TokenSequence

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

edu.washington.cs.knowitall Class Sentence

id

originalText

typeLookup

Sentence

Sentence

Sentence

Sentence

Sentence

Sentence

fromDocument

toString

equals

hashCode

zip

zip

getRange

types

tag

tag

getRange

getLemmas

getLemmas

addExtraction

addExtractions

addExtraction

extractions

fromXmlElement

toXmlElement

extractions

makeRegex

convertGroup

getId

getTypes

edu.washington.cs.knowitall
Class Sentence