TextRunner Turing Center KnowItAll Project University of Washington logo
   Thomas Lin, Mausam and Oren Etzioni, Identifying Functional Relations in Web Text

If a relation is functional, that means each element of the domain maps to at most one element in the range. For example, "(arg1) is the birthstone for (month)" is a functional relation, so that means nothing can be the birthstone for multiple months. If we know that "Emerald is the birthstone for May", this means that Emerald cannot also be the birthstone for April, June, or any of the other months. Functionality data has potential applicability to NLP tasks such as contradiction detection, quantifier scope disambiguation, synonym resolution, ontology generation, question answering, and information extraction.

More examples of functional relations: wasBornIn, isTheTallestMountainIn(country)
Examples of non-functional relations: ate, visited(country)

The focus of the EMNLP 2010 paper is the task: Given a popular untyped relation and many instances of its use on the Web, return whether it is a function. We are able to perform that task at very high precision using the methods described in the paper partly because (1) these were the most popular relations and had ample evidence for judging functionality, and (2) the final determination is untyped, so we can merge functionality determinations from different typed senses of the relation. For the task of generating a corpus of functions, however, it is more interesting to also include the sparser/less popular relations (we want a larger corpus, not just determinations on the most frequent relations), and to output typed (rather than untyped) functionality determinations (which can be used as is or merged if needed). This page contains a list of several thousand typed functional relations that were automatically identified by taking the OCCAM corpus (Fader, 2010) and then (1) automatically assigning probable argument 2 types to those relations (based on intersection between type lists and observed arguments), and (2) using the CleanLists technique to judge the functionality of those typed relations. We are only generating typed output here, so DistrDiff is not used.

Generated list of typed relations, sorted by predicted functionality: html format csv format

Note that the relations are sorted by a column labeled "Max G-test." A score above 6.63 indicates the relation is likely to be functional. To calculate max G-test, we calculate the G-test value for each typed relation's evidence both with and without our argument 1 filters and then return the higher score. The reason we do this is because we are now dealing with potentially sparser data. While our argument 1 filtering rules (no proper nouns, no ambiguous names) are useful, for some sparser relations they might filter out the majority of the Web evidence. Argument 1 filtering is there to make the relation data look more functional (for functional relations), but some relations look functional even without needing argument 1 filtering, and in those cases the filtering just served to make the data even sparser. Taking the max G-test score is a good way to handle this.

We are continuing to research techniques for improving functionality determinations in the context of generating a corpus. Some error comes from generating incorrect selectional preferences (e.g., if many books are named after people, then you might see wasTheAuthorOf(person)), and this can be addressed by better accounting for prominence of potential types for terms. There is also increased error from sparsity, which can partly be addressed with more advanced matching techniques from Web text arguments into our type lists. Lastly, we are examining how lexico-syntactic patterns could be incorporated to reduce the effect of textual functionality.

In addition to the overall list, we have also divided the list by argument 2 types. For example, with country type this shows that the most functional relations are is the capital of(country) and is the second largest city in(country), while the least functional relations are is the official religion of(country) and is the official language in(country):

Arg2 Type Top Functional Relation More Relations
activity (arg1) is a form of (activity) 73 total (html csv)
artwork (arg1) is a native of (artwork) 65 total (html csv)
brand (arg1) is the best (brand) 20 total (html csv)
city (arg1) is the capital of (city) 172 total (html csv)
color (arg1) is a mixture of (color) 396 total (html csv)
company (arg1) is a trademark of (company) 102 total (html csv)
continents (arg1) was the first country in (continents) 502 total (html csv)
country (arg1) is the capital of (country) 1,687 total (html csv)
deity (arg1) is the son of (deity) 48 total (html csv)
distance (arg1) is a first class hotel located (distance) 68 total (html csv)
drug (arg1) is the brand name for (drug) 12 total (html csv)
ethnicity (arg1) was the first (ethnicity) 12 total (html csv)
filmdistributor (arg1) is the new (filmdistributor) 12 total (html csv)
film (arg1) is the author of (film) 286 total (html csv)
gender (arg1) is the hottest (gender) 3,597 total (html csv)
govtitle (arg1) is the new (govtitle) 20 total (html csv)
language (arg1) is the national language and (language) 216 total (html csv)
month (arg1) is the birthstone for (month) 212 total (html csv)
monthyear (arg1) has a maturity date of (monthyear) 4 total (html csv)
percent (arg1) has a duty cycle of (percent) 193 total (html csv)
person (arg1) is the author of (person) 80 total (html csv)
planet (arg1) is the largest moon of (planet) 57 total (html csv)
programminglang (arg1) is the author of (programminglang) 18 total (html csv)
religion (arg1) is the best (religion) 112 total (html csv)
sports (arg1) is the governing body of (sports) 194 total (html csv)
sportsteam (arg1) is the venue for (sportsteam) 3 total (html csv)
states (arg1) is the capital of (states) 1,904 total (html csv)
superhero (arg1) is the new (superhero) 35 total (html csv)
temperature (arg1) have a temperature of (temperature) 41 total (html csv)
videogame (arg1) is the champion of (videogame) 37 total (html csv)
weight (arg1) has a capacity of (weight) 5 total (html csv)
years (arg1) was a finalist in (years) 191 total (html csv)