UNDERSTANDER D.2 Conceptual Dependency Scripts for … · 2.2 Knowledge Taxonomies and Ontologies...

1

Conceptual Dependency Scripts for Business Intelligence

Project Acronym UNDERSTANDER

Document-Id D.2

File name

Version Final document

Date start: 01 March 2014 end: 30 April 2014

Author(s) Violeta Damjanovic (SRFG)

QA Process Verteiler: Prüfung durch: Genehmigung durch:

2

Table of Contents

1. Introduction 1.1. Motivation 1.2. Scope 1.3. Structure of the Document

2. Related Work in Knowledge Representation and Understanding 2.1. Statistical Versus Knowledge-based Approaches 2.2 Knowledge Taxonomies and Ontologies on the Web

2.2.1 Wikipedia 2.2.2 WordNet 2.2.3 YAGO2 2.2.4 More taxonomies and ontologies 2.2.5 Related Projects

2.3 Ontology-based Question Answering Systems (QA) 3. Business Intelligence Scripts and Conceptual Dependency Theory

3.1 A Brief Overview of Conceptual Dependency 3.2 User Agents Support for Conceptual Dependency Scripts 3.3 Conceptual Dependency Scripts in UNDERSTANDER

4. Conclusions Appendix 1 - A collection of Winograd Schema questions References

3

1. Introduction The table below summarizes the main project goals, description of the content, methods and milestones related to Work Package 2 (WP2).

Goals The main goal of this report is to develop Conceptual Dependency (CD) scripts for Business Intelligence (BI), on top of the domain vocabulary developed in WP1. These BI scripts act as classifiers according to which the agent has to recognise a relevant web resource and rewrite it according to the identified script.

Description of the Content

Creating 10-12 CD scripts that cover important elements of BI in UNDERSTANDER.

Method Semi-formal specification and subsequent programming in SWI-Prolog and/or JADE.

Milestones ● Formalized generic BI scripts, plus concepts for customization to specific domains;

● Formal specification; ● Submitted paper (as a relevant conference/workshop paper)

1.1. Motivation The authors in (Allen, 1987) discuss two underlying motivations for building a computational theory: “The technological goal is simply to build better computers, and any solution that works would be acceptable. The cognitive goal is to build a computational analog of the human-language-processing mechanisms; such a theory would be acceptable only after it had been verified by experiment. Thus, the technological goal cannot be realized without using sophisticated underlying theories that are on the level being developed by theoretical linguistics. On the other hand, the present state of knowledge about natural language processing is so preliminary that attempting to build a cognitively correct model is not feasible.” Our motivation in UNDERSTANDER is to consider several existing underlying linguistics theories, and put them in an experimenting context of novel approaches driven by the Semantic Web and Linked Open Data technologies. In this report, we firstly consider several state-of-the-art technologies in knowledge representation and natural language understanding such as: taxonomies and ontologies on the Web (e.g. Wikipedia, WordNet, Freebase, Cyc, YAGO, and its extended version - YAGO2, etc.), knowledge representation-related projects (e.g. DBpedia, IBM Watson, Siri, Apache Stanbol, etc.). Secondly, we are interested to apply some of the

4

above mentioned technologies to design CD- and BI-related statements (e.g. to ground these statements in a cognitively realistic conceptual space, to determine its truth, to translate it accurately into another language, to semantically enhance and expand the knowledge dimensions around BI concepts, and so on). Knowledge representation and understanding is a prerequisite to any kind of intelligent behaviour, which is expected to be provided by BI applications when stimulating overall process knowledge performing e.g. a manufacturing process. In other words, we see BI as a knowledge phenomenon that is focused on what needs to be known to (a) stimulate new knowledge about BI and (b) perform it in a way that brings more understanding about its context. H.J. Levesque (Levesque, 2013) suggested that “in the context of question-answering what matters when it comes to the science of AI is not a good semblance of intelligent behaviour at all, but the behaviour itself, what it depends on, and how it can be achieved.” Similarly, our motivation in UNDERSTANDER is to develop a knowledge-based model of BI that can be further parameterised to answer end-user business related questions. For example, Levesque examines answering certain ad-hoc questions (posed in English) as one of the basic forms of intelligent behavior. For that purpose, Levesque discusses several behavioural tests such as the Turing Test, Winograd Schema questions, John McCarthy approach to answering questions (Levesque, 2013), and additionally - gives general suggests on Knowledge Representation. The authors in (Horkoff et al., 2012) stated that although BI systems are already widely used in many companies, data still remains the main traps when it comes to their analysis and interpretation:

● “Systems are still very technical and data-oriented, ● hard to understand what the data means, ● hard to design queries or make new reports without technical knowledge or a knowledge

of the underlying data structure, ● gap between business and IT-supplied data.”

Gap between the two worlds of business and data still remains the greatest barrier to the adoption of BI technology, and the greatest factor in the cost of applying BI technology (Barone et al., 2010). Recent move of data technologies towards Linked Data opens another set of problems. For example, the authors in (Janowicz & Hitzler, 2013) remain sceptical about the existing Linked Data: “By reviewing existing Linked Data, it becomes clear that most of them use ontologies/vocabularies that do not go beyond surface semantics, i.e., the ontologies merely consist of explicit subsumption relations together with some other relations without providing a detailed axiomatization… The lack of a deeper axiomatization prevents any interesting reasoning… Such surface ontologies make the meaningful interpretation and querying of Linked Data a difficult and manually intensive task and reduce ontology alignment techniques to educated guessing.” Inspired by both Levesque and (Horkoff et al., 2012), we take into consideration (a) Levesque’s suggests and existing approaches to answering questions in Knowledge Representation, and

5

(b) the Business Intelligence Model (BIM) approach (see UNDERSTANDER D.1 for more details) (c.f. http://bin.cs.toronto.edu/home/index.php), to jointly enable CD theory to enter the BI scene in UNDERSTANDER.

1.2. Scope The scope of our work in UNDERSTANDER WP2 explores (a) answering question methods, such as Winograd schema, (b) the BIM approach, (c) the CD concepts, (d) knowledge taxonomies and ontologies on the Web. We extend our user agents (see UNDERSTANDER D.3 for more details) developed in JADE to cover the basic concepts of CD theory. Furthermore, we create several CD scripts that describe UNDERSTANDER BI needs presented through our example of home heating technologies and manufacturers.

1.3. Structure of the Document Section 2 discusses the state-of-the-art in Knowledge Representation and understanding, considering both statistical and knowledge-based approaches. For example, prominent statistical approaches are based on finding the statistical significance (e.g. error rate, probabilities (p-value)), while knowledge-based approaches bring the knowledge components and semantics to KRs. Here, we discuss the Turing Test, multiple-choice questions, Winograd Schema questions, Closed World Assumption (CWA), Automatic Knowledge Acquisition (AKA), community-based ontology building, as existing knowledge acquisition and testing mechanisms. Section 2 continues with the discussion on existing knowledge bases, taxonomies and ontologies, discussing Wikipedia, WordNet, YAGO2, and related projects for constructing large knowledge collections. It also explores several existing ontology-based Question Answering (QA) systems. Section 3 keeps its focus on BI scripts and CD theory in UNDERSTANDER. It gives a quick view on CD core concepts, and several business scripts for home heating scenario. Section 4 concludes the document.

6

2. Related Work in Knowledge Representation and Understanding This section presents related work in Knowledge Representation (KR) technologies, considering both statistical and knowledge-based approaches, and at the same time, discussing the most prominent knowledge bases, taxonomies, and ontologies with the potential to contribute to Business Intelligence (BI).

2.1. Statistical Versus Knowledge-based Approaches So far, AI scientists have been investigating different forms of intelligent behaviour, i.e. behaviour that includes abilities such as learning, emotions, perceptions, etc. Since the 1990s, there is an obvious tendency in AI fields to consider intelligent behaviour only in terms of statistical significance (i.e. error rate, probabilities (p-value)) (Manning & Schütze, 1999). For example, the successes of the last several decades in Natural Language Processing (NLP) tasks as text summarization and question-answering have been based on statistical NLP (Levesque et al., 2012). One serious hurdle is that statistical approaches provide less knowledge-intensive behaviour, which “might produce prodigies at chees, face-recognition, Jeopardy, and so on, that are completely hopeless outside their area of expertise.” (Levesque, 2013) Recent NLP programs still have statistical components at their core, but many of them emphasize the importance of Knowledge Representation (KR) and reasoning; for example, DARPA Machine Reading Program (Strassel et al., 2010) (Etzioni et al., 2006), Recognizing Textual Entailment (RTE) (Dagan et. al, 2006) (Bobrow et al., 2007) (Rus et al., 2007), the Text Retrieval Conference (TREC, c.f. http://trec.nist.gov/overview.html). We call such approach that brings KR components to statistics, as knowledge-based. The history of developing the first KR systems starts in 70’ (Rumelhart & Ortony, 1977): (Bobrow & Norman, 1975), (Norman, 1975) have used the term “schema” for representing knowledge; (Norman et al., 1975) have used the term “definition” for the same purpose; (Minsky, 1975) proposed the theory based on “frames”, while (Winograd, 1975), (Charniak, 1975) have largely followed Minsky’s theory. (Schank & Abelson, 1975) (Schank et al., 1975) use the term “script” to refer to one class of schemata and the term “plan” to refer to a class of more abstract schemata. (Rumelhart, 1975) also uses the term “schema” to refer to a set of abstract schemata similar to Schank & Abelson’s plans. In overall, KR has provided numerous knowledge models since today, from frames and KL-ONE (Brachman & Schmolze, 1985) to recent variants of Description Logics (DLs) and ontology languages based on DLs. KR systems and methods are massively used to achieve some sort of (computationally) intelligent behaviour. One of the first computational behaviour tests performed in AI is the

7

Turing Test (Turing, 1950), called after Alan Turing who tried to answer whether machines are capable of producing observable behaviour. Turing suggested the following: “a computational account is adequate if it is able to generate behavior that cannot be distinguished over the long haul from the behaviour produced by people” (Levesque, 2013). A sort of behaviour he had in mind was participating in a natural conversation in English, over a teletype, what he calls the Imitation Game. In other words, the computer is said to pass the Turing test “if no matter how long the conversation, the interrogator cannot tell which of the two participants is the person.” Levesque emphasized the Turing Test “relies too much on deception”, which is its main deficiency. To pass the Turing Test, “a programme will either have to be evasive (and duck the question) or manufacture some sort of false identity (and be prepared to lie convincingly).” One possible solution to prevent deception and trickery is the captcha (von Ahn et al., 2003) that is “a distorted image of a multidigit number presented to a subject who is then required to identify the number.” (Levesque et al., 2012) The current computer still cannot recognize the captcha as humans do. There are several alternative to the Turing Test, which mainly have argued against viewing the Turing Test as the AI ultimate test, i.e. (Cohen, 2004) (Dennett, 1998) (Ford & Hayes, 1995 (Whitby, 1996). Cohen has suggested a system capable of producing a report on any arbitrary topic and systems capable of learning world knowledge through reading text (Levesque et al., 2012). A different approach to testing intelligent systems, which uses principles of minimum length learning to develop a test applicable to any intelligence is presented in (Hernandez-Orallo & Dowe, 2010). Another alternative to the Turing Test is known as multiple-choice questions. Asking multiple-choice questions instead of establishing a real conversation between a person and a computer rise the chances of passing the Turing Test.

The authors in (Levesque, 2013) give guidelines how to carefully create a suite of multiple-choice questions that can be resolved by the computer programs:

● Make the questions Google-proof. Access to a large corpus of English text data should not by itself be sufficient.

● Avoid questions with common pattern. ● Watch for unintended bias. The word order, vocabulary, grammar, and so on, all need

to be selected very carefully not to betray the desired answer.

Closed World Assumption (CWA) is another way to answer the question that does not require high level of understanding (Levesque, 2013). CWA assumption says the following: “If you can find no evidence for the existence of something, assume that it does not exist.” Another alternative to the Turing Test is a Winograd Schema question, named after Terry Winograd (Winograd, 1972). It is “a pair of sentences that differ only in one or two words and contain a referential ambiguity that is resolved in opposite direction in the two sentences” (Levesque et al., 2012) . In other words, it is a binary choice question with the following properties (Levesque, 2013):

● Two parties are mentioned in the question (both are males, females, objects, or groups).

8

● A pronoun is used to refer to one of them (“he,” “she,” “it,” or “they,” according to the parties).

● The question is always the same: what is the referent of the pronoun? ● Behind the scenes, there are two special words for the schema. There is a slot in the

schema that can be filled by either word. The correct answer depends on which special word is chosen.

Sentences similar to Winograd Schema have been discussed by (Hobbs, 1979), (Caramazza et al., 1977), (Goikoetxea et al., 2008), (Rohde, 2008). A library of Winograd Schema questions extracted from (Levesque et al., 2012) (Levesque, 2013) is given in Appendix 1. To test system comprehension and intelligence (ability to reason), many datasets have been created (Cooper et al., 2006). Here are only few of them:

● the RTE datasets (c.f. PASCAL Recognizing Textual Entailement Challenge (RTE-7) at TAC 2011: www.nist.gov/tac/2011/RTE/index.html);

● the Choice of Plausable Alternatives (COPA) (Roemmele et al., 2011): ● the FRACAS dataset (a set of instances, each of which is a set of premises and possible

conclusions. Some sets of premises license the conclusions, others don't; c.f. http://cogcomp.cs.illinois.edu/page/resource_view/10)

Numerous approaches have been proposed to create general-purpose ontologies on top of different KRs (Suchanek et al., 2007). For example, Automatic Knowledge Acquisition (AKA) is a class of approaches with the focus on extracting knowledge structures automatically from text corpora, e.g. pattern matching, natural language parsing, statistical learning (c.f. (Suchanek et al., 2006) (Etzioni et al., 2004) (Cafarella et al., 2005) (Agichtein & Gravano, 2000) (Snow et al., 2006) (Pantel & Pennacchiotti, 2006) (Cunningham et al., 2002)). Although automatic knowledge acquisition often provides results of remarkable accuracy, the quality is still significantly below that of a hand-crafted knowledge bases. Thus, the most successful and widely employed ontologies are still created by humans. For example, WordNet, Cyc, SUMO (c.f. http://suo.ieee.org/SUO/SUMO/), domain-specific ontologies and taxonomis such as UMLS (c.f. http://www.nlm.nih.gov/research/umls/), the GeneOntology (c.f. http://www.geneontology. org/), etc. Early resources such as the original Cyc and WordNet lack the extensional knowledge about individual entities of this word and their relationships (for example, their contains logical statement that i.e. musicians are humans, but they do not know that Leonard Cohen won the Grammy Award.) Lately, community-based ontology building approaches have entered the scene, forced by the great success of Wikipedia and novel information extraction algorithms that have revived interest in large-scale knowledge bases and enabled new approaches that could overcome the prior limitations (Hoffart et al., 2012). Notable endeavors of this kind include academic projects such as the Semantic Wikipedia project (Völkel et al., 2006f. ), the DBpedia project (a conversion of Wikipedia into RDF combined with other Linked Data sites to provide extra information, c.f. http://dbpedia.org/About), KnowItAll, Omega, WikiTaxonomy, YAGO. DBpedia harvests facts from Wikipedia infoboxes at large scale, and also interlinks its entities to other

9

sources in the Linked-Data cloud. YAGO pays attention to inferring class memberships from Wikipedia category names, and integrates this information with the taxonomic backbone of WordNet. Most of these knowledge bases represent facts in the form of subject-property-object (SPO) triples according to the RDF data model, and some provide convenient query interfaces based on languages like SPARQL (Hoffart et al., 2012). Meanwhile there are also commercial services such as the Freebase project (a community-curated database of well-known people, places and things, c.f. http://www.freebase.com/), Trueknowledge (c.f. trueknowledge.com), Wolfram Alpha (c.f. wolframalpha.com), etc.

2.2 Knowledge Taxonomies and Ontologies on the Web Web ontologies have become the most widely accepted standard artifacts for representing and reasoning upon knowledge, which is formally rendered as a set of concepts defined either within a given domain or across multiple domains, along with the logical relationships between and constraints upon, these concepts (Behrendt & Damjanovic, 2013). A widely accepted definition of ontology from the literature is the one given by Tom Gruber (Gruber, 1993): “An ontology is an explicit specification of a conceptualization.” More recently, Tom Gruber has elaborated on that definition and in (Gruber, 2009) writes: “In the context of computer and information sciences, an ontology defines a set of representational primitives with which to model a domain of knowledge or discourse. The representational primitives are typically classes (or sets), attributes (or properties), and relationships (or relations among class members).” The core feature of ontologies is the possibility to define formal models of knowledge domains by combining any type and amount of semantic structures (such as taxonomies, mereonomies, non-hierarchical relations such as equals and opposites) on the basis of relationships between “things”. All such “things” are then referenced by identifiers complying to a uniform mechanism, i.e., Uniform Resource Identifiers (URI). Each ontology provides the vocabulary (or labels) for referring to the terms in a subject area, as well as the logical statements that describe what the terms are, how they are related to each other, etc. One of the central roles of ontologies is to establish further levels of interoperability, i.e. semantic interoperability, between agents and applications on the emerging Semantic Web (Lee, 2001), as well as to add a further representation and inference layers on top of the Web’s current layers (Decker, 2000), (Hendler, 2001). (Staab & Studer, 2009) gives an comprehensive overview on ontologies. In overall, ontologies can be hand-crafted or constructed in an (semi-) automated manner. Prominent examples of hand-crafted knowledge resources are Cyc (Lenat, 1995), WordNet (Fellbaum, 1998), and SUMO (Niles & Pease, 2001); also recent ontologies such as GeoWordNet (Giunchiglia et al., 2010). While existing hand-crafted approaches have near-perfect precision, they cannot achieve the large-scale coverage of automatically constructed ontologies. In contrast to them, most automated approaches have drawn from semi-structured elements in Wikipedia and other Web

10

sources: infoboxes, category names, tables, lists, etc. The literature review including (AKBC, 2010) (Doan et al., 2008) (Weikum & Theobald, 2010), gives an overview of recent work on ontologies on the Web. The rest of this section overviews existing online ontologies/taxonomies, such as Wikipedia, WordNet, YAGO, YAGO2, and their projects (e.g. DBpedia).

2.2.1 Wikipedia Wikipedia is a free multilingual, Web-based encyclopedia (Suchanek et al., 2007), written collaboratively by volunteers. Each Wikipedia article is a single Web page and describes a single topic or entity. The majority of Wikipedia pages have been manually assigned to one or multiple categories. A Wikipedia page can also have an infobox, which is a standardized table with information about the entity described in the article. For example, there is a standardized infobox for people, that contains the birth date, the profession, and the nationality. Other widely used infoboxes exists for cities, music bands, companies, etc. Several projects have constructed a taxonomy from the Wikipedia category system alone (Ponzetto & Navigli, 2009a) (Ponzetto & Strube, 2011) (Ponzetto & Strube, 2007) (Zirn et al., 2008) (Nastase et al., 2010). For example, WikiTaxonomy (Ponzetto & Strube, 2007) (with improvements in (Ponzetto & Strube, 2011)) introduced the idea of arranging the categories of Wikipedia into a hypernymy (type-of) hierarchy. The approach restricts itself to the Wikipedia categories only. The noisy and inconsistent nature of Wikipedia's non-leaf categories leads WikiTaxonomy to construct a many-rooted taxonomy (with several thousand unrelated roots). A number of works have followed up on the WikiTaxonomy project. Zirn et al. (Zirn et al., 2008) take the WikiTaxonomy as input to decide whether a leaf node is an instance or a class. Similarly to WikiTaxonomy, the WikiNet (Nastase et al., 2010) project extracts a concept tree from Wikipedia categories. WikiNet does not distinguish between classes and instances. WikiNet also extracts a rich set of relationships between entities. In contrast, YAGO knowledge base (Hoffart et al., 2012) selects only those Wikipedia categories that are classes and takes the instances from the Wikipedia articles instead. (Hoffmann et al., 2009) developed human-machine collaborative system to complete infobox information by taking sentences from the Wikipage using KYLIN system (Wu & Weld, 2007). Kylin is an Information Extraction (IE) system that extracts training data by analysing Wikipedia infoboxes and collects the best sentences in the article body that represents a field in the infobox. In addition, Kylin makes suggestions to the user for adding missing fields in the infobox. Madwiki (DeRose et al., 2008) stores information found on the Web into structured databases, using slots pairing attribute/value (Suchanek et al., 2007). Views of these databases are generated as Wikipages, in a way that users can add and manage information. When changes

11

are generated, Madwiki takes the control and synchronize users information with the structured databases fixing inconsistencies. In addition, Madwiki uses the idea of path view language to navigate from one node to another, in the database representation.

2.2.2 WordNet WordNet is a semantic lexicon for the English language developed at the Cognitive Science Laboratory of Princeton University (Fellbaum, 1998). WordNet distinguishes between words as literally appearing in texts and the actual senses of the words. A set of words that share one sense is called a synset. There exist several projects that deal with mapping Wikipedia categories to WordNet senses, as presented in (Ponzetto & Navigli, 2009a) (Ponzetto & Navigli, 2010) (Toral et al., 2009):

● the authors in (Ponzetto & Navigli, 2010) map the Wikipedia categories of WikiTaxonomy to WordNet concepts. They found that the most frequent names heuristic has a precision of 75%, while their techniques improve precision;

● In a similar spirit, the authors in (Toral et al., 2009) map Wikipedia categories to WordNet nouns. They reported a precision of 77%.

Furthermore, WordNet++ (c.f. http://lcl.uniroma1.it/wordnetplusplus/) binds Wikipedia pages about common nouns (such as “soda drink”) to the corresponding WordNet concepts. BabelNet (a large multilingual encyclopedic dictionary and ontology, c.f. http://babelnet.org/) maps Wikipedia articles to WordNet, providing their enhancement using multilingual concepts. BabelNet does not contain facts about entities (other than lexical, taxonomic, and unspecified unlabeled relations). UWN and MENTA (Melo & Weikum, 2009) (Melo & Weikum, 2010) have added a multilingual dimension to entity and concept names, and also the class system.

2.2.3 YAGO2 YAGO2 is a huge semantic knowledge base, derived from Wikipedia, WordNet and GeoNames (c.f. http://www.mpi-inf.mpg.de/yago-naga/yago/). Currently, it contains “more than 10 million entities (such as persons, organizations, cities, etc.) and more than 120 million facts about these entities.” The linking is firstly done by linking Wikipedia entities by a type (instanceOf) relationship to suitable Wikipedia leaf category classes (Hoffart et al., 2012). Secondly, YAGO2 links its Wikipedia leaf category classes by a subclassOf relationship to suitable WordNet classes (Fellbaum, 1998). For that purpose, it uses the following algorithm, as presented by (Suchanek et al., 2007): Function wiki2wordnet(c) Input: Wikipedia category name c Output: WordNet synset 1 head =headCompound(c) 2 pre =preModifier(c) 3 post =postModifier(c) 4 head =stem(head)

12

5 If there is a WordNet synset s for pre + head 6 return s 7 If there are WordNet synsets s1, ...sn for head 8 (ordered by their frequency for head) 9 return s1 10 fail Lines 1-3 determine the head compound, the pre-modifier and the post-modifier of the category name. For example, the Wikipedia category such as “American people in Japan”, these are ”American”, ”people” and ”in Japan”, respectively. Lines 4-6 check whether there is a WordNet synset for the concatenation of pre-modifier and head compound (i.e. American person). In line 7, the head compound (person) has to be mapped to a corresponding WordNet synset (s1, ..., sn). The authors in (Suchanek et al., 2007) found that mapping the head compound simply to the most frequent synset (s1) yields the correct synset in the overwhelming majority of cases. This way, the Wikipedia class “American people in Japan” becomes a subclass of the WordNet class person/human. In that way, YAGO reuses WordNet and enriches it with the leaf categories from Wikipedia (Hoffart et al., 2012). Its compatibility with WordNet allows easy linkage and integration with other resources; for example, Universal WordNet (Melo & Weikum, 2009). YAGO presents the facts as triples of subject-predicate-object (SPO in short) in a way compatible with the RDF data model. YAGO makes extensive use of reification: every fact (SPO triple) is given an identifier, and this identifier becomes either the subject or the object of other facts. The new YAGO2 knowledge model extends SPO triple model to time and space (the authors calls it SPOTL(X) tuples) and provides the following (Hoffart et al., 2012):

● an extensible framework for fact extraction, which can tap on infoboxes, lists, tables, categories, and regular patterns in free text, and allows fast and easy specification of new extraction rules;

● an extension of the knowledge representation model tailored to capture time and space, as well as rules for propagating time and location information to all relevant facts;

● methods for gathering temporal facts from Wikipedia and for seamlessly integrating spatial types and facts from GeoNames (c.f. http://geonames.org ), in an ontologically clean and highly accurate manner;

● a new SPOTL(X) representation of spatio-temporally enhanced facts, with expressive and easy-to-use querying;

● exemplary demonstrations of the added value obtained by the spatio-temporal knowledge in YAGO2, by showing how this aids in extrinsic tasks, like question answering and named entity disambiguation.

YAGO2 excludes attributes that contain natural language text, and contains mostly entities

13

which are not present in Wikipedia. Hence, we additionally discuss two projects, such as: (i) IBM Watson, which is capable of answering question posed in natural language (developed in IBM DeepQA project), and (ii) Apache Stanbol, which semantically enhances natural language text (developed in FP7 IKS project; c.f. http://stanbol.apache.org/).

2.2.4 More taxonomies and ontologies Cyc (Lenat, 1995) has attempted to populate its semantic classes by instances gathered from the Web (Shah et al., 2006) (Hoffart et al., 2012). However, that work reported only very small coverage; the commercial products of CyCorp Inc. may have higher coverage, but there are no details published. Freebase (c.f. freebase.com ) and Trueknowledge (c.f. trueknowledge.com ) are more recent endeavors to build large-scale knowledge bases, tapping into Wikipedia as well as other sources. Both of them are also of commercial nature. Newer work (i.e. Navigli et al., 2011) has addressed the issue of taxonomy generation from the Web on a larger scale.

2.2.5 Related Projects There is a variety of academic projects for constructing large knowledge collections, using information extraction techniques on Web sources; for example: KnowItAll and its successor TextRunner (Etzioni et al., 2005) (Banko et al., 2007), DBpedia project (Auer et al., 2007), the Omnivore system (Cafarella, 2009), work on distilling Web tables and lists into facts (Cafarella et al., 2008) (Limaye et al., 2010) (Venetis et al., 2011), the ReadTheWeb project (Carlson et al., 2010), the StatSnowball methods used for building Entity-Cube (Zhu et al., 2011) and its follow-up project Probase (Wu et al., 2011), WikiNet (Nastase et al., 2010), SOFIE (Suchanek et al., 2009), Prospera (Nakashole et al., 2011), and others. Most of these approaches produce outputs in non-canonical form, with surface names and textual patterns, rather than canonicalized entities and typed relations. DBpedia is a knowledge base extracted from Wikipedia infoboxes and wiki markups. It allows for performing complex semantic queries and inferring new relations that are missing in Wikipedia. For example, DBpedia supports complex semantic queries such as “people born in Berlin before 1900.” The authors in (Torres et al., 2012) propose an approach to inject DBpedia information in Wikipedia. It is known as a Path Indexing Algorithm (PIA), which takes the resulting set of a DBPedia query and discovers the best representative navigational path in Wikipedia. Paths are generalized by replacing properties of source and range articles with wildcards forming path queries. In addition, DBpedia has placed emphasis on high recall from the infoboxes, and makes use of the YAGO taxonomy. Recently, UWN and MENTA (Melo & Weikum, 2009) (Melo & Weikum, 2010) have added a multilingual dimension to entity and concept names, and also the class system. The Kylin/KOG project (Wu & Weld, 2008) has developed learning-based methods for automatically typing Wikipedia entities and generating infobox-style facts. However, this project has not led to a publicly available knowledge base. Omega (Philpot et al., 2008) integrated WordNet with separate upper-level ontologies and populated various classes with instance collections, including locations from geo gazetteers. Sweto (Aleman-Meza et al., 2004) is a tool suite for

14

building knowledge bases in a semi-automatic manner. YAGO2 project constructs a taxonomy automatically from Wikipedia and WordNet. WikiTaxonomy (see Section 2.2.1) and YAGO (see Section 2.2.3) have emphasized high precision from Wikipedia categories, and aligning this with WordNet into a much richer class system. Apache Stanbol (c.f. https://stanbol.apache.org/) extends existing Content Management Systems (CMSs) with semantic services (Behrendt & Damjanovic, 2013). It can be also used to tag extraction/suggestion, text completion in search fields, smart content workflows, email routing based on extracted entities, topics, etc. It is built in a modular fashion, as Figure 1 shows. Each component is accessible via its own RESTful web interface.

Figure 1. Apache Stanbol Components

The main components of Apache Stanbol are:

● Content Enhancement: It includes services which add semantic information to “non-semantic” pieces of content. Apache Stanbol Enhancer provides both a RESTful API and a Java API which allow a caller to extract features from passed content. In case of RESTful API, Apache Stanbol takes the content and delivers it to a configurable chain of enhancement engines. There exist several preprocessing engines, e.g. engine for converting the content into the correct format, for extracting semantic metadata about the content, for language detection, sentence detection, for extracting entities such as persons and places directly from the text, etc.

● Reasoning: Apache Stanbol Reasoner provides a set of services that take advantage of automatic inference engines. It implements a common API for reasoning services, providing the possibility to perform different reasoners in parallel. It includes OWL API and Jena-based abstract services, which provide implementations for Jena RDFS, OWL, OWLMini and Hermit reasoning service. Apache Stanbol Reasoner can be used to automatically infer additional knowledge and obtain new facts in the knowledge base. Apart reasoning, Apache Stanbol Rules supports construction and execution of

15

inference rules. An inference rule, or transformation rule, is a syntactic rule or function, which takes premises and returns a conclusion. It adds a layer for expressing business logics by means of axioms that is encoded by inference rules. Axioms can be organized into a container that is called - a recipe, which identifies a set of rules that share the same business logic and interpret them as a whole. Rules can be expressed and processed in three different formats - SWRL, Jena rules, and SPARQL. Apache Stanbol Rules allows for integrity check for heterogeneous data coming from external sources, to prevent unwanted formats and inconsistent data.

● Knowledge Models: It includes services used to define and manipulate the data models (e.g. ontologies). Apache Stanbol Ontology Manager provides a suite of functionalities such as retrieval, aggregation, loading and concurrent management of knowledge bases. It provides a controlled environment for managing ontologies, ontology networks and user sessions for semantic data. It provides full access to ontologies, which are stored into the Apache Stanbol persistence layer. Managing an ontology network means activating or deactivating parts of a complex model, so that data can be viewed and classified under different "logical lenses".

● Persistence: It includes services that store semantic information; for example, enhanced content, entities, facts.

2.3 Ontology-based Question Answering Systems (QA) The authors in (Lopez et al., 2011) survey the QA research field, “from influential works from the AI and database communities developed in the 70s and later decades, through open domain QA stimulated by the QA track in TREC since 1999, to the latest commercial semantic [ontology-based] QA solutions.” They also give an outlook for the QA research area, focusing in particular on the R&D directions that need to be pursued to realize the goal of efficient and competent retrieval and integration of answers from large scale, heterogeneous, and continuously evolving sources on the Semantic Web (SW), as well as Linked Open Data (LOD). In addition, they agreed that ontology-based QA is a promising direction towards the realization of user-friendly interfaces. Traditionally, QA approaches have largely been focused on retrieving answers from raw text, with the emphasis on using ontologies to mark-up Web resources and improve retrieval by using query expansion (McGuinness, 2004). The novelty of ontology-based QA trend is to exploit the SW information for making sense of, and answering, user queries. The authors in (Lopez et al., 2011) classify a QA system according to four dimensions:

● the type of questions (input): e.g. systems capable of processing factual questions (facto-ids), systems enabling reasoning mechanisms, systems that fuse answers from different sources, interactive (dialog) systems, analogical reasoning systems;

● the sources (unstructured data such as documents, or structured data in a semantic or non-semantic space),

● the scope (domain-specific, open-domain), and ● the traditional intrinsic problems derived from the search environment.

Furthermore, (Lopez et al., 2011) notes that ontology-based QA systems vary on two main

16

aspects: (1) the degree of domain customization they require, which correlates with their retrieval performance, and (2) the subset of NL they are able to understand in order to reduce both complexity and the habitability problem as discussed in (Kaufmann & Bernstein, 2007). Some of the most prominent semantic QA systems are as the follows:

● AquaLog (Lopez et al., 2007): ontology independent; it uses the GATE infrastructure and resources (Cuningham et al., 2002) to obtain a set of linguistic annotations which are further extended by JAPE grammars to identify terms, relations, question indicators, features and to classify the query into a category. Knowing the category and GATE annotations for the query, the Linguistic Component creates the linguistic triples, known as Query-Triples. Query-Triples are further processed and interpreted by the Relation Similarity Service, which maps the Query-Triples to ontology-compliant Onto-Triples, from which an answer is derived.

● QACID (Fernandez et al., 2009) relies on the ontology, a collection of user queries, and an entailment engine that associates new queries to a cluster of existing queries.

● ORAKEL (Cimiano et al., 2007) is a NL interface that translates factual wh-queries into F-logic or SPARQL and evaluates them with respect to a given KB. The main feature is that it makes use of a compositional semantic construction approach thus being able to handle questions involving quantification, conjunction and negation. WordNet is used with the purpose to suggest synonyms (in the most frequent sense of the word) for the verb or noun currently edited.

● e-Librarian (Linckels, 2005) The system relies on simple, string-based comparison methods (e.g., edit distance metrics) and a domain dictionary to look up lexically related words (synonyms) because general-purpose dictionaries like WordNet are often not appropriate for specific domains. It does not return the answer to the user’s questions, but it retrieves the most pertinent document(s) in which the user finds the answer to her question.

● NLP-Reduce (Kaufmann et al., 2007), allows almost any NL input (from ungrammatical inputs, like keywords and sentence fragments, to full English sentences). It processes NL queries as bags of words, employing only two basic NLP techniques: stemming and synonym expansion.

● Querix (Kaufmann et al., 2006): it uses the Stanford parser to analyze the input query, then, from the parser’s syntax tree, extended with WordNet synonyms, it identifies triple patterns for the query. These triple patterns are matched in the synonym-enhanced KB by applying pattern matching algorithms. When a KB is chosen, the RDF triples are loaded into a Jena model, using the Pellet reasoner to infer all implicitly defined triples and WordNet to produce synonym-enhanced triples.

● GINSENG (Bernstein et al., 2006) controls a user’s input via a fixed vocabulary and predefined sentence structures through menu-based options. GINSENG uses a small static grammar that is dynamically extended with elements from the loaded ontologies and allows an easy adaptation to new ontologies.

● PANTO (Wang et al., 2007) takes a NL question as input and executes a corresponding SPARQL query on a given ontology model. It relies on the statistical Stanford parser to create a parse tree of the query from which triples are generated. They are mapped to the triples in the lexicon. The lexicon is created when a KB is loaded into the system, by

17

extracting all entities enhanced with WordNet synonyms. ● QuestIO (Tablan et al., 2008) translates NL queries into formal queries, but the system

is reliant on the use of gazetteers initialized for the domain ontology. ● FREyA (Damljanovic et al., 2010) is the successor to QuestIO, providing improvements

with respect to a deeper understanding of a question's semantic meaning, to better handle ambiguities when ontologies are spanning diverse domains. FREyA allows users to enter queries in any form; it uses the Stanford parser. Triples are generated from the ontological mappings taking into account the domain and range of the properties. In addition, FREyA is able to query large, heterogeneous and noisy single sources (or ontological graph) covering a variety of domains, such as DBpedia (Bizer et al., 2009).

● ONLI (Mithun et al., 2006) is a QA system used as front-end to the RACER reasoner. ONLI transform the user NL queries into a nRQL query format that supports the <argument, predicate, argument> triple format. It accepts queries with quantifiers and number restrictions.

● SemanticQA (Tartir et al., 2010) makes it possible to complete partial answers from a given ontology with Web documents. SemanticQA assists the users in constructing an input question as they type, by presenting valid suggestions in the universe of discourse of the selected ontology, whose content has been previously indexed with Lucene. The matching of the question to the ontology is performed by exhaustively matching all word combinations in the question to ontology entities. All generated ontological triples are further combined into a single SPARQL query.

PowerAqua (Lopez, Sabou et al., 2009) evolved from the AquaLog system to the case of multiple heterogeneous ontologies. It performs QA over structured data in an open domain scenario, allowing the system to benefit, on the one hand from the combined knowledge from the wide range of ontologies autonomously created on the SW, reducing the knowledge acquisition bottleneck problem typical of KB systems, and on the other hand, to answer queries that can only be solved by composing information from multiple sources.

18

3. Business Intelligence Scripts and Conceptual Dependency Theory The authors in (Lopez et al., 2011) emphasized that “the techniques used to solve the lexical gap between the users and the structured knowledge are largely comparable across all systems: off-the-shelf parsers and shallow parsing are used to create a triple-based representation of the user query, while string distance metrics, WordNet, and heuristics rules are used to match and rank the possible ontological representations.” They summarized the effects of various QA systems on scalability as follows (Lopez et al., 2011):

● traditional NLIDB (Natural Language Interfaces to Databases) systems are not easily adaptable or portable to new domains;

● open QA systems over free text require complicated design and extensive implementation efforts “due to the high linguistic variability and ambiguity they have to deal with to extract answers from very large open-ended collections of unstructured text.”;

● ontology-based QA systems “are limited by the single ontology assumption and they have not been evaluated with large-scale datasets.”;

● proprietary QA systems use their own encoding of the sources, so cannot be considered as interfaces to the SW;

● open semantic QA approaches “can in principle scale to the Web and to multiple repositories in the SW in a potentially wide number of domains. However, semantic indexes need to be created offline for both ontologies and documents.”

In UNDERSTANDER, we follow open semantic QA approach, while exploring how Conceptual Dependency (CD) theory and its verb-oriented organization of knowledge can improve searching capabilities of user agents. Our motivation is also to be able to recognize real world situations from web content. Hence, we aim at web understanding in some rudimentary form, and we aim at both understanding and generating semantic annotations to content. We call our approach - WebCDR (Web Conceptual Dependency Reasoner), which categorize web content according to specific world knowledge. In order to connect WebCDR with agent models, WebCDR goals should be usable as agent goals and various CD primitives should be usable as percepts of the agents. Likewise, we should be able to constrain the environment by describing it in terms of scripts that can be triggered. As discussed in D.1 “Domain Vocabulary for Business Intelligence”, CD theory is one of three UNDERSTANDER technology pillars. D.1 presents the basis of CD, and its integration with UNDERSTANDER domain vocabulary for BI (knowledge base implemented in Prolog). In this section, we only briefly look in basics of CD theory, and continue with the extension of our knowledge base (previously defined in D.4 “Business Intelligence Knowledge Base”) to support user agents using CD primitives and scripts.

3.1 A Brief Overview of Conceptual Dependency Conceptual Dependency (CD) is a theory of verb-oriented organisation of knowledge proposed

19

by (Schank & Abelson, 1975). The authors in (Lytinen, 1992) have summarised the work of Schank & Abelson and their successors. Hence, we only briefly summarize on the main aspects of CD, while referring the readers to a survey by (Lytinen, 1992). Lytinen points out that CD assumes a richly structured knowledge base and relatively simple processing rules. “The result (of using CD theory) is simpler processing theories which are highly dependent on a very rich, and highly organized, knowledge base.” In CD, there exist 11 primitives representing the semantics of possible actions: Physical Actions:

● INGEST: to take something inside an animate object; ● EXPEL: to take something from inside an animate object, or force it out; ● GRASP: the grasping of an object by an actor so that it may be manipulated; ● MOVE: the movement of a body part of an agent; ● PROPEL: the application of physical force to an object;

State Changes: ● PTRANS: to change the location of a physical object; ● ATRANS: to change an abstract relationship of a physical object;

Mental Actions: ● MTRANS: to transfer information between agents mentally; ● MBUILD: to create or combine thoughts or new information by an agent;

Instruments for other Actions: ● ATTEND: the act of focussing attention of a sense organ toward an object; ● SPEAK: the act of producing sound, including non-communicative sounds.

In addition, CD distinguishes between the following conceptual categories:

● PP - picture producer; physical object; ● ACT - one of eleven primitive actions; ● LOC - location; ● T - time; ● AA - action aider; modifications of features of an ACT; ● PA - attributes of an object of the form STATE(VALUE)

Prof. Jan Wiebe (University of Pittsburgh) listed the following weaknesses and strengths of CD (Wiebe, 2013):

● CD weaknesses ● Incompleteness: no ontology; missing entire conceptual areas; no method for

handling quantification;

20

● Primitives: are the primitives really atomic? primitives can be composed of other primitives; there are the wrong set of primitives;

● No higher level concepts; ● Many inferences are not organized around primitives.

● CD strengths ● Words decomposed into primitives, so that language processing focuses on

general concepts instead of individual words; ● Canonical representation captures commonality across different words and

structures; ● Interlingual representation facilitates machine translation; ● Words triggers CD frames that provide predictions about what will come next.

Identifies conceptual roles and helps disambiguation. Also suggests how we can finish other people’s sentences;

● Facilitates inference. Many inference are already contained in the representation itself. We can infer properties of unknown words.

As one of the first weaknesses of CD has been seen as its incompleteness, in sense of missing ontology and the entire conceptual areas, our idea in UNDERSTANDER is to ontologically support CD theory and implement its primitives through agent’s ontologies in JADE.

3.2 User Agents Support for Conceptual Dependency Scripts Our motivation for introducing CD theory to user agents in UNDERSTANDER is to explore the following: “does CD help in adding relatively sophisticated semantic descriptions to web content”? The implementation of CD primitives in JADE refer us back to Home Heating ontologies, as described in D.4 “Business Intelligence Knowledge Base”. Hence, this report of the UNDERSTANDER project discusses the extensions of previously designed Home Heating knowledge base to additionally support CD scripts and required CD primitives. At first, let us consider the following simple example of CD primitives in use in UNDERSTANDER: “Andrew searches for new home heating technologies which run on geothermal power.” Andrew builds the mental state (by himself) that is a CD-plan for searching action. It can be implemented using MBUILD mental action, that is about “focusing attention towards an object”. The CD script can be written in Prolog-like notation as follows:

MBUILD(present, Andrew, HeatingTechnologies(GeothermalPower)).

MBUILD is one of 11 CD primitives for representing semantics of possible actions (in our case, the action is “paying attention towards...”). CD conceptual role is an actor (for example, Andrew), and CD conceptual category is an object (for example, a set of heating technologies that satisfies searching criterium). In addition, the purpose of UNDERSTANDER knowledge base is to support searching functionality of user agents, which in CD theory refers to “mental

21

actions” primitive MBUILD (creating or combining thoughts or new information by an agent). In other words, UNDERSTANDER user agents are not supposed to produce the sound (SPEAK), or take something inside an animate object (INGEST), or change the location of a physical object (PTRANS). Therefore, our focus when supporting CD scripts for BI in UNDERSTANDER is on mental action primitive MBUILD. CD primitive MBUILD requires a subject (Actor), an object (MentalObject) and a secondary object (From(Actor)). These correspond to the concept Searching (for MBUILD in CD), concepts HomeHeatingManufacturers and HomeHeatingTechnologies (for MentalObject in CD). Note that we’re missing the CD conceptual role Actor in our current implementation of Home Heating ontologies in JADE. Hence, the extension of Home Heating ontologies in JADE includes addition of a new concept implementing the CD conceptual role Actor in UNDERSTANDER. For that purpose, we firstly extend the ontology vocabulary, which specifies the terminology of Concept and AgentAction interfaces. In HomeHeatingVocabulary, we’re adding the following lines of code describing CD conceptual role - Actor.

public interface HomeHeatingVocabulary { … //-------> Ontology vocabulary public static final String CD_ACTOR = "CD Actor"; public static final String CD_ACTOR_ID = "cdActorId"; public static final String CD_ACTOR_NAME = "cdActorName"; … }

Secondly, in HomeHeatingOntology, we’re adding the following code:

public class HomeHeatingOntology extends Ontology implements HomeHeatingVocabulary { … // CD Conceptual Role Actor add(cs = new ConceptSchema(CDACTOR), CDActor.class); cs.add(CD_ACTOR_ID, (PrimitiveSchema) getSchema(BasicOntology.STRING), ObjectSchema.MANDATORY); cs.add(CD_ACTOR_NAME, (PrimitiveSchema) getSchema(BasicOntology.STRING), ObjectSchema.MANDATORY); … }

22

Finally, we define class CDActor that implements CD conceptual role called Actor.

public class CDActor implements HomeHeatingVocabulary, Concept { // -------------------------------------- private String cdActorId; private String cdActorName; public String getActorsName() { return cdActorName; } public String getActorsId() { return cdActorId; } public void setActorsName(String cdActorName) { this.cdActorName = cdActorName; } public void setActorsId(String cdActorId) { this.cdActorId = cdActorId; } public boolean equalsCDActor (CDActor cdrole) { return cdrole.getActorsId().equals(this.cdActorId); } public String toString() { return cdActorName + " # " + cdActorId; } }

3.3 Conceptual Dependency Scripts in UNDERSTANDER Here follows a collection of CD scripts, which relate to home heating scenario in UNDERSTANDER, as described in Section 3 of D.4 “Business Intelligence Knowledge Base”.

“Andrew searches for new home heating technologies which run on geothermal power.” The CD script can be written in Prolog-like notation as follows: MBUILD(present, Andrew, HeatingTechnologies(GeothermalPower)).

23

MBUILD presents semantics of possible actions, e.g. searching for something specific. Andrew is an CD actor, and HomeHeatingTechnologies is an CD object.

“Andrew retrieves all home heating manufacturers from Germany.” The CD script can be written in Prolog-like notation as follows: MBUILD(present, Andrew, HeatingManufacturers(Germany)).

“Andrew retrieves all home heating manufacturers from Germany that deal with the home heating technologies which run on geothermal power.” The CD script can be written in Prolog-like notation as follows: MBUILD(present, Andrew, HeatingManufacturers(Germany),

HeatingTechnologies(GeothermalPower)).

24

4. Conclusions We adopt this line of research that employs CD theory for BI, and develop it further because it looked promising from a pragmatic, software engineering point of view. Here, we do not make any claims about cognitive science or computational linguistics, but we do claim that the models may well constitute a methodologically adequate approach for the problem of gathering BI from web resources. In addition, ontology-based systems make use of the semantic information to additionally interpret and provide precise answers to questions posed in Natural Language (NL).

25

Appendix 1 - A collection of Winograd Schema questions A selected collection of Winograd Schema questions (source (Levesque, 2013) (Levesque et al., 2012)) is shown below. We also refer readers to visit the following online collection of pre-tested Winograd schemas constructed by Hector Levesque, Ernest Davi, Ray Jackendoff, David Bender, and more: http://www.cs.nyu.edu/faculty/davise/papers/WS.html

Joan made sure to thank Susan for all the help she had given. Who had given the help?

● Joan ● Susan

Joan made sure to thank Susan for all the help she had received. Who had received the help?

● Joan ● Susan

The trophy would not fit in the brown suitcase because it was so small. What was so small?

● the trophy ● the brown suitcase

The town councillors refused to give the angry demonstrators a permit because they feared violence. Who feared violence?

● the town councillors ● the angry demonstrators

The large ball crashed right through the table because it was made of styrofoam. What was made of styrofoam?

● the large ball ● the table

The sack of potatoes had been placed below the bag of flour, so it had to be moved first. What had to be moved first?

● the sack of potatoes ● the bag of flour

Sam tried to paint a picture of shepherds with sheep, but they ended up looking more like golfers. What looked like golfers?

26

● the shepherds ● the sheep

The racecar easily passed the school bus because it was going so fast. What was going so fast?

● the racecar ● the school bus (Special=fast; other=slow)

Frank was jealous when Bill said that he was the winner of the competition. Who was the winner?

● Frank ● Bill (Special=jealous; other=happy)

Tom threw his schoolbag down to Ray after he reached the [top/bottom] of the stairs. Who reached the [top/bottom] of the stairs?

● Tom ● Ray

Although they ran at about the same speed, Sue beat Sally because she had such a [good/bad] start. Who had a [good/bad] start?

● Sue ● Sally

The sculpture rolled off the shelf because it wasn’t [anchored/ level]. What wasn’t [anchored/level]?

● The sculpture ● The shelf

Sam’s drawing was hung just above Tina’s and it did look much better with another one [below/above] it. Which looked better?

● Sam’s drawing ● Tina’s drawing

Anna did a lot [better/worse] than her good friend Lucy on the test because she had studied so hard. Who studied hard?

● Anna ● Lucy

The firemen arrived [after/before] the police because they were coming

27

from so far away. Who came from far away? ● The firemen ● The police

Frank was upset with Tom because the toaster he had [bought from/sold to] him didn’t work. Who had [bought/sold] the toaster?

● Frank ● Tom

Jim [yelled at/comforted] Kevin because he was so upset. Who was upset?

● Jim ● Kevin

The sack of potatoes had been placed [above/below] the bag of flour, so it had to be moved first. What had to be moved first?

● The sack of potatoes ● The bag of flour

Pete envies Martin [because/although] he is very successful. Who is very successful?

● Martin ● Pete

I spread the cloth on the table in order to [protect/display] it. To [protect/display] what?

● The table ● The cloth

Sid explained his theory to Mark but he couldn’t [convince/understand] him. Who did not [convince/ understand] whom?

● Sid did not convince Mark ● Mark did not understand Sid

Susan knew that Ann’s son had been in a car accident, [so/because] she told her about it. Who told the other about the accident?

● Susan ● Ann

28

The drain is clogged with hair. It has to be [cleaned/removed]. What has to be [cleaned/removed]?

● The drain ● The hair

My meeting started at 4:00 and I needed to catch the train at 4:30, so there wasn’t much time. Luckily, it was [short/delayed], so it worked out. What was [short/delayed]?

● The meeting ● The train

There is a pillar between me and the stage, and I can’t [see/see around] it. What can’t I [see/see around]?

● The stage ● The pillar

Ann asked Mary what time the library closes, [but/because] she had forgotten. Who had forgotten?

● Mary ● Ann

Bob paid for Charlie’s college education, but now Charlie acts as though it never happened. He is very [hurt/ungrateful]. Who is [hurt/ungrateful]?

● Bob ● Charlie

At the party, Amy and her friends were [chatting/barking]; her mother was frantically trying to make them stop. It was very strange behavior. Who was behaving strangely?

● Amy’s mother ● Amy and her friends

The dog chased the cat, which ran up a tree. It waited at the [top/bottom] Which waited at the [top/bottom]?

● The cat ● The dog

Sam and Amy are passionately in love, but Amy’s parents are unhappy about it, because they are [snobs/fifteen]. Who are [snobs/fifteen]?

29

● Amy’s parents ● Sam and Amy

Mark told Pete many lies about himself, which Pete included in his book. He should have been more [truthful/skeptical]. Who should have been more [truthful/skeptical]?

● Mark ● Pete

Since it was raining, I carried the newspaper [over/in] my backpack to keep it dry. What was I trying to keep dry?

● The backpack ● The newspaper

Jane knocked on Susan’s door, but she didn’t [answer/get an answer]. Who didn’t [answer/get an answer]?

● Jane ● Susan

Sam tried to paint a picture of shepherds with sheep, but they ended up looking more like [dogs/golfers]. What looked like [dogs/golfers]?

● The sheep ● The shepherds

Thomson visited Cooper’s grave in 1765. At that date he had been [dead/travelling] for five years. Who had been [dead/travelling] for five years?

● Thompson ● Cooper

Tom’s daughter Eva is engaged to Dr. Stewart, who is his partner. The two [doctors/lovers] have known one another for ten years. What two people have known one another for ten years?

● Tom and Dr. Stewart ● Eva and Dr. Stewart

The actress used to be named Terpsichore, but she changed it to Tina a few years ago, because she figured it was [easier/too hard] to pronounce. Which name was [easier/too hard] to pronounce?

30

● Tina ● Terpsichore

Sara borrowed the book from the library because she needs it for an article she is working on. She [reads/writes] when she gets home from work. What does Sara [read/write] when she gets home from work?

● The book ● The article

Fred is the only man still alive who remembers my greatgrandfather. He [is/was] a remarkable man. Who [is/was] a remarkable man?

● Fred ● My greatgrandfather

Fred is the only man alive who still remembers my father as an infant. When Fred first saw my father, he was twelve [years/months] old. Who was twelve [years/months] old?

● Fred ● My father

There are too many deer in the park, so the park service brought in a small pack of wolves. The population should [increase/decrease] over the next few years. Which population will [increase/decrease]?

● The wolves ● The deer

Archaeologists have concluded that humans lived in Laputa 20,000 years ago. They hunted for [deer/evidence] on the river banks. Who hunted for [deer/evidence]?

● The prehistoric humans ● The archaeologists

The scientists are studying three species of fish that have recently been found living in the Indian Ocean. They [appeared/began] two years ago. Who or what [appeared/began] two years ago?

● The fish ● The scientists

The journalists interviewed the stars of the new movie. They were very

31

[cooperative/persistent], so the interview lasted for a long time. Who was [cooperative/persistent]?

● The stars ● The journalists

I couldn’t find a spoon, so I tried using a pen to stir my coffee. But that turned out to be a bad idea, because it got full of [ink/coffee]. What got full of [ink/coffee]?

● The coffee ● The pen

32

References (Agichtein & Gravano, 2000) E. Agichtein and L. Gravano. Snowball: extracting

relations from large plain-text collections. In ICDL, 2000. (AKBC, 2010) First workshop on automated knowledge base construction, 2010. (Aleman-Meza et al., 2004) B. Aleman-Meza, C. Halaschek, A. Sheth, I. B. Arpinar, G.

Sannapareddy, SWETO: Large-Scale Semantic Web Test-bed, in: In 16th International Conference on Software Engineering and Knowledge Engineering, SEKE 2004: Workshop on Ontology in Action, Ban, Canada, pp. 21-24.

(Allen, 1987) Allen, J.F., 1987. Natural Language Understanding. Menlo Park, CA: The Benjamin/Cummings Publishing Company, Inc.

(Auer et al., 2007) S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, Z. G. Ives, DBpedia: A Nucleus for a Web of Open Data, in: The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, pp. 722-735.

(Banko et al., 2007) M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, O. Etzioni, Open Information Extraction from the Web, in: Proceedings of the 20th International Joint Conference on Artificial Intelligence, IJCAI 2007, Hyderabad, India, pp. 2670-2676.

(Behrendt and Damjanovic, 2013) W. Behrendt and V. Damjanovic. "Developing Semantic CMS Applications: The IKS Handbook", Salzburg Research Publisher, ISBN 978-3-902448-35-4 (1st Edition. 124 pp.). March, 2013. Online available: http://www.iks-project.eu/resources/developing-semantic-cms-applications-iks-handbook-2013

(Bernstein et al., 2006) Bernstein, A., Kauffmann, E., Kaiser, C., and Kiefer, C. (2006). Ginseng: A Guided Input Natural Language Search Engine. In Proc. of the 15th workshop on Information Technologies and Systems (WITS 2005), p.45-50. MV-Wissenschaft, M{ü}nster.

(Bizer et al., 2009) Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellman, S. (2009): DBPedia. A Crystal-lization Point for the Web of Data. Journal of Web Se-mantics: Science, Services and Agents on the World Wide Web, 7(3): 154-165. Elsevier.

(Bobrow & Norman, 1975) Bobrow, D.G. & Norman, D.A. Some Principles of Memory Schemata. In D.G. Bobrow & A.M. Collins (Eds.) Representation and Understanding: Studies in Cognitive Science. New York: Academic Press, 1975.

(Bobrow et al., 2007) Bobrow, D., Condoravdi, C., Crouch, R., de Paiva, V., Kartunen, L., King, T., Mairn, B., Price, L., Zaenen, A. Precision-focused Textual Inference. In Proceedings Workshop on Textual Entailment and Paraphrasing, 2007.

33

(Brachman & Schmolze, 1985) Brachman, R.J and Schmolze, J. G. (1985). "An Overview of the KL-ONE Knowledge Representation System". Cognitive Science 9 (2): 171.

(Cafarella et al., 2005) M. J. Cafarella, D. Downey, S. Soderland, and O. Etzioni. KnowItNow: Fast, scalable information extraction from the web. In EMNLP, 2005.

(Cafarella et al., 2008) M. J. Cafarella, A. Halevy, D. Z. Wang, E. Wu, Y. Zhang, WebTables: Exploring the Power of Tables on the Web, Proc. VLDB Endow. 1 (2008) 538-549.

(Cafarella, 2009) M. J. Cafarella, Extracting and Querying a Comprehensive Web Database, in: 4th Biennial Conference on Innovative Data Systems Research, CIDR 2009, Asilomar, USA.

(Caramazza et al., 1977) Caramazza, A., Grober, E., & Garvey, C. (1977). Comprehension of anaphoric pronouns. Journal of Verbal Learning & Verbal Behavior, 16, 601-609.

(Carlson et al., 2010) A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. H. Jr., T. M. Mitchell, Toward an Architecture for Never-Ending Language Learning, in: Proceedings of the 24th AAAI Conference on Artificial Intelligence, AAAI 2010, Atlanta, Georgia, USA, pp. 1306-1313.

(Charniak, 1975) Charniak, E. Organization and Inference in a Frame-like Systems of Common Sense Knowledge. In Proceedings of Theoretical Issues in Natural language Processing: An Interdisciplinary Workshop. Cambridge, Mass.: Bolt, Berenek & Newman, Inc. 1975. pp. 42--51.

(Cimiano et al., 2007) Cimiano, P., Haase, P., Heizmann, J. (2007): Porting Natural Language Interfaces between Domains: An Experimental User Study with the ORAKEL System. In Chin, D. N., Zhou, M.X., Lau, T. S. and Puerta A. R., editors. In Proc. of the International Conf. on Intelligent User Interfaces, p. 180-189, Gran Canaria, Spain. ACM.

(Cohen, 2004) Cohen, P. 2004. If not the Turing Test, Then What? In Proceedings of the 19th National Conference on AI. Menlo Park, California: AAAI Press.

(Cooper et al., 2006) Cooper, R., Crouch, D., Eijckl, J.V., Fox, C., Genabith, J.V., Japars, J., Kamp, H., Milward, D., Pinkal, M., Poesio, M., Pulman, S., 1996. A Framework for Computational Semantics (FraCaS). Technical Report. The FraCaS Consortium.

(Cuningham et al., 2002) Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V. (2002): GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proc. of the 40th Anniversary Meeting of the Association for Computational Linguistics, 54: 168-175. Association for Computational Linguistics.

(Cunningham et al., 2002) H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan.

34

GATE: A framework and graphical development environment for robust NLP tools and applications. In ACL, 2002.

(Dagan et. al, 2006) Dagan, I., Glicksman, O., Magnini, B. The PASCAL Recognizing Textual Entailment Challenge. In Machine Learning Challenge: LNAI 3944. Springer Verlag. 2006

(Damljanovic et al., 2010) Damljanovic, D., Agatonovic, M., Cunningham, H. (2010): Natural Language interface to ontologies: Combining syntactic analysis and ontology-based lookup through the user interaction. In Aroyo, L., Antoniou, G., Hyvönen, E., ten Teije, A., Stuckenschmidt, H., Cabral, L. and Tudorache, T., editors. In Proc. of the European Semantic Web Conference, Heraklion, Greece. Springer Verlag.

(Decker, 2000) S. Decker, 2000. The Semantic Web: The Roles of XML and RDF. In IEEE Internet Computing. Vol. 4, No. 5, pp. 63-74.

(Dennett, 1998) Dennett, D, 1998. Can Machines Think? In mather, G., Verstreten, F., and Anstis, S., (Eds.) The Motion Aftereffect. MIT Press.

(DeRose et al., 2008) De Rose, P., Chai, X., Gao, B.J., Shen, W., Doan, A., Bohannon, P., Zhu, X. Building community wikipedias: A machine-human partnership approach. In G. Alonso, J.A. Blakeley, A.L.P. Chen, Eds. ICDE, pp. 646-655. IEEE, 2008.

(Doan et al., 2008) A. Doan, L. Gravano, R. Ramakrishnan, S. Vaithyanathan (Eds.), SIGMOD Rec. Special Section on Managing Information Extraction, Vol.37(4), 2008.

(Etzioni et al., 2004) Etzioni, O., Cafarella, M.J., Downey, D., Kok, S., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A., Web scale information extraction in KnowItAll. In WWW 2004.

(Etzioni et al., 2005) O. Etzioni, M. J. Cafarella, D. Downey, A.-M. Popescu, T. Shaked, S. Soderland, D. S.Weld, A. Yates, Unsupervised Named-Entity Extraction from the Web: An Experimental Study, Artif. Intell. 165 (2005) 91-134.

(Etzioni et al., 2006) Etzioni, O., Banko, M., Cafarella, M., Machine Reading. In Proceedings of the 21st National Conference on AI. Menlo Park, California. AAAI Press.

(Fazzinga et al., 2011) B. Fazzinga, G. Gianforme, G. Gottlob, and T. Lukasiewicz. Semantic Web Search Based on Ontological Conjunctive Queries, Journal of Web Semantics, 9, 453-473, December 2011.

(Fellbaum, 1998) C. Fellbaum (ed.). WordNet: An Electronic Lexical Database. MIT Press, 1998.

(Fellbaum, 1998) C. Fellbaum, WordNet: An Electronic Lexical Database, MIT Press, 1998.

(Fellbaum, 1998) C. Fellbaum, WordNet: An Electronic Lexical Database, MIT Press, 1998.

35

(Fernandez et al., 2009) Fernandez O, Izquierdo R, Ferrandez S, Vicedo J. L (2009): Addressing Ontology-based question answering with collections of user queries. Information Processing and Management, 45 (2): 175-188. Elsevier.

(Ford & Hayes, 1995) Ford, K. & Hayes, P., 1995. Turing Test Considered Hermful. In Proceedings of the 14th International Joint Conference on AI, 972-977. San Mateo, California. Morgan Kaufmann.

(Giunchiglia et al., 2010) F. Giunchiglia, V. Maltese, F. Farazi, B. Dutta, GeoWordNet: A Resource for Geospatial Applications, in: The Semantic Web: Research and Applications, 7th Extended Semantic Web Conference, ESWC 2010, Heraklion, Crete, Greece, pp. 121-136.

(Goikoetxea et al., 2008) Goikoetxea, E., Pascual, G., & Acha, J. (2008). Normative study of implicit causality in 100 interpersonal verbs in Spanish. Behavior Research Methods, Instruments, & Computers, 40, 760–772.

(Gruber, 1993) T.R. Gruber. Towards Principles for the Design of Ontologies Used for Knowledge Sharing. In N. Guarino and R. Poli, editors, Formal Ontology in Conceptual Analysis and Knowledge Representation, Deventer, The Netherlands, 1993. Kluwer Academic Publishers.

(Gruber, 2009) Thomas R. Gruber. Ontology. In Encyclopedia of Database Systems, pages 1963–1965. Springer-Verlag, 2009.

(Hendler, 2001) J. Hendler, 2001. Agents and the Semantic Web. In IEEE Intelligent Systems. Vol. 16, No. 2, pp. 30-37

(Hernandez-Orallo & Dowe, 2010) Hernandez-Orallo, J. and Dowe, D.L., Measuring Universal Intelligence: Toward an Anytime Intelligence Test. Artificial Intelligence 174(18): 1508-1539, 2010.

(Hobbs, 1979) Hobbs, J.R. 1979. Coherence and coreference. Cognitive Science 3(1):67–90.

(Hoffmann et al., 2009) R. Hoffmann, S. Amershi, K. Patel, F. Wu, J. Fogarty, and D. S. Weld. Amplifying community content creation with mixed initiative information extraction. In Proceedings of the 27th international conference on Human factors in computing systems, CHI '09, pp. 1849-1858, USA, 2009.

(Horkoff et al., 2012) J. Horkoff, D. Barone, L. Jiang, E. Yu, D. Amyot, A. Borgida, J. Mylopoulos, 2012. NSERC Report. Online available: http://www.cs.utoronto.ca/~jenhork/Presentations/BIM_TorontoSELab_July2012.pdf

(Janowicz & Hitzler, 2013). K. Janowicz, and P. Hitzler. Thoughts on the Complex Relation Between Linked Data, Semantic Annotations, and Ontologies. In Proceedings of the 6th International workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR 2013), pp. 41-44, ISBN: 978-1-4503-2413-7.

(Kaufmann & Bernstein, 2007) Kaufmann, E., Bernstein, A. (2007): How Useful Are

36

Natural Language Interfaces to the Semantic Web for Casual End-Users?. In Aberer, K., Choi, K. S., Noy, N., Allemang, D., Lee, K. I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., and Cudré-Mauroux, P., editors. In Proc. of the 6th International Semantic Web Conference, pp. 281--294. Busan, Korea, Springer LNCS Vol. 4825.

(Kaufmann et al., 2006)Kauffmann, E., Bernstein, A., and Zumstein, R. (2006): Querix: A Natural Language Interface to Query Ontologies Based on Clarification Dialogs. In Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M. and Aroyo, L., editors. In Proc. of the 5th International Semantic Web Conference, p.980-981, Athens, USA. Springer LNCS, Vol. 4273.

(Kaufmann et al., 2007) Kauffmann, E., Bernstein, A., and Fischer, L. (2007): NLP-Reduce: A “naïve” but Domain-independent Natural Language Interface for Querying Ontologies. In Franconi, E., Kifer, M. and May, W., editors. In Proc. of the 4th European Semantic Web Conference, p.1-2, Innsbruck. Springer Verlag.

(Lee, 2001) T.B. Lee, 2001. The Semantic Web. Scientific American, Vol. 284, No. 5, pp. 34–43.

(Lenat, 1995) D. B. Lenat, CYC: A Large-Scale Investment in Knowledge Infrastructure, Commun. ACM 38 (1995) 3238.

(Levesque et al., 2012) H.J. Levesque, E. Davis, L. Morgenstern, The Winograd Scheme Challenge, Proceedings of KR-2012, Rome, 2012

(Levesque, 2013) H.J. Levesque. On Our Best Behaviour. In Proceedings of the IJCAI-13 Conference. Beijing, 2013. http://www.cs.toronto.edu/~hector/Papers/ijcai-13-paper.pdf

(Limaye et al., 2010) G. Limaye, S. Sarawagi, S. Chakrabarti, Annotating and searching web tables using entities, types and relationships, Proc. VLDB Endow. 3 (2010), pp. 1338-1347.

(Linckels, 2005) Linckels S., M.C. (2005): A Simple Solution for an Intelli-gent Librarian System. In Proc. of the IADIS International Conference of Applied Computing, p.495-503.

(Lopez et al., 2007) Lopez, V., Uren, V., Motta, E. and Pasin, M. (2007): AquaLog: An ontology-driven question answering system for organizational semantic intranets. Journal of Web Semantics: Science Service and Agents on the World Wide Web, 5(2): 72-105.

(Lopez et al., 2011) Lopez, V., Uren V., Sabou, M., Motta, E., 2011. Is Question Answering fit for the Semantic Web? A Survey. In Semantic Web 2(2). 125--155. 2011

(Lytinen, 1992) S.L. Lytinen, "Conceptual Dependency and its Descendants". Computers, Mathematics, and Applications 23(2-5):51-73. Online available from: http://deepblue.lib.umich.edu/bitstream/2027.42/30278/1/0000679.pdf

(Manning & Schütze, 1999) Manning, C. and Schütze, H., Foundations of Statistical

37

Natural Language Processing. Cambridge, Mass., MIT Press, 1999. (Mc Guinness, 2004) Mc Guinness, D. (2004): Question Answering on the Semantic

Web. IEEE Intelligent Systems, 19(1): 82-85. (Melo & Weikum, 2009) G. de Melo, G. Weikum, Towards a Universal Wordnet by

Learning from Combined Evidence, in: Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, Hong Kong, China, pp. 513-522.

(Melo & Weikum, 2010) G. de Melo, G. Weikum, MENTA: Inducing Multilingual Taxonomies from Wikipedia, in: Proceedings of the 19th ACM Conference on Information and Knowledge Management, CIKM 2010, Toronto, Canada, pp. 1099-1108.

(Minsky, 1975) Minsky, M.A. A Framework for Representing Knowledge. In P.H. Winston (Ed.), The Psychology of Computer Vision. New York: McGraw-Hill, 1975.

(Mithun et al., 2006) Mithun, S., Kosseim, L., Haarslev, V. (2007): Resolving quantifier and number restriction to question OWL on-tologies. In Proc. of The First International Workshop on Question Answering at the International Conference on Semantics, Knowledge and Grid, Xi’an, China.

(Nakashole et al., 2011) N. Nakashole, M. Theobald, G. Weikum, Scalable knowledge harvesting with high precision and high recall, in: Proceedings of the fourth ACM international conference on Web search and data mining, WSDM 2011, Hong Kong, China, pp. 227-236.

(Nastase et al., 2010) V. Nastase, M. Strube, B. Boerschinger, C. Zirn, A. Elghafari, WikiNet: A Very Large Scale Multi-Lingual Concept Network, in: Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010, La Valetta, Malta.

(Nastase et al., 2010) V. Nastase, M. Strube, B. Boerschinger, C. Zirn, A. Elghafari, WikiNet: A Very Large Scale Multi-Lingual Concept Network, in: Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010, La Valetta, Malta.

(Navigli et al., 2011) R. Navigli, P. Velardi, S. Faralli, A graph-based algorithm for inducing lexical taxonomies from scratch, in: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, IJCAI 2011, Barcelona, Spain, pp. 1872-1877.

(Niles & Pease, 2001) I. Niles, A. Pease, Towards a standard upper ontology, in: Proceedings of the 2nd International Conference on Formal Ontology in Information Systems, FOIS 2001, Ogunquit, Maine, pp. 2-9.

(Norman, 1975) Norman, D.A. Resources and Schemas Replace Stages of Processing. Presented at the 16th Annual Meeting. The Psychonomic Society, Denver, Colorado, 1975.

38

(Pantel & Pennacchiotti, 2006) P. Pantel and M. Pennacchiotti. Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In ACL, 2006.

(Philpot et al., 2008) A. Philpot, E. H. Hovy, P. Pantel, Ontology and the Lexicon, chapter: The Omega Ontology, Cambridge University Press, 2008.

(Ponzetto & Navigli, 2009a) S. P. Ponzetto, R. Navigli, Large-Scale Taxonomy Mapping for Restructuring and Integrating Wikipedia, in: Proceedings of the 21st International Joint Conference on Artificial Intelligence, IJCAI 2009, Pasadena, California, USA, pp. 2083-2088.

(Ponzetto & Navigli, 2010) S. P. Ponzetto, R. Navigli, Knowledge-rich Word Sense Disambiguation rivaling supervised system, in: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, Uppsala, Sweden, pp. 1522-1531.

(Ponzetto & Strube, 2007) S. P. Ponzetto, M. Strube, Deriving a Large-Scale Taxonomy from Wikipedia, in: Proceedings of the 22nd AAAI Conference on Artificial Intelligence, AAAI 2007, Vancouver, British Columbia, Canada, 2007, pp. 1440-1445.

(Ponzetto & Strube, 2011) S. P. Ponzetto, M. Strube, Taxonomy induction based on a collaboratively built knowledge repository, Artificial Intelligence 175 (2011) 1737-1756.

(Roemmele et al., 2011) Roemmele, M., Bejan, C., Gordon, A., 2011. Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning. In Proceedings of the International Symposium on Logical Formalizations of Commonsense Reasoning.

(Rumelhart & Ortony, 1977) Rumelhart, D. E., & Ortony, A. The representation of knowledge in memory. In R. C. Anderson, R. J. Spiro, & W. E. Montague (Eds.), Schooling and the acquisition of knowledge. Hillsdale, N.J.: Erlbaum, 1977.

(Rumelhart, 1975) Rumelhart, D.E. Notes on a Schema for Stories. In D.G. Bobrow & A.M. Collins (Eds.), Representation and Understanding: Studies in Cognitive Science. New York: Academic Press, 1975.

(Rus et al., 2007) Rus, V., McCarthy, P., McNamara, D., Graesser, A. A Study of Textual Entailment. International Journal of AI Tools, 17, 2007.

(Schank & Abelson, 1975) Schank, R.C. & Abelson, R.P. “Scripts, Plans and Knowledge”. Thinking: Readings in Cognitive Science, Proceedings of the Fourth International Joint Conference on Artificial Intelligence, page 151-157. Tbilisi, USSR, (1975). Online available: http://oak.conncoll.edu/parker/com316/progassign/scripts.pdf

(Schank et al., 1975) Schank, R.C. & The Yale, A.I. Project. SAM - A Story Understander. New Haven, Connecticut: Yale University, dept. of CS, 1975.

(Shah et al., 2006) P. Shah, D. Schneider, C. Matuszek, R. C. Kahlert, B. Aldag, D. Baxter, J. Cabral, M. J. Witbrock, J. Curtis, Automated Population of Cyc:

39

Extracting Information about Named-entities from the Web, in: Proceedings of the Nineteenth International Florida Articial Intelligence Research Society Conference, FLAIRS 2006, Melbourne Beach, Florida, USA, pp.153-158.

(Shoham & Leyton-Brown, 2009) Y. Shoham and K. Leyton-Brown, “Multiganet Systems: Algorithmic, Game-Theoretic, and Logical Foundations”, Cambridge University Press, 2009. Online available: http://www.masfoundations.org/

(Snow et al., 2006) R. Snow, D. Jurafsky, and A. Y. Ng. Semantic taxonomy induction from heterogenous evidence. In ACL, 2006.

(Sridevi & Umarani, 2013) K. Sridevi a& R. Umarani. Ontology Ranking Algorithms on Semantic Web: A Review. International Journal of Advanced Research in Computer and Communication Engineering, Vol. 2, Issue 9, September 2013.

(Staab & Studer, 2009) S. Staab, R. Studer, Handbook on Ontologies, Springer, 2. edition, 2009.

(Strassel et al., 2010) Strassel, S., Adams, D. Goldberg, H., Herr, J., Keesinger, R., Oblinger, D., Simpson, H., Schrag, R., Wright, J. The DARPA Machine Reading Program - Encouraging Linguistic and Reasoning Research with a Series of Reading Tasks. In International Conference on Language Resources and Evaluation (LREC), 2010.

(Suchanek et al., 2006) F. M. Suchanek, G. Ifrim, and G. Weikum. Combining linguistic and statistical analysis to extract relations from web documents. In KDD, 2006.

(Suchanek et al., 2007) F. M. Suchanek, G. Kasneci, G. Weikum, YAGO: A Core of Semantic Knowledge, in: Proceedings of the 16th international conference on World Wide Web, WWW 2007, Ban, Canada, pp. 697-706.

(Suchanek et al., 2007) Suchanek, F.M., Kasneci, G., Weikum, G., 2007. Yago: A Large Ontology from Wikipedia and WordNet. MPI-I-2007-5-003. Online: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.85.8206&rep=rep1&type=pdf

(Suchanek et al., 2009) F. M. Suchanek, M. Sozio, G. Weikum, SOFIE: A Self-Organizing Framework for Information Extraction, in: Proceedings of the 18th International Conference on World Wide Web, WWW 2009, Madrid, Spain, pp. 631-640.

(Tablan et al., 2008) Tablan, V, Damljanovic, D., and Bontcheva, K. (2008): A Natural Language Query Interface to Structured Information. In Bechhofer, S., Hauswirth, M., Hoffmann, J. and Koubarakis, M., editors. In Proc. of the 5th

European Semantic Web Conference, p.1-15, Tenerife, Spain. Springer Verlag. (Tartir et al., 2010) Tartir, S., and Arpinar, I. B. (2010): Question Answering in Linked

Data for Scientific Exploration. In the 2 ndAn-nual Web Science Conference. ACM.

(Toral et al., 2009) A. Toral, O. Ferrandez, E. Agirre, R. Mu~noz, A study on linking wikipedia categories to wordnet synsets using text similarity, in: Recent

40

Advances in Natural Language Processing, RANLP 2009, Borovets, Bulgaria, Association for Computational Linguistics, 2009, pp. 449-454.

(Torres et. al., 2012) Torres, D., Molli, P., Skaf-Molli, H., Diaz., A. Improving Wikipedia with DBpedia. In WWW 2012, SWCS’12 Workshop. Online: http://www2012.wwwconference.org/proceedings/companion/p1107.pdf

(Venetis et al., 2011) P. Venetis, A. Halevy, J. Madhavan, M. Pasca, W. Shen, F. Wu, G. Miao, C. Wu, Recovering Semantics of Tables on the Web, in: Proc. VLDB Endow. 4 (2011), pp. 528-538.

(Völkel et al., 2006) Völkel, M., Krötzsch, M., Vrandecic, D., Haller, H., Studer, R., 2006. Semantic wikipedia. In WWW, 2006.

(von Ahn et al., 2003) von Ahn, L., Blum, M., Hopper, N., Langford, J. CAPTCHA: Using Hard AI Problems for Security. In Eurocrypt-2003, 294-311. 2003.

(Wang et al., 2007) Wang, C, Xiong, M., Zhou, Q., Yu, Y. (2007): PANTO: A portable Natural Language Interface to Ontologies. In Franconi, E., Kifer, M., May, W., editors. In Proc. of the 4th European Semantic Web Conference, p.473-487, Innsbruck, Austria. Springer Verlag.

(Weikum & Theobald, 2010) G. Weikum, M. Theobald, From Information to Knowledge: Harvesting Entities and Relationships from Web Sources, in: Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems of data, Indianapolis, PODS 2010, Indiana, USA, pp. 65-76.

(Whitby, 1996) Whitby, B., 1996. Why the Turing Test is AI’s Biggest Blind Alley. In Millican, P., and Clarck, A. (Eds.) Machine and Thought. Oxford University Press.

(Wiebe, 2013) Wiebe, J., 2013. Course CS3730: Conceptual Dependency Theory (CD). Online available: http://people.cs.pitt.edu/~wiebe/courses/CS3730/Spring2013/cd.pdf

(Winograd, 1972) Winograd, T. Understanding Natural Language. New York: Academic Press, 1972.

(Winograd, 1975) Winograd, T. Frame Representation and the Declarative-Procedural Controversy. In D.G. Bobrow & A.M. Collins (Eds.), Representation and Understanding: Studies in Cognitive Science, New York: Academic Press, 1975.

(Wu & Weld, 2007) F. Wu and D. S. Weld. Autonomously semantifying Wikipedia. In M. J. Silva, A. H. F. Laender, R. A. Baeza-Yates, D. L. McGuinness, B. Olstad, . H. Olsen, and A. O. Falc~ao, editors, CIKM, pp- 41-50. ACM, 2007.

(Wu & Weld, 2008) F. Wu, D. S. Weld, Automatically Refining the Wikipedia Infobox Ontology, in: Proceeding of the 17th international conference on World Wide Web, WWW 2008, Beijing, China, ACM, 2008, pp. 635-644.

(Wu et al., 2011) W. Wu, H. Li, H. Wang, K. Zhu, Towards a Probabilistic Taxonomy of Many Concepts, Technical Report MSR-TR-2011-25, Microsoft Research Beijing, 2011.

41

(Zhu et al., 2011) J. Zhu, Z. Nie, X. Liu, B. Zhang, J.-R. Wen, StatSnowball: A Statistical Approach to Extracting Entity Relationships, in: Proceedings of the 18th international conference on World Wide Web, WWW 2011, Madrid, Spain, pp. 101-110.

(Zirn et al., 2008) C. Zirn, V. Nastase, M. Strube, Distinguishing between instances and classes in the wikipedia taxonomy, in: The Semantic Web: Research and Applications, 5th European Semantic Web Conference, ESWC 2008, Tenerife, Canary Islands, Spain, pp. 376-387.

UNDERSTANDER D.2 Conceptual Dependency Scripts for … · 2.2 Knowledge Taxonomies and Ontologies...

Documents

Transcript of UNDERSTANDER D.2 Conceptual Dependency Scripts for … · 2.2 Knowledge Taxonomies and Ontologies...