Tutorial: “Acquisition of Complex Logical Knowledge: Authoring, Assimilation, and Assembly”

At K-CAP 2015 (8th International Conference on Knowledge Capture), in Palisades, NY, USA

 

Half-day on October 7, 2015 (either morning or afternoon; TBD)

 

Presenters:

 

Abstract:

Machine learning (e.g., text extraction) and graph databases (e.g., SPARQL/RDF) are currently hot topics in the industry. However, to deliver business/social value more effectively – e.g., for policy/legal compliance in finance, marketing, and health; personalized tutors in science; and human-computer interaction – these technologies need to be combined with more complex human-authored knowledge and deeper reasoning. Recently new expressively powerful yet computationally affordable techniques have emerged for such knowledge representation and reasoning, particularly for Rulelog, a rich form of semantic rules and ontologies that can integrate closely with natural language (NL) as well as machine learning. A large subset of Rulelog is in draft as an industry standard. It overcomes many of the limitations of classical logic.

In this tutorial, we cover the fundamentals and latest progress in acquiring such complex logical knowledge. Recent techniques for human authoring starting from source text have significantly reduced the effort and skill required per sentence; these techniques rely on explanations, NL interpretation, logic-based NL generation, and collaborative curation. Recent techniques for assimilating and automatically importing knowledge from structured sources have significantly reduced the cost and increased the scale of knowledge base assembly, using standards-centric semantic interoperability plus ontological mapping knowledge.

 

Overview of content, aims, presentation style:

The overall presentation style will be typical for tutorials: based on a slide deck, with interleaved Q&A and discussion, plus live demos of some tools. Examples of knowledge, reasoning, KA, and tools will be sprinkled throughout, drawing on several domains including financial/legal and science/education. We will illustrate many of them using Coherent Knowledge System’s state-of-the-art Ergo suite for Rulelog semantic rules and ontologies. The tutorial will touch upon machine learning, but mainly focus on two other kinds of knowledge capture ‒ (1) human-authored knowledge and (2) knowledge assimilated via semantic interoperability from structured sources ‒ with hooks for machine learning to be combined more tightly in future with these two other kinds.

Below is a preliminary outline, annotated with rough estimates of time allocations.

  • Introduction (15 min.)
    • Get acquainted: participants give their names and motivations
    • What is complex logical knowledge
      • Beyond databases; chained reasoning; beyond OWL; applications at high-level
    • The kinds of knowledge capture / acquisition (KA) of complex knowledge
      • Authoring from text, i.e., from natural language (NL); assimilation via semantic interoperability from structured sources; machine learning
    • Representation and reasoning for complex knowledge (35 min.)
      • Classical logic; ISO Common Logic standard
      • Rulelog, declarative logic programs, meta expressiveness; RuleML; W3C RIF standard
        • Defeasible higher-order logic formulas; restraint bounded rationality
      • Probabilistic extensions of logic; integrating with machine learning (ML)
    • Applications of complex knowledge (15 min.)
      • Policies and regulations; science; analysis; HCI; search
      • Financial/insurance, e-commerce/marketing, education, health, security, mobile
    • KA from structured sources via automated import/assimilation/integration (15 min.)
      • Semantic interoperability, incl. with OWL/RDF and SPARQL
      • Ontology mappings
    • KA from text (NL) overall (15 min.)
      • Textual Logic – conceptually: logic-based mappings between logic and text
      • Authoring and curation: overall techniques and methodology
    • Ontological mapping between logical terms and natural language terms (10 min.)
    • Explanation as a critical tool (15 min.)
    • Editor tools: syntax checking; dependency analysis; term search (15 min.)
    • Templated NL: for interpreting and generating text (15 min.)
    • Controlled NL: non-templated, using a NL parser but significantly restricted (15 min.)
      • Attempto; SBVR
    • General NL interpretation for authoring, semi-automatically (20 min.)
      • Ambiguity reduction; quantifier typing and scoping; coreference
    • Conclusions; Future Directions

Tightly combining authoring with ML (10 min.)

  • Wind-up discussion (15 min.)

 

Motivation on why the topic is of particular interest at this time:

There have been recent fundamental and practical advances in complex knowledge in the form of semantic rules and ontologies, especially in Rulelog; several of them within the last 3 years. Rulelog’s representational expressiveness is a much better match to natural language understanding and to machine learning, due to progress in semantics, theory, and interoperability. Rulelog’s reasoning technology is much more computationally scalable, due to better algorithmic implementations and cheaper large RAM’s. It has capable efficient implementations. Both (1) machine learning, including knowledge extraction from natural language, and (2) graph databases/knowledge, including RDF and SPARQL, have each become hot recently in industry as well as research. But both need to be combined with more complex human-authored knowledge and deeper reasoning in order to deliver business/social value more effectively.  Techniques for developing such human-authored complex knowledge, including starting from text, have progressed substantially.   A large subset of Rulelog is in draft as an industry standard to be submitted to RuleML and W3C as a dialect of Rule Interchange Format (RIF).

 

Relationship to Conference Topics:

This tutorial touches on the majority of the conference topics, including:

  • knowledge acquisition, capture, authoring, and extraction (KA);
  • KA from structured sources, and KA from text;
  • knowledge engineering, modelling methodologies, ontologies, collaborative curation, and human computation;
  • decision support, provenance and trust, knowledge management, semantic web / linked data, markup, and knowledge graphs.

 

Prerequisite Knowledge:

This tutorial will cater to those first learning about complex logical knowledge in declarative logic programs and Rulelog, as well as those who already have some background in them. It will assume only background knowledge of the basics of logical knowledge representation: first order logic and relational DBMS. Helpful also, but not required, are the basics of RDF and OWL.

 

Presenters’ Bios:


Benjamin Grosof smallBenjamin Grosof (lead presenter) is CTO, CEO, and co-founder (2013-present) of Coherent Knowledge Systems, a semantic technology software-centric startup that is commercializing a major research breakthrough in logic-based artificial intelligence combined with natural language processing. He also is president of Benjamin Grosof & Associates, an expert consulting business on software technology and related strategy. Previously he was a senior research program manager at Vulcan Inc. (2007-2013), the asset management company of Paul G. Allen (co-founder of Microsoft). There he conceived and led a large research program in the area of rule-based semantic technologies and artificial intelligence (AI). Before Vulcan, he was an IT professor at MIT Sloan (2000-2007) and a senior software scientist at IBM Research (1988-2000). He has pioneered semantic technology and industry standards for: rules; the combination of rules with ontologies; the application of rules in e-commerce, e-contracts, and policies; and the acquisition of rules and ontologies from natural language (NL). He co-founded the influential RuleML industry standards design effort and prototyped it in SweetRules, the main bases for the W3C Rule Interchange Format (RIF) standard. He co-founded the International Conference on Rules and Rule Markup Languages for the semantic web (which since became the RuleML and RR conferences). He led the invention of several fundamental technical advances in knowledge representation, including courteous defeasibility, restraint bounded rationality, and the rule-based technique which rapidly became the currently dominant approach to commercial implementation of W3C OWL (Web Ontology Language) and the main basis of its RL (Rules Profile) standard. He also has extensive experience in machine learning, probabilistic reasoning, and user interaction design. His background includes four major industry software releases, two years in software startups, a Stanford PhD, a Harvard BA, and over 60 refereed publications.

Grosof has given numerous invited talks about semantic rules, and developed several MIT courses with substantial focus on it. He presented ‒ with coauthors, including usually Michael Kifer since 2009 ‒ related tutorials on reasoning with complex knowledge at the AAAI Conference on Artificial Intelligence (2013), International Joint Conference on Artificial Intelligence (2001), ACM Conference on E-Commerce (2004), International Semantic Web Conferences (2004, 2005, 2006, 2009, 2010, 2012), the WWW conference (2006, 2009), and the 9th International Web Rule Symposium (upcoming in Aug. 2015; a.k.a. RuleML conference).

 


Michael KiferMichael Kifer is a Professor with the Department of Computer Science, Stony Brook University, USA.  He received his Ph.D. in Computer Science in 1984 from the Hebrew University of Jerusalem, Israel, and the M.S. degree in Mathematics in 1976 from Moscow State University, Russia.

Kifer is a co-founder of Coherent Knowledge Systems, a new startup on semantic technology, and since 2012 he has been serving as the President of the Rules and Reasoning Association (RRA). His interests include Web information systems, knowledge representation, and database systems. He has published four text books and numerous articles in these areas.  In particular, he co-invented F-logic, HiLog, and Transaction Logic, which are among the most widely cited works in Computer Science and, especially, in Semantic Web research.   Kifer serves on the editorial boards of several computer science journals and chaired a number of conferences. Twice, in 1999 and 2002, he was a recipient of the prestigious ACM-SIGMOD “Test of Time” awards for his works on F-logic and object-oriented database languages. In 2013, Kifer’s paper on Transaction Logic Programming was presented the Association of Logic Programming “Test of Time” award as the most influential paper of 20 years ago. In 2006, Kifer was Plumer Fellow at Oxford University’s St. Anne’s College and, in 2008, he received SUNY Chancellor’s Award for Excellence in Scholarship.


Paul FodorPaul Fodor is a Research Assistant Professor with the Department of Computer Science, Stony Brook University, USA. He received his Ph.D. in Computer Science in 2011 from the Stony Brook University, New York, preceded by his M.S. degree in 2006 from Stony Brook University, and B.Sc. in Computer Science in 2002 from the Technical University of Cluj-Napoca, Romania.

Dr. Fodor is a co-founder of Coherent Knowledge Systems with over 10 years’ experience in databases research, natural language processing, artificial intelligence and stream processing systems. His work on declarative rule languages and logic used as a specification language and implementation framework for knowledge bases was applied in areas ranging from natural language processing to complex event processing and semantic Web technologies. Through his research, Dr. Fodor has contributed to several large software projects: the IBM Watson natural language processing system for the Jeopardy! Challenge with human champions, the OpenRuleBench suite of benchmarks for analyzing the performance and scalability of rule systems for the semantic Web, the ETALIS declarative complex event processing and stream reasoning system, and the SILK Semantic Inferencing on Large Knowledge. Dr. Fodor was Principal Investigator (PI), Co-PI and contractor for projects funded by both public governmental sources and private companies, such as, PI for the SILK project funded by Vulcan Inc. to develop intelligent textbooks, contractor for the IBM Watson project, contractor for XSB Inc. for the DARPA Component, Context, and Manufacturing Model Library (C2M2L-1) using XSB Prolog, and PI for the Stony Brook University Hospital’s Lung Cancer Evaluation Center management program.