# KEHOME/doc/iMKE.html # Dec/2/2005 Implementation of MKE

Implementation of MKE

by Dr. Richard H. McCullough

0. introduction

The first version of McCullough Knowledge Explorer (MKE) and the MKR language was born in 1996. MKR/MKE slowly evolved as I discovered what was needed for real world applications. By the time I introduced the current form of action/method characterization in 1999, MKE had grown to about 25,000 lines of Icon programs. After modifications in 2003 to interface with different languages and systems of the "Semantic Web", MKE was about 50,000 lines of Unicon programs, plus 1,000 lines of Java programs, plus 4,000 lines of UNIX shell programs.

I have thought about creating a smaller base version of MKE, with dynamically loaded add-ons as needed. But the steady advances in computer technology have relieved me of the necessity of doing so.

During all this development, I have maintained the uniform structure of the MKR language.

1. records

This section covers the record structures used by MKE. Access to these structures is via a set of procedures with a "Unicon-class-like" spirit. However, they are not classes because the original MKE implementation was done in Icon, and making the change to Unicon classes never got to the top of my priority list.

1.1. parsing

The most important syntactic structures have a record to preserve the results of the parse.
token.icn:	record WORD(wtype,wvalue)
token.icn:	record TOKEN(ttype,tvalue)
symbol.icn:	record SYMBOL(stype,svalue)
phrase.icn:	record PHRASE(pvalue)
nvlist.icn:	record NVPHRASE(novtype,novlist)
nvstack.icn:    (name-space stack)
dollar.icn:     (name-value lookup)
array.icn:	record AAPHRASE(aname,aindex)
bselist.icn:	record BSE(bse_separator,bse_begin,bse_list,bse_end)
pplist.icn:	record PPOBJECT(ppat,ppout,ppof,ppwith,ppod,ppfrom,ppto)

WORD, TOKEN, SYMBOL denote the results of three phases of parsing (see section 2.1).
NVPHRASE denotes a [name,op,value] list.
AAPHRASE denotes an associative array reference, name[char], which is very useful for specifying GDBM database operations.
BSE denotes lists enclosed in [] or {}.
PPOBJECT denotes the (optional) prepositional phrases used with actions and methods.

As an example of the "Unicon-class-like" structure, consider the procedures used to access NVPHRASE. novlist is [name,op,value]. novtype is "nv" or "nvnull" (no value).

# new_nv(novlist)
# nv_novlist(x)
# nv_name(x)
# nv_op(x)
# nv_value(x)
# nv_badtype(t,x,ierror)  # error message
# nv_unparse(x)
# nv_writes(fd,x)
# nv_tsize(tsym)
# nv_map_symbol(x,tokenlist))
# nv2nov(nvphrase)
# nov2nv(novlist)
# symbol2nv(symbol)
The purpose of most of the procedures is quite obvious, so I will only comment on three of them. Every "class" has an unparse(), which returns the original input string; this is a little "tricky" because blanks and list separators are discarded early in MKE processing. Every "class" has a tsize(), which returns the number of tokens in a symbol; this is used for error checking during the parsing phase, to insure that no tokens are added or deleted. Every "class" has a map_symbol(), which is used to map token type back to token value after symbol parsing is complete. For a complex example requiring the reinsertion of blanks between words, see htxt_map_symbol() in KEHOME/src/html.icn.

1.2. concepts

concept.icn:	record CONCEPT(iconcept,...)
role.icn:	record ARGDEF(argname,argtype,...)
event.icn:	record EVENT(action,subject,ppobject,space,time,view)
context.icn:	record CONTEXT(atspace,attime,atview)
knit.icn:	record CONTEXT_TABLE(ct_ALIAS,ct_NICK,ct_K,ct_C,ct_A,ct_views)
knit.icn:	record CONTEXT_DATA(cx_view,cx_krlanguage,...)

CONCEPT is the structure used for all concepts. It includes two double-linked lists for hierarchy structure, two double-linked lists for group membership, and a table of values for each type of characteristic: part, attribute, action, relation, binary relation, differentia.
ARGDEF is the definition of the argument structure (the "roles") of methods and relations.
EVENT is the description of an individual instance/occurrence of an action or method.
CONTEXT is the space,time,view of an action or method.
CONTEXT_TABLE is all the tables (and a set) that define a knit (not used).
CONTEXT_DATA is all the attributes of a knit, including tables (not used).

1.3. groups

group.icn:	record ABSTRACT_GROUP(iname,...)
begin.icn:	record GROUP(gtype,gvalue)
hwalk.icn:	record HOUNIT(holist,hoend)
relation.icn:	record RELUNIT(reltuple,relend)
xml.icn:	record TRIPLE(ntsubject,ntpredicate,ntobject,ntend)
xml.icn:	record MCF(mcfname,mcfvalue)
xml.icn:	record RDF(rdfsymbol)

ABSTRACT_GROUP is the definition of a group, e.g., set or list.
GROUP denotes begin-statement or end-statement or group-statement between begin and end.
HOUNIT denotes a hierarchy group-statement.
RELUNIT denotes a relation group-statement (an "infon").
TRIPLE denotes the "NTriple" characterization of an RDF statement.
MCF denotes an MCF group-statement (standard format sometimes used for Stanford TAP knowledge base).
RDF denotes an RDF group-statement.

1.4. genealogy

These structures are used to record the information needed to create a standard GEDCOM file used by genealogy programs.
ged.icn:	record PERSON(gedname,uniquename,...)
ged.icn:	record FAMILY(gedname,uniquename,...)
ged.icn:	record NOTE(gedname,uniquename,...)

2. parsing the MKR language

I built my own MKR parser, using the matching function technique described in Griswold's Icon book.

2.1. words, tokens, symbols

There are four phases to the parser:
prompt() gets the next line from the input file;
get_word() gets the next word from that line;
get_token() maps the word to a single-byte token type;
get_symbol() parses the tokens into symbols.
Since every proposition ends in a semicolon, get_symbol() is activated when a semicolon is found. With nested propositions (e.g. if-then-else-fi) processing is delayed until the NEWcomplete() procedure determines a complete proposition has been read.

2.2. phrases and lists

The phrase is the basic linguistic unit in MKR. There are many kinds of phrases, as outlined in section 1.1. Anywhere a phrase is expected, a list of phrases can be used. The values of characteristics can be comma-separated lists enclosed in [], or proposition lists enclosed in {}.

2.3. inserting propositions into the input stream

The top-level parsing procedure is parse_file(fd,ps,option). The file descriptor is passed down to the lower level procedures
The interpret_line(line,dollar) procedure inserts lines into the input steam by calling parse_file(line). When prompt(line) finds that its argument is a string, it returns that string instead of returning the next line from the input file.

2.4. inserting context markers into the input stream

To simplify the parsing of begin-end groups which are used to define hierarchies and n-ary relations, a special token is inserted at the beginning of each group-statement by insert_context(). The special token is removed by delete_context() before the symbols are passed on for interpretation.

2.5. interpreting symbols

The interpret_symbol(symbol) procedure is the top level. It walks down the parse tree, and calls lower level procedures to interpret statements, questions, commands, assignments. Dollar variable substitutions must be delayed until the "last minute" in order to get the up-to-date value of the variable.

The depth of the parse trees makes it rather tedious to find the right data for interpretation. To alleviate this problem I use find_stype(stype,symbol) to search down the parse tree and find the right data. For example, if I am processing a proposition, I can use

    subject := list_unparse(find_stype("subject",symbol).svalue)
    verb    := list_unparse(find_stype("verb",symbol).svalue)
    object  := list_unparse(find_stype("object",symbol).svalue)

2.6. user-defined verbs

MKR has a large set of predefined verbs. The user-defined equivalent of a verb is a binary relation. For example:
	John brel father = Sam;
where "rel" is a predefined MKR verb, and "father" denotes a user-defined binary relation. Alternatively, you can define a new verb to express the binary relation. Here's an equivalent statement of the above example.
	relFather isu relation verb with arity=2;
	John relFather Sam;
User-defined verbs have a special token type "B". New verbs are simply added to the mkr_word table which is used by get_token().

This capability can even be used to make MKR look more like RDF. (I call this the RDF/OWL Mtriples language.)

	subClassOf isu verb with ctype=relation;
	type       isu verb with ctype=relation;
	Chevy subClassOf car;
	Bessie type Chevy;
The MKR equivalent of these statements is
	Chevy iss car;
	Bessie isu Chevy;

2.7. Merr error message system

I adapted the Merr error message system, which was designed for the Unicon compiler, to work with my custom MKR parser. Only minor changes were required to use names instead of numbers for parser states and input tokens. I added one new feature: the ability to associate a "high level" message with a parser state.

The "err.meta" file contains the token error patterns and error messages. Merr executes "ksc" to determine the parser state associated with the token error pattern, and creates the yyerror.icn file. I configured merr to use "ksc" instead of "ke" because "ksc" is about 100 times faster than "ke".

3. methods and relations

One of the significant innovations in the MKR language is the use of prepositional phrases to characterize the changes caused by actions. In the simplest case, actions have only direct objects, e.g.,
	John do hit od the ball done;
The most general form of action in MKR is
    at space=s,time=t,view=v {
	subject do action = event
		out  action products
		of   action domains
		with action characteristics
		od   action direct objects
		from action initial characteristics
		to   action final   characteristics
Actions can also be written as productions
	product := subject do ... done;
Methods are just a special case of actions, and are described in the same way. All n-ary relations have an implicit method defined by a format and meaning, but relations have only "direct objects".

3.1. dollar variables

For the simplest case of direct objects only, I use the UNIX shell convention of denoting the arguments by $1,$2,... Note that 1,2,... are variable names, and the user may chose to use other names such as x,y,... For methods and relations which are implemented as Unicon procedures, the values of $0 (this event), $1,$2,.. are passed in a Unicon table. If the user defines local variables, they are also passed in the table.

For the more complicated case of any prepositional phrases in any order, the PPOBJECT record is used to pass the arguments to the method (see section 1.1).

3.2. format

For the simplest case, the format is expressed as a list
	format = [class:1, ...]
For the more complicated case, the format is expressed as a proposition
	format = { MKR proposition }

3.3. meaning

For methods implemented as an MKR script, the meaning is expressed as a list of propositions
	meaning = { proposition list }
For methods implemented as a Unicon procedure, the meaning is expressed as the name of that procedure
	meaning = procedure_name

4. interfacing with other systems

Since MKR includes the ability to execute local operating system commands (syntax: replace "do" by "!"; execution: use Unicon system()), MKE can easily interface with many other systems. On the "Semantic Web", MKE has interfaced with the Stanford TAP knowledge base, the OpenCyc knowledge base, Google and Amazon.

4.1. language translation

It is sometimes necessary to translate between MKR and other languages. In the case of languages like CycL, which has the same "power" as MKR, the translation is straightforward. With languages like RDF/OWL, which are much "weaker" than MKR, it's like a one-way street. RDF/OWL has no n-ary relations, no methods, no questions, no logical inferences.

The map_word() procedure uses four tables to translate common words in DC, RDF, OWL, MCF to the corresponding words in MKR.

MKE includes the following commands:

	do read tap from file done;
	do read rdf from file done;
	do read mcf from file done;

4.2. kbmode

The use of the "kbmode" parameter provides an easy way to control which knowledge base MKE uses. For example, "let kbmode=tap;" tells MKE to use the Stanford TAP knowledge base; "let kbmode=cyc;" tells MKE to use the OpenCyc knowledge base. Implementing the mode switch will usually require some UNIX shell and/or Java API interfaces to the other knowledge base.

4.3. UNIX shell

Amazon is an example of a knowledge base that requires only a shell script to access it.

4.4. Java APIs

Google, TAP, and OpenCyc all require the use of a Java API to interface with their knowledge bases. For Google and OpenCyc, I've built interfaces to the Java API using only shell scripts. For TAP, I used shell scripts plus some small Java programs which provide a more user-friendly interface.