Software Tech News 5-1: Semantic Interoperability Among Agents

Volume 5 Number 1 - Software Agents Part 2

Semantic Interoperability Among Agents

by Drew McDermott, Yale University

Introduction

Although there are many definitions of "agent," one of the most interesting is "self describing program." If a webbased program provides a detailed description of flow to interact with it, then it is an agent, and other programs can use the information it provides to decide at run time how to interact with it. Extensions of existing planning algorithms can be used to construct sequences of actions, or more elaborate action structures, for accomplishing goals by sending requests to an agent. One obstacle to this happy scenario is that not all agents will describe themselves, or their ideal partners in the same notation, or, using the fashionable word, using the same ontology. In that case, the agents can't be coupled unless an ontology translation can be found. Taking a logical point of view, the problem can be thought of as managing a merged ontology containing bridging axioms that relate the terms of one ontology to the terms of the other, not necessarily onetoone.

Although many people are excited about "agent technology," few people agree about what agent technology actually might be, or even what the word "agent" might mean. In this article, I mean something very specific: an agent is a computer program accompanied by a description of what it does and how to use it. The description is detailed and perspicuous enough that another program can interact with the agent without having to be explicitly programmed to do so. In other words, an agent is a selfdescribing program. For example, a Wabash.com (A little know competitor to a well-known on-line book seller) of the future might provide enough information that an automated shopping program could conduct a transaction with it by deducing from scratch how one asks Wabash.com for information, how one tells it what one wants, how one tells it the appropriate billing information, and so forth. The shopping program might itself be an agent, of course, and to avoid seeming to make irrelevant distinctions I will use the word "agent" as though all programs were selfdescribing unless proven otherwise.

The need for this kind of self description is evident to many people. For example, companies active on the Internet are interested in supporting automated businesstobusiness transactions by making their websites selfdescribing. This interest has led to the creation of the UDDI (Universal Description, Discovery, and Integration) consortium that is developing notations for Website selfdescription (see www.uddi.or). UDDI, and related notations such as WSDL and SOAP [2, 1] use XML [4] as their basic syntactic vehicle. They tend to focus on representing attributes of web agents such as where they are located, what protocols they use to communicate, and such.

In this article I will argue that one can aim higher. Rather than represent service descriptions with simple attributevalue tables, we should be able to say things like: "To put an item in the shopping cart, send a message of type Put-in-shopping-cart and parts Productid and Quantity." One way to formalize this statement is with a notation like this:

(: action (putinshoppingcart id Productid quantity Integer)

: effect (quartincart id quantity))

which is derived from PDDL (Planning Domain Definition Language) [6], a notation for defining all the legal actions in a domain and what their effects are.

The use of such notations does not mean we are giving up on XML, which has earned wide popularity as a messageexchange medium on the Internet. But XML is defined to be "machinereadable and humantolerable." That is, it isn't completely opaque to people, but it isn't supposed to be easy for them to produce. One way to encode the definition above in XML is this:

<Action>
     <name resource=" ecom: putinshoppingcart " />
     <params>
          <rdf: Seq>
               <rdf: li>
                    <VarID="vl" name="id">
                         <type resource="ecom:Productid"/>
                    </Var>
               </rdf :li>
               <rdf: li>
                    <VarID= " v2 " name= "quant " >
                         <type resource="ecom:Integer"/>
                    </Var>
               </rdf: li>
          </rdf: Seq>
     </params>
     <effect>
          <Predication>
               <Subj resource="#vl"/>
               <Pred re sourc e= " e com: quant  in cart " />
               <Obj resource="#v2"/>
          </Predication>
     </effect>
</Action>

This is actually in RDF [5], a set of conventions for using XML syntax to describe arbitrary objects. Although it is possible to understand it, and even type it, there is no real reason for people to ever touch RDF or XML. Computers like these notations because they unambiguously specify the syntactic hierarchy of an expression, even for programs that know little about the content. People tend to prefer conciseness and layout in order to grasp the structure and meaning of an expression. For the rest of this paper, I'll use logicbased notations with Lisp syntax, but rest assured that when it comes time for agents to exchange information, they will probably do it with notations containing a lot more angle brackets.

There are several different phases involved in having two agents hook up. The first is the advertising/search phase. It is based on the assumption that agents with something to sell will post descriptions of their abilities in central registries, where agents trying to solve a problem can find them. This is also sometimes called the brokering phase, based on the plausible assumption that the registry plays an active role in coupling the two agents together.

Our focus is on what happens after the brokering phase. One can generally assume that the agent descriptions used for advertising and search are fairly "shallow." The descriptions of what agent 1 wants and what agent 2 has are in some broad vocabulary that enables one to distinguish book sellers from bookies. Once the decision has been made to couple two agents together, a more detailed description comes into play. It's at this point that action definitions like that for put-in-shopping-cart are made available. If a merchant has provided definitions for all the possible actions a customer can take, then the customer is faced with a problem of the following form:

Given the current situation, and definitions of the possible actions in every situation, find a sequence of actions that will achieve a given goal.

This is in essence what is called in AI a planning problem. I will use the phrase planning phase for the process of solving this problem, that is, finding an action sequence. After the planning phase comes the execution phase, when the sequence of actions is actually carried out. It is reasonable (we hope) to assume that the planning agent will succeed if it executes the plan; but there may well be situations where the plan exits prematurely with some sort of failure indication. In that case the agent may give up, or replan, starting from the situation it finds itself in halfway through the original plan.

As an example of a planning problem, suppose the buying agent wants to own a copy of Ubik by Philip K. Dick, but spend less than twelve dollars. Through a broker, it finds a merchant, Wabash.com, that seems to be in the right business. It asks Wabash.com for its description, computes for a while, and then generates the following plan:

 (series (tag s1 (send Wabash. con
     (queryinstock (     (author "Philip K. Dick")

          (title "Ubik")) ) ) )
 (tag s2 (receive Wabash.com (stepvalue sl)))
 (verify (exists (pid  Productid)
     (= (stepvalue s2)
     (instockreply yes pid) ) ) )
 (send Wabash. con
     (putinshoppingcart (! pid (stepvalue s2)
          1)))
 (tag s4 (send Wabash. cam
     (paymentmethod (creditcard "9876 6802 2963 3715")))
 (tag s5 (receive Wabash. cam (stepvalue s4) ) )
 (verify (= (stepvalue s5)(paymentmethodstatus authorized)))
 (tag s6 (send Wabash.com (confirmpurchase)))
 (tag s7 (receive Wabash.com (stepvalue s6)))
 (verify (= (stepvalue s7) (purchasestatus confirmed))))

This plan is much simplified, but exhibits certain key features:

Steps can be tagged with arbitrary symbolic names, which enables the agent to refer to the stepvalue of previous steps.
The stepvalue of a receive step is the message received. The stepvalue of a send step is a little less intuitive: it's a "message id" that allows the sending agent to match up replies with sends. That is, in (receive u' i), the argument i should be the message id of the send the received message is in response to.
The plan is not guaranteed to succeed. The verify steps check to see if assumptions are fulfilled; if not, the plan fails. The planning agent will have to replan, or perhaps find another merchant agent to deal with.

A more realistic planner could produce a branching plan, with alternative continuations after a runtime test. For instance, the plan might say to try the usual credit card, and if it is refused, try sending a backup creditcard number.

This is a rich area for research, but I want to focus on a different set of issues, concerned with how agents cope with differences in vocabulary and terms constructed from it. There is no guarantee that when two agents encounter each other they will talk about the same thing using exactly the same terms. It's not that easy to create notations for agent descriptions, and the more diverse the population of agents to be described gets, the harder it gets to find a notation that everyone involved can agree on. If selfdescribing agents become a reality at all, it is likely that notations will be centered around particular industries or other types of institutions (military, educational, and such). Within a community that shares a notation, communication will be fairly straightforward. Between communities, it will be considerably more difficult.

The word "ontology" is in fashion for talking about vocabularies and notations. Philosophers use the word, as a singular noun only, to mean the philosophy of being; in the representation business, we often contemplate multiple competing "ontologies." That's because the word has come to mean "How objects in a domain are named, classified, and dissected." The link to the philosophy of being is the idea that "to be is to have a name," so that what you don't give a name to might as well not exist. For example, in the bookselling business you distinguish between the paperback and hardcover versions of a book, but you don't distinguish between the 7th and 8th printing of the hardcover edition. If someone wants to program their agent to buy a book from the 7th printing, he is out of luck. Wabash.com's agent will never know what he is talking about. Printings may as well not exist as far as it is concerned.

There are other cases where agents from overlapping domains do talk about the same things, but in different ways. Here they do have a chance to communicate, provided a way can be found to translate between ontologies. This is a very difficult problem, much harder, for instance, than the planning problem we sketched above, which is hard enough. The reason it is so difficult is that it often requires subtle judgments about the relationships between the meanings of formulas in one notation and the meanings of formulas in another. Furthermore, there is no obvious "oracle" that will make these judgments. We cannot assume that there is an overarching (possibly "global" ) ontology that serves as a court of appeals for semantic judgments. There are times when such a strategy will work, but only after someone has provided a translation from each of the disparate ontologies to the overarching framework, and there is no reason to expect either of these translation tasks to be any easier than the one we started with. Indeed, the more the overarching framework encompasses, the harder it will be to relate local ontologies to it. I fence the work of ontology reconciliation inevitably involves a human being to do the heavy lifting. The most we can hope for is to provide a formal definition of the problem, and software tools (such as those described by [9]) to aid in solving it.

The problem of ontology translation is complicated by the fact that different ontologies are expressed in radically different notations, from relational databases to semantic networks. This diversity makes it seem as if a key part of the problem is expressing mappings between arbitrary data structures. Our research group is going in a different direction. We don't assume that "mapping" something to something else is the crux of the matter, but instead that the problem is to infer content expressible in one ontology from content expressed in the other.

We start with the postulate that the different syntactic forms used by different ontologies can be factored out, allowing the problem to be phrased at the content level only. What this assumption comes down to is the idea that everything expressed in an ontology can be expressed in a neutral logical syntax, so that the only way it can differ from other ontologies is in itsvocabulary. For the rest of this paper, I will assume that all facts are expressed in terms of formal theories, each of which contains the following elements:

A set of types.
A set of symbols, each with a type.
A set of axioms involving the symbols.

Once we have cleared away the syntactic underbrush, the ontologytransformation problem becomes much clearer. Suppose one bookseller has a theory O₁ with a predicate (in-stock x - Book t - Duration), meaning that x is in stock and may be shipped in time t. Another bookseller expresses the same information in its theory 0₂, with two predicates, (in-stock y - Book) and (deliverable d - Duration y - Book). We are presented with a dataset D₁ that is in terms of O₁, which contains fragments such as

(:constants Ubik Ulysses Book)
 (:axioms (instock Ubik (* 4 day))
                  (instock Ulysses (* 24 hour))
                  . . . )

To translate this into an equivalent dataset that uses 0₂, we must at least find a translation for the axioms. The types and constants need to be handled as well, but we'll ignore that.

With this narrow focus, it becomes almost obvious how to proceed: Treat the problem as a deduction from the terms of one theory to the terms of the other. That is, combine the two theories by "brute force," tagging every symbol with a subscript indicating which theory it comes from. Then all we need to do is supply a "bridging axiom" such as

(forall (b t) (iff (instock) b t)

(and (instock2 b)

(deliverables t b))))

which we can use to translate every axiom in D₁, or any other dataset. More precisely, we can use it to augment the contents of D₂. Any time we need an instance of (instock₂ x) and (deliverables y x), the bridging axiom will tell us that (instock₂ Ubik) and (deliverable₂ (* 24 hr) Ubik) are true (and maybe other propositions as well). (the lifting axioms of [3].)

It seems as if we have lost sight of our original goal. We were looking for an approach to transforming ontologies, and we seem to have found a way of merging ontologies. Actually, that is not such a bad place to be. It suggests that the case of two ontologies is not special; we might well want to merge three or more. There are efficiency issues, but they are not that different from those that arise in importing modules from programminglanguage libraries.

Still, one is likely to feel that problems like the one just given are much too easy. In realistic cases, two ontologies will "carve the world up differently." They may have different "granularity," meaning that one makes finer distinctions than the other; of course, 0₁ might make finer distinctions than 0₂ in one respect, coarser distinctions in another. Here's an example: suppose 0₁ is the ontology we have been drawing examples from, a standard for the mainstream book industry. Now suppose 0₂ is an ontology used by the rare book industry. The main difference is that the rarebook people deal in individual books, each with its own provenance and special features (e.g., an autograph by the author). Hence the word "book" means different things to these two groups. For the mainstream group, a book is an abstract object, of which there are assumed to be many copies. If a customer buys a book, it is assumed that he or she doesn't care which copy is sent, provided it's in good condition. For the rarebook industry, a book is a particular object. It may be an "instance" of an abstract book, but this is not a defining fact about it.

For example, if you buy Walt Whitman's Leaves of Grass from Wabash.com, you can probably choose from different publishers, different durabilities (hardcover vs. paperback, page weight), different prices, and various other features (scholarly annotations, large print, spiral binding, etc.). However, you certainly can't choose exactly which copy you will receive of the book you ordered; and you probably can't choose which poems are included, even though Whitman revised the book throughout his life. The versions in print today include the last version of each poem included in any edition.

If you buy the book from RareBooks.com, then there is no such thing as an abstract book of which you wish to purchase a copy. Instead, every concrete instance of Leaves of Grass must be judged on its own merits. Indeed, making this purchase is hardly a job for an automated agent, although it could be useful to set up an agent to tell you when a possibly interesting copy comes into the shop.

Let's look at all this more formally. Suppose that the planning agent uses the industrystandard ontology (0₁), and the broker puts it in touch with RareBooks.com, with a note that although it bills itself as selling books, its service description uses a different ontology (0₂). If after trying more accessible sources the planning agent's goal can't be achieved, then the broker may search for an existing ontology transformation, or merge, that can be used to translate RareBooks's service description from 0₂ to 0₁. (If it can't find one, all it can do is notify the maintainers of the ontologies of the problem; there is no way for the broker, the planning agent, or the end user to find a transformation on the fly.)

Let us sketch what some of the bridging axioms between 0₁ and 0₂might look like. In particular, we need to infer instances of (is Book₁ x) given various objects of type Book₂ with various properties. Objects of type Book₁ we will call commodity books; an example is the Pocket Books edition of Mein Kampf. Objects of type Book₂ we will call collectable books; an example is a copy of Mein Kampf once owned by Josef Stalin. It is roughly true that many, but not all, rare books can be thought of as instances of particular commodity books. Two rare books are instances of the same commodity book if they have the same publisher, the same title, the "same" contents, and the same characteristics (e.g., hardcover, large print, and such). (Comparing the ISBNs of the two books would go a long way toward deciding if they are the same, but the ISBN system has been in effect for only thirty years, so it won't apply to many rare books.) We can produce the following bridge axioms:

 (: functions (booktype x  Book2)  Bookl)
  (:axioms (forall (bl2 b22  Book2)
      (iff  (and   (=  (publishers bl2)
                       (publishers b22))
                   (=  (titles bl2) (titles b22))
                   (=  (physcharac2 bl2)
                        (physcharac2 b22))<
                   (< revisiondif2 bl2 b22) 1~5))
      (=  (booktype bl2)   (booktype b22))))
 (forall (b2  Book2)
      (=  (buy2 b2)
      (buy1 (booktype b2))))

This should all be selfexplanatory, except for the predicate revisiond if, which we suppose is in use in the rare book business to express how many revisions are found between an earlier and later copy of an author's work. We have introduced a new function booktype, which maps individual collectable books to their types, which are commodity books.

For axioms such as these to do the planning agent any good, it must be possible for the planning agent to use them to translate a rarebook dealer's service description. Suppose the agent is trying to buy a copy of Lady Chatterly's Other Lover, a little known sequel to D.H. Lawrence's famous work (in fact, fictitious). Having exhausted the usual sources, it attempts to deal with RareBooks.com. It must find a plan in the merged ontology,

Here are the main points I have tried to make:

Interagent communication requires a sophisticated level of representation of knowledge states, action definitions, and plans.
This representation can only be logicbased; no other notation has the expressive power. Embedding this logic in some form of XML/RDF/DAML notation is a good idea for web-based agents, but puts nontrivial demands on the representational power of those notations.
In spite of the expressivity, there are algorithms for manipulating logicbased expressions that might overcome computational complexity problems.
In particular, planning algorithms are a natural fit to the idea of a service descrip-tion. The service description specifies the possible inter-actions with an agent; a plan is a sequence of interactions to achieve a specific goal. Finding such plans is more or less what planning algorithms do.
One of the worst obstacles to allowing agents to use each others' descriptions to communicate is that they might speak different languages, or, in the current jargon, "use different ontologies."
Solving this problem at the logical level allows us to focus on the problem of supplying bridging axioms between the two vocabularies. These must be supplied by people, although automated tools can play a significant role in tracking bugs, version control, and such.

One might make the objection that real life cannot be so tidily reduced to predicate calculus. Real ontologies contain many hidden presuppositions, which are lost when they are boiled down to dry axioms. This is a serious objection, but we hope it is less likely to hold true in a domain like ours, where everything has got to be pretty formal for agent communication to be possible at all.

This is obviously work in progress. We are in the process of adapting our Unpop planner [7] to handle hierarchical and contingency planning. We are beginning work on an implementation of the ontology merger.

About the Author

Dr. Drew McDermott is a Professor of Computer Science at Yale University. He received a Ph.D. in 1976 from Massachusetts Institute of Technology. He is the coauthor of two textbooks in artificial intelligence, and has a new book coming out from MIT Press on machine consciousness. He is on the editorial board of Artificial Intelligence, and is a Fellow of the American Association for Artificial Intelligence. His research is in robot navigation, planning, and interagent communication.

Author Contact Information

Dr. Drew McDermott
Yale University
Dept. of Computer Science
51 Prospect Street
New Haven Ct. 06511

203-432-1284
[email protected]
http://www.cs.yale.edu/homes/dvm/