Software Tech News 4-4: The Need and Tools to Gain Precision in Electronic Commerce

Volume 4 Number 4 - Software Agents Part 1

The Need and Tools to Gain Precision in Electronic Commerce

Gio Wiederhold, Stanford University

1. Introduction

The web is moving our decision-making processes from an information-sparse setting into in information-rich setting. A major problem facing individual users is the ubiquity and diversity of information. The World-Wide Web contains more alternatives than can be investigated in depth. The value system itself is changing, whereas traditionally information had value, now it is the attention of the purchaser that has value. Evidence is the focus on free availability of web resources and the difficulties in maintaining copyrights.

Tools are needed to search through the mass of potential information. Traditional information retrieval tools have focused on returning as much possible relevant information, in the process lowering the precision, since much irrelevant material is returned as well. However, for E-commerce to be effective, we cannot afford to wade through excess data produced by these errors. In statistics these are characterized as false positives, or type 2 errors. In most business situations, a modest fraction of missed opportunities (type 1 errors) are acceptable. We will discuss the trade-offs and present current and future tools to enhance precision in electronic information gathering.

While much progress in Information Science is triggered by progress in technology, when assessing the future we must focus on the consumers. In this article we consider the needs of decision-makers, consumer, business, and professional needs were also addressed in an earlier source report [29]. Support of decision-making must be done expeditiously, with a very low rate of error and modest human supervision. When excessive information must be processed, efficiency drops, and what is worse, confusion ensues, making decision-making error prone.

The major problem facing individual decision-makers is the ubiquity and diversity of information. Even more than the advertising section of a daily newspaper the World Wide Web contains more alternatives than can be investigated in depth. When leafing through advertisements the selection is based on the prominence of the advertisement, the convenience of getting to the advertised merchandise in one's neighborhood, the reputation of quality, personal or created by marketing, of the vendor, and features - suitability for a specific need, and price. The dominating factor differs based on the merchandise. Similar factors apply to on-line purchasing of merchandise and services. Lacking the convenience of leafing through the newspaper, greater dependence for selection is based on selection tools.

Figure 1: Limits on Human Processing

1.1 Getting the Right Information

Getting complete information is a question of breadth. In traditional measures completeness of coverage is termed recall. To achieve a high recall rapidly all possibly relevant sources have to be accessed. Since complete access for every information request is not feasible, information systems depend on having indexes. Having an index means that an actual information request can start from a manageable list, with points to locations and pages containing the actual information.

The effort to index all publicly available information is immense. Comprehensive indexing is limited due to the size of the web itself, and the rate of change of updates to the information on the web. Some of these problems can be, and are being addressed by brute force, using heavyweight indexing engines and smart indexing engines. For instance, sites that have been determined to change frequently will be visited by the worms that collect data from the sources more often, so that the average information is as little out-of-date as feasible [17]. Of course, sites that change very frequently, fro example, more than once a day, cannot be effectively indexed by a broad-based search engine. We have summarized the approaches currently being used in [30].

Getting complete information typically reduces the fraction of actual relevant material in the retrieved collection. It is here where it is crucial to make improvements, since we expect that the recall volume of possibly relevant retrieved information will grow as the web and retrieval capabilities grow. Selecting a workable quantity that is of greatest benefit to a customer requires additional work. This work can be aided by the sources, through better descriptive information or by intermediate services, that provide filtering. If it is not performed, the customer has a heavy burden in processing the overload, and is likely to give up.

High quality indexes can help immensely. Input for indexes can be produced by the information supplier, but those are likely to be limited. Schemes requiring cooperation of the sources have been proposed [8]. Since producing an index is a valued-added service, it is best handled by independent companies, who can distinguish themselves, by comprehensiveness versus specialization, currency, convenience of use, and cost. Those companies can also use tools that break through access barriers in order to better serve their population.

Figure 2: Trading of Relevance versus Precision

1.2 The Need for Precision

Our information environment has changed in recent years. In the past, approximately ten years ago, most decision makers operated in settings where information was scarce, and there was an inducement to obtain more information. Having more information was seen as being able to make better decisions, and reduce risks, save resources, and reduce losses.

Today we have access to an excess of information. The search engines will typically retrieve more than a requestor can afford to read. The metrics for information systems have been traditionally recall and precision. Recall is defined as the ratio of relevant records retrieved to all relevant records in the database. Its complement, the count of relevant records not retrieved is termed a type 1 error in statistics. Precision is defined similarly as the ratio to relevant records to irrelevant records. The irrelevant records retrieved are categorized as type 2 errors. In practical systems these are related, as shown in Figure 2. While recall can be improved by retrieving more records, the precision becomes disproportionally worse.

There are a number of problems with these metrics: measuring relevance and precision, the relative cost of the associated errors, and the scale effect of very large collections.

Relevance - Well recognized is that the decision on relevance of documents is fluid. When the resources, as on the web, are immense, the designation of relevance itself can become irrelevant. Some documents add so little information that an actual decision-making process will not be materially affected. A duplicate document might be rated relevant, although it provides no new information. Most experiments are evaluated by using expert panels to rate the relevance of modest document collections, since assessing all documents in the collection is a tedious task.

Precision - The measurement of precision suffers from the same problem, although it does not require that all documents in the collection be assessed, only the ones that have actually been retrieved. Search engines, in order to assist the user, typically try to rank retrieved items in order of relevance. Most users will only look at the 10 top-ranked items. The ranking computation differs by search engine, and account for much of the differences among them. Two common techniques are aggregations of relative word frequencies in documents for the search terms and popularity of webpages, as indicated by access counts or references from peer pages. For e-commerce, where the catalog entries are short and references harder to collect, these rankings do not apply directly. Other services, as MySimon, and Epinion [6] try to fill that void by letting users vote.

Cost - Not considered in most assessments of retrieval performance are relative costs to an actual user of the types of errors encountered. For instance, in a purchasing situation, the cost of not retrieving all the possible suppliers of an item may cause paying more than necessary. However, once the number of suppliers is such that a reasonable choice exists, the chance that other suppliers will offer significantly lower prices is small. The cost of type 1 errors is then low, as shown in Figure 3.

Figure 3: Costs of type 1 versus type 2 Errors

The cost of an individual type 2 error is borne by the decision-maker, who has to decide that an erroneous, irrelevant supplier was selected, perhaps a maker of toy trucks when real trucks were needed. The cost of an individual rejection may be small, but when we deal with large collections, the costs can become substantial. We will argue that more automation is needed here, since manual rejection inhibits automation.

Scale - Perfection in retrieval is hard to achieve. In selected areas we find now precision ratios of 94% [19]. While we don't want to belittle such achievements, having 6% type 2 errors can still lead to very many irrelevant instances, when such techniques are applied to large collections, for instance, a 6% error rate on a million potential items will generate 60 000 errors, way too many to check manually. It is hard to be sure that no useful items have been missed if one restricts oneself to the 10 top-ranked items.

1.3 Errors

The reasons for having errors are manifold. There are misspellings, there is intentional manipulation of webpages to make them rank high, there is useful information that has not been accessed recently by search engines, and there are suppliers that intentionally do not display their wares on the web. These suppliers do this because they want to be judged by other metrics, such as quality, rather than the dominant metric when purchasing, namely price. All these sources of errors warrant investigation, but we will focus here on a specific problem, namely semantic inconsistency.

The importance of errors is also domain-dependent. A database which is perfectly adequate for one application may have an excessive error rate when used for another purpose. For instance, a payroll might have too many errors in the employee's address field to be useful for mailout. Its primary purpose is not affected by such errors, since most deposits are directly transferred to banks, and the address is mainly used to determine tax deduction requirements for local and state governments. To assure adequate precision of results when using data collected for another objective some content quality analysis is needed prior to making commitments.

2. Semantic Inconsistency

The semantic problem faced by systems using broad-based collections of information is the impossibility of having wide agreements on the meaning of terms among organizations that are independent of each other. We denote the set of terms and their relationships, following current usage in Artificial Intelligence, as an ontology. In our work we define ontologies in a grounded fashion, namely:

Ontology:	a set of terms and their relationships
Term:	a reference to real-world and abstract objects
Relationship:	a named and typed set of links between objects
Reference:	a label that names objects
Abstract Object:	a concept which refers to other objects
Real-world Object:	an entity instance with a physical manifestation

Grounding the definitions so that they can refer to actual collections, as represented in databases, allows validation of the research we are undertaking [27]. Many precursors of ontologies have existed for a long time. Schemas, as used in databases, are simple, consistent, intermediate-level ontologies. Foreign keys relating table headings in database schemas imply structural relationships. Included in more comprehensive ontologies are the values that variables can assume; of particular significance are codes for enumerated values used in data-processing. Names of states, counties, etc. are routinely encoded. When such terms are used in a database the values in a schema column are constrained, providing another example of a structural relationship. There are thousands of such lists, often maintained by domain specialists. Other ontologies are being created now within Document Type Definitions (DTD) for the eXtended Markup Language (XML) [5].

2.1 Sources of Ontologies

Although the term ontology is just now getting widespread acceptance, all of us have encountered ontologies in various forms. Often terms used in paper systems have been reused in computer-based systems:

Lexicon:	collection of terms used in information systems
Taxonomy:	categorization or a classification of terms
Database Schemas:	attributes, ranges, constraints
Data Dictionaries:	guide to systems with multiple files, owners
Object Libraries:	grouped attributes, inherit, methods
Symbol Tables:	terms bound to implemented programs
Domain Models:	interchange terms in XML DTDs, schemas

The ordering in this list implies an ongoing formalization of knowledge about the data being referenced.

Database schemas are the primary means used in automation to formalize ontological information, but they rarely record relationship information, nor define the permissible range for data attributes. Such information is often obtained during design, but rarely kept and even less frequently maintained. Discovering the knowledge that is implicit in the web itself is a challenging task [9].

2.2 Large versus Small Ontologies

Of concern is the breadth of ontologies. While having a consistent, world-wide ontology over all the terms we use would cause the problem of semantic inconsistency to go away, we will argue that such a goal is not achievable, and, in fact, not even desirable.

Small Ontologies

We have seen successes with small, focused ontologies. Here we consider groups of individuals, that cooperate with some shared objective, on a regular basis. Databases within companies or interest groups have been effective means of sharing information. Since they are finite, it is also possible for participants to inspect their contents and validate that the individual expectations and the information resources match. Once this semantic match is achieved, effective automatic processing of the information can take place. Many of the ongoing developments in defining XML DTD's and schemas follow the same paradigm, while

interchanging information to widely distributed participants. Examples are found in diverse applications, as petroleum trading and the analysis of Shakespeare's plays. The participants in those enterprises have shared knowledge for a long time, and a formal and processable encoding is of great benefit.

There is still a need in many of these domains to maintain the ontologies. In healthcare, for instance, the terms needed for reporting patient's diseases to receive financial reimbursement change periodically, as therapies evolve and split for alternate manifestations. At a finer granularity, disease descriptors used in research areas evolve even faster, as we learn about distinctions in genotypes that affect susceptibility to diseases.

The maintenance of these domain ontologies often evolves onto professional associations. Such associations have a membership that has an interest in sharing and cooperating. Ontology creation and maintenance is a natural outgrowth of their function in dissemination of information, and merges well with the role they have in publication and organizing meetings. An example of such a close relationship in Computer Science is the classifica-tion of computer literature [1], published by the ACM and revised approximately every five years. This document provides an effective high-level view to the literature in the scientific aspects of the domain, although it does not provide a granularity suitable for, say, trading and purchasing of software.

Large Ontologies

A major effort, sponsored by the National Library of Medicine (NLM), has integrated diverse ontologies used in healthcare into the Unified Medical Language System (UMLS) [11]. In large ontologies collected from diverse sources or constructed by multiple individuals over a long time some inconsistencies are bound to remain. Maintenance of such ontologies is required when sources change [21]. It took several years for UMLS to adapt to an update in one of its sources, the disease registry mentioned earlier. UMLS does fulfills its mission however, in broadening searches and increasing recall, which is the main objective of bibliographic systems.

Large ontologies have also been collected with the objective to assist in common-sense reasoning (CyC) [16]. CyC provides the concept of microtheories to circumscribe contexts within its ontology. CyC has been used to articulate relevant information from distinct sources without constraints imposed by microtheories [4]. That approach provides valuable matches, and improves recall, but does not improve precision.

The inconsistency of semantics among sources is due to their autonomy. Each source develops its terminology in its own context, and uses terms and classifications that are natural to its creators and owners. The problem with articulation by matching terms from diverse sources is not just that of synonyms - two words for the same object, or one word for completely different objects, as miter in carpentry and in religion. The inconsistencies are much more complex, and include overlapping classes, subsets, partial supersets, and the like. Examples of problems abound. The term vehicle is used differently in the transportation code, in police agencies, and in the building code, although over 90% of the instances are the same.

The problems of maintaining consistency in large ontologies is recursive. Terms do not only refer to real-world objects, but also to abstract groupings. The term `vehicle' is different for architects, when designing garage space, versus its use in traffic regulation, dealing with right-of-way rules at intersections. At the next higher level, talking about transportation will have very different coverage for the relevant government department versus a global company shipping its goods.

There are also differences in granularity with domains. A vendor site oriented towards carpenters will use very specific terms, say sinkers and brads, to denote certain types of nails, that will not be familiar to the general population. A site oriented to home owners will just use the general categorical term nails, and may then describe the diameter, length, type of head, and material. For the homeowner to share the ontologies of all the professions involved in construction would be impossible. For the carpenter to give up specialized terms and abbreviations, as 3D for a three-penny sized nail, would be inefficient — language in any domain is enhanced to provide effective communication within that domain. The homeowner cannot afford to learn the thousands of specialized terms needed to maintain one's house, and the carpenter cannot afford wasting time by circumscribing each nail, screw, and tool with precise attributes.

The net effect of these problems, when extended over all the topics we wish to communicate about is that it is impossible to achieve a globally consistent ontology. Even if such a goal could be achieved, it could not be maintained, since definitions within the subdomains will, and must continue to evolve. It would also be inefficient, since the subdomains would be restricted in their use of terms. The benefits to the common good, that we all could communicate consistently will be outweighed by the costs incurred locally and the cost of the requirements that we all acquire consistent global knowledge.

Composition of Small Ontologies

If we have proven here, albeit informally, that large global ontologies cannot be achieved, even though they are desirable to solve broader problems than can be solved with small ontologies, we are faced with one conclusion. It will be necessary to address larger problems caused by interoperating with small ontologies. Since a simple integration of small ontologies will lead us directly into the problems faced by large ontologies, we must learn to combine the small ontologies as needed, specifically as needed for the applications that require the combined knowledge.

However, inconsistent use of terms makes sharing of information from multiple sources incomplete and imprecise. As shown above, forcing every category of customers to use matching terms is inefficient. Mismatches are rife when dealing with geographic information, although localities are a prime criterion for articulation [18].

Most ontologies have associated textual definitions, but those are rarely sufficiently precise to allow a formal understanding without human interpretation. Although these definitions will help readers knowledgeable about the domain, they cannot guarantee precise automatic matching in a broader context, because the terms used in the definitions also come from their own source domains. The result is that inconsistencies will occur when terms for independent, but relatable domains are matched.

These inconsistencies are a major source for errors and imprecision. We have all experienced web searches that retrieved entries that had identically spelled keywords, but were not all related to the domain we are addressing - type 2 errors. When we augment the queries with possible synonyms, because we sense a high rate of missing information, type 1 errors, the fraction of junk, type 2 errors, typically increases disproportionably. The problems due to inconsistency are more of a hindrance to automation than to browsing, where one deals with one instance at a time.

3. Articulation

Since we cannot hope to achieve global consistency, but still must serve applications that span multiple domains, we must settle composition. The theme, that only focused, application oriented approaches will be maintainable, directs us to limit us to the concepts needed for interoperation, for which we will reuse the term articulation.

Once we have clear domain ontologies that are to be related within an application we must recognize their intersections, where concepts belong to multiple domains. For clarity, we restrict ourselves to intersections of two domains. More complex cases are certainly feasible, but we will address them using the algebraic capabilities presented in Section 5. We will deal in this section with the binary case.

Figure 4: Articulation of Two Domains

3.1 Semantic Rules

An application requiring information from two domains must be able to join them semantically, so that there will be a semantic intersection between them. Such a match may not be found by lexical word matching.

For instance, checking for a relationship of automobile purchasing and accidents requires looking for the car owners in dealer records that list the buyers.

We define then the articulation to be the semantically meaningful intersection of concepts that relate domains with respect to an application. The instances should match according to our definition of an ontology, given in the introduction to this section.

An articulation point hence defines a relevant semantic match, even if the actual terms and their representation do not match. For instance, for vacation travel planning a trip segment matches the term flight from the airline domain, and term journey from the railroad domain. Terms at a lower level of abstraction, defining instances also have to made to match. For instance, to take a train to San Francisco Airport one must get off at the San Bruno Caltrain station. Here the terms are at the same granularity, and once matched, the articulation is easy. Understanding such articulation points is a service implicitly provided now by experts, here travel agents. In any application where subtasks cross the boundaries of domain some experts exist that help bridge the semantic gaps.

Often the matching rules become complex. In listings of the California Department of Motor Vehicles (DMV) houseboats are included. To match vehicles correctly for, say, an analysis of fuel consumption, the articulation rule has to exclude those houseboats. The attributes that define the classes now become part of the input needed for the execution of the articulation. Such differences in scope are common, and yet often surprising, because the application designer has no reason to suspect that such differences exist. A good way to check correctness of matches is to process the underlying databases.

The concept is not to force alignment of entire base ontologies, but only present to the application consistent terms in the limited overlapping area. Typical applications that rely on intersections are in purchasing goods and services from another domain, the example above cited journeys and lights. Terms only used in one domain need not be aligned, as sleeping compartment and in-flight movie.

3.2 Creating Articulations

There are already people in all kinds of business settings who perform such work. Any travel agent has to be able to deal with the diversity of resources. However, when interacting on the phone or directly with diverse webpages on the Internet, the problems are not widely recognized. For automation they will need to be solved formally.

Keeping the rules that define an articulation specific to narrow application contexts simplifies their creation and maintenance. Even within an application area multiple rule sets can exist, for instance, one might be specific to logistics in drug distribution. The logical organization to be responsible for the rules which define such a specific articulation ontology for example, pharmaceutical drugs would be the National Drug Distributors Association (NDDA) in the USA. There will be a need for tools to manage those rules, and these tools can serve diverse applications, both in creation and maintenance [13].

When two sources come from the same organization, we would expect an easy match. i.e., a consistent ontology. However, we found that even in one company the payroll department defined the term employee differently from the definition used in personnel, so that the intersection of their two databases is smaller than either source. Such aberrations can easily be demonstrated, by computing the differences of the membership from the respective databases, following an ontological grounding as we use here. In large multi-national corporations and in enterprises that have grown through mergers, differences are bound to exist. These can be dealt with if the problems are formally recognized and articulated, but often they are handled in an isolated fashion, and solved over and over in an ad-hoc fashion.

Such analyses are not feasible when source information sources are world wide, and contexts become unclear. Here no comprehensive matching can be expected, so that certain operations cannot be executed reliably on-line, although many tasks can be carried out. These difficulties are related to the applicability of the closed-world-assumption (CWA)[22].

It requires an effort to define articulations precisely. The investment pays off as it reduces the wasted effort in taking care of the effects of errors that are now avoided. The initial effort becomes essential to support repetitive transactions, where one cannot afford to spend human efforts to correct semantic mismatches every time.

To summarize, articulations that are needed among domains are made implicitly by smart people. Converting human expertise in dealing with domain intersections to permit automation will require a formalization of the domain ontologies and their semantic intersections. Such research will be an important component of moving to the semantic web [2].

3.3 An Algebra for Ontologies

There will be many applications that require more than a pair of ontologies. For example, logistics, which must deal with shipping merchandise via a variety of carriers: truck, rail, ship, and air, requires interoperation among many diverse domains, as well as multiple companies located in different countries. To resolve these issues we are developing an ontology algebra, which further exploits the capabilities of rule-based articulation [20].

Once we define an intersection of ontologies through articulation, we should also define union and difference operations over ontologies [24]. We apply the same semantic matching rules we used for articulation to transform the traditional set operations to operations that are cognizant of inter-domain semantics. Assuring that soundness and consistency, mirroring what we expect from traditional set operations, is a challenge.

Having an algebra not only achieves disciplined scalability to an unlimited set of sources, but it also provides a means to enumerate alternate composition strategies, assess their performance, and, if warranted, perform optimizations [13]. We expect that the semantic union operation will mainly be employed to combine the results of prior intersections, in order to increase the breadth of ontological coverage of an application.

The semantic difference operation will allow the owners of a domain ontology to distinguish the terms that the owners can change as their needs change. The excluded terms, by definition, participate in some articulation, and changes made to them will affect interoperation with related domains, and hence make some application less precise, or even disable it. Informally, difference allows ontology owners to assess the scope of their local autonomy.

4. Architecture

We use the term architecture to refer to the composition of modules of information systems. Traditional information systems have depended on human experts. Their elimination through the capability of providing direct linkages on the web has led to disintermediation [23] .

We see these services being replaced by automated engines, positioned between the information clients and the information resources. Within the mediators will be the intelligent functions that encode the required expertise for semantic matching and filtering [15]. Composition of synergistic functions creates a mediator performing substantial service. Such a service is best envisaged as a module within the networks that link customers and resources, as sketched in Figure 5. Many customers can share mediator services through their web portals [14]. Multiple mediators will often be needed, when measures used for selection and valuation are not commensurate. For instance, tradeoffs involving cost versus quality, or risk versus having-up-to-date information must be relegated to the decision maker, and not automated at lower levels in a system hierarchy.

Figure 5. Mediator Architecture

Domain-type mediators can integrate domains as financial information, personnel management, travel, logistics, technology, etc. [24]. Within these domains will be further specialization, as in finance to provide information about investing in precious metals, bonds, blue-chip stocks, utilities, and high tech.

There will be meta-services as well, helping to locate those services and reporting on their quality. Mediators encompass both experts and software to perform these functions, and sustain the services as functional requirements and underlying ontologies evolve.

4.1 Middleware and Mediation

The need for middleware to connect clients to servers has been well established, although it has only attracted a modest amount of academic interest [7]. However, middleware products only enable communication, and deal with issues as establishing connectivity, reliability, transmission security, and resolution of differences in representation and timing. A mediator can exploit those technologies and avoid dealing with the problems that arise from having an excess of standards. However, middleware never deals with true semantic differences and only rarely with integration of information, leaving these tasks to a superior layer of software.

There is today a small number of companies building mediators with transformation and integration capability [28]. However, the available technology is not yet suitable to be shrink-wrapped and requires substantial adaptation to individual settings. Many products focus on specific increments. When the added value is modest, then the benefit gained is likely outweighed by the cost in performance incurred when adding a layer into an information system architecture. In those cases, making incremental improvements to the sources, as providing object transforms, or in applications, as providing multiple interfaces, seems to be preferable.

However, if the services placed into an intermediate layer are comprehensive, sufficient added value can be produced for the applications that access the mediators and the cost of transit through the additional layer will be offset.

4.2 Incremental Maintenance

To deliver valuable services, mediators will have to be updated as well. Some changes are bound to affect the customers, as new interfaces or changes in the underlying ontologies. Unwanted updates, scheduled by a service, often hurt a customer, even though in the long run the improvement is desired. To allow customers to schedule their adaptation to new capabilities when it is suitable for them, mediator owners can keep prior versions available. Since mediators are of modest size and do not hold voluminous data internally, keeping an earlier copy has a modest cost.

The benefits of not forcing all customers to change interfaces at the same time are significant. First of all customers can update at a time when they can do it best. A second benefit is that first only a few customers, namely those that need the new capabilities will be served. Any errors or problems in the new version can be repaired then, in cooperation with those customers, and broader and more serious problems will be avoided [25].

Since maintenance of long-lived artifacts, including software, is such a large fraction of the lifetime cost it is crucial to plan for maintenance, so that maintenance can be carried out expeditiously and economically. Being able to be responsive to maintenance needs increases consumer value and reduces both consumer and provider cost. Where maintenance today often amounts to 80% of lifetime cost, a 25% reduction in those costs can double the funds available for systems improvements, while a 25% increase can inhibit all development and lead to stasis.

5. Summary

Information presented to customers or applications must have a value that is greater than that of obtaining and managing it. A large fraction of the cost is dealing with erroneous and irrelevant data, since such processing requires human insight and knowledge. More information is hence not better, and less may well be, if relevance per unit of information produced is increased.

The need for assistance in obtaining relevant information from the World Wide Web was recognized early in the web's existence [3]. This field has seen rapid advances, and yet the users remain dissatisfied with the results. Complaints about "information overload" abound. Web searches retrieve an excess of references, and getting an actually relevant result, requires much subsequent effort.

In this article we focused on one aspect, namely precision, the elimination of excess information. The main method we presented is constrained and precise articulation among domains, to avoid the errors that occur when searches and integration of retrieved data is based on simply lexical matches.

We refer to the services that replace traditional human functions in information generation as mediators, and place them architecturally between the end-users, the human professionals and their local client software, and the resources, often legacy databases and inconsistently structured web sources. Such novel software will require more powerful hardware, but we are confident that hardware-oriented research and development is progressing, and will be able to supply the needed infrastructure. The major reason for slow acceptance of innovations is not the technology itself, but the massiveness of the organizational and human infrastructure.

Biography

Dr. Wiederhold is currently a Professor Emeritus at Stanford University in both the Departments of Computer Science, and Medicine and Electrical Engineering (by courtesy) where his research includes; design and operation of large-scale software systems, database design, knowledge bases, distributed systems, applications in medicine, planning, and business, and privacy/security.

Dr. Wiederhold is also the CEO of Symmetric Security Technologies, a software development and consulting firm specializing in Internet security. He is greatly concerned about secure information management. He has been the principal investigator for NSF and DARPA-funded research projects to develop security mediation in the healthcare domain (Trusted Interoperation of Healthcare Information (TIHI)), for medical images (Trusted Image Dissemination (TID)), and CAD (Secure Access Wrapper (SAW)).

Dr. Wieferhold has authored nine books and over 300 technical reports and papers. He received his Ph.D. in Medical Information Science from the University of California in 1976.

Author Contact Information
Dr. Gio Wiederhold Computer Science Department Gates Computer Science Bldg. 4A Room 433 Stanford University Stanford CA 94305 Phone: 650-725-8363 E-mail: [email protected]