"From Search Engines to Question-Answering Systems~WThe Need for New
Tools"
Existing search engines, with Google at the top, have many remarkable capabilities. Furthermore, constant progress is being made in improving their performance. But what should be realized is that existing search engines do not have an important capability: deduction capability - the capability to synthesize an answer to a query by drawing on bodies of information which reside in various parts of the knowledge base. By definition, a question-answering system is a system which has deduction capability. Can a search engine be upgraded to a question-answering system through the use of existing tools - tools which are based on bivalent logic and bivalent-logic-based probability theory? A view which is articulated in the following is that the answer is: No.
The first obstacle to upgrading is the concept of relevance. There is an extensive literature on relevance, and every search engine deals with relevance in its own way, some at a high level of sophistication. But what is quite obvious is that the problem of assessment of relevance is quite complex and far from solution.
There are two kinds of relevance: (a) query relevance and (b) topic relevance. Both are matters of degree. For example, on a very basic level, if the query is q: "How old is Vera?" and the available information is p: "Vera has a daughter who is in mid-thirties," then what is the degree of relevance of p to q? Another example: To what degree is a book entitled "Knowledge Representation" of relevance to the topic of summarization.
The second obstacle is world knowledge -the knowledge which we acquire through experience, communication and education. Simple examples are: "Icy roads are slippery" and "Princeton usually means Princeton University." World knowledge plays a central role in assessment of relevance and deduction. The problem with world knowledge is that it is, for the most part, perception-based. Perceptions - and especially perceptions of probabilities - are intrinsically imprecise, reflecting the fact that human sensory organs, and ultimately the brain, have a bounded ability to resolve detail and store information. Imprecision of perceptions stands in the way of using conventional techniques - techniques which are based on bivalent logic and bivalent-logic-based probability theory, to deal with perception-based information.
To deal effectively with relevance, world knowledge and deduction, new tools are needed. The principal new tools which are outlined in my lecture are: Precisiated Natural Language (PNL); Protoform Theory (PFT); and the Generalized Theory of Uncertainty (GTU). These tools are drawn from fuzzy logic - a logic in which everything is, or is allowed to be, a matter of degree.
The centerpiece of the new tools is the concept of a generalized constraint. The importance of the concept of a generalized constraint derives from the fact that in PNL and GTU it serves as a basis for generalizing the universally accepted view that information is statistical in nature. More specifically, the point of departure in PNL and GTU is the fundamental premise that, in general, information is granular, with statistical information constituting a special case. This, much more general, view of information is needed to deal effectively with relevance, world knowledge, deduction and related problems.