Communication and Understanding
If understanding things is hard, communicating that understanding can be even harder. One of the things that is truly challenging about semantic technologies is that in addition to the difficulties that people can have with understanding and communicating with each other, now we are adding computers into the mix.
To get started, I want to quote from a paper by William Thurston from the April 1994 issue of the Bulletin of the AMS (American Mathematical Society). Thurston is a highly respected mathematician – and a Fields medalist to boot. The paper – On Proof and Progress in Mathematics – was written in response to a paper by Arthur Jaffe and Frank Quinn called “Theoretical Mathematics”: Toward a cultural synthesis of mathematics and theoretical physics from the July 1993 issue of the Bulletin. The April 1994 issue contains a number of responses to the Jaffe-Quinn article in addition to Thurston’s, along with a response by the authors. Jaffe and Quinn certainly sparked a lively debate!
Thurston’s paper covers many topics of interest, and I expect to return to it in future articles. For now, I’m just going to focus on two specific quotes. The first is from page 15 of the article:
We held an AMS summer workshop at Bowdoin in 1980, where many mathematicians in the subfields of low-dimensional topology, dynamical systems and Kleinian groups came.
It was an interesting experience exchanging cultures. It became dramatically clear how much proofs depend on the audience. We prove things in a social context and address them to a certain audience. Parts of this proof I could communicate in two minutes to the topologists, but the analysts would need an hour lecture before they would begin to understand it. Similarly, there were some things that could be said in two minutes to the analysts that would take an hour before the topologists would begin to get it. And there were many other parts of the proof which should take two minutes in the abstract, but that none of the audience at the time had the mental infrastructure to get in less than an hour [my emphases].
At that time, there was practically no infrastructure and practically no context for this theorem, so the expansion from how an idea was keyed in my head to what I had to say to get it across, not to mention how much energy the audience had to devote to understand it, was very dramatic.
What this illustrates for me is the fact that we cannot simply expect to be able to translate human-readable texts into machine-processable formalisms and back without (at a minimum) developing models of cognition that represent someone’s understanding at a fairly detailed level. The translation processes need to be able to tailor the results to and from these cognitive profiles in order to be able to produce anything useful.
Human conversations include many mechanisms for negotiating about areas of shared comprehension. When people are aware that they share a common understanding of concepts and language in a particular area, they can communicate dramatically faster (using both linguistic and conceptual shorthands) than when it is necessary to digress into background explanations and develop appropriate terminology.
Earlier in the paper (page 6) Thurston addresses the ways in which different media affect the communication process.
Much of the difficulty has to do with the language and culture of mathematics, which is divided into subfields. Basic concepts used every day within one subfield are often foreign to another subfield. Mathematicians give up on trying to understand the basic concepts even from neighboring subfields, unless they were clued in as graduate students.
In contrast, communication works very well within the subfields of mathematics. Within a subfield, people develop a body of common knowledge and known techniques. By informal contact, people learn to understand and copy each other’s ways of thinking, so that ideas can be explained clearly and easily [my emphases].
Mathematical knowledge can be transmitted amazingly fast within a subfield. When a significant theorem is proved, it often (but not always) happens that the solution can be communicated in a matter of minutes from one person to another within the subfield. The same proof would be communicated and generally understood in an hour talk to members of the subfield. It would be the subject of a 15- or 20-page paper, which could be read and understood in a few hours or perhaps days by members of the subfield [my emphases].
Why is there such a big expansion from the informal discussion to the talk to the paper? One-on-one, people use wide channels of communication that go far beyond formal mathematical language [my emphasis]. They use gestures, they draw pictures and diagrams, they make sound effects and use body language. Communication is more likely to be two-way, so that people can concentrate on what needs the most attention. With these channels of communication, they are in a much better position to convey what’s going on, not just in their logical and linguistic facilities, but in their other mental facilities as well.
In talks, people are more inhibited and more formal. Mathematical audiences are often not very good at asking the questions that are on most people’s minds, and speakers often have an unrealistic preset outline that inhibits them from addressing questions even when they are asked.
In papers, people are still more formal. Writers translate their ideas into symbols and logic, and readers try to translate back [my emphasis].
Note that the degree (actual or expected) of interactivity plays a significant role here. In direct communications interactivity is high, so that someone can (generally) go ahead and presume that the other person will interrupt if they have questions or disagreements. In a talk the bandwidth is still (relatively) high, even though it will be much less interactive and has to deal with a broader range of potential (mis)understandings. In a paper, almost all interactivity is lost – and the assumptions about shared understandings have to be even broader.
One of the potential benefits of the Semantic Web (or at least some future version of it!) is that semantic infrastructures should be able to support a high degree of interactivity, so that instead of just having to read papers, it should be possible for the semantic infrastructure to create personalized interactive tutorials for people that are tailored to bridging the gaps between what they understand and what an author is talking about.
Tie this in with a combination of social networking technologies and (semantic-based) knowledge infrastructure technologies (such as digital libraries with underlying ontology layers) and you start to get an idea of what a significant portion of my personal vision in these areas is all about.
On being multi-disciplinary
Thurston’s paper illustrates some of the difficulties experienced even by mathematicians operating in different subfields. Things get much worse when operating across disciplines.
One of the challenges in working with ontology is that it is a core meta-discipline that ultimately should pervade every other discipline. In addition, in order to really understand ontology in its own right it is necessary to know concepts from many other disciplines. Philosophy, sociology, and cognitive science are all directly relevant in various ways – and when we attempt to translate ontology into computer terms, we inevitably bring in all of the subfields of computer science and practical programming (which generally have very different perspectives).
All of this before even getting to any application domains!
In my own approach, I use a lot of mathematics. This creates additional difficulties, for the obvious reason that if even professional mathematicians operating in different subfields have difficulty understanding each other, what chance does anyone else have ???
A much more challenging issue is raised by the old saying “Jack of all trades, Master of none”. In general, the broader one’s interests are, the more difficult it is to be at the forefront of any of them. Certainly, keeping up with the literature in any academic discipline, or with specific technologies from a practical perspective, becomes increasingly difficult the broader you try to go.
Nevertheless, I can’t resist quoting the story of Lugh – an ancient Irish legend. This version is from the Wikipedia.
As a young man Lugh travelled to Tara to join the court of king Nuada of the Tuatha Dé Danann. The doorkeeper would not let him in unless he had a skill with which to serve the king. He offered his services as a wright, a smith, a champion, a swordsman, a harpist, a hero, a poet and historian, a sorcerer, and a craftsman, but each time was rejected as the Tuatha Dé Danann already had someone with that skill. But when Lugh asked if they had anyone with all those skills simultaneously, the doorkeeper had to admit defeat, and Lugh joined the court.
A happy ending ;-)
These days, inter-disciplinary collaboration is increasingly accepted as a source of innovation. People from different fields bring different perspectives to each other’s problems. It is still necessary, however, to undergo an initial cross-learning phase in order for members of a team to gain enough familiarity with each other’s disciplines in order to make useful discussions possible.
I have also found from personal experience that disciplinary / technological breadth can provide a very useful foundation for facilitating these kinds of discussions – especially when supported by at least some practical level of understanding a variety of consulting skills. Facilitation itself is more art than science, and (at least based on my own experience) is best picked up by absorption from skilled practitioners than by focusing on theoretical models.
Often, however, it is necessary for one person to try to find ways to pull together all of the different strands in a multi-disciplinary collaboration. I’ll close this section with just one more quote, from the Preface of Conceptual Spaces: The Geometry of Thought by Peter Gärdenfors.
While writing the text, I felt like a centaur, standing on four legs and waving two hands. The four legs are supported by four disciplines: philosophy, computer science, psychology, and linguistics (and there is a tail of neuroscience). Since these disciplines pull in different directions – in particular when it comes to methodological questions – there is a considerable risk that my centaur has ended up in a four-legged split.
A consequence of this split is that I will satisfy no one. Philosophers will complain that my arguments are weak; psychologists will point to a wealth of evidence on concept formation that I have not accounted for; linguistics [sic] will indict me for glossing over the intricacies of language in my analysis of semantics; and computer scientists will ridicule me for not developing algorithms for the various processes that I describe.
I fully expect this blog to suffer from these same kinds of challenges – but with even more disciplines to upset ;-) In a subsequent section, I’ll add something about what happens when we consider adding computers into the mix.
Before doing that, however, I want to take a brief digression in order to explore a few aspects of what it means to condense complex conceptual structures into highly compact (symbolic) representations.
Conceptual Condensation
In a very interesting paper (A Study in the Foundations of Programming Methodology: Specifications, Institutions, Charters and Parchments [1986] – CSLI-86-54), Joseph A. Goguen and R. M. Burstall are able to formulate their notion of a generalized institution in just 11 symbols ( although exactly how many symbols there are here depends on what you choose to count as a distinct symbol! ):
D(| | op / V -)
I won’t attempt an explanation of this definition here, since to do so would require considerable background in category theory, as well as the concepts developed in the paper itself. The point is that for someone with the appropriate background – and an explanation of what the symbols mean in that context – this simple formula is able to condense most of the content of the previous highly technical 20 pages.
Although this is closely related to the points made in Thurston’s paper, it is technically somewhat different. Thurston talks about the ability to rapidly convey the gist of a new proof to members of a subfield, and the advantages of interactivity in that communication.
By contrast, what we have here is the ability to represent a very complex set of conceptual interrelationship patterns by a very concise symbolic formula. I regard this as involving a significant difference in perspective, although the two perspectives are highly complementary.
Although this example is highly obscure and technical, broadly understood phrases such as ‘the x, y and z axes’ to represent Cartesian coordinates placed in a specific orientation relative to a page also present an example of significant conceptual condensation.
In this case, we are not dealing with a formula in a specific technical language, but instead with a convention-based association between x, y and z and all of the many concepts and connotations associated with Cartesian coordinates – not to mention the associational connotations of 3-d Cartesian coordinates themselves with (non-relativistic) models of space.
Finally, I can’t resist the temptation to include a quote from G. Spencer Brown’s Laws of Form (page 117 in the original 1969 edition).
… For example, everything in pp 98-126 of Principia Mathematica can be rewritten without formal loss in the one symbol┐
provided, at this stage, the formalities of calculation and interpretation are implicitly understood, as indeed they are in the Principia. Allowing some 1500 symbols to the page, this represents a reduction in the mathematical noise-level by a factor of more than 40000.
Although Spencer Brown, along with the value of the ‘calculus of indications’ developed in Laws of Form, is somewhat controversial, I find that it is well worth an occasional re-read – especially for its Notes chapter. There are many more illuminating comments here about different aspects of the ways in which highly condensed symbols can be interpreted in many different (but related) ways when working at levels of abstraction in which distinct concepts start to degenerate (Spencer Brown’s term) or fuse (my term) into very powerful proto-concepts.
I hope that these three examples give at least some indication of the ways that symbols can be used to provide very concise representations of complex conceptual patterns. This is very different from the use of symbols in formal calculational systems, in which their meanings are not ‘grounded’ in some way (a useful reference for these issues is Symbol Grounding for the Semantic Web by Anne Cregan).
So at this point, I think we finally have enough background developed in order to make a start on explaining the idea of ‘artificial persons’.
Artificial Persons
So just what does happen when we start to think about inserting computer systems into our discussion of communication and understanding?
To start with, computers are unbelievably dumb. Basically, they can’t be said to ‘understand’ anything at all. However, they do have a fantastic ability to store and manipulate symbolic representations.
The idea of ‘artificial persons’ is basically a framework for an elaborate series of thought experiments about just what it might take to at least give computers the ability to simulate understanding in such a way that we can leverage their other abilities in much more powerful ways than is currently possible.
Traditionally, computer software has been conceived of as ‘programs’ that run on some combination of hardware, software, networking, and storage infrastructures. These infrastructures include components such as operating systems, ‘middleware’, databases, and so on.
The most widely available networking infrastructure is The Internet – and we will be specifically interested in that portion of The Internet known as the World Wide Web (or simply ‘the web’).
Today, most of the web is focused on text that is created by and for people. Although components of the web, such as search engines, do process this text in order to make it easier for people to find things, it would be hard to make a claim that there is any computer-level understanding of what any of this text is all about.
The Semantic Web project is an initial attempt to bridge this gap. It does so by introducing formal ontologies that can be used to represent (at least some aspects of) the meanings associated with text in ways that can be processed by machines. It should then be possible to program automated agents to help people find things in much more specific ways than is currently possible.
My personal belief is that current Semantic Web technologies have the potential to be extremely useful in specific application domains, but that they are still a long way from the maturity needed in order to exploit their full potential. Given that these are very new technologies, this shouldn’t be any surprise.
One area that I believe is crucial for expanding the range of application of the Semantic Web is the ability to create and define knowledge infrastructures that are comprehensible to both people and machines. That means that a digital library of textual information would have an underlying semantic infrastructure that is able to represent that textual information in ways that are machine-processable (at a semantic level).
I’ll start to elaborate on many aspects of this in future posts, but for now I want to skip over all the details, and consider certain aspects of this that I find particularly interesting in the (very) long term. I’ve alluded to one of these early on in this article, namely the ability for such systems to take cognitive profiles of people into account in their interactions with them – and to use these in order to offer personalized tutorials on concepts that someone might not understand.
Although this is clearly well beyond the reach of current technologies, I actually want to go even further than this in order to introduce a key principle that I think will start to be extremely important as these kinds of technologies develop. That is something I call finite resource logic (FRL).
FRL starts to come into play when we start considering the cost structures associated with highly sophisticated agents. For example, suppose that I have an agent that ‘understands’ my preferences, and knows pretty much what I know – because I end up using it as a communications intermediary all the time. Think of this as ‘the future of email’ – the agent ends up being a sophisticated secretary (or executive assistant) that is able to organize things for me, take care of routine communications, planning and scheduling, and only pass on to me things of likely interest.
Such an agent will not only build up an extensive knowledge base about me (my personal and business affairs and interests) but will also need to be able to interact with a broad range of web-based resources that are focused on shared – rather than personal – knowledge.
The agent will need to be able to ‘understand’ at least the basic concepts on which the symbolic representations of these knowledge domains are grounded, in order to be able to act as an effective intermediary for me. Each of these domains will necessarily increase the size (and hence the associated processing power required) of the agent’s own ‘personal’ knowledge base – and hence the costs of maintaining it.
At some point, I believe that these kinds of costs will start to significantly constrain the amount of information that an agent can ‘afford’ to keep track of. Although this will probably seem like a huge leap, ultimately I see this creating a whole economic ecosystem associated with agents essentially trading knowledge and information modules among themselves.
One of the key points about all this is that instead of thinking about knowledge infrastructures as simply being massive – and passive – impersonal resources, agents such as those I’ve described will become highly specialized and personalized to their ‘owners’ – actively seeking to maintain an ‘optimal’ mix of knowledge at a cost appropriate to their owner’s resources, goals, and the apparent ‘value’ they provide.
At some point, I see this starting to ‘tip over’ to the point where such agents can genuinely be regarded as a kind of ‘artificial person’ in their own right – even interacting in their own societies in addition to interacting with (real) people. I know – another huge leap !!!
So what happens when an agent’s owner dies or can no longer afford to employ it. If it’s built up a sufficiently powerful set of knowledge bases and capabilities, perhaps it can go out and look for another job?
At this point, I think this article is already more than long enough, and is starting to venture into territory that deserves much deeper explorations than are practical here. What I hope I’ve managed to convey is that by skipping into these very long term scenarios, we start to find interesting questions that go far beyond things we might think about when we only consider near-term scenarios.
I’ll have a lot more to say about all of these topics in future posts. For now, I’ll simply make a final comment that the idea of finite resource logic is something that I introduce into the core of my approach to mathematical ontology. Once you get the hang of thinking in these terms, many aspects of the way in which conventional mathematics is developed and presented start to seem curiously distorted. Stay tuned …
Ian
November 24, 2007
