Paradigmatic Considerations for an AI Interface for a Wearable Computer
By ai-depot | March 15, 2003
An AI Interface
An AI Interface
Designing an interface for the LC should be relatively simple because the kind of data which will be available is highly predictable. The distraction quotient for the interface is controlled by the user, whether notification of results takes the form of on-demand access, or access scheduled by the user. For example, the user knows in advance that a weather forecast is available. Interface options should minimally allow user to choose among having this data presented on demand, at an arranged time (such as simultaneously with a wake-up alarm), or when triggered by another event (i.e., when planning an itinerary that includes travel, shopping for clothes, or doing laundry or picking up dry-cleaning). Similarly, a user may request what movies are playing at a local theatre, or request the closest location for a particular film.
User-driven demand for data or information (which is immediately thereafter presented) is, obviously, not a user interruption. (It may be a lower-level stack operation, which is not a negligible consideration, but for now I will ignore this. To the extent we must pay conscious attention to creating the request, it is a distraction, but we have bigger problems than this to deal with, and creating better "English-language" inquiry engines is already a much-considered problem.) Notification of specific content after user demand for status (i.e., headers for e-mail) is also not itself an interruption, although we can choose to be interrupted in order to process new information.
Unfortunately, these two interface modes — response to user demand, and updated status report (list) on user demand — represent most of what humans have even imagined for effective AI agents. Science fiction, which is where humans explore many new concepts (usually, we only build what we first imagine), has remarkably little to offer us with regard to an AI interface. If we disregard the cyberpunk genre (because that is an immersive paradigm), we are pretty much left with Asimov’s text processing based on user’s intentional thought (or more properly, subliminal speech), the constructed reality of Terminator (featuring an augmented reality visual display in POV shots), or the carefully constructed explicit queries (and very occasional vocal announcements of environmental dangers or their continuing status) of the fictional Star Trek and Alien universes. Even if we stretch a point and regard either an autonomous robot (HAL, Data) or an independent AI (Colossus, etc.) as a means to an interface, we are not enlightened.
Then there are the Borg… The interface for these fictional, centrally-controlled creatures is not problematic because the computer is the means of control. A Borg cannot be distracted since it has no individuality. Their interface is immersive, reflecting only sufficient environmental cues to keep them from falling over objects (including humans) in their vicinity. In fact, their interface does not even have the intelligence to become aware of humans and issue a warning. I suggest if designers build an immersive interface, they at least build in an automatic notification strategy for the most critical environmental cues. An example: if continuous input monitoring by the PC’s AI agent indicates a local tornado warning which the AI knows the user is not monitoring, it should be empowered to issue automatically a notification of results in modalities with very high distraction quotient. (Note the implication that input to AI agents should include user’s current interface modality and, probably, content.)
Grouping or Filtering Results
It may be instructive to consider one response to the story about PDA-dependency previously cited (found lower on the same Web page):
Genius (7:25pm EST Tue Feb 06 2001)
The smartest man of the last century once said [sic] Why should I remember a phone number when I have a phone book?
That was Albert Einstien [sic, repeated] folks.
I switched from a paper based [sic] appt/address book to a PDA becuase [sic] a PDA can beep to remind me of the meeting I am missing.
Setting aside obvious concerns about a "genius" who can’t spell or use punctuation properly, this poster does make significant points: We do not want to have to "remember the phone book," nor is this even possible. Encyclopedic knowledge was last possible about 300 years ago; some time in the past century we certainly entered the age of information overload. Is it possible, then, to distinguish between details we do not need to remember, and larger-scale concepts we need to be able to recall or retain in memory? This is equivalent to the question: Could an AI interface automatically filter its discoveries, if it performs data mining for us?
Whereas information has a (usually hierarchical) structure, data does not. This is not obvious, actually — because we impose organization on data, we "see" it as organized, but stimuli, the raw constituents out of which we construct data, does not come with organization. As infants we learn to see, for example. (Kittens raised in an environment with only vertical bars will literally not see, and run into, horizontal bars!) We learn to associate sounds and sights based on experience, forming stable concepts ("Mother," "cat," "bottle," etc.) from wildly varying sensory impressions. It follows that, in a general sense, an AI probably cannot filter the results of autonomous data mining, or even order them.
Different persons think about (work with) different data, at different levels of organization. For example, most of those who have my level of verbal expertise enjoy crossword puzzles. Manipulating letters within a word-based space is enjoyable activity for them. Others at my intelligence level apparently enjoy trivia contests. For a long time I had no idea why I did not share their enthusiasm for either pursuit, until quite recently (maybe 5-10 years ago — and note that I am over 50!), when I suddenly realized that I have a strongly marked preference for playing with much larger-scale concepts. Some enjoy synthesizing letters or words, I apparently prefer to synthesize large-scale general theories of human behavior.
A very large-scale information orientation means that theories about learning, models of memory, and political or economic theories (although themselves information) are, for me, data. That is, I need to retain such models and theories in my memory in order to manipulate them, to develop information which has more significant meaning for me. For most persons who bother to learn such things, these concepts or theories are not data; if they learn them at all (perhaps for academic reasons), they probably forget them later.
What I shall call "information orientation" is probably critical to our individual memory recall abilities. This is a long-winded way of suggesting we remember data at the scale with which we like to work (because that it most salient to us). Further, we remember downwards (since details will fit into the higher-level structures, and such hierarchical organizations are known to enhance memory), but not upwards. Unfortunately, as already mentioned, identifying a level of abstraction (which is related to my definition of human intelligence) is not something an AI can generally do.
Incidentally, note that highly significant human mental activity may even be neither conscious nor intentional. Human "obsession" is continuous, sometimes even without conscious thought. In fact, several thinkers reported solving significant problems while asleep, in dreams. One very prominent example from the past century is one of the researchers who first announced the structure of DNA. His discovery could not have occurred if his memory did not easily provide many specific details about chemicals and molecular structure, in order for this data to be mentally manipulated.
In writing the above, my memory supplies Cambridge, and Watson & Crick, but not which researcher reported dreaming the solution. This detail missing from my memory is a perfect example of the sort of thing I personally rely on pervasive computing to supply. My ideal PC AI agent would note my reference without citation, and automatically issue a search request to the LC AI so it could supply this reference either when I asked for it, or when I initiated a proofing operation on my document-in-progress. Note that it would not automatically jump in and put in the reference while I was writing, because that could distract me! If supplying the reference during an automated (unobserved) proofing operation, it would subtly highlight this change, so I could optionally attend to or ignore (or even cancel) the addition as I review the changed document. It would also list all such changes in a separate document or list.
Existing Interfaces
The AI interface of a wearable device coupling personal and ubiquitous computing will, literally and very significantly, change how we perceive our world. Indeed, it will change our actual world. From an early age, we will expect our world to "beep at us," just as we now commonly expect to coordinate closely with persons who may be physically at quite a distance from us, whom we may not even have met.
The second point made by "Genius" is that a PDA entry beeps, whereas an entry made with a paper-based organizer cannot. This leads us to some important observations: An interface should be able to employ a range of notification strategies, with varying distraction quotients. These include variations in sensory modality (most software designers currently only consider these when working on software for disabled persons), as well as modifers (for sound, loudness, pitch, and varying tone or frequency of stimulus; for vision, textual size, color, and font as well as non-text images, etc.)
Again, individual user preference should always be indulged by designers of an AI’s notification stimuli. For example, my own PIM is set up not to beep, but pops up a visual cue over whatever other application I’m currently working with, while my (equally obsolete) word processor beeps — quietly — when it corrects a word since I might not notice the movement (a visual cue). On a larger scale, a chess program that suggests a line using algebraic notation is worthless to me, because I never learned to read that format fluently; I would insist on an interface that delivered different lines of play in the form of animated moves on a board. There may be similar considerations for any AI helping a musician; output choices should at least include visual (notes on a staff), sound, or a MIDI file. Generally, an AI agent must adapt our preferred vocabulary for specific fields, it will never be appropriate for it to require us to learn its (limited) symbol set. (The only exception to this rule would be AIs used for teaching or training.)
Software as well as hardware capabilities increasingly shape our human, sense-based perception of our world. The existing dominant interface for most applications is text menu/drop down options. This provides no intuitive cues, and is not heuristic. It does not even encourage exploration. Its only advantage to human memory is that the top-level menu items tend to be standard, beginning with “File” and ending with “Help”; however — as anyone who has watched the arbitrary relocation of preference options within browser software knows — these titles are apparently used as often to hide as to transparently organize features users might want to employ. A browser which did not allow sophisticated users to set up their own home page and similar options could not survive; at the same time, however, the dominant browser makers (in their other capacity as portal owners) apparently don’t want to make it easy for naive users to discover this capacity.
The few remaining complex user interfaces (mostly for games) are not much more promising. As an older woman looking at a new video game, I am always initially amazed at the amount of data presented which the user is expected to comprehend. Having mastered one such game, however, I realize these interfaces are designed to facilitate physical reaction rather than thought. They are also immersive interfaces, and relatively simple ones at that. (Indeed, one presumably intended consequence of an immersive interface is that it makes almost all of the "real world" irrelevant, leaving the user free to concentrate on the specific stimuli or factors chosen by the designer of the interface.)
A simulation can be considered a special class of video game which (usually) facilitates thinking rather than simple physical reaction, but the way a simulation models (or is relevant to) the world has always been very narrowly filtered by its designer. Simulations are very useful applications, but they yield no obvious clues about designing an AR interface for a general purpose AI agent. They are just another example of applications which respond to user demand, like a statistics program. Yet intuition tells me an AI agent ought to be able to run useful "what-if" scenarios on my behalf, whether it stores the results, or notifies me of results it judges are either critical, or contextually (based on its monitoring of my activities) relevant.
Tags: none
Category: essay |