Digital immersion is moving into public space. Interactive screens and public displays are deployed in urban environments, malls, and shop windows. Inner city areas, airports, train stations and stadiums are experiencing a transformation from traditional to digital displays enabling new forms of multimedia presentation and new user experiences. Imagine a walkway with digital displays that allows a user to immerse herself in her favorite content while moving through public space. In this paper we discuss the fundamentals for creating exciting public displays and multimedia experiences enabling new forms of engagement with digital content. Interaction in public space and with public displays can be categorized in phases, each having specific requirements. Attracting, engaging and motivating the user are central design issues that are addressed in this paper. We provide a comprehensive analysis of the design space explaining mental models and interaction modalities and we conclude a taxonomy for interactive public display from this analysis. Our analysis and the taxonomy are grounded in a large number of research projects, art installations and experience. With our contribution we aim at providing a comprehensive guide for designers and developers of interactive multimedia on public displays.
Displays can be found in many public spaces in the form of advertising and information displays or as artistic installations. Through novel display technologies (e.g., eInk, OLED), we envision that in the future, literally every surface could be transformed into a display, ranging from floors and walls to ceilings and windows. Such displays can reach passersby in different situations and places. Yet, they are usually installed in a fixed place and rely upon location, orientation, and viewer distance to be optimally perceived. In contrast to such static displays, we see large potential in autonomous, free-floating displays that can change their position to appear at any given point in space and approach the user – we refer to such displays as midair displays. In an emergency situation such as a fire or earthquake, statically deployed displays may become unusable due to power outage or may even be destroyed. In such cases, midair displays could be used to show emergency instructions and guidance to people. Further scenarios include navigation as well as (personalized) group displays delivering information to people doing outdoor sports or tourists exploring a city. With our work we aim to lay the foundation for future research on midair displays, particularly interaction with the display and means to adapt the display based on the user context. The contribution of this paper, in which we explore the concept of free-floating midair displays, is threefold. We first describe scenarios that outline the potential of the approach. Second, we present a functional prototype consisting of a copter drone to which we mounted a remotely controllable iPad. Third, we quantitatively and qualitatively explore the concept by demonstrating our prototype, running a brief reading test, and conducting a series of interviews.
Traditionally, most multimedia applications can be found on personal devices, such as PCs or mobile phones. However, electronic displays are also rapidly permeating public spaces, increasingly augmenting and replacing traditional, static signs. This broadens the domain of multimedia beyond the personal space to also include the public, urban space. Although the vast majority of these displays are still not interactive, there seems to be a clear trend towards networked and interactive displays. While interactive networked displays are promising for deploying multimedia applications and content, many deployments seem to be plagued with much lower usage than expected by their designers . It seems that although designers implement existing knowledge from HCI, like usability and affordances, there are additional issues unique to public displays that hamper their acceptance. The vast majority of interactive public displays proposes a ‘poster’ mental model to their audience, and allow for interaction via touch and / or keys only. This is despite several other mental models and interaction modalities have been proposed. In addition, many displays seem to fail to attract enough attention of passers-by, simply vanishing in the clutter of things in public space that compete for attention. If they capture attention, many displays seem to fail to motivate passers-by to interact, who have other goals in mind. If, finally, the audience has noticed the display and is motivated to interact, interactive displays seem to fail to deal appropriately with the public nature of interaction, where people may avoid interaction in order to maintain their social role and, e.g., not look silly. These requirements can be addressed by displays utilizing broader metaphors than just that of a poster, for example windows, mirrors, or overlays over the physical world.
While many findings from HCI also apply to public displays, simply guaranteeing utility, usability, and likability may not be enough to design public displays. In particular, public displays need to grab the attention of passers-by, motivate passers-by to interact with them, and deal with the issues of interaction in the public. Since most multimedia systems have been designed as personal devices or for use in home environments, these issues have not yet received sufficient attention. For public multimedia systems however, how the audience approaches them is crucial.
In contrast to many other computing technologies, interaction with public displays does not start with the interaction itself. Instead, the audience is initially simply passing by, without any intention for interaction. A model of the different phases of interaction has been presented in  (Figure 1). This model builds on the model presented in , but instead focuses on audience behavior that is readily observable by an outside observer. People pass through different phases, where a threshold must be overcome for people to pass from one phase to the next. For each pair of phases, a conversion rate can be calculated of how many people are observed to pass from one phase to the next, and different displays can be compared by these rates. In the first phase, people are merely passing by. In the second phase, they are looking at the display, or reacting to it, e.g. by smiling or turning their head. Subtle interaction is only available when users can interact with the display through gestures or movement, and occurs, e.g., when they wave a hand to see what effect this causes on the display. Direct interaction occurs when users engage with a display in more depth, often positioning themselves in the center in front of it. People may engage with a display multiple times, either when multiple displays are available or if they walk away and come back after a break. Finally, people can take follow-up actions, like taking a photo of themselves or others in front of the display.
Thresholds exist between the phases, such that for example not all passers-by will look at a display, and not all who look at it will engage in subtle or direct interaction. We propose that the major lever to overcome the first threshold is to raise the attention of passers-by. In order to overcome the second threshold, the curiosity of onlookers should be raised, and in order to overcome the other thresholds, people must be motivated. All of these thresholds may be raised by various consequences of the fact that the interaction happens in the public. Thus, adequate measures must be taken in order to mediate these issues and lower the thresholds.
Human-computer interaction often assumes that the user is aware of the computer in the first place. This is not necessarily the case for public displays. In contrast to other computing technologies public displays are not owned by their primary users (the audience). They are installed in public contexts, where they compete for audience attention with various other stimuli (like other signs, traffic, or people). There has been a discussion on how much attention ubiquitous computers should attract. On one hand, it has been argued that if the environment is filled with ubiquitous computers, they should better remain calm and slide effortlessly between center and periphery of attention . On the other hand, it has been argued that they should engage people more actively in what they do . If public displays fail to attract enough audience attention however, they may not be used at all.
Models of Attention
Generally, the information processing power of the human brain is limited, and at any point in time, more sensory input arrives at the brain than can be processed in detail. Attention denotes the process in which the human brain decides which of the numerous sensory inputs to apply the most computational power to. Visual attention is often modeled with a ‘Spotlight’ metaphor, in which a certain region of the visual field is selected for more detailed processing. This spotlight often coincides with the fovea, but can change in location and diameter. In general, attention is influenced both by bottom-up processes (external stimuli like a suddenly appearing error message) and top-down processes (like the goal of the user looking for a letter in a certain color).
A computational model for bottom-up attention is presented by Itti et al. . The sensory input image is split into representations for colors, intensity, and orientations (in the human brain, specialized neurons exist for these representations). From the representations, various feature maps are computed, which are then normalized and combined into conspicuity maps. These conspicuity maps are combined into a single saliency map. In a winner-take-all process, the most salient region is selected to be attended and inhibited so that the next attended region will be a different one (inhibition of return). This bottom up model only takes into consideration the mere sensory input to the brain. Yet, this process is complemented by top down processes, in which the focus of attention is influenced by the current task, previous knowledge, and cues. An extended model of visual attention combining bottom up and top down processes is presented by Hamker . In particular, internal goals are modeled to influence the attention process.
In addition to these neuro-computational models, applied models were postulated in particular to inform human-computer interaction design. Weiser and Brown  proposed a model of center and periphery of attention, where users could only centrally attend to one thing at a time, but could monitor multiple things simultaneously in the periphery of their attention. In their proposal for Calm Computing, Weiser and Brown suggested that devices should be designed so that they effortlessly slide back and forth between the center and periphery of attention. They suggested that thereby users could attend to more things simultaneously in the periphery of their attention, and take control over them by re-centering them in the center of their attention.
(Not) Attracting Attention
Among the general models that have been proposed of what attracts (visual) attention are behavioral urgency and (Bayesian) surprise. Change blindness can be used in order not to attract attention, and specifically for public displays, the Honeypot effect has been shown to strongly attract attention.
Franconeri and Simons  hypothesize that stimuli that indicate the potential need for immediate action capture attention. It has been found that abrupt appearance of new objects  and certain types of luminance contrast changes  capture attention. In addition, moving (towards the observer) and looming stimuli have been found to capture attention . Since all of these stimulus properties may hint at potential need for immediate action (e.g., an animal approaching), behavioral urgency may be a useful model to predict how much attention a stimulus will attract.
Itti et al.  propose a model of Bayesian surprise for bottom-up visual attention. Surprise measures the difference between posterior and prior beliefs about the world. This is different from Shannon’s concept of information, as instead of relying on objective probabilities it considers only subjective beliefs. They implemented a model of low-level visual attention based on Bayesian surprise to predict eye movement traces of subjects watching videos. The model performs better than other models predicting attention based on high entropy, contrast, novelty, or motion.
Change Blindness is an effect that shows how the attention attracting effect of changes can be avoided. In certain circumstances, people have surprising difficulty to observe apparently obvious major changes in their visual field, e.g., road lines changing from solid to dashed, or a big wall slowly changing the color. Effects that cause change blindness include blanking an image, changing perspective, displaying “mud splashes” while changing the image, changing information slowly, changing information during eye blinks or saccades, or changing information while occluded (e.g., by another person). Intille  proposes to use change blindness to minimize the attention a display attracts while updating content.
The Honeypot effect  has been described by Brignull et al. in the context of the Opinionizer public display that was shown during a party. Whenever a crowd of people had already gathered around the display, this crowd seemed to attract a lot of attention and other people were much more likely to also attend the display. Similar effects have also been observed with the Citywall display  as well as with the Magical Mirrors installation .
Although attention plays a role for any multimedia system, it plays a crucial role for multimedia on public displays, because of the strong competition for audience attention in public spaces.
Traditional paper-based public displays have served as read only media (e.g., posters, billboards). With making displays interactive users need to be motivated to make use of these systems and need to find an incentive for using them. Typically people do not go out in order to look for a public display to use. They rather come across a public display (e.g. while waiting) and become motivated by external factors to use them. The entry of interactive displays into public space is part of a greater tendency: computer usage is spreading into public life and no longer restricted to mere task fulfillment at the workplace. While task-oriented theories simply regard the “how” of an activity but not the “why”, they leave questions concerning underlying motivations unanswered .
Malone presents a distinction between tools and toys to differentiate systems that have an external goal from those who are used for their own sake. Tools are task-oriented. They are designed to achieve goals “that are already present in the external task.” Toys either need to provide a specific goal or enable the user to create their own, emergent goal. A tool should be easy to use – a toy needs to provide a challenge to be motivating to the user .
In spite of its increasing significance in human-computer interaction, motivation has been only an isolated object for investigation. Up until now there exists a significant need for advancement in understanding the motivation behind the user’s activity . Particularly, only little is known about how the design of public displays will invite interaction .
In his Magical Mirrors prototype study Michelis identified the following building blocks for motivating interaction in Public Space . His list of motivating factors is based on the work of Thomas Malone who investigated motivating principles for designing traditional human-computer interaction .
Challenge and Control
The first motivating factors, challenge and control, are based on the notion that the ability to master an interaction, while still being challenged, will increase motivation to carry out this interaction. Flow  has been presented as a state of mind where the user is fully immersed in an activity while feeling energized and focused. Simply said, flow can be achieved in a channel between too little challenge, leading to boredom, and too high challenge, leading to anxiety. In human-computer interaction people strive for an optimal level of competency that allows them to master the challenges presented by the application . The Magical Mirrors study revealed that viewing the consequence of one’s own interactive behavior was the most important element for challenge as a motivating factor. Here, the users were motivated to explore and master the interactive functions of the displays . In addition to this visibility, the presence of an emergent goal to the interaction, in which a distinction between set and emerging goals can be made, also played an important role. Emerging goals arose from the interaction of the individual with the Magical Mirrors displays. Since emerging goals have a strong motivating effect, interactive environments should not only provide a set of goals but also allow the design of one’s own emerging goals . Moreover, the intrinsic motivating challenge of an activity appears to increase if, in interacting with the environment, a clear and direct feedback follows from one’s own behavior and the attaining of the goal . The results of the Magical Mirrors study supports the importance of emergent goals for motivation . In order to turn an interaction into a challenge, the behavioral outcome should however be somewhat uncertain and the end result should remain unknown prior to being conducted. The motivating effect of control is based primarily on recognizing a cause and effect relationship, as well as on powerful effects and the freedom of choice in performing the interaction. For motivation the perception of control is more important than actual control. The subjective sense of control can even have a motivating effect if the person doesn’t possess any actual control .
Curiosity and Exploration
As one of the most important foundations for intrinsically motivating behavior, curiosity is evoked through novel stimuli that present something unclear, incomplete or uncertain. The individual searches around for possible explanations within their environment and their behavior is motivated by a desire to avoid potential insecurities. Curiosity is described as a precursor to explorative behavior, through which people make accessible previously unavailable information about their environment. In accessing previously unavailable information about their environment, people utilize exploration as a means to avoid insecurities . Specific explorations are attempts to reduce the degree of incongruity and therefore the level of stimulation. However, if the stimulation falls below an optimal level, the individual is motivated to make further explorations in order to re-establish the optimum. Curiosity appears to belong to the most important characteristics of intrinsically motivating environments. In order to stimulate curiosity and to influence motivation, the interaction shouldn’t be designed in a way that is either too complex or too trivial. Interactive elements should be novel and surprising, but not incomprehensible. On the basis of his or her prior experiences the user should have initial expectations for how the interaction proceeds, but these should only be partially met . In reactive environments a motivating optimum of complexity is hence also fostered through the interplay of surprising and constructive interaction. The desired behavior for the interaction can be initially activated by surprising elements and maintained through constructive elements. In contrast to perceptible changes that appeal to people’s sensory curiosity, cognitive curiosity relates to anticipated changes. People are motivated in this way to optimize their cognitive structures . To increase motivation through curiosity, it appears at first sufficient to convey to the individual a sense of incompleteness, discrepancy or dissipation and to present through the interaction the chance to abate these sensations. However during the interaction it should be made especially clear how to attain completeness .
Choice as a motivating factor is based on the observation that the motivation for a behavior appears to increase if in the process people can select between alternatives in behavior. The choice between alternatives enables them to control their behavior and to make active decisions regarding behavior for the individual situation. Preferable are those alternatives that best correspond to one’s own preferences and through which not only the behavior itself but also the effects of one’s own behavior can be controlled . With an increase in the number of possible choices, the likelihood increases that a feeling suited to the individual can be found. Even with very trivial choices, or ones, which only exist in the imagination of the individual a motivating effect was clearly proven . Given that the mere presence of choice appears to promote intrinsic motivation, it can therefore be established among other things that the sensation of autonomy and control increases as a result. The greater the number of choices perceived, the stronger one’s own autonomy and control appears to be. On the other hand it was demonstrated that a number of alternatives that exceeds an optimal level  as well as the absence of choice and opportunities for control  lead in various ways to a reduction in intrinsic motivation. In sum, the offer or presence of interaction alternatives can be a strong motivating factor within human-computer interaction and encourages the performing and maintaining of specific interactions . This could also be shown in the Magical Mirrors study .
Fantasy and Metaphor
In general, imaginary settings also appear to have a motivating effect on behavior. In these fantasy settings the constraints of reality are switched off so that one imagines possessing new abilities. In interacting with computers one of the initial user reactions is oftentimes the inspiration of fantasy; the extent to which interactive environments inspire fantasy determines their attractiveness and generates interest in the reception of the interaction . The use of metaphors allows for operationalizing fantasy concepts . By employing metaphors fantasy elements can be directly integrated into human-computer interaction. Since they refer to physical or other systems metaphors can help the user to comprehend the interaction prior to actual use, motivating him or her toward the reception of the interaction . In the Magical Mirrors study, the metaphor used was the distortion mirror known from annual fairs and amusement parks . Since the interaction bears resemblance to already known situations, it can be grasped more easily and utilized more efficiently. By doing so metaphors do not need to reproduce the world realistically, since the abstract, conceptual, or symbolic representation can prove equally effective as life images . The significance of metaphors in human-computer interaction is supported by a series of research projects. If new forms of interaction are linked to familiar traditions, it appears easier for users to carry over already established behaviors.
In contrast to the first motivating factors, collaboration is based on the interaction with other human beings. A condition for its motivating effect is the opportunity that the individual can influence the interaction of other people . This also appears to apply when multiple individuals engage in communal activities via the use of computers. With the linking of computers via the Internet, human-computer interaction was also expanded around a social component . In addition to social interaction over the Internet, the use of interactive public displays increasingly plays an important role in collective interaction located in one place . The motivation to collaborate is increased for example through functionalities that make visible the effects of one’s own behavior. With a view toward cooperation and competition, differences can be ascertained between individualistic, cooperative and competitive orientations. While people with a cooperative orientation also hold the preferences of others important, people with a competitive orientation seek to maximize their own preferences in relationship to the preferences of others. In this case collaboration is especially motivating if individual behavior is recognized by others . If the efforts and effectiveness of one’s own behavior are recognized and valued, people are motivated to repeat this behavior again. If the collaboration is continued, the probability of sustained recognition is even greater. The visibility of one’s own behavior is also one of the most important foundations for recognition . The degree to which collaboration has a motivating effect is influenced by the personal experience of the individual and can strongly vary according to each particular situation. Alongside individual orientation cultural differences also play a role .
Interaction in the public
The third major issue that may hamper interaction of the audience with public displays is that this interaction happens in the public. People may want to give a certain impression towards others, avoid to be annoyed by displays or other people, not give out private information and simply be polite towards others.
The Presentation of Self
In his book “The presentation of self in everyday life” Goffman  reframes social behavior of people as a scene play. Everybody plays a certain role and a major goal of people is to maintain coherence of their role. The public space is divided into front stage and back stage. For example a salesperson may behave very differently in the sales room and in the storage area. A policeman may avoid interaction with a playful public display in order to maintain his role. Similarly, people may avoid gestures, which they believe would contradict their role, like bowing or kneeling.
A public display may well be perceived as a stage, and how people interact with it may depend on their personality traits. While an introverted or shy person may avoid interacting with a public display, to attract attention, an extroverted person may take advantage of the opportunity and use the situation to present a show to the audience.
The Selective Control of Access to the Self
Privacy may be divided into the selective control of access to the self, and control over one’s personal data. Privacy as the selective control of access to the self has been defined by Altman  as a dialectic and dynamic boundary regulation process. For example, people may not want to be approached offensively by a public display (which may be perceived as spam). Similarly, they may be afraid of standing in the public attention and possibly being approached by others when they interact with a public display.
The Control over one’s Personal Data
Law in many countries guarantees privacy as the control over one’s personal data. Langheinrich  explains the guiding principles of anonymity, access, locality, adequate security, notice and disclosure, and choice. E.g., anonymity as ‘the state of being not identifiable within a set of subjects’ should be guaranteed wherever possible. Hence, a display system may allow users to carry a personal RFID chip that stores a profile of their interests. When a single person passes by a display and some personalized information pops up, this information may be associated with the person by any bystander. If, however, multiple persons are in the vicinity of the display at the same time, it may not be obvious for a bystander which person to associate this information with.
Finally, people may simply want to be polite to other people in the environment. For example, a certain public display may require them to stand in a thoroughfare when they are interacting. Probably, at least after a couple of people bumped into them, they will cancel their interaction in order not to stand in the way of others.
The Public Nature of the Space
Public space is characterized by not being controlled by individuals or small groups. It serves to connect private spaces as well as a multitude of overlapping functional and symbolic uses. This means in particular that the operators of a public display cannot control the space around it. For example, if a group of people lingers in front of the display and prevents its use, there is usually nothing a display operator can do about it.
Furthermore, the multitude of uses of public space means that most of the passers-by have something else to do when they pass by. They may be on their way to or back from work, go shopping, or visit someone. If the goal is leisure related, e.g. just strolling around, probability of interaction may be much higher.
Outdoor deployment of public displays also means that there may be physical constraints impossible to control. The sun reflecting on the display may make the display unrecognizable, and cold temperatures may make it impossible for passers-by to stay around the display for longer or take their gloves off to touch.