Abstract

Users tend to position themselves in front of interactive public displays in such a way as to best perceive its content. Currently, this sweet spot is implicitly defined by display properties, content, the input modality, as well as space constraints in front of the display. We present GravitySpot – an approach that makes sweet spots flexible by actively guiding users to arbitrary target positions in front of displays using visual cues. Such guidance is beneficial, for example, if a particular input technology only works at a specific distance or if users should be guided towards a non-crowded area of a large display. In two controlled lab studies (n=29) we evaluate different visual cues based on color, shape, and motion, as well as position-to-cue mapping functions. We show that both the visual cues and mapping functions allow for fine-grained control over positioning speed and accuracy. Findings are complemented by observations from a 3-month real-world deployment.

F. Alt, A. Bulling, G. Gravanis, and D. Buschek, “GravitySpot: Guiding Users in Front of Public Displays Using On-Screen Visual Cues,” in Proceedings of the 28th acm symposium on user interface software and technology, New York, NY, USA, 2015.
[PDF]

Introduction

Displays have become ubiquitous in public spaces, such as shopping malls or transit areas in airports and train stations. At the same time, researchers and practitioners aim to in- crease suer uptake by providing interactive and engaging experiences. This trend is further supported by sensing technologies (cameras, depth sensors, etc.) becoming available for easy and low cost integration with such displays. Sensing technology, however, has specific requirements regarding the optimal operating distance, thereby constraining the possible interaction space. For example, while touch sensors require the user to come in close proximity to the display, gestures- based interaction using Kinect allows users to position them- selves freely between 0.5 m–4.0 m in front of the display. Stationary eye trackers require the user’s head to be inside the tracking box – about 30 cm × 30 cm – at a distance of 70 cm in front of the screen. Hence, interactive displays face the challenge of how to encourage users to position themselves in a target location within the interaction space. Similar challenges arise in situation where public displays are deployed opportunistically. Such deployments are often con- strained by the size and layout of the physical space surround- ing the intended target location. This results in displays be- ing positioned in non-optimal spots where, for example, users cannot easily stop without blocking the way of other passers- by. This phenomenon has been coined the butt-brush effect. In such cases, it would often be desirable to guide users towards less crowded areas, particularly in front of large dis- plays. As a solution to these challenges, deployments aim to either anticipate the default sweet spot, i.e. the area where users are most likely to stop as they approach the display, or they try to actively promote the optimal interaction area by means of explicit hints on the floor (footprints), next to the display (text), or on the display itself (text or silhouette). We present GravitySpot, a novel approach that modifies the visual appearance of the display content based on user position. We leverage findings from human vision research that show that humans can very quickly process certain visual cues, such as color, motion, and shape. By showing the un- modified content only from a specific location in front of the display, users are made to anticipate this so-called sweet spot. GravitySpot advances the state of the art in several ways.

    1. It allows for changing the sweet spot in an adaptive and dynamic manner, for example based on the current number and position of people in front of the display.
    2. It does not require attention switches as cues are not de- coupled from the actual screen content, in contrast to, for example, hints displayed on the floor or next to the screen.
    3. It is more robust against occlusions, since by showing the cue on the screen, users can simply re-position themselves to perceive the cue, compared to cases, where other users are standing on a cue shown statically on the floor.
    4. It neither requires space nor time-multiplexing between cue and content nor any overlays (e.g., silhouette) since it integrates smoothly with the actual content.
    5. It requires minimal hardware. Any sensor that allows the user position to be determined can be used (e.g., Kinect).

We compare different visual cues with regard to positioning accuracy and speed and show how to improve them by adapt- ing the mapping between user position and visual cue. We conduct two controlled lab studies (n=29). Results suggest a trade-off between accuracy and speed depending on the cue. In a second study we demonstrate that by altering the map- ping between user position and cue intensity, this trade-off can be overcome and accuracy (up to +51%) and speed (up to +57%) be enhanced. This is valuable for designers, since it allows cues to be chosen based on the content shown on the display (for example, readability of text can be preserved by choosing appropriate cues). The studies are complemented with a real-world deployment. We show that also in a real- world situation, where users are unaware of how the cues work, they can quickly and accurately position themselves. Our contribution is threefold. First, we introduce the idea of flexible sweet spots and propose a set of visual cues to guide users to arbitrary sweet spots in front of a display. Second, we present two controlled lab experiments to study the efficiency of the proposed cues and the impact of different map- ping functions. Third, we present an in-the-wild deployment, demonstrating how to integrate the approach with an interactive application. We found that the approach is easily understandable to users with its efficiency being similar to the lab.

Guiding Users Using Visual Cues

Findings in cognitive psychology suggest that the human visual system can rapidly process a visual scene to extract low level features, such as color, shape, orientation, or movement, without the need to cognitively focus on them [27]. We aim to leverage this ability by mapping a user’s current position to visual cues shown on the display.

Psychological Foundations

Our work exploits effects of attentive and pre-attentive visual perception, as introduced by Neisser [20] and confirmed by Treisman [27]. Neisser describes the process of visual perception as a two-step process. First, simple features of a scene are perceived, such as separating textures or the distinction between an object and its background (figure-ground perception). This stage is pre-attentive and characterized through parallel processing. It results in a set of features not yet associated with specific objects [17]. Second, users associate features to scenes, directing attention serially towards the different scene objects. There is no consent in research literature as to which features are perceived pre-attentively [11]. There is strong evidence that the list of tasks working pre-attentively presented by Neisser is not conclusive. Hence, also the distinction between pre-attentive and non pre-attentive features is rather blurry. Research that aims to make this distinction includes the work from Wolfe [34]. He presents a list of 28 features, separated into likely, possible, and unlikely candidates for preattentive perception. As Wolfe noted himself, for many cases there is only little evidence since results stem from single publications – so the list may have to be extended in the future. We base our research on the work of Nothdurft on the role of visual features during pre-attentive visual perception [21]. Nothdurft classified pre-attentive features into three categories: color, shape, and motion.

Selection of Visual Cues

We selected five visual cues according to Northdurft’s categories (see Figure 2). According to Wolfe, all of these cues are likely to be perceived pre-attentively.

Color

Public displays often contain monochrome content, such as text. Hence, we opted for brightness and contrast as color cues, since these have a smaller impact on readability. To also consider features that affect the color information of multicolor content, we included saturation.

Shape

We selected shape features that alter the form of content and can be applied to content post-hoc. In particular we chose pixelation and distortion. While pixelation simply decreases the resolution of the content, distortion applies a non-affine mapping function. Both cues have a strong impact on readability. Based on the font size, content becomes only readable near the sweet spot (10–20 cm).

Motion

Finally, as a motion cue, we opted for jitter that moves content with a frequency of 5 Hz along the screen axes. Based on the distance of the user from the sweet spot, the effect intensity is increased by adapting the motion amplitude.

Baseline

We compare these cues with two baselines from prior work. We opted for on-screen cues, since they were shown to work best in public settings [35]. The first cue is a compass-like arrow on the display that points to the direction in which users should move to reach the sweet spot. The arrow is slightly tilted in z-direction, indicating that “up” means moving forward. The second cue is a simple text telling users whether they should move ‘forward’, ‘backward’, ‘left’, or ‘right’.

Apparatus

To evaluate how well users could be guided using visual cues we implemented the GravitySpot prototype. The C# prototype consists of (1) a tracking module that measures users’ 2D position in realtime using Kinect and (2) a rendering module that allows any of the aforementioned visual cues to be applied to the display content. The intensity of the cue depends on the current distance of the user to the target position. We implemented different mappings (Figure 3), where the minimum is defined by the target spot and the maximum by the largest distance at which the user can still be sensed.

Sensor Calibration

We use the Kinect skeleton data to calculate the user position (x- and z-coordinate) in the 2D space in front of the display. To cover as much space as possible, we support the use of multiple Kinects – for example, with two Kinects a visual angle of up to 90 can be covered. We implemented a calibration tool that allows position information obtained from multiple Kinect sensors to be transformed into an x/z user position. For calibration we use triangulation based on 2 reference points. To be able to change the sweet spot during runtime, our prototype allows a rectangular area to be defined within the field of view of the Kinect sensors. Arbitrary locations within this area can then be selected as sweet spots.

Mapping Between Position and Cue

During first tests, we noticed that the visual cues were subject to a trade-off between speed and accuracy of guiding a user to the sweet spot. To investigate this phenomenon in more detail we decided to implement different position-to-cue mapping functions. The functions were designed in such a way as to improve the visual cues so that the users find the sweet spot faster and/or more precisely. We chose four mapping functions: Linear, SlowStart, QuickStart and SCurve.

Linear mapping function. The linear mapping function was chosen as a baseline. The Euclidean distance x of the user is linearly mapped to the intensity of the visual cue.

Slow start mapping function. We use a root function for the slow start mapping. At larger distances to the sweet spot, changes in user position cause only subtle visual changes. Changes become more obvious closer to the sweet spot. Thus, we expect that far away from the sweet spot, users need more time to figure out the direction in which to move, but then hit the sweet spot more precisely due to the increased change.

Quick start mapping function. For the quick start mapping function, the most prominent changes to the visual cue happen at great distance to the sweet spot. We expect this function to guide the user early into the direction of the sweet spot and to improve task completion time. Since changes in position at smaller distances to the sweet spot only cause minor changes in the visual cue, we expect the accuracy to be low.

S-shaped mapping function. The s-shaped mapping function is a combination of the quick start and slow start mapping functions. We expect it to provide clearly visible changes at great distances and accurate feedback when the user draws near to the sweet spot. In the center span this function keeps a steady increase and does not fall flat. As a result, we avoid areas where the user receives no feedback on position changes. We expect this function to provide a good combination of speed and accuracy, while outperforming the linear function.

Implications for Design

Results from our studies show that designers can guide users with different cues, and that they should consider mapping functions to tune these cues with respect to speed and accuracy. Apart from accuracy and speed, cues should be considered regarding readability. For textual content, color cues seem more appropriate than shapechanging ones. In contrast, the latter seem to not only attract more attention (usually desirable for any public display app) but also to be more entertaining and engaging for users, making these cues particularly suitable for playful applications.