Making internet graphics accessible through rich audio and touch

Welcome! This project is carried out by McGill University's Shared Reality Lab (SRL), in strategic partnership with Gateway Navigation CCC Ltd and the Canadian Council of the Blind. The project is funded by Innovation Science Economic Development Canada through the Assistive Technology Program. The motivation for this project is to improve the access to internet graphics for people who are blind or visually impaired.

Audio Demo

You can listen to a demo of the audio spatialization here.

User Survey

Our survey for potential users is available in English and en Français.

Overview

On the internet, graphic material such as maps, photographs, and charts that represent numerical information, are clear and straightforward to those who can see it. For people with visual impairments, this is not the case. Rendering of graphical information is often limited to manually generated alt-text HTML labels, often abridged, and lacking in richness. This represents a better-than-nothing solution, but remains woefully inadequate. Artificial intelligence (AI) technology can improve the situation, but existing solutions are non-interactive, and provide a minimal summary at best, without offering a cognitive understanding of the content, such as points of interest within a map, or the relationship between elements of a schematic diagram. So, the essential information described by the graphic frequently remains inaccessible.

Our approach is to use rich audio (sonification) together with the sense of touch (haptics) to provide a faster and more nuanced experience of graphics on the web. For example, by using spatial audio, where the user experiences the sound moving around them through their headphones, information about the spatial relationships between various objects in the scene can be quickly conveyed without reading long descriptions. In addition, rather than only passive experiences of listening to audio, we allow the user to actively explore a photograph either by pointing to different portions and hearing about its content or nuance, or use a custom haptic device to literally feel aspects like texture or regions. This will permit interpretation of maps, drawings, diagrams, and photographs, in which the visual experience is replaced with multimodal sensory feedback, rendered in a manner that helps overcome access barriers for users who are blind, deaf-blind, or visually impaired.

Our technology is designed to be as freely available as possible, as well as extensible so that artists, technologists, or even companies can produce new experiences for specific graphical content that they know how to render. If someone has a special way of rendering cat photos, they do not have to reinvent the wheel, but can create a module that focuses on their specific audio and haptic rendering, and plug it into our overall system.

A woman sitting at a computer that is displaying a web page with six images. She has a cup of coffee to her left and a phone to her right.
A visually impaired man wearing a sweater and headphones, sitting in front of a computer in a library, reading a braille book.

User Experience

After installing an extension into their web browser, a new menu item lets the user choose a specific graphic for which they would like to have a richer experience. Unlike extensions meant to simply put a band-aid over poor accessibility design, interactive, multimodal tools available in the extension let even a well-designed website provide an experience that was not possible before. Multiple renderings, along with confidence scores, are automatically downloaded so the user can choose the one they prefer to interact with, or even explore multiple options. For example, a graphic of a chart may allow the user to first sit back and experience a 3D-sound representation of the overall trend, and then choose a second, interactive version that lets them explore the chart with a specialized tactile device that lets them feel the chart itself.


Research and Implementation Areas

We have four main project axes:

  1. Machine Learning: Machine learning models extract useful information from the graphic, which could be the texture, colors, objects, people, chart values, etc.
  2. Audio Rendering: The information is mapped to rich audio and haptic renderings. We leverage text-to-speech (TTS) technologies, as well as audio spatialization techniques and audio effect generation, to produce a soundscape.
  3. Haptic and Multimodal Rendering: When vibration motors or orther haptic hardware is available, tactile information is also provided to enhance the audio cues. This allows the use of both hearing and feeling at the same time, with information conveyed simultaneously through both channels.
  4. Extensible architecture: Within the one year of this project, we know that we will not be able to do justice to all possible graphical content types. A key aspect of the project is to make sure that our designs and code are as freely accessible as possible, and extensible by others so new approaches and renderings can be easily incorporated without having to reinvent the wheel.

​ ​

Deliverables and Timeline

The project runs for one year, from April, 2021 through March, 2022. The two major technical deliverables are an internet browser extension, plus a server running handlers that take the user's chosen web graphic and create appropriate audio and haptic renderings, which are then returned to the browser for the user to experience using the hardware they have available. Major milestones:

Milestone End Major Deliverables
2021 Jun. Internal demonstration of Chrome browser extension sending request to server and receiving an audio rendering that can be selected and played by the user. Testing with NVDA screen reader. Very limited rendering capabilities, mostly to demonstrate technical architecture. Handlers focused on photographic images.
2021 Sep. Public alpha release: Extension is made publicly available on a limited basis, and connects to McGill server to render a limited selection of web graphics. Source code and release binaries for browser extension and server components public on GitHub. Additional handlers focused on map graphics.
2021 Dec. Public beta: External users encouraged to use extension with McGill server on wide variety of sites, and report issues. Testing with additional screen readers. Early extension support for additional chromium-based browsers. Move to more reliable and secure server infrastructure. Additional handlers for chart/graph graphics.
2022 Mar. V1.0 release: Stable release of chromium-based browser extension (with advanced haptic capability), plus Firefox (without advanced haptic capability). Server maintained for reliable use.


Contact Us

For any information related to project you can contact atp@cim.mcgill.ca.

Internal / Alpha

For testing and pre-release content, you will need a password.