2D refreshable tactile displays for automatic audio-tactile graphics

ESR13

Objectives

Translating graphical information to a tactile display is a difficult process requiring vast expert knowledge due to differences in visual and tactile perception and limitation of devices This project will first focus on developing guidelines and rules on how to map different graphical content types to tactile domain, considering insight into tactile processing. To analyze and extract the most relevant information from graphics, state of the art image processing techniques will then be used, such as automatic classification of selected graph types using state of the art machine learning techniques.

Expected Results

Automatic generation of audio-tactile graphics with state-of-the-art machine learning techniques.

Placement

Host institution: Karlsruhe Institute of Technology

Enrolments (in Doctoral degree): Karlsruhe Institute of Technology

Supervisors

Rainer Stiefelhagen, Klaus-Peter Hars

Presentation of ESR13

PhD defense: To be announced

My name is Omar Moured, I received my High-honor B.Eng. and M.Sc degrees in Electrical and Electronics Engineering from Middle East Technical University. During my MSc, I specialized in the deep learning area. More specifically, multiple objects tracking. At the same time, I was working as AI researcher for 3 years. I am currently working on document analyzing (Natural Language Processing), segmentation and detection (Computer vision) within the INTUITIVE project.

Abstract of PhD goals

A vast array of document visualizations, such as charts, whether digital or printed, enrich our ability to digest, comprehend, and interact with data. However, despite their usefulness, a significant disparity exists in the accessibility aspects for the blind and visually impaired (BVI) community, highlighting a clear gap in information equity. The majority of these visuals are shared as raster images, which does not hold any underlying metadata. Moreover, the challenge extends from the production to their interpretation; while decent vision-language (VL) models that could potentially bridge this gap exist, they are not sufficiently investigated to address accessibility concerns. Hence, in this work, we started by analyzing document layouts and extracting metadata from visualizations, enabling BVI individuals to interact tactically with our innovative solutions. We then explored the generation of high-quality alternative text by establishing new benchmarks and introducing novel pretraining and UI methodologies tailored to this specific challenge. These methods have allowed the VL models to accurately recognize complex chart categories, such as panel charts, and produce long-context summaries. Our study then turns to address the diverse nature of visualizations, like captured charts, undertaking robustness benchmarks. For each of our contributions, we conducted extensive user studies involving both sighted and BVI participants, paving the way for the development of accessibility-enabled AI models.

Results

Deliverable 5.1 Taxonomy of image processing algorithms in the context of categorizing tactile graphics
Systematic analysis of ‘translating and simplifying visual to tactile graphics’ by a literature review

Deliverable 5.2 Design guidelines for representing audio-tactile graphics on a two-dimensional display  Employ deep learning to categorise certain types of graphics. Development of a prototype to translate visual to tactile information. Evaluate the system with blind users on a two-dimensional tactile display. Investigate the combination of audio-output with tactile output.

Conference Article
Moured O.; Baumgarten-Egemole, M.; Roitberg, A.; Muller, K.; Schwarz, T.; Stiefelhagen, R.
Chart4Blind: An Intelligent Interface for Chart Accessibility Conversion
IUI, 2024
DOI: https://doi.org/10.1145/3640543.3645175

Conference Article
Moured, O.; Zhang, J.; Roitberg, A.; Schwarz, T.; Stiefelhagen, R.
Line Graphics Digitization: A Step Towards Full Automation
International Conference on Document Analysis and Recognition, ICDAR, 2023
DOI: https://doi.org/10.48550/arXiv.2307.02065

Conference Article
Moured, O.; Alzalabny, S.; Schwarz, T.; Rapp, B.; Stiefelhagen, R.
Accessible Document Layout: An Interface for 2D Tactile Displays
International Conference on PErvasive Technologies Related to Assistive Environments, PETRA, 2023
DOI: https://doi.org/10.1145/3594806.3594811

Conference Article
Ramôa, G.; Moured, O.; Schwarz, T.; Müller, K.; Stiefelhagen, R.
Enabling People with Blindness to Distinguish Lines of Mathematical Charts with Audio-Tactile Graphic Readers
PErvasive Technologies Related to Assistive Environments, PETRA, 2023
DOI: https://doi.org/10.1145/3594806.3594818

Conference Article
Ramôa, G.; Moured, O.; Schwarz, T.; Müller, K.; Stiefelhagen, R.
Display and Use of Station Floor Plans on 2D Pin Matrix Displays for Blind and Visually Impaired People
PErvasive Technologies Related to Assistive Environments, PETRA, 2023
DOI: https://doi.org/10.1145/3594806