[Submitted on 22 Feb 2022]
View a PDF of the paper titled Hidden Bawls, Whispers, and Yelps: Can Text Be Made to Sound More than Just Its Words?, by Calúa de Lacerda Pataca and Paula Dornhofer Paro Costa
view pdf
abstract:Whether a word is shouted, whispered, or screamed, captions will generally present it that way. If they are your only way of accessing what is being said, the subjective nuances expressed in the voice will be lost. Since so much communication is driven by these nuances, we believe that if captions are to be used as accurate representations of speech, embedding visual representations of paraphrased properties into captions can help readers use them to better understand the speech beyond just the textual content. This paper presents a model for processing vocal prosody (its intensity, pitch and duration) and mapping it to the visual dimensions of typography (respectively, font-weight, baseline shift and letter-spacing), creating a visual representation of these lost vocal subtleties that can be directly embedded into the typographic form of text. An assessment was conducted where participants were exposed to this speech-modulated typology and asked to match it to its original audio presented among similar options. Participants (n=117) were able to correctly identify the original audio with an average accuracy of 65%, with no significant difference when they were shown the modulation as animated or static text. Additionally, participants’ observations revealed that their mental models of speech-gathering typology varied widely.
Submission History
From: Kailua de Lacerda Pataca [view email]
[v1]
Tuesday, 22 February 2022 02:35:25 UTC (1,948 KB)