Chapter 7 | Respeaking: Subtitling through Speech Recognition


Back to Publications   Back to The Routledge Handbook of Audiovisual Translation




Publication date: 13 September 2018
Copyright date 2019
Hardback ISBN: 9781138859524
E-book ISBN 9781315717166
You can order this volume on the Routledge website






Respeaking may be defined as the production of subtitles by means of speech recognition. Respeakers listen to the original sound of a live programme or event and respeak it, including punctuation marks and some specific features for the deaf and hard-of-hearing audience, to a speech recognition software, which turns the recognized utterances into subtitles displayed on the screen with the shortest possible delay. Although respeakers are usually encouraged to repeat the original soundtrack in order to produce verbatim subtitles, the high speech rates of some speakers and the need to dictate punctuation marks and abide by standard viewers’ reading rates means that respeakers often end up paraphrasing rather than repeating or shadowing the original soundtrack.

Originated in the US as a way to improve the efficiency of court reporting, respeaking was later on introduced in Europe as a means to provide live subtitles on TV, where it has consolidated over alternative methods such as stenography. Lately, the use of respeaking has expanded to other contexts such as pre-recorded subtitling for TV, live public events (conferences, talks, religious ceremonies, university lectures, school classes, etc.), business meetings and telephone conversations.

Although until recently training in respeaking was only provided by subtitling companies, some universities have developed respeaking courses, which normally focus on elements from interpreting, SDH and aspects that are specific to respeaking, mostly related to the use of speech recognition software. As far as research is concerned, it is relatively scarce, especially if compared to research in other related fields such as AVT and accessibility. Academic work on respeaking has so far focused on the process of respeaking, the training of respeakers (comparing, for example, the performance of interpreters and subtitlers) and the analysis and reception of respoken subtitles by the viewers. One of the most debated topics of discussion and research is quality assessment. This has prompted the introduction of models such as the NER model, which is currently used by universities, regulators and subtitling companies and provides a bridge between academia and the industry.

The future of respeaking is closely linked to the development of speech recognition technology, whether for the use of respeaking in interlingual contexts or for other uses, such as transcription in the film and medical industries. Likewise, the rapid development of speaker-independent speech recognition technology (which, unlike speaker-dependent speech recognition, turns the original audio of a programme into subtitles without the need for a respeaker in between) is bringing about new approaches to live subtitling. In this context, live subtitlers may become editors of automatically recognized subtitles that they correct and cue live or may disappear altogether if broadcasters decide to show live subtitles produced by automatic speech recognition without any editing or human intervention. Research on quality will thus be essential to ensure that these automatic subtitles meet the standards required by the viewers.



Pablo Romero-Fresco is a Ramón y Cajal grant holder at Universidade de Vigo (Spain) and Honorary Professor of Translation and Filmmaking at the University of Roehampton (UK). He is the author of the books Subtitling through Speech Recognition: Respeaking (Routledge) and Accessible Filmmaking (Routledge) and leader of the research centre GALMA (Galician Observatory for Media Accessibility), for which he is coordinating the EU-funded projects Media Accessibility Platform and ILSA (Interlingual Live Subtitling for Access).



The Routledge Handbook of Audiovisual Translation Studies provides an authoritative and straightforward overview of the field through thirty-two specially commissioned chapters written by leading scholars in the field.

This state-of-the-art reference work is divided in four sections. The first part focuses on established and emerging audiovisual translation modalities, explores the changing contexts in which they have been and continue to be used, and examine how cultural and technological changes are directing their future trajectories. The second part explores the interface between audiovisual translation and a range of theoretical models that have proved particularly productive in steering research in audiovisual translation studies. Some of these models are associated with disciplines that have long intersected with audiovisual translation, while others are drawn from areas of knowledge that are only now beginning to make their presence felt in the audiovisual translation literature. The third part surveys a range of methodological approaches supporting traditional and innovative ways of interrogating audiovisual translation data. The final part addresses a range of themes pertaining to the place of audiovisual translation in society: these include the institutionalization, academization and technologization of audiovisual translation, as well as its role as a force for social change, both within and outside the industry. This Handbook gives audiovisual translation studies the voice it needs to make its presence felt within the Humanities research landscape.


Back to Publications   Back to The Routledge Handbook of Audiovisual Translation