Christian Benoît graduated from the National Polytechnic Institute in Grenoble and a qualified electronics engineer. He was a warm-hearted person with a strong character, a hard worker with a fascination for knowledge, a love of lively discussion and a profound concern for social issues. In Speech Communication, he found a research field that allowed him to use his scientific abilities to explore the areas that aroused his keenest interest.
After completing his postgraduate studies and thesis, he consistently sought to combine fundamental and multidisciplinary research aiming to broaden the field of knowledge on the processes involved in the production and perception of speech, with applied research for speech technology. The thesis he prepared at the ICP and presented in 1985 to the National Polytechnic Institute in Grenoble involved multidisciplinary research on the definition and characterisation of events which, through the acoustic speech signal, provide indications as to the temporal coordination of the speech organs: mandibles, tongue, velum and speech source.
With Christian Abry and Louis-Jean Boë, he was involved early on in national and international debates on how to label the databases of sounds that are needed for the massive training requirements of automatic speech recognition systems. He consistently argued the point, now generally accepted, that a temporal speech signal cannot be “sliced up” into a string of elementary units of sound. At the time, he was developing a signal editor, EDISIG, which was used for many years by researchers in our community.
In 1985, he left Grenoble to work on speech synthesis with the CNET in Lannion. With Michel Cartier and Françoise Émerard, he specialised in speech systems evaluation and, encouraged by Christel Sorin, soon became heavily involved in collaborative research at EU level, particularly with the ESPRIT SAM project. He worked closely with researchers in the UK, Germany and Netherlands, creating the basis for intensive collaborative work at international level, which would forge his reputation across the 5 continents as one of the top French researchers in the field of audio-visual speech. He contributed to the development of evaluation protocols, proposing “10 phonetically balanced and semantically unpredictable phrases” to serve as a standard corpus for evaluating French language synthesis. He was always delving further into theoretical aspects and constantly sought to link up linguistic and phonetic aspects with current technological issues, questioning the influence that certain socio-linguistic parameters might have on the intelligibility of synthesised signals. Comparing experimental results from tests on the intelligibility of French language synthesis conducted with French and Ivorian subjects in Grenoble and Abidjan, he suggested the existence of an index of quantifiable “perceived linguistic complexity”, which would need to be taken into account when designing and interpreting such tests.
In June 1988, Christian Benoît joined the CNRS as a research fellow, taking up a permanent post with the ICP. Pursuing the numerous lines of enquiry he had developed with the CNET in Lannion, he continued to work for nearly two years on synthesis evaluation, continually developing the scope of his collaborative work at EU level. In 1990 he started what was to be the major work of his career as a researcher: the study of the role of the visual component in speech communication and its use in speech synthesis and recognition systems. His goals were ambitious: (1) to quantify the contribution of the visual component in perceptions of speech in a noisy environment; (2) to measure and quantify the resulting information; (3) to develop speech synthesis and recognition systems which generate or incorporate that information. With his students, Tayeb Mohamadi, Oscar Angola, Thierry Guiard-Marigny, Ali Adjoudani, Bertrand Legoff and Lionel Réveret, he built up a research team that gained international recognition in the field. With Tahar Lallouache and Christian Abry, he contributed to the development of a system to measure labial geometry. The subject wore blue lipstick, so that the system inevitably came to be known as the “French blue lips system”! Using systematic corpus building methodology, he identified some twenty labial forms that are characteristic of French, the French visemes, a visual equivalent of phonemes, which he used as a basis to develop the first system capable of synthesising faces speaking from a text and based on a display of key images synchronised with the acoustic signal. Coming up against the limitations of this approach, he decided to redirect his research towards parametric modelling of labial geometry, producing a 3D model of the lips in 1993. This was incorporated into a full model of the face, in close collaboration with Dominique Massaro and Michael Cohen of the University of California in Santa Cruz, where he started work in 1993, on secondment from the CNRS. There, he developed a system for audio-visual synthesis using a text in real time, where lip movements are controlled by rules of co-articulation and reflect the reciprocal influence of sounds in the same speech sequence. The development of the synthesiser was a turning point in Christian Benoît’s research. Besides its technological applications, the system brought him contracts of considerable interest, especially with the National Cinema Centre, the ACCT, the AMIBE project (GDR-PRC Man-Machine Communication) and the EU MIAMI project. The synthesiser became the central tool in his work on quantitative evaluation of the contribution of the visual component of speech in a noisy environment, since it enabled him to control the amount and type of visual information transmitted to the listener. Thanks to a system of audio-visual word recognition developed in parallel, he was able to test different types of architecture to integrate acoustic and visual information and compare it with speech performance observed in human subjects. During the project, in order to collect lip information from a speaker in a realistic communication situation, Christian Benoît’s team, working with Eric Vatikiotis-Bateson from the ATR laboratory in Kyoto, developed a headset fitted with a micro-camera that was placed on the speaker’s head to capture the image of the lips and measure the parameters characterising lip geometry in real time. At the same time, another project was launched, again with the ATR, to develop a labiometric system that did not require the speaker to wear make-up. This early research was taken up at the ICP by Gérard Bailly, opening up perspectives of considerable interest in the area of video-telephony, in which the telephone network could be used not only to transmit acoustic speech signals but also, with a reasonable bit-rate increase, lip and facial parameters that would animate a speaking clone at the receiving end. Other useful avenues might include developments in telephone communication for people with impaired hearing. This was the direction in which Christian Benoît was working just before he died, in an attempt to develop a prototype speech synthesis system based on a method of communication which is increasingly used by deaf people, known as Cued Speech. The method associates standard lip-reading with specific hand and finger positions used as codes for different sounds. At the ICP, Denis Beautemps is now taking this, Christian’s last project, further forward.
The research has been disseminated in over 75 publications, as well as in numerous video documents, including a film entitled Innovating Tomorrow, produced by the EU in 1997. As a researcher of international stature, Christian Benoît was invited to conferences all over the world, from Monte-Carlo to Seoul and from London to Philadelphia or Kyoto. He was also the instigator and organiser of international workshops that are now some of the most outstanding events in the field of speech synthesis and audiovisual speech processing. He was a member of the editorial board for the journal Speech Communication, and also of the scientific committees of numerous international symposia. Christian Benoît was among the most ardent supporters of the French-speaking community. From 1986 to 1989, he was an active member of the Francophone Group on Speech Communication (Groupe Francophone de la Communication Parlée) at the SFA, of which he was the Secretary and for which he created a liaison bulletin (humorously entitled Tchatch'Comm). He acted as representative for the French-speaking community within the European Speech Communication Association, for which he was also the secretary and treasurer. Christian Benoît, a brilliant scientist and a witty, generous man with a wonderful and often provocative sense of humour, gave out a very positive image of French culture. Many of our foreign colleagues still smile on remembering Christian Benoît’s reply when asked, at the 1992 Banff conference in Canada, why he was working on the sounds [i], [a] and [y]. He hesitated a moment, then said "Because I’m French".
To us, the "Christian Benoît" award is a way of perpetuating the memory of a scientist whose work, throughout his outstanding career, will continue to open up technological and theoretical perspectives for years to come, and of a generous, likeable man who was keen to support and promote the activities of young researchers. A way in which we can pursue some of the proliferating activity of Christian Benoît, who died a cruel and untimely death on the 26th April 1998, at the age of 41.