• No results found

The Typophone : a talking typewriter : opportunities and limitations of speech technology for the handicapped

N/A
N/A
Protected

Academic year: 2021

Share "The Typophone : a talking typewriter : opportunities and limitations of speech technology for the handicapped"

Copied!
197
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

The Typophone : a talking typewriter : opportunities and

limitations of speech technology for the handicapped

Citation for published version (APA):

Kroon, J. N. (1986). The Typophone : a talking typewriter : opportunities and limitations of speech technology for

the handicapped. Technische Universiteit Eindhoven. https://doi.org/10.6100/IR251691

DOI:

10.6100/IR251691

Document status and date:

Published: 01/01/1986

Document Version:

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be

important differences between the submitted version and the official published version of record. People

interested in the research are advised to contact the author for the final version of the publication, or visit the

DOI to the publisher's website.

• The final author version and the galley proof are versions of the publication after peer review.

• The final published version features the final layout of the paper including the volume, issue and page

numbers.

Link to publication

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:

www.tue.nl/taverne Take down policy

If you believe that this document breaches copyright please contact us at: openaccess@tue.nl

(2)

A TALKING TYPEWRITER

Opportunities and Iirnitations of

speech technology for the handicapped

~

: I·· .... ... ... ... .... . ... .

: :: I··· .. ...

0... . o .. LL 100

~~====================================~

6.~14

' 0 . . .

!I.,:...

1/11I1If" ... ,1/11111" ...

.

.. " ... .

3 ". • ••••

.

t.· •..

.

.

.

N

2

2 .::.t.

.... · . t /L til I'" ... u Ut,. ... .. ,ufHfhU. f"" tu' .... ot

• ,,"'. '11

I', .... ",",

1111' ,11'''' .. ,' ,,111,111. tlt tut t t , t t . ~ ol • .. u ... . . .t·uH .... . LLI

··t... .

. ...

ttt ••••.•.• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 0_ •• _ • • t •••••••• o~ ____ ~ ____ ~~~~~~~~~~~~~~ • • • • • • • • • • • • t • • • • • •

...

.

.

0.0 0.8 LB 2.4 3.2 4.0 t (s)

J.N. KROON

(3)
(4)

. The word TYPOPHONE spelled out by the Typophone:

The figure shows a graphical presentation of the speech data

of the letters T-Y-P-O-P-H-O-N-E, economically encoded

in 1 kbitJsec, th us a total of about 3.6 kbit

,

from which

synthetic speech may be generated by a MEA8000

(5)

Opportunities and limitations of

speech technology for the handicapped

DE TYPOFOON: EEN SPREKENDE SCHRIJFMACHINE

De mogelijkheden en beperkingen van

spraaktechnologie voor gehandikapten

PROEFSCHRIFf

ter verkrijging van de graad van doctor aan

de Technische Universiteit Eindhoven, op gezag

van de rector magnificus, prof. dr. F.N. Hooge,

voor een commissie aangewezen door het college

van dekanen in het openbaar te verdedigen op

dinsdag 21 oktober 1986 te 16.00 uur

door

JOHANNES NORBERTUS KROON

geboren te Den Haag

(6)

en

(7)
(8)
(9)

1. PROLOGUE 1

1.1. Introduction 1

1.2. Scope of the study 3

1.3. Terminology 6

2. SPEECH TECHNOLOGY: THE RESOURCES 9

2.1. Introduction 9

2.2. Speech generation 10

2.2.1. Speech reproduction 10

2.3.2. Model of human speech product ion 11

2.2.3. Reduction of speech data 15

2.2.4. Speech resynthesis at IPO 17

2.2.5. Synthesis of running speech 23

2.3. Conclusion 24

3. NEEDS FOR AND AVAILIBILITY OF SPEECH AIDS 27

3.1. Introduction 27

3.2. Methods and general observations 30

3.3. Talking speech prosthesis for the speechless 31

3.4. Speech generation for the blind 33

3.4.1. Aids for external visual information 35 3.4.2. Aids with internal visual information 39

3.5. Conclusions 45

4. TYPING BY THE BLIND 4.1. Introduction

4.2. The introduction phase 4. 3. The learning phase 4.4. The practice phase 4.5. Discussion

5. FEEDBACK IN TYPING 5.1. Introduction

5.2. Non-speech modes of feedback 5.3. Speech feedback 5.4. Conclusion 49 49 52 54 56 58 63

63

64 67 70

(10)

6. EXPERIMENTAL SET-UP FOR A TALKING TYPEWRITER 71

6.1. Introduction 71

6.2. Hardware description 72

6.2.1. The voice response unit 75

6.2.2. The clock circuit 78

6.3. Functional description 78

6.3.1. Speech feedback of the keying 78

6.3.2. Text memory 79 6.3.3. Recall functions 80 6.3.4. Correction possibilities 82 6.3.5. Additional features 82 6.3.6. Outside functions 83 6.4. Conclusion 84

7. EVALUATION OF THE EXPERIt1ENTAL TALKING

TYPEWRITER 85

7.1. Introduction 85

7.2. Results 88

7.2.1. General experience with the speech output 88 7.2.2. Experience with typing using speech

feedback 89

7.2.3. Experience with the intermediate text

memory 92

7.2.4. Use of the function keys in general 92 7.2.5. Experience with recal1 and editing

functions 95

7.2.6. Experience with additional functions 97 7.2.7. Handling of the set-up in general 98

7.3. Discussion 100

7.3.1. Similar studies, developments and

products 103

8. THE TYPOPHONE 113

8.1. Introduction 113

8.2. Specification of requirements 114

8.3. Technica1 design of the Typophone 115

-8.3.1. The typewriter 116

8.3.2. Hardware description 118

8.3.3. Software description 125

8.4. Functional description 131

8.4.1. Power-up procedure 131

8.4.2. Typing with speech response 131

8.4.3. Reca11 and editing with typewriters keys 133 8.4.4. Recal1 and editing with Typophone keys 134

(11)

9. EPILOGUE 139

SUMMARY 143

SAMENVATTING 145

REFERENCES 147

APPENDICES 153

A. Institutes and persons consulted in the

survey of the needs of the handicapped 153

B. Vocabulary of the experimental talking

typewriter set-up 155

C. Wiring diagram of the Typophone talking

typewriter 158

D. Organization of speech data storage 160

E. Vocabulary of the Typophone talking

typewriter 162

F. Flowchart of the Typophone's software 164

G. The spelled speech alphabet 179

(12)

VOO R\'lOO RD

Dit proefschrift beschrijft de resultaten van een onderzoek dat niet had kunnen plaatsvinden zonder de welwillende medewerking van:

De instellingen en personen, genoemd in Appendix A, die ons informatie hebben gegeven over de aktuele behoeften van gehandikapte personen.

De drie instellingen die bereid waren om mee te werken aan de evaluatie van de experimentele spre-kende schrijfmachine, te weten: "Het Loo Erf", Apel-doorn, Centrum Bartimeus, Zeist en het Koninklijk Instituut tot Onderwijs van Blinden, Huizen (U.H.). Met name wil ik de lokale type-instructeurs

vermelden. Dat waren respektievelijk K. Fredriks, A.A. de Langen en L.I. Eichenberger-Boot.

Prof.Dr. H. Bouma, die met een ongelooflijk geduld heeft kunnen en willen wachten op de vele concepten van dit proefschrift en die mij voortdurend heeft kunnen motiveren om deze tot een definitieve versie te bewerken, daarbij zelf ook een grote bijdrage leverend.

H.E.M. Mélotte, die mij gedurende het hele onderzoek met veel enthousiasme heeft begeleid en die, niet in de laatste plaats, de nazorg van de Typofoon op zich heeft genomen.

Drs. L. Bosma-Zabel, die mij op de eerste wankele stappen in de wereld van de gehandikapten heeft bege-leid.

R.M. Smith, die vele publikaties voor mij heeft

gezocht en gevonden en daarnaast een grote impuls aan een van mijn hobbies heeft gegeven.

Dr.Ir. L.L.M. Vogten en Ir. L.F. Willems, die ml] hebben ingeleid in de spraaktechnologie en veelvuldig bijstand hebben verleend bij het analyseren van de Typophone spraak.

J. ' t Hart, die zijn stem heeft geleend om deze op een zuinige wijze op te laten slaan.

(13)

Ing. E. de Braal, voor wie de bouw van de experimen-tele opstelling bijna een levenswerk moet zijn geweest.

Ing. J. Polstra, die de eerste versie van het indus-triële prototype heeft ontworpen. Ik zal de nacht van het zilveren IPO, waarbij ook de aanwezigheid van

c.

Fellinger vermeld mag worden, niet gauw vergeten. Ing. J. Tiesinga, die steun heeft verleend bij de bouw van de tweede versie van de Typofoon.

De verschillende medewerkers in de werkplaats van het IPO die onder leiding van J.H. Bolkestein de vele mechanische/elektrische werkzaamheden hebben uitgevoerd.

G.E. Luton, die de engelse tekst zeer nauwgezet heeft gekorrigeerd.

(14)

1. PROLOGUE

l.I. INTRODUCTION

Recent advances in speech technology have opened up numerous possibilities for the improvement of aids for the handicapped or for the development of new aids. Speech technology is used here as general term for electronic speech processing, including automatic speech recognition, speech synthesis and other kinds of speech manipulation by electrical or mechanical means. Of these aspects, speech generation in parti-cular has made convincing progress, as a result of which various types of speech generators are now commercially available. This has inspired people to design and build a variety of talking aids for the handicapped.

tihat benefits might speech technology bring to handi-capped people? Human speech is an adequate means of communication which enables people to transfer

thoughts, meanings, intentions, feelings, etc. to one another (Nooteboom and Cohen, 1984). If this communi-cation by human speech is disturbed in one way or another, speech technology might provide help in the form of a speech prosthesis, giving speech-impaired people a new voice, or a 'listening' communication aid for the deaf, which might recognize hu man speech and give a visual or tactile representation. Speech mayalso play a significant role when a communication channel, other than speech, is malfunctioning. For example, a reading machine for the blind might con-vert printed language into synthetic speech, and a

'listening' typewriter might enable a paralytic or amputee to write letters.

Speech technology has already yielded a wide variety of benefits for the handicapped, as reflected by the following commercially available aids, implemented with speech technology:

- Two Handivoice speech prostheses for speech-impaired individuals (Votrax). The devices are pro-vided with a numeric and a word keypad -respectively, for entering the message to be reproduced in synthe-tic speech. Available since 1978 at a price of about US$ 2,000.

(15)

- Vocaid is a similar aid, but its output based on the resynthesis of preselected phrases (Texas Instru-ments). It has been selling for about US$ 150 since 1983.

- The Kurzweil Reading machine is a desktop appa-ratus which converts typewritten and printed text into fluent speech for the blind 'reader'. It appeared on the market in 1979 at a price of about US$ 20,000.

- Talking calculators which answer back the touched keys and give the results of calculations (Canon, Panasonic and Sharp). Talking calculators have been on sale since 1975 at prices ranging from US$ 40 to 450.

- A talking thermometer designed to measure body, environmental and bathwater temperatures (AEG-Tele-funken). It has been available since 1980 for about Dfl 1,600.

The above speech aids for the handicapped reflect the state of the art and indicate the current market situation. In addition, many publications report promising results with prototypes and many others announce the forthcoming product ion of speech aids. Unfortunately, i t seems that the majority of these efforts find no outlet in the end. Why?

In attempting to answer this question, we may wonder whether enough attent ion has been paid to the actual benefit for the handicapped. In this respect, we make a distinction between the intrinsic benefit and the human factors aspects. The intrinsic benefit relates to the functionality of speech technology for the application in question. The human factors relate to the ease with which the aid can be used and control-led, which has to do with the complexity of opera-tion, the meaning and use of control buttons, pos-sible time delays, adaptation to the specific handi-cap, and, in general, the way in which it falls in with the wishes and expectations of the handicapped individual.

In the present study we make i t our object to inves-tigate the potential use in aids for the handicapped, of speech techniques which are available at present or will be in the near future. We started in 1979 with an initia1 investigation into the state of the art of the various speech techniques and into the

(16)

actual needs of the various groups of handicapped people. We arrived at the conclusion that there is a particular need for a talking typewriter for the blind and poor-sighted, which seemed to be realizable in the near future.

A detailed description will be given of the design, construction and evaluation of an experimental type-writer with speech output, aimed at the development of an affordable commercial machine. By the end of the study in 1983 i t proved possible to demonstrate an industrial prototype of a talking typewriter, based on the evaluation results gathered with the experimental set-up.

1.2. SCOPE OF THE STUDY

Within the objective to investigate the potential use of present speech technology in aids for the handi-capped, the study focusses on a talking typewriter for the blind (Fig.l.l). As speech technology is the resource which gives the opportunity to design an aid with speech output, the study starts with a survey of the literature on this topic, supplementéd with

information on speech research, leading to the

(re)synthesis of speech, at the Institute for Percep-tion Research, IPO, where this study has been carried out. The survey is presented on the basis of a model of human speech production (chapter 2).

During this part of the study, it appeared that only the resynthesis of a limited number of predetermined utterances was available on an industrial scale. The synthesis of speech without any vocabulary restric-tion will probably become available in the near future. This indicates that we will have to forgo speech recognition and certain types of speech mani-pulation for the time being, as these techniques will not become available within the limited time span of this study.

(17)

SPEECH TECHNOLOGY ON BEIIALF OF THE HANDICAPPED

I

I

I

STATE OF THE ART SURVEY OF NEEDS

OF SPEECH BY FIELD STUDY AND

TECHNOLOGY LITERATURE REVIEW

I

PROJECT DEFINITION:

I

,

TALKING TYPEWRITER

"'"

FOR THE BLIND

I PROBLEMS OF BLIND TYPISTS I ROLE OF FEEDBACK IN TYPING DESIGN OF AN EXPERIMENTAL SET-UP I FIELD EVALUATION I DESIGN OF AN ~DUSTRIAL PROTOTYPE EPILOGUE

(18)

With this mind we have to define a speech aid equip-ped with speech resynthesis that fulfils an actual need of handicapped individuals. For this purpose we have conducted a series of interviews among handicap-ped people, and undertaken a literature study to determine the needs and solutions as revealed in reported research, development and evaluations. The development of a talking typewriter for the blind turned out to be the most useful and realizable aid at the moment. A talking typewriter spells out a text during typing or/and afterwards and as such it may compensate the effects of a visual impairment

(chapter 3). .

The speech technology required for the talking type-writer for the blind typist was available: the resyn-thesis of speech from a restricted vocabulary. The question arises how to make sure that the talking typewriter fulfils the needs of the blind typist. In the first place we have to know the problems of the blind in using a typewriter. We gathered this infor-mat ion by conducting a second series of interviews with instructors of typing courses for the blind

(chapter 4).

Vision plays an important role in the feedback of typing by sighted typists. Obviously, blind and poor-sighted people have to do without this feedback chan-nel and we wonde red whether other chanchan-nels could take over the role of feedback. Therefore we studied the importance of the various forms of feedback in typing

(chapter 5). Speech output seems adequate for imme-diate response confirmation of keying and for the recall of produced text.

We considered that on the basis of these results we ought to be able to design a useful talking type-writer. However, the actual benefit under different practical circumstances would have to be proved. We therefore decided to design and build an experimental talking typewriter on the basis of the gathered

information (chapter 6).

Blind typists with different levels ofskill evalua-ted this experimental typewriter under various condi-tions. The results of the evaluation showed that in the feedback of typing actions, listening to speech output may indeed take over a part of the role of

(19)

vision. The results also comprised the human factors and provided recommendations with respect to the control of a talking typewriter (chapter 7).

The question then arose whether i t was possible to manufacture a typewriter with speech responsethat would fulfil the needs of the blind typist and meet the evaluation results of the experimental set-up. At the same time we had to take careful account of the target group. As the need for a talking typewriter is felt by blind people in a large variety of financial capacities, ranging from fully supported professional use to private communication, we took the view that the development had to be aimed at affordability. This imposed certain constraints on the design. We succeeded in building a prototype talking typewriter which seems to meet the most prominent needs at a reasonable estimated price level, the Typophone

(chapter 8). The presentation of the Typophone proto-type virtually closed the study.

At this stage, a null-series of the Typophone should be produced, so as to evaluate its performance before starting series production. The epilogue (chapter 9) describes our attempts to involve a manufacturer into this process have failed up to now, which led an inter-department-group between IPO and the Division of Medical and Electrical Engineering of the Techno-logical University Eindhoven to manufacture a null-series under their own control.

1.3. TERMINOLOGY

T.hroughout the study some general terms will be used in the domain of speech technology and aids for the handicapped. We define these terms inthe following: -Speech technology is the general term for the various

ways in which human speech signals are processed by digital electronic equipment. Speech technology con-centrates on the design of computer software and hardware, usually based on results of speech research. We distinguish the following aspects:

(20)

Speech analysis breaks down human speech into fea-tures that are relevant to human speech product ion and perception. Speech analysis provides the founda-tion of most speech techniques. It opens opportu-nities for an economical storage of speech in encoded form and facilitates speech recognition and manipu-lation.

Speech generation is the general name for the elec-tronic output of spoken utterances. There are three main types: reproduction, resynthesis and synthesis. Speech reproduction means the direct output of unpro-cessed digitally stored speech utterances.

Speech resynthesis is the technique which regenerates speech from encoded speech data. In this case the speech utterances have been analysed and usually economically stored in an encoded form, so that a resynthesis is necessary.

Speech synthesis is the technique which generates any arbritrary utterance by concatenation of speech

building-blocks according to a set of rules. In dependence on the input message the rules determine the sequence of the building-blocks and create timing and intonation patterns.

Speech reco~nition analyses human speech and

classifies ~t into a predetermined set of words or other speech units. The technique discriminates between spoken utterances with different meanings from a certain speaker or from many speakers and responds accordingly.

Speech manipulation is the technique which introduces a well-definerl change in the characteristics of human speech. Usually i t starts with the output of speech analysis and manipulates certain speech features in one way or another. Thereafter the manipulated speech features may be resynthesized into speech or conver-ted into non-speech output modes such as visual or vibratory displays.

Aids for the handicapped are technical provisions which are used to cancel or at least diminish the effects of an .impairment. They may have been designed for both handicapped and non-handicapped people.

(21)

Special aids are provisions developed to meet a spe-cific need of handicapped people. A special aid may also prove useful for the non-impaired population. This study is concerned mainly with special aids, although the term 'special' willoften be omitted. Speech aids are aids in which one of the speech techniques is implemented.

Disabilitr is a shortcoming of the bodily condition and in th1S context it is considered purely physical. Impairment and handicap describe the consequences of a disability with regard to functions which otherwise could have been fulfilled. An impairment concerns the consequences with regard to bodily functions, whereas the term handicap has a more general meaning for all personal, social and occupational consequences. Although a strict distinction between impairment and handicap is possible, in practice these terms are

used interchangeably~

Low vision is a visual impairment with a vision of less then 0,25 but better then 0,03 and a range of vis ion of the central 10 degrees at least.

Blindness is a visual impairment with a vision of less then 0,03 or less then 10 degrees range of vision.

Speech impairment means that a subject, who can for-mulate normally, is either not able to speak at all or hel) speaks with 50 much difficulty that his

speech is understood by a few acquaintances only.

1) Throughout this study 'he' and 'his' should be read as 'he or she' and 'his or her' respectively.

(22)

2. SPEECH TECHNOLOGY: THE RESOURCES 2.1. INTRODUCTION

In the prologue we proposed to investigate the poten-tial use in aids for the handicapped of those speech technologies which are available and applicable at present or will be in the near future. The techno-logies to be considered are speech recognition, speech reproduction, speech synthesis and resyn-thesis, and speech manipulation. Speech analysis and manipulation are not available yet as.real-time pro-cesses, which will usually be an essential require-ment. In addition, the analysis of speech is only available as computational algorithms on powerful computers and thus hardly applicable on-line for the handicapped. Therefore, these aspects are excluded

from the study. Since no speech recognizers were available at the time of the initial investigation, their application is not considered either.

Having speech generation techniques available, a question arises concerning applicability: the poten-tial use of speech technique will depend on whether i t matches certain well-defined needs of the handi-capped. That is to say, a particular aid will make a number of demands on the technique. Although some of these demands will be very specific to the applica-tion, the following general demands may be identi-fied.

A speech prosthesis, for instance, may make a port a-bility demand. Portaa-bility finds expression in

maximum allowed physical dimensions and weight, and a power consumption for which (rechargeable) batteries will suffice. It will be a less stringentrequirement

for aids that find a more or less fixed place at home, school or workshop. In that case we can better speak of movability, and the power can than probably be drawn from the mains.

Regarding the vocabulary, the aids on offer will differentiate between more or less extensive ranges of vocabulary. While one application may ask for the generation of separately spoken utterances out of a restricted vocabulary, another will need fluent speech without any vocabulary restriction. In addi-tion, the application may impose requirements with

(23)

respect to the random access performance, governing the time between a command and the actual speech being generated. Furthermore, we have to be aware that the perception of speech will vary between indi-viduals, and may depend on the amount of information and the redundancy in the generated utterances. A means of controlling the speech ra te mayalso prove useful.

What should be the guality of the generated speech in relation to a specific application? If the speech response is only meant for the user himself, the us er can familiarize himself with the electronic voice. Even when a message is not perfectly pronounced, the user may learn to interpret it correctly. On the other hand, if the apparatus is a communication aid meant for being understood by any listener, both intelligibility and naturalness should score high. Intelligibility is important as the listener will not have the chance to get used to it, and naturalness may determine the acèeptance of the aid. Thus the quality of generated speech will be an important application-dependent requirement.

As human speech research provides the framework, and since many features of speech generation techniques refer to human speech, we start with a description of human speech production. Thereafter electronic speech generation will be treated.

2.2. SPEECH GENERATION 2.2.1. Speech reproduction

Speech reproduction may be done by straight record and play-back of human speech. The methods of speech storage and generation that are widely used, are the . record-player (spatial storage), tape recorder

(mag-netic storage) and the celluloid film of sound

motion-pictures (optical storage). The speech signal is recorded sequentially by analogeous means. This is where the main disadvantage arises, namely that the information is not randomly accessible, which means that i t takes a substantial time to locate a wanted utterance. Most applications, on the contrary, will require almost immediate speech output following a

(24)

commando

Faster access has been achieved by the development of magnetic disc memories (winchester rlisc and floppy disc drive) and laser-based optical systems (Compact Disc), on which the speech signal is digitally

stored. At the time of this study these devices could not yet be used in applications requiring portability and/or affordability.

Is i t possible to use solid-state memories (ROM

=

re ad only memory) for storing speech data? The use of ROMs allows random access and makes i t possible to meet the specifications of minimum weight, space and power consumption. The speech signal has first to be digitized. As to storage, a sampling rate of 10 kHz at 12 bit re sol ut ion results in a memory requirement of 120 kbit for the straight storage of 1 sec of speech. One 16 kbyte of ROM, for instance, will store only 1 second of speech data. Although the capacity of memory chips is rapidly increasing, it is not yet generally attractive to store let us say one up to several minutes of speech in this manner. On the other hand, the use of digital unprocessed recordings ensures that the reproduced speech will be of high-fidelity quality.

For more economical storage i t is possible to encode the electrical speech signal with a significant data reduction. Although universal methods exist to reduce data rates, an encoding method will probably provide the highest reduction if i t is based on the specific features of the signal. As we are interested in speech signaIs, we will take a look at the model of human speech product ion , which is represented in the next section.

2.2.2. Model of human speech product ion

Human speech is produced by a sequence of speech organ movements, commanded by the brain, by which a series of varying but specific acoustic waves are generated, which can be recognized as numan speech and understood as such. A description of the physics of human speech product ion is given by the source-filter model of Fant (1970). For the purposes of the present study we restrict ourselves to a simplified

(25)

model given by ' t Hart, Nooteboom, Vogten and Willems (1982) and Vogten (1983), on which speech research at IPO is based. Such a model has proved adequate to open the door to artificial speech generation. The human speech organs are located in the respira-tory organs and a part of the digestive tract (see Fig.2.l). Lungs, trachea, vocal cords and narrowings in the air tract act jointly as the sound source. Movements of the thorax decrease the volume of the lungs, causing air pressure to build up and an air current to flow. In this way the energy needed to produce sound is generated.

Fig.2.1. Human speech organs: 1. Trachea, 2. Vocal cords, 3. Nasal cavity, 4. Soft palate, 5. Tongue, 6. Oral cavity, 7. Pharynx. (from ' t Hart et al., 1982)

(26)

The real sound sourees are the locations within the speech organs where the flowing air is set in vibra-tion. In principle we distinguish two sound sourees, the vibrating vocal cords for voiced sounds and nar-rowings in the air tract for unvoiced sounds, which may act both alternately and in combination with each other.

In the model these two sound sourees are represented by an impulse generator and a noise souree respec-tively (Fig.2.2). The voiced sounds originate from a periodic impulse generator with a periodicity of l/F O' F being the fundamental frequency corres-pondingOto the vibration frequency of the vocal

cords. In the frequency domain this signal gives rise to a series of equally strong components \17i th fre-quencies FO' 2FO' 3F O' etc. The decrease in power of the harmonies by -12 dB/octave, as seen in the spec-trum of the actual human source because of its appro-ximately sawtooth shaped signal, will be included in the filter. Unvoiced sounds are generated from noise with a flat spectrum (white noise). A variable

ampli-fier completes the souree.

In certain phonerne sounds, like v and z, voiced and unvoiced sound sourees act simultaneously. In dis-tinction from actual human speech, the model switches between the sourees, which could introduce problems in the generation of such phonemes. Intonation occurs primarily by varying the fundamental frequency of the impulse source. The variation of the souree signal with time is now controlled by only three parameters: the choice between voiced and unvoiced, VUV, the fundamental frequency, FO in the case of voiced sounds and an amplificatl0n factor, G (Fig.2.2). The combined human impulse/noise source produces sounds with a broad spectrum ranging from about 80 to 8000 Hz. As components beyond 5 kHz hardly contribute to the intelligibility and quality of male speech in particular (Vogten, 1983), the range is usually res-tricted to 5 kHz for practical reasons. The spectrum of the noisy sounds is continuous. Voiced sounds, on the other hand, show a discontinuous spectrum with peaks at the fundamental frequency, FO (the vibratory

frequency of the vocal cords) and the corresponding higher harmonies , nFO (n = I, 2, 3, .•••• ).

(27)

VUV

G F1 81 - - - - - FS 85 I

-

-

-

I r

r

white noise FO

!lmp;lses!

j

L

-

-

- -...l L _J

sou ree filter

Fig.2.2. Source-filter model of human speech produc-tion. The source signal originates from an impulse generator with frequency FO or a white noise source selected by the voiced-unvoiced parameter VUV, and is amplified by a factor G. The filter is a series connection of 2nd order formant filters, charac-terized by centra 1 frequencies, Fi and bandwidths,

Bi' 1<=i<=5 (from Vogten, 1983). .

The extended wldth of the frequency spectrum of the human sound source creates the possibility to change the relatively small number of basic source sounds

(e.g. voiced, unvoiced, voiced-fricative, plosive, etc.) into a wide diversity of actual speech seg-ments. The air flows through a resonator formed by the pharynx and the oral and nasal cavities, which acts as an acoustical filter (Fig.2.1). Changing the dimensions of this resonator by a move~ent of tongue, lower jaw and lips makes the filter action variabie. 'l'he mouth opening itself has a high-pass filter char-acteristic

(+

6 dB/octave) and contributes as such to the acoustical filter.

Due to the filter action of the oral cavities the source signal is amplified at some frequencies and weakened at others. The resonance frequencies of the vocal tract give rise to a number of more or less pronounced peaks in the spectrum: the formants.

(28)

In ana1ysing human speech, Vogten (1983) indicates that the frequency region up to 5000 Hz shows a maximum of four formants (in the case of woman's voices) or five (men's voices). According to Noote-boom and Cohen (1984) the three 10wer frequency for-mants (F 1 , F and F ) are main1y responsib1e for the identity of the

voi~ed

sounds. The higher formants contribute in particu1ar to the natura1ness of speech.

In the model the filter can be seen as a series connection of simp1e (second-order) filters, each corresponding to one formant (Fig.2.2i. Since each formant is characterized by va1ues of the central frequency and bandwidth, the tota1 filter is deter-mined in the case of five formants by ten

time-varying parameters (F 1 .• F , B .• B ). The model filter a1so takes care of the

sp~cific p~operties

of the human sources of voiced and unvoiced sounds, as far as these are not covered by the model sources them-se1ves, and of the acoustic transfer function of the mouth opening. This implies an overall decrease of about -6 dB/octave for voiced sounds and +6 dB/oct for unvoiced sounds.

So far, the description of human speech product ion has been rather static. The model may be said to give an instantaneous ana10gy of the position and action of the human speech organs. Speech, however, is a very dynamic sound process and there are a large number of 1inguistic ru1es stored in our brains to dictate the movements of the voca1 system. Speech ana1ysis provides a too1 for studying these time variations of the model parameters.

2.2.3. Reduction of speech data

During speech, the speech organs show a series of continuous movements at a re1ative1y slow rate in comparison with the resu1ting speech waveform, which has a bandwidth of about 8000 Hz. This dynamic beha-viour implies that the model parameters vary re1ati-ve1y slow1y in time. Since these paramèters describe the speech signa1, the storage of the parameters instead of the speech signa1 requires 1ess memory space, 1eading to the wanted data reduction. Speech ana1ysis sifts the characteristics of source and

(29)

filter out of the speech signal in time. The speech analysis at our disposal at IPO and the computer implementations have been extensively described by Vogten (1983). Fig.2.3. gives the result of an analy-sis run as an example.

A FTER I GO T HERE I WENTSTRAIGH T TOBE 0

tl ...--

.

-'-

.

.. -__ -.--,/,,-..

.

~-

.

:-_

...

....

~:~._.-.:-

.... _.

-.-!./-~'.:..

.. -.:-'--

...

.

.

:--~:-.

/-,. .-

.

...:_~.-.:_

...

_,

[::0

V/UT

1

J

_"___

_~~_

.

".

[~~HZ

.~":: .. ~

...

.

_;_.

_

..• ~.- .""""'---. ~ ..

,

--

..

--

.

+-~--~--~~~--~----~~~~o

o

0.5

1

.

0

1

.

5

2

.

0s

Fig.2.3. Result of an analysis process. From the top to bot tom: amplitude amplification factor G,

voiced-unvoiced indicator UV, fundamental frequency FO and the five formants indicated by their centra 1 frequencies Fi and their quality factors Q.

(verti-cal dashes, Q.=F./B. ). (from Ot Hart et al:, 1982)

~ ~ ~

The analysis of speech fragments with a duration of 25 ms (filter) or 40 ms (fundamental frequency) is done every 10 ms. Thus the resulting parameter values arp obtained at a sample frequency of 100 Hz. The analysis of each sample produces thirteen parameter

(30)

values: amplitude, voiced-or-unvoiced, fundamental frequency and the frequencies and bandwidths of five formants. With the exception of the VUV parameter (1 bit) each of the parameter values is represented by 12 bits. This leads to a memory requirement of 100x(1+12x12)=14.5 kbit/sec. for storage of the speech feature values. With respect to the digitized original speech signal (120 kbit/s) this means an eight-fold reduction of memory space.

As the speech organs move rather slowly, an addi-tional reduction in memory requirements seems pos-sible. Looking at the results of the analysis we see that the feature values show for the most part smooth variations over a rather small range of values. To reduce the data rate, the inter-sample time may be increased and the number of values for each of the features lowered so that these values can be encoded by less than 12 bits per parameter.

Resynthesizing speech from stored speech parameters makes i t possible to evaluate its perception and study the influence on speech quality of cuts in speech data rates. Vogten (1983) has shown that a careful use of skipping and quantization of parameter values, with an adapted resynthesis process, results in a minimum memory requirement of about 1 kbit/sec. With respect to unprocessed storage of the original signal this gives a memory saving by a factor of 120. One 16 kbyte memory chip can then store the para-meters for 128 seconds of speech. This creates the opportunity to generate speech from data in memory ICs, and thus in real-time and with random access.

2.2.4. Speech resynthesis at IPO

Resynthesis of speech is the process which rebuilds speech from the speech parameters stored in a memory as a result of the analysis of original spoken utter-ances. It is implemented with a speech product ion model. At IPO the speech analysis and resynthesis system runs on a VAX 11/780 computer (tt Hart et al., 1982). Based on research executed with-this system, solitary resynthesis systems have been developed: The first solitary IPO Voice Response Unit (VRU) was described by Willems, Moonen, Lammers, Dobek, Nes and

(31)

Jimenez Nichols (1977). In conformity with the speech model described, this VRU is based on a series con-nection of four formant filters. The coefficients of these filters are derived from the formant-encoded speech data, as are the choice of source (voiced/ unvoiced), source frequency (if voiced) and source amplitude. The average bit rate is only 1 kbit/ sec. The first VRU was built with the aid of MSI circuits

(TMC 539), each carrying two digital filters. In a second version (1979), five of which were built, the speech synthesizer was implemented with a 2901 bit-slice microprocessor, controlled by an additional microprocessorsystem. This VRU fills a total of about 465 cm2 of circuit board (see Fig.2.4), requires a power of about 50 watt and weighs about 9 kg (inclusive housing and power supply), so its use will not quite meet the portability requirement. We

used such a VRU in our experimental talking type-wri ter (chapter 6).

Miniaturization and integration in electronicsled from here to the development of speech synthesis chips: these are integrated circuits with such a high density thatonly one chip performs the complete resynthesis. Research at IPO initiated the develop~

ment of the Philips MEA8000 speech synthesis chip (Willems and Bierlaagh, 1983), which became available in 1982. It will be succeeded by the PCF8200 at the end of 1986. Fig.2.5. and Table 2.1. give further details. In the following we restrict ourselves to the then available MEA8000.

Fig.2.5. shows that the HEA8000 speech chip is based on formant-encoded speech data, in conformity with the speech model. This opens up many possibilities for manipulating speech properties, and thus resyn-thesis may yield speech which is deliberately dif-ferent from the original speech.

An example is the speech rate, which we want to

control. $ince the data of each speech sample contain information on its duration (Table 2.1), the bit rate can be modified. With that option, the bit rate of the output speech is adjustablebetween 500 and 4000 bit/sec. This property allows us to alter the speech rate without changing other parameters such as

frequency. Thus the speech chip meets the requirement of changing the speed of speech, with the advantage

(32)

over e.g. speech reproducers that i t can be done without the character of the speech itse1f changing.

Fig.2.4. The interior of the IPO VRU showing three circuit boards raised with respect to their norrna1 position.

(33)

Av

Fig.2.S. Electrical model of the Philips f.1EA8000 and PCF8200 speech synthesis chips. The MEA8000 lacks the fifth formant filter. Parameter codes: see Table 2.1.

TABLE 2.1.

Parameter Code Bits Bits

MEA8000 PCF8200

Start frequency F 8 8

Delta frequency/

~~O

S S

noise selection

Amplitude voiced/noise A'j./AN 4 4

Frame duration D - 2 2 Frequency lst formant Fl 5 S Bandwidth lst formant Bl 2 3 Frequency 2nd formant F 2 5 S Bandwidth 2nd formant B2 2 3 Frequency 3rd formant F3 3 3 Bandwidth 3rd formant B3 2 2 Frequency 4th formant F 4 0 3 Bandwidth 4th formant B4 2 2 Frequency 5th formant F S

---

1 Bandwidth Sth formant BS - - - 2

Total bits per speech frame 32 40

'rable 2.1. Distribution of frame-data bits over the parameters for the MEA8000 and PCF8200 speech syn-t.he5is chips.

(34)

A general block diagram of a speech resynthesis sys-tem with the MEA8000 speech chip, is shown in

Fig.2.6. The MEA8000 requires some additional cir-cuitry, in the form qf memory chips for storing speech data and a miéroprocessor system. The 1/0 circuit takes care of the cornmunication of the system with the outside world, e.g. an apparatus or key-board. The microprocessor handles these 1/0 signals, reads the required speech data from the speech ROM and sends these data, whether or not processed, to the speech chip. The speech chip resynthesizes the speech signal, which in turn is amplified and fed to a loudspeaker or earphones. In addition to the speech data, the ROM stores the microprocessor's program codes. The hardware will be described in more detail in chapter 8, which concerns our actual application of the MEA8000 speech chip.

CHIP

Fig.2.6. Block diagram of a universal large-vocabu-lary speech resynthesis system, built around the MEA8000 speech chip.

The extent of the speech vocabulary is determined by the memory capacity available and the addressing ability of the microprocessor. The memory capacity gives an indication of the total duration of pro-ducible speech. As to the addressability, the usual 16-bit address bus gives direct accessta 64 kbyte of memory. At an ave rage bit rate of 1 kbit/sec, for instance, 64 kbyte or 512 kbit ROM gives about 8.5 minutes of randomly accessable resynthesized speech. Since the speech chip consumes only 30 mA

(35)

from a 5 V supply, i t fulfils the portability

requirement. From the above we can conclude that the MEA8000 allöws the random access generation of a total of several minutes of speech from a prede-termined vocabulary by resynthesis.

Up to now we have not dwelt on the quality of the speech resynthesized with the aid of the MEA8000. In

judging speech quality there are two aspects to be considered: intelligibility and naturalness of the produced speech. Vogten (1983) quotes some methods for measuring speech intelligibility. It is usually measured with recognition tests by subjects under speciEic conditions, resulting in a percentage of correct responses. In the first intelligibility test ofthe IPO analysis-resynthesis system, Vogten (1980) applied a speech interference threshold (SIT) method which measures the influence of interfering speech

(masker), expressed by a quality (Q) factor. He found

that doubling the bit rate of resynthesized speech

leads to a 3 dB-higher Q value. Another speech

quality measurement at IPO by Nooteboom and Doodeman (1982) makes use of the redundancy of polysyllabic meaningful words. Both methods indicate that the speech qualitydecreases with increasing reduction of speech data. ' t Hart et al. (1982) indicate that intelligibility strongly decreases in the case of a bit rate under 1 kbit/sec.

As regards the naturalness of the speech, i t has appeared that the voice of some speakers is better suited for the analysis-resynthesis process than the voice of others. Moreover, the MEA8000 is not able to produce with high quality the speech of females and children. lts successor, the PCF8200, is expected to do better in this respect. However, with a careful analysis of the voice of a suitable speaker, the MEA8000 generates speech of sufficient quality for many applications in which a clear distinctlon is required between the various utterances. These appli-cations are to be found particularly in personal appliances in which the speech response is only meant for· the user himself. In such applications the degree of naturalness is supposed to be sufficient for

acceptance of the speech resynthesized by the MEA8000 .

(36)

2.2.5. Synthesis of running speech

Hitherto we have reviewed some speech reproduction techniques based on development work accomplished at IPO. We have seen that speech can be resynthesized from economically encoded speech data. The data of short or longer stretches of speech can be stored for later retrieval, af ter which the speech can be resyn-thesized with the aid of a speech synthesis chip. Suppose that we have a large number of words in our vocabulary, ready to be resynthesized in any order. Would i t be possible to construct messages of longer duration by concatenation of the available segments? Let us take a look at human speech.

By comparing the spectrograms of complete sentences and separate words, both spoken by a human speaker

(see Nooteboom and Cohen, 1984) i t is found that the duration of the separately spoken words is longer than that of the same words of the spoken sentence. This time compression is not uniform. Some words are more compressed than others and we can distinguish intra-word differences in compression. A second dif-ference between the separate words and the spoken sentence is the absence of pauses between the words in the latter case. I t is noted that the formant frequencies of the succeeding words blend smoothly into each other, so that the word boundaries have disappeared. Added to this is the fact that words in a sentence are not always pronounced in the same way as the words spoken separately. Some phonemes, pre-sent in the separate words, are even abpre-sent in the running speech version. Thirdly, the intonation of a sentence is connected with a specific pitch contour in dependence on the accentuation. Just concatenating speech units will not lead to the correct contour. The analysis of hurnan speech thus shows that i t is not possible to construct messages just by conca-tenation of speech segments. The same will hold for a speech synthesis system starting from word units. The synthesis of arbitrary running speech uses much shorter basic speech elements such as phonemes, from which speech may be synthesized by applying a number of rules (synthesis-by-rules). These rules supply a convers ion of the usual orthographic input into a phonetic transcription, and provide information on the duration of the phonemes, on the way they blend

(37)

into each other, on the pitch and amplitude contour, etc. (Fig.2.7). These rules should solve the kind of problems indicated above.

The use of diphones instead of phonemes has proved to be advantageous (Elsendoorn and ' t Hart, 1982).

Diphones contain the transition part from one phoneme to the next. The use of diphones intrinsically

results in smooth transitions between phonemes. Although diphones have a relatively short duration, economical storage does make demands on memory capa-city, as some thousands of them are needed. Since, however, speech synthesis-by-rules mayalso make use of the MEA8000 speech chip, the diphones can in fact b~ stored economically. The promising use of diphones was not yet available at the time of this study.

2.3. CONCLUSION

The state of the art at the time of our initial investigation (1978-1979) indicates that speech re-synthesis of a fixed number of speech sounds, words and/or utterances is the only speech technique that 1end itse1f to application in aids for the handicap-pedo This applies not only to such technological aspects as encoding economics, random accessibility and speechquality, but also to the state of miniatu-rization, price, power requirements, availability and re1iabi1ity. On the other hand, speech technology is continuous1y advancing. For instance, in the period since the completion of this study, research into speech synthesis with diphones has resulted in a prototype keyboard-to-speech system (Menting, 1984). Therefore we should not lose sight of other aspects during our investigation into the needs of handicap-ped peop1e, which will be described in the following chapter.

(38)

TEXT INPUT YES VOCABULARY LOOK-UP LOOK-UP OF DURATION RULES LOOK-UP OF INTONATION RULES GRAPHEME-PHONEME CONVERS ION DURATION CORRECTIONS '---l~ PITCH/ AMPLITUDE CORRECTION SPEECH SYNTHESIS NO

Fig.2.7. Basic flowchart of speech synthesis-by-rules. Rules are applied for grapheme-phoneme translation, and for duration, pitch and amplitude adaptations.

(39)
(40)

3. NEEDS FOR AND AVAILABILITY OF SPEECH AIDS 3.1. INTRODUCTION

The objective of the present chapter is to justify that a talking typewriter for the blind is the

obvious project of this study. We first want to know the needs of the handicapped who are potential users of speech aids. The present chapter describes an initial investigation.

As to speech-impaired people, they need an alter-native output channel for communication with other people. The opportunities of present speech techno-logy lie in speech generation. Will i t be sufficient to equip one of the existing communication aids'with a speech generator? If i t is, what requirements should the speech generator meet with respect to vocabulary, speech quality, control, etc.? Since resynthesis of speech is available, and speech syn-thesis is expected to be available soon, we are able to check if the requirements of a talking speech prosthesis can be met.

The deaf and hard-of-hearing have impairments in speech perception. What benefit might bring speech technology to these people? In principle the deaf need an ideal speech recognition system, that is to say a system capable of recognizing running speech from any speaker, in real time and even in noisy surroundings, in a portable, handy, low-powered appliance. As mentioned in chapter 2, far less power-ful speech recognition systems are not yet available. Alternatively, they might bene fit from a speech

modi-fier that would manipulate speech and adapt it to the deficient hearing capacity. Speech technology offers many opportunities such as sharpening, enhancement, spreading out and transposition of the formants in the frequency domain, and additional tactile or

visual presentation of certain speech features. Since speech manipulation was not yet available as a real-time processor, however, i t is not dealt with in this study either.

Within the category of non-speech impairments we think in the first pI ace of a visual impairment, giving a diminished access to visual information. The question arises in how far the use of other sens es

(41)

like touch and hearing, can compensate for this han-dicap. With respect to touch, the use of braille, for instance, has earned a good reputation. Reading

braille, ho",ever, also has its restrictions: The material has to be prepared in advance, specially and exclusively for the blind. The amount of available braille texts is consequently rather restricted and specific texts, e.g. for study purposes, may involve long waiting times. The reading speed of braille (about 100 wpm) is low in comparison with visual reading (200-300 wpm). Reading braille is hard to learn, in particular by the large group of late-blind because of the callousness of the fingertips and a decline in the power to learn to use such a new sense. 8nly about 10% of the blind population, among whom very few are over 60, are able to re ad braille effectively. A spoken output seems to be preferable to a tactile mode, as i t does not have the above-mentioned disadvantages. However, appliances with speech output are rare and there is little experience of its acceptance. We may ask, for instance, what kind of visual information is suited for convers ion

into speech. And what are the needs that can be met by the available resynthesis of speech from a prede-termined vocabulary?

A second type of non-speech impairment to be

consi-dered is a -motor impairment, as a result of which

communicative means and environmental events may be difficult or even impossible to control. In this respect we may think of the telephone, typewriter, lighting, TV and radio, the turning over of book pages, opening and closing of doors and curtains, wheelchair control, etc. For this purpose special control devices have been designed such as the mono-selector (Radio Bulletin, 1979) which are operated by only one switch. If the handicapped person is able to speak consistently, speech recognition might offer a faster solutibn. For subjects with somewhat distur-bed, but still consistent speech, the aid can be adapted to the user's voice. Public provisions, on the other hand, will require speaker-independence. Since speech recognition as such is only rarely

available, this application is not \.,rorken out in this study.

(42)

Table 3.1. summarizes the speech techniques that are relevant to the various impairments. Applications of speech technology in the near future are to be expec-ted in cornrnunication aids for the speech-impaired, and in various kinds of aids with speech output for the visually-impaired.

TABLE 3.1.

IMPAIRMENT SPEECH TECHNOLOGY AVAILABILITY

Speech Speech resynthesis yes

impairment from a limi ted

vocabulary

Speech synthesis coming soon

Deafness Speaker-independent not yet

recognition of running speech

Hard-of- Manipulation of not yet

.,-hearing speech in real

time

Motor Speaker-trainable restricted

impairment small-vocabulary availability

speech recognition

Visual Speech resynthesis yes

impairment from a limited

vocabulary

Synthesis of speech coming soon from text

Table 3.1. Review of impairments and relevant speech techniques with an indication of their availability at present or in the future.

(43)

3.2. METHOOS AND GENE RAL OBSERVATIONS

The needs of the handicapped were initially gauged by looking at the aids currently available. These aids probably represent accepted solutions to existing problems. To obtain information on experience gained with these aids and on current developments we con-sulted puolications in the fields of health care, rehabilitation, impairments, perception, speech tech-nology, etc. as weIl as more popular periodicals. This provided us with insight into the opinions of the handicapped on the usefulness of available aids, the topical ideas and on wanted aids that may possi-bly be developed in the future. Furthermore, we went right to the source, to the actual world of the handicapped.

We contacted nineteen addresses, comprising insti-tutes, associations, foundations and prominent indi-vidual workers within the world of the handicapped

(see appendix A). Sixteen of these contacts led to visits during which we discussen with attendants, physicians, advisors, home teachers and instructors

(in the following cal led field workers) the problems of handicapped people in their daily lives, the need for aids and the requirements to be met by such aids. \'/e tried to touch all important aspects of daily life by considering the circumstances of e.g. employment, study, private bookkeeping, communication, mobility, sports and entertainment.

Instead of representing scrupulous descriptions of the interviews, we have chosen for a more general presentation of the discussed topics including a review of available aids and relevant literature. As we have set out to consider only the needs that might be met by speech synthesis and resynthesis, the

results of our survey are confined to communication problems experienced by persons with speech and sight impairments.

The interviews generally left the impression that the field workers knew hardly anything about the develop-ment of speech technology and its opportunities for handicapped persons. Because of this they had diffi-culties to take an active part in the conversation on the application of speech technology. This also

(44)

aids. Nevertheless, the discussions did give us use-ful information on the need for the development of aids and on some general aspects to keep in mind. The main need seems to be to achieve a higher degree of independence and self-sufficiency. This applies in particular to money matters and private correspon-dence, which the handicapped urgently want to handle themselves without outside interference. Gainful employment is another important aspect of the handi-capped person's independence. The imprp.ssion exists that many occupations could come within reach of the handicapped if only adequate aids were available. These aids should preferably be introduced during the learning phase, so that professional training could make the most of the opportunities which the aid can offer.

With respect to the nevelopment of ains the field workers ask for standardization and compatibility so that difficulties are not encountered when aids are exchanged between users or another aid is purchased. If an aid is not to disappear hardly used in the cupboard, good instructions and training are essen-tial. With respect to the development process itself the handicapped user should be involved right from the beginning. However, care should be taken that the handicapped are not led to entertain false hopes that an aid under test will quickly become available. They already had to accept too many disappointments. On the positive side, the field workers state that when a usable aid comes onto the market, more kinds of users show interest than expected previously.

3.3. TALKING SPEECH PROSTHESIS FOR THE SPEECHLESS Speech technology might provide speech-impaired people with a new voice. According to the interviews, the main goal should be to achieve communication without the use of technical provisions. Persons whose larynx has been removed, for instance, will be trained in oesophagal speech. Usually,a speech pros-thesis will only be. considered in cases where the patient is unable to master oesophagal speech. For all that, a speech prosthesis could be useful shortly af ter surgery, during training in oesophagal speech

(45)

and in cases where the laryngectomee is not under-stood by a hard-of-hearing partner.

lihat devices are available to enable the speechless to convey their thoughts, to express their wants and to answer c]llestions? The most simple aid, which is always and anywhere available, is paper-and-pencil, also cheap and portable. A comparable technical solu-tion is found in miniature typewriters (Canon Commu-nicator) or aids that output the text on a luminuous line display instead of paper (Elkomi, Lightwriter). Also, the PTT 'Teksttelefoon' provides di stance com-munication.

~he speed of such communication is far below the normal speech rate. Also, the listener always has to cooperate by keeping his eyes on the output medium instead of on face of the "speaker" and the communi-cat ion power of facial expres sion is thus unused. The "speaker", on the other hand, has to resort to an additional means for drawing attention.

It may be asked in how far speech generation meets these shortcomings of aids with text output. As speech is the fastest and most natural medium for communication, a talking speech prosthesis is expec-ted to be preferred if the control is sufficiently fast as well. A spoken message can be heard by a number of persons at the same time, even by those who are not in the immediate vicinity, or by persons who have turned their attent ion to someone or something else. In particular, speech is a very effective means of attracting attention, and communication aids with spoken output restore the transmission of messages by telephone without any further need for technical adaptations or special apparatus.

What other requirements have to be met for an accep-table talking speech prosthesis? The messages should be as unrestricted as possible, thus the application of speech synthesis-by-rules is indicated. Since the produced speech will be directed to any casual lis-tener, the quality of the speech has to be good in terms of intelligibility as well as naturalness. However, although speech in itself provides a fast means of communication, the implementation of speech output does not guarantee a speech-like communication speed. The speed of communication is in fa ct dictated

Referenties

GERELATEERDE DOCUMENTEN

The latex agglutination test (Wellcogen) was evaluated speci- fically in cases of 'septic unknown' meningitis, with CSF findings characteristic of bacterial meningitis but with

De vruchtwisseling bestaat uit: bloembollen (tulip, hyacinth en narcis), vaste planten, sierheesters en zomerbloemen.. Indien mogelijk wordt een

Daarom kunnen scherven die in goede staat zijn, in NIET LUCHTDICHT AFGESLOTEN PE-zakken worden verpakt, en samen in een grotere zuurvrije doos worden geplaatst.. De zakken mogen

In the treatment of the arthritis group of diseases there are two main groups of biological agents available, the tumour necrosis factor-α (TNF-α) inhibitors and the

In the high temperature region not much reduction is observed, which means that the passivation step did not reverse the metal assisted reduction of the

Critically, the epilarynx is distinct from the laryngopharynx, which is defined as the lower pharynx and is superiorly bounded by the hyoid bone and extends down to the level

F Waanders is with the Water Pollution Monitoring and Remediation Initiatives Research Group at the school of Chemical and Minerals Engineering, North

3 I reject two aspects of Wiredu’s moral theory, namely: firstly, I problematise humanism – a secular interpretation of morality that dominates African moral theorisation – a thesis