Prof. Dr.-Ing. habil. Hansjörg Mixdorff
Beuth-Hochschule für Technik Berlin
(University of Applied Sciences)
FB Informatik und Medien
Multimediatechnik (Audio/Video) / Technische Sprachverarbeitung
Email: mixdorff@bht-berlin.de


D.Eng. Thesis download page - Prof. Dr. Mixdorff

Welcome! From here you can download a postscript version of my D.Eng. thesis entitled "Intonation Patterns of German - Quantitative Analysis and Synthesis of F0 Countours".

Summary: In the present study a model of German intonation is presented which elaborates on the early tone switch approach by Isacenko to form a quantitative description of intonational events. Basic elements, "intonemes" and "boundary tones" are defined which characterize an arbitrary intonation F0 contour and whose properties can be described in terms of the physiologically motivated, mathematically formulated Fujisaki model-model of the generation process of F0. Natural speech data is analyzed to yield typical parameter values for intonemes in a given linguistic context.

Theoretical Part

In Chapter 1 the motivation and aims of the present study are briefly discussed. Chapter 2 explains the term "intonation" as employed in this work and gives an overview of intonation as an interdisciplinary research topic. The production process and some important findings about the role of intonation in speech perception are briefly discussed. The functions of intonation are then introduced. In this context, emphasis is put on the linguistic functions and the terminology commonly adopted. In the last part of this chapter, conventional models of intonation are discussed which aim to establish the relationship between the linguistic contents of an utterance and the F0 contour.

This chapter introduces the Fujisaki model-model, its mathematical formulation, the model components and the physiological interpretation the model provides as to the generation process of F0 contours. Special emphasis is laid on earlier studies applying the model to languages other than German. The last part of this chapter is dedicated to the work of Möbius who modified the model for his studies on German intonation. It is discussed which drawbacks as to the descriptive power of the model were incurred by Möbius modifications.

This chapter discusses the approach for analyzing F0 contours adopted in this work. It is explained how the contours were extracted from the speech data and corrected, and under which constraints word boundaries were marked. Then the modeling procedure for determining the Fujisaki model-model parameters applied to the corrected contour is explained and the linguistic constraints by which it is guided.

Experimental Part

This chapter discusses a production experiment examining utterances of intonational contrasts. It was designed to investigate and quantify the influence of the linguistic features sentence mode, focal position and focal projection (narrow against broad focus) on the F0 contour and the Fujisaki model-model parameters yielded in the analysis of the contour. Main observations are that sentence mode is mostly marked globally, influencing the F0 contour from the location of the last accent in the utterance until its end. Whereas broad focus is characterized by smaller accent command amplitudes and accent merging, broad focus implies a reduction of accents other than the focal accent and accent command amplitude boosting.

This chapter discusses a perception experiment based on the results of the preceding chapter. Averaging Fujisaki model-model parameters over a larger number of utterances of the same sentence to produce F0 contours which can be regarded as representing prototypal' patterns is investigated. In order to examine the perception of sentence mode, center stimuli for the categories `statement', `question' and `non-terminal utterance' were created as well as intermediate stimuli between these stimuli. The experiment shows that the center stimuli are unanimously identified as representing the respective categories. Whereas statement and non-terminal utterances are mainly distinguished by the offset time T2 of the accent command assigned to the last accent in the utterance, non-terminal utterances and questions mainly differ as to the accent command amplitude Aa assigned to the last accent and the occurrence of the `question terminal rise'. In the latter part of this chapter basic intonational elements are defined and the way in which their domains in the F0 contour can be delimited is discussed.

This chapter deals with the production of complex utterances by a larger number of subjects. The utterances were produced by reading a short, coherent text in the form of a short story. In addition to the Fujisaki model-model parameters, a number of secondary parameters such as phrase durations, speech rate and pause duration was determined. On the phrase level, for instance, the strength of the left phrase boundary was found to have the most influence on the phrase command magnitude Ap. On the accent level, the position of the accent syllable in the utterance, the word type and the focal condition were the most important factors influencing the accent command amplitude Aa. Higher speech rate reduces the number of phrases and also Ap and Aa.

In this chapter the experimental results from Chapters 5 through 7 are applied to the development of a synthesis scheme for generating F0 contours for TTS. In the introduction to the chapter conventional systems for prosodic control in speech synthesis are reviewed. In the latter part of the chapter a scheme based on the Fujisaki model-model employing qualitative linguistic rules and quantitative phonetic rules (decision trees) is discussed.

In this chapter the experimental results from Chapter 7 where a group of native speakers of German produced readings of a given text are compared with the results produced by Japanese subjects reading both the German version and a Japanese translation of the same text. In the first part of the chapter an introduction is given to the possible deviations exhibited by learners of a foreign language. Hypotheses about these deviations in the case of Japanese learners of German are set up based on the contrasts of the two languages on the syntactic, phonetic and intonational levels. Results include, for instance, the observation that the slope of the phrase component in the Japanese speakers is generally steeper and phrases shorter. Erroneous accentuation of German compound words can be explained by the fact that Japanese compounds are often formed by unaccented words preceding an accentable word which hence bears the compound accent, a situation inverse to German where the modifier at the beginning of the compound generally bears the compound accents.

Chapter 10 summarizes the results of this work and discusses problems connected with the approach developed and compares it with other models of intonation. Short- and long-term objectives are formulated, such as the problem of semi-automatic Fujisaki model parameter extraction for speeding up the analysis of larger corpora. Furthermore, examples of possible future applications of the approach to discourse or dialectal studies are given. The appendices contain details on the speech corpora used and the synthesis rules developed.

Letzte Änderung: 24-July-2006
URL dieser Seite: http://public.beuth-hochschule.de/~mixdorff/thesis/index.html

