Essay Topics
Types of Essays
Essay Checklist
Word Counter
Readability Score
Essay Rewriter
Speech perception is the ability to comprehend speech through listening. Mankind is constantly being bombarded by acoustical energy. The challenge to humanity is to translate this energy into meaningful data. Speech perception is not dependent on the extraction of simple invariant acoustic patterns in the speech waveform. The sound's acoustic pattern is complex and greatly varies. It is dependent upon the preceding and following sounds (Moore, 1997). According to Fant (1973), speech perception is a process consisting of both successive and concurrent identification on a series of progressively more abstract levels of linguistic structure. Nature of Speech Sounds Phonemes are the smallest unit of sound. In any given language words are formed by combining these phonemes. English has approximately 40 different phonemes that are defined in terms of what is perceived, rather than in terms of acoustic patterns. Phonemes are abstract, subjective entities that are often specified in terms of how they are produced. Alone they have no meaning, but in combination they form words (Moore, 1997). In speech there are vowels and consonants. Consonants are produced by constricting the vocal tract at some point along its length. These sounds are classified into different types according to the degree and nature of the constriction. The types are stops, affricates, fricatives, nasals, and approximants. Vowels are usually voiced and are relatively stable over time Moore, 1997). Categorical Perception Categorical perception implies definite identification of the stimuli. The main point in this area is that the listener can only correctly distinguish speech sounds to the extent that they are identified as different phonemes. Small changes to the acoustical signal may make little difference to the way the sound is perceived, yet other changes which are equally as small may produce a distinct change, altering the phoneme identity. People do not hear changes within one phoneme category. Only changes from one phoneme to another phoneme are detected (Lobacz, 1984). Although categorical perception generally is considered to reflect the operation of a special speech decoder, there is a strong indication that categorical perception can also occur in non-speech signals. Musicians are a good example of this. The discrimination performance of musicians was better for frequency changes that revised the identity of the chord than for changes that did not alter the identity (Moore, 1997). Categorical perception is not unique to speech, however it appears more frequently with speech than with non-speech signals. There are three possible explanations for categorical perception. The first explanation suggests that consonants and vowels may be explained in terms of differences in the extent to which the acoustic patterns can be retained in auditory memory. Consonant sounds have a lower intensity than vowels, fluctuate more rapidly, and last for a shorter time than vowels. Therefore, the acoustic patterns of consonants frequently decay rapidly. Another explanation is that boundaries, which separate one speech sound from another, tend to lie at a point where discrimination is optimal. The last explanation is that it comes from experience with a person's own language. In this explanation it is believed that a person learns to attend to acoustic differences that affect the meaning of a word and ignore the differences that do not affect the meaning. The natural consequence of this is categorical perception (Moore, 1997). Brain Specialization Language functions are unilaterally represented in one of the two hemispheres. It is most commonly found in the left hemisphere. Therefore, the right ear will identify speech stimuli better than the left ear. This occurs because the neural pathways cross from the ear to the brain (Studdert-Kennedy and Shankweiler, 1970). Interestingly, the left ear will detect melodies better than the right ear. Speech is more readily decoded in the left hemisphere than in the right cerebral hemisphere. This is evident in people with brain lesions. The left hemisphere plays a primary role in speech perception (Moore, 1997). Speech Mode Speech mode is the perception of the restructured phonemes. If phonemes are encoded syllabically, they must be recovered in perception by a suitable decoder. Liberman (1996) stated that perception of phonemes that have been encoded may be expected to differ from the perception of the phonemes that have not been encoded and from non-speech. For example, the transition cues for /d/ in /di/ and /du/ sound like whistles when taken out of speech context. They do not sound like speech or like each other. This example could include transition cues from many other phonemes. With simplified speech of this kind, the listener's perception is greatly dependent upon whether the listener is in speech mode. It has been found that stimuli with spectral and temporal properties similar to those of speech are learned more readily than stimuli that is simplified, provided that the speech-like stimuli is identified as speech by the listener. Processes different from those underlying the perception of other sounds characterize speech mode. It is strengthened by recent findings that speech and non-speech sounds are processed primarily in different cerebral hemispheres of the brain (Liberman, 1996). According to Moore (1997), speech mode is unusual in that it operates for an entire class of highly complex and varied acoustic signals, whose main feature is that a human vocal tract produced them. Cue Trading Several cues may signal a single phonetic contrast. Therefore, it is possible to demonstrate that when the perceptual utility of one cue is attenuated, another cue may take on principal effectiveness in signaling the contrast under scrutiny because both cues are equal. This is defined as phonetic trading relation (Luce & Pisoni, 1986). In natural speech almost every phonetic contrast is cued by numerous distinct acoustic properties of the speech signal. According to Moore (1997), a change in the setting or value of one cue, which leads to a change in the phonetic perception, can be offset by an opposed setting of a change in another cue so as to maintain the original phonetic perception. This is referred to as cue trading or phonetic trading. Cue trading generally occurs in speech stimuli, however one should not assume that trading relations never occur for non-speech stimuli. Evidence has shown that trading relations can be found for stimuli that have some speech like properties but are not actually perceived as speech. The reality that trading relations differ depending on whether stimuli are perceived as speech or non-speech, provides great support for the concept of a speech mode of perception (Moore, 1997). Audiovisual Integration Speech perception is not solely dependent upon what we hear. Other factors such as sight play a major role in perception. For example, when observers are presented acoustically with /ba/, but see a face saying /de/, they will often perceive the sound as /da/. This sound is derived from combining the consonant that they saw and the vowel that they heard. This result is typically experienced as slightly imperfect by comparison with the normal case in which acoustical and optical stimuli are in agreement. The observers cannot tell what the nature of the imperfection is. They are not able to say that it is because they heard one thing and saw something else being said. The conclusion is the McGurk effect. It provides strong evidence for the equivalence in phonetic perception of two different kinds of physical information. Since the acoustic and optical stimuli are providing information about the same phonetic gesture, and it is the gesture that is perceived, the McGurk phenomenon is exactly what one would expect (Liberman, 1996). It can be concluded that the movement of a speaker's face and lips can have a strong influence on perception of speech stimuli. Audiovisual integration also occurs for non-speech sounds. For example, sound localization often is influenced by vision (Moore, 1997). Models of Speech Perception There are many models of speech perception. There is not one specific model that is generally accepted. Three influential models being discussed are the motor theory, the cued based approach, and the TRACE model. Motor Theory In the motor theory the objects of speech perception are the intended phonetic gestures of the speaker. According to Liberman (1996), "they are represented in the brain as motor commands that call for movements of the articulators through certain linguistically significant configurations." The listener perceives the articulatory gesture the speaker is intending to make when producing the word or utterance. In the motor theory, speech perception and speech production are closely linked and innately specified. This model accounts for many speech perception characteristics. However, the model does not specify how the translation from the signal to the perceived gesture is accomplished, thus making the model incomplete (Liberman, 1996). The motor theory is in two ways motor. First, it is considered motor because it takes the proper object of phonetic perception to be a motor event. Secondly, it assumes that adaptations of the motor system for controlling the organs of the vocal tract took precedence in the evolution of speech (Liberman and Mattingly, 1985). Cue Based Approach In the cue based approach there is a sequence of steps of processing. The speech signal undergoes analysis in the peripheral auditory system. The next step is acoustic property detectors. This includes onset detectors, spectral change detectors, formant frequency detectors, and periodicity detectors. These detectors compute relational attributes of the signal. The next step is an array of phonetic feature detectors. They examine the set of auditory property values over a chunk of time and make decisions as to whether a particular phonetic feature is present (i.e. nasality). All of these decisions are language specific. In conclusion, it should be possible to find a relatively uniform mapping between acoustic patterns and perceived speech, as long as the acoustic patterns are analyzed in appropriate ways (Stevens, 1986). TRACE Model The TRACE model consists of a large number of units, broken down into three levels, which are the feature, phoneme, and word levels. Each of these levels contains highly interconnected processing units called nodes. TRACE accounts for several different aspects of human speech perception. Like humans, TRACE uses information from overlapping portions of the speech wave to identify successive phonemes. The model's tendency toward categorical perception is affected by many of the same parameters, which affect the degree of categorical perception shown by humans (Elman and McClelland, 1986). This model is considered a connectionist model, based on neural networks. In the lowest level, the nodes represent the phonetic features. In the second level the nodes represent the phonetic segments. Lastly, the nodes represent the words. When a particular level of activation is reached the nodes are fired, which indicates that a feature, phoneme, or word is present (Moore, 1997). At the feature level, there are banks of detectors for each of the dimensions of speech sounds. Each bank is reproduced for several successive moments in time. At the word level there are detectors for every word. The detectors are replicated across time slices. Units with adjacent centers span overlapping ranges of slices (Elman and McClelland, 1986). When a node fires, activation is passed along to connected nodes. Excitatory links exist between nodes at different levels, which can cause a node at the next level to fire. There are also inhibitory links between nodes within the same level, which allows highly activated nodes to inhibit competitive nodes with less activity. This results in one node taking all the activity. The flow of activation is not just from the feature detectors to the word level. The excitatory activation flows in both directions, which allows for information gathered at the word level to influence phonetic identification (Moore, 1997). Like humans the TRACE cannot identify a word until it has heard part of the next word. It can, however, better determine a where a word will begin when it is preceded by a word rather than a non-word. Although the model is influenced by word beginnings, it can recover from underspecification or distortion of a word's beginning. The model is able to use activations of phoneme units in one part of the TRACE to adjust the connection strengths determining which feature will activate which phoneme. This model is called the TRACE because the pattern of activation left by a speech input is a trace of the analysis of the input at each of the levels (Elman and McClelland, 1986). Resistance of Speech to Corrupting Influences One factor that can greatly affect speech perception is background noise. For satisfactory communication, the signal to noise ratio should be +6dB. When this does not occur, speech perception drastically drops. Moore (1997) stated that at a 0dB signal to noise ratio word articulation scores reach 50%. A second factor, which may affect speech perception, is a change in frequency spectrum. Many transmissions only pass a certain range of frequencies. This may leave some speech signals out since information by the speech wave is not confined to any particular frequency range. A third factor is peak clipping. If an amplifier is overloaded then the peaks of the waves may be flattened off, thus causing a loss in some of the speech signal. This degrades the quality and naturalness of speech, but does not greatly affect the intelligibility of speech (Moore, 1997). Conclusion When discussing speech perception, one is seldom really concerned about perception of speech alone, but in fact about essential aspects of language. Speech is a complex stimulus varying in both frequency and time. A basic problem in the study of speech perception is to relate speech wave properties to specific linguistic units. A second problem is finding cues in the acoustic waveform that clearly indicates a particular linguistic unit. Often times, a phoneme will only correctly be identified if information obtained from a word or syllable is utilized. Speech is perceived and processed in a different way from non-speech stimuli, called speech mode. Speech intelligibility is relatively unaffected by severe distortions of the signal. Speech is an effective method of communication, which remains reliable under difficult conditions (Moore, 1997).
Essay Writing Checklist
The following guidelines are designed to give students a checklist to use, whether they are revising individually or as part of a peer review team.
Introduction
  • Is the main idea (i.e., the writer's opinion of the story title) stated clearly?
  • Is the introductory paragraph interesting? Does it make the reader want to keep on reading?
Body Paragraph
  • Does each body paragraph have a clear topic sentence that is related to the main idea of the essay?
  • Does each body paragraph include specific information from the text(including quoted evidence from the text, if required by the instructor)that supports the topic sentence?
  • Is there a clear plan for the order of the body paragraphs (i.e., order of importance, chronology in the story, etc.)?
  • Does each body paragraph transition smoothly to the next?
Conclusion
  • Is the main idea of the essay restated in different words?
  • Are the supporting ideas summarized succinctly and clearly?
  • Is the concluding paragraph interesting? Does it leave an impression on the reader?
Overall Essay
  • Is any important material left unsaid?
  • Is any material repetitious and unnecessary?
  • Has the writer tried to incorporate "voice" in the essay so that it has his/her distinctive mark?
  • Are there changes needed in word choice, sentence length and structure, etc.?
  • Are the quotations (if required) properly cited?
  • Has the essay been proofread for spelling, punctuation, grammar, etc.?
  • Does the essay have an interesting and appropriate title?
The Analysis of Speech Perception
Trending Essay Topics
Explore today's trending essay topics:
Reference
Feel free to use content on this page for your website, blog or paper we only ask that you reference content back to us. Use the following code to link this page:
Terms · Privacy · Contact
Essay Topics © 2020

The Analysis Of Speech Perception

Words: 2289    Pages: 8    Paragraphs: 22    Sentences: 143    Read Time: 08:19
Highlight Text to add correction. Use an editor to spell check essay.
              Speech perception is the ability to comprehend speech through listening. Mankind is constantly being bombarded by acoustical energy. The challenge to humanity is to translate this energy into meaningful data. Speech perception is not dependent on the extraction of simple invariant acoustic patterns in the speech waveform. The sound's acoustic pattern is complex and greatly varies. It is dependent upon the preceding and following sounds (Moore, 1997). According to Fant (1973), speech perception is a process consisting of both successive and concurrent identification on a series of progressively more abstract levels of linguistic structure.
             
              Nature of Speech Sounds
             
              Phonemes are the smallest unit of sound. In any given language words are formed by combining these phonemes. English has approximately 40 different phonemes that are defined in terms of what is perceived, rather than in terms of acoustic patterns. Phonemes are abstract, subjective entities that are often specified in terms of how they are produced. Alone they have no meaning, but in combination they form words (Moore, 1997).
              In speech there are vowels and consonants. Consonants are produced by constricting the vocal tract at some point along its length. These sounds are classified into different types according to the degree and nature of the constriction. The types are stops, affricates, fricatives, nasals, and approximants. Vowels are usually voiced and are relatively stable over time Moore, 1997).
             
              Categorical Perception
             
              Categorical perception implies definite identification of the stimuli. The main point in this area is that the listener can only correctly distinguish speech sounds to the extent that they are identified as different phonemes. Small changes to the acoustical signal may make little difference to the way the sound is perceived, yet other changes which are equally as small may produce a distinct change, altering the phoneme identity. People do not hear changes within one phoneme category. Only changes from one phoneme to another phoneme are detected (Lobacz, 1984).
             
              Although categorical perception generally is considered to reflect the operation of a special speech decoder, there is a strong indication that categorical perception can also occur in non-speech signals. Musicians are a good example of this. The discrimination performance of musicians was better for frequency changes that revised the identity of the chord than for changes that did not alter the identity (Moore, 1997). Categorical perception is not unique to speech, however it appears more frequently with speech than with non-speech signals.
             
              There are three possible explanations for categorical perception. The first explanation suggests that consonants and vowels may be explained in terms of differences in the extent to which the acoustic patterns can be retained in auditory memory. Consonant sounds have a lower intensity than vowels, fluctuate more rapidly, and last for a shorter time than vowels. Therefore, the acoustic patterns of consonants frequently decay rapidly. Another explanation is that boundaries, which separate one speech sound from another, tend to lie at a point where discrimination is optimal. The last explanation is that it comes from experience with a person's own language. In this explanation it is believed that a person learns to attend to acoustic differences that affect the meaning of a word and ignore the differences that do not affect the meaning. The natural consequence of this is categorical perception (Moore, 1997).
             
              Brain Specialization
             
              Language functions are unilaterally represented in one of the two hemispheres. It is most commonly found in the left hemisphere. Therefore, the right ear will identify speech stimuli better than the left ear. This occurs because the neural pathways cross from the ear to the brain (Studdert-Kennedy and Shankweiler, 1970). Interestingly, the left ear will detect melodies better than the right ear. Speech is more readily decoded in the left hemisphere than in the right cerebral hemisphere. This is evident in people with brain lesions. The left hemisphere plays a primary role in speech perception (Moore, 1997).
             
              Speech Mode
             
              Speech mode is the perception of the restructured phonemes. If phonemes are encoded syllabically, they must be recovered in perception by a suitable decoder. Liberman (1996) stated that perception of phonemes that have been encoded may be expected to differ from the perception of the phonemes that have not been encoded and from non-speech. For example, the transition cues for /d/ in /di/ and /du/ sound like whistles when taken out of speech context. They do not sound like speech or like each other. This example could include transition cues from many other phonemes. With simplified speech of this kind, the listener's perception is greatly dependent upon whether the listener is in speech mode. It has been found that stimuli with spectral and temporal properties similar to those of speech are learned more readily than stimuli that is simplified, provided that the speech-like stimuli is identified as speech by the listener. Processes different from those underlying the perception of other sounds characterize speech mode. It is strengthened by recent findings that speech and non-speech sounds are processed primarily in different cerebral hemispheres of the brain (Liberman, 1996). According to Moore (1997), speech mode is unusual in that it operates for an entire class of highly complex and varied acoustic signals, whose main feature is that a human vocal tract produced them.
             
              Cue Trading
             
              Several cues may signal a single phonetic contrast. Therefore, it is possible to demonstrate that when the perceptual utility of one cue is attenuated, another cue may take on principal effectiveness in signaling the contrast under scrutiny because both cues are equal. This is defined as phonetic trading relation (Luce & Pisoni, 1986). In natural speech almost every phonetic contrast is cued by numerous distinct acoustic properties of the speech signal. According to Moore (1997), a change in the setting or value of one cue, which leads to a change in the phonetic perception, can be offset by an opposed setting of a change in another cue so as to maintain the original phonetic perception. This is referred to as cue trading or phonetic trading. Cue trading generally occurs in speech stimuli, however one should not assume that trading relations never occur for non-speech stimuli. Evidence has shown that trading relations can be found for stimuli that have some speech like properties but are not actually perceived as speech. The reality that trading relations differ depending on whether stimuli are perceived as speech or non-speech, provides great support for the concept of a speech mode of perception (Moore, 1997).
             
              Audiovisual Integration
             
              Speech perception is not solely dependent upon what we hear. Other factors such as sight play a major role in perception. For example, when observers are presented acoustically with /ba/, but see a face saying /de/, they will often perceive the sound as /da/. This sound is derived from combining the consonant that they saw and the vowel that they heard. This result is typically experienced as slightly imperfect by comparison with the normal case in which acoustical and optical stimuli are in agreement. The observers cannot tell what the nature of the imperfection is. They are not able to say that it is because they heard one thing and saw something else being said. The conclusion is the McGurk effect. It provides strong evidence for the equivalence in phonetic perception of two different kinds of physical information. Since the acoustic and optical stimuli are providing information about the same phonetic gesture, and it is the gesture that is perceived, the McGurk phenomenon is exactly what one would expect (Liberman, 1996).
             
              It can be concluded that the movement of a speaker's face and lips can have a strong influence on perception of speech stimuli. Audiovisual integration also occurs for non-speech sounds. For example, sound localization often is influenced by vision (Moore, 1997).
             
              Models of Speech Perception
             
              There are many models of speech perception. There is not one specific model that is generally accepted. Three influential models being discussed are the motor theory, the cued based approach, and the TRACE model.
             
              Motor Theory
             
              In the motor theory the objects of speech perception are the intended phonetic gestures of the speaker. According to Liberman (1996), "they are represented in the brain as motor commands that call for movements of the articulators through certain linguistically significant configurations. " The listener perceives the articulatory gesture the speaker is intending to make when producing the word or utterance. In the motor theory, speech perception and speech production are closely linked and innately specified. This model accounts for many speech perception characteristics. However, the model does not specify how the translation from the signal to the perceived gesture is accomplished, thus making the model incomplete (Liberman, 1996). The motor theory is in two ways motor. First, it is considered motor because it takes the proper object of phonetic perception to be a motor event. Secondly, it assumes that adaptations of the motor system for controlling the organs of the vocal tract took precedence in the evolution of speech (Liberman and Mattingly, 1985).
             
              Cue Based Approach
             
              In the cue based approach there is a sequence of steps of processing. The speech signal undergoes analysis in the peripheral auditory system. The next step is acoustic property detectors. This includes onset detectors, spectral change detectors, formant frequency detectors, and periodicity detectors. These detectors compute relational attributes of the signal. The next step is an array of phonetic feature detectors. They examine the set of auditory property values over a chunk of time and make decisions as to whether a particular phonetic feature is present (i. e. nasality). All of these decisions are language specific. In conclusion, it should be possible to find a relatively uniform mapping between acoustic patterns and perceived speech, as long as the acoustic patterns are analyzed in appropriate ways (Stevens, 1986).
             
              TRACE Model
             
              The TRACE model consists of a large number of units, broken down into three levels, which are the feature, phoneme, and word levels. Each of these levels contains highly interconnected processing units called nodes. TRACE accounts for several different aspects of human speech perception. Like humans, TRACE uses information from overlapping portions of the speech wave to identify successive phonemes. The model's tendency toward categorical perception is affected by many of the same parameters, which affect the degree of categorical perception shown by humans (Elman and McClelland, 1986). This model is considered a connectionist model, based on neural networks. In the lowest level, the nodes represent the phonetic features. In the second level the nodes represent the phonetic segments. Lastly, the nodes represent the words. When a particular level of activation is reached the nodes are fired, which indicates that a feature, phoneme, or word is present (Moore, 1997).
             
              At the feature level, there are banks of detectors for each of the dimensions of speech sounds. Each bank is reproduced for several successive moments in time. At the word level there are detectors for every word. The detectors are replicated across time slices. Units with adjacent centers span overlapping ranges of slices (Elman and McClelland, 1986).
             
              When a node fires, activation is passed along to connected nodes. Excitatory links exist between nodes at different levels, which can cause a node at the next level to fire. There are also inhibitory links between nodes within the same level, which allows highly activated nodes to inhibit competitive nodes with less activity. This results in one node taking all the activity. The flow of activation is not just from the feature detectors to the word level. The excitatory activation flows in both directions, which allows for information gathered at the word level to influence phonetic identification (Moore, 1997).
             
              Like humans the TRACE cannot identify a word until it has heard part of the next word. It can, however, better determine a where a word will begin when it is preceded by a word rather than a non-word. Although the model is influenced by word beginnings, it can recover from underspecification or distortion of a word's beginning. The model is able to use activations of phoneme units in one part of the TRACE to adjust the connection strengths determining which feature will activate which phoneme. This model is called the TRACE because the pattern of activation left by a speech input is a trace of the analysis of the input at each of the levels (Elman and McClelland, 1986).
             
              Resistance of Speech to Corrupting Influences
             
              One factor that can greatly affect speech perception is background noise. For satisfactory communication, the signal to noise ratio should be +6dB. When this does not occur, speech perception drastically drops. Moore (1997) stated that at a 0dB signal to noise ratio word articulation scores reach 50%.
             
              A second factor, which may affect speech perception, is a change in frequency spectrum. Many transmissions only pass a certain range of frequencies. This may leave some speech signals out since information by the speech wave is not confined to any particular frequency range.
             
              A third factor is peak clipping. If an amplifier is overloaded then the peaks of the waves may be flattened off, thus causing a loss in some of the speech signal. This degrades the quality and naturalness of speech, but does not greatly affect the intelligibility of speech (Moore, 1997).
             
              Conclusion
             
              When discussing speech perception, one is seldom really concerned about perception of speech alone, but in fact about essential aspects of language. Speech is a complex stimulus varying in both frequency and time. A basic problem in the study of speech perception is to relate speech wave properties to specific linguistic units. A second problem is finding cues in the acoustic waveform that clearly indicates a particular linguistic unit. Often times, a phoneme will only correctly be identified if information obtained from a word or syllable is utilized. Speech is perceived and processed in a different way from non-speech stimuli, called speech mode. Speech intelligibility is relatively unaffected by severe distortions of the signal. Speech is an effective method of communication, which remains reliable under difficult conditions (Moore, 1997).
Persuasive Essay 
Tip: Use our Essay Rewriter to rewrite this essay and remove plagiarism.

Add Notes

Have suggestions, comments or ideas? Please share below. Don't forget to tag a friend or classmate.
clear
Formatting Help
Submit