August 13, 2015
Written
by Maximus Peperkamp, M.S. Verbal Engineer
Dear Reader,
This writing is my thirteenth response to the paper “Talker-specific learning in
speech perception” by Nygaard and Pisoni (1998). The researchers are unknowingly honing in on
what I call Sound Verbal Behavior (SVB), when they state “The crucial research question
then becomes whether, given experience with the particular aspects of the
speech signal relating to talker identity, it follows that listeners also
become sensitive to talker-specific linguistic properties.” What is said in SVB is easier understood
because of how it is said and how the
speaker sounds determines whether what the speaker says makes sense to the
listener. However, the opposite is also true for Noxious Verbal Behavior (NVB), in
which “particular aspects of the speech signal relating to talker identity”, the sound of the speaker’s voice, because of its aversive nature, gets more
attention than the “talker-specific linguistic properties.” It requires the listener's effort to relate these “aspects of talker identity” with
“talker-specific linguistic properties”, if the speaker expresses NVB, but if
the speaker has SVB, no such effort is needed.
Unaware of the SVB/NVB
distinction, the researchers interpret various investigations done by others by
stating “Taken together, these practice effects with synthetic and compressed
speech suggest that the speech processing system is capable of adjusting to a variety
of distortions, both synthetic and natural, that occur in the acoustic signal.”
Only in NVB there is, because of the aversive-sounding voice of the speaker, a
need for the listener to adjust “to a variety of distortions”, but in SVB there
is no need to adjust to the speaker’s “acoustic signal” as there is no
aversive stimulation.
In SVB the speaker’s voice only has an appetitive effect
on the listener. Not capturing this affective quality of the speaker’s
vocalization, these authors believe that “Each talker’s vocal style shapes the acoustic
realization of linguistic constituents in different but systematic and
predictable ways.” With knowledge about the SVB/NVB distinction, however, is is clear that SVB speakers
shape in a SVB way and NVB speakers shape in a NVB way. How the
speaker sounds influences the negative or positive emotional experiences of the
listener and indeed shape “the acoustic realization of linguistic constituents
in different but systematic and predictable ways.”
Tossing out the emotional
influence of how we talk is characteristic for NVB. In SVB, listeners can recognize
this influence, because they can, as speakers, articulate and explore the positive
or the negative emotions, which they experience because of how the speaker
sounds. In other words, in SVB there is feedback from the listener to the
speaker. Rather than the listener making an effort to adjust to the speaker,
the SVB speaker effortlessly adjusts to the listener. Effortless adjustment of
the speaker to the listener occurs because the speaker listens to him or
herself while he or she speaks.
As the speaker listens
to him or herself and experiences
him or herself, he or she estimates how listeners other than the speaker are experiencing him or her. As the speaker listens
to him or herself more often, his or her ability to accurately assess whether
his voice is having a positive or a negative effect on the experience of other
listeners reaches increasingly higher levels of accuracy. As a consequence, SVB is continued, even
under the worst circumstances. This writer has reached this point and believes that others can reach it too.
What used to be a big problem turned around and made his
life enjoyable. He used to feel very troubled and upset by how others were
sounding, but as he now recognizes NVB as NVB and SVB as SVB, he is able to
withdraw from NVB and approach and attract SVB. The problems of NVB can be avoided only if
we recognize NVB as NVB. For most of us this is not the case since we also don’t
recognize SVB as SVB. Once we discriminate this distinction, NVB will decrease and
SVB will increase. Our ignorance about this distinction prevents SVB and
perpetuates NVB.
These researchers don’t know the SVB/NVB distinction and their reasoning is by default based on NVB. Since they are scientists, they find some pieces of
the puzzle. “Nevertheless, perceptual adaptation to individual talkers’ voices,
as mentioned previously, has traditionally been cast as a problem of
eliminating variation due to individual differences in speakers’ voices from
underlying linguistic constants, rather than as a perceptual learning process
in which listeners become attuned to properties of the speech signal which
subserve both talker identification and linguistic processing". This “perceptual learning process in which listeners become attuned” describes SVB. Note that the “problem of eliminating variation due to
individual differences in speaker’s voices” is not the speaker’s problem, but
the listener’s problem. It is, of course, also the speaker’s problem, but it doesn't seem that way as long as he or she can get away with forcing others to listen to him or her and not listening to him or herself. The speaker’s problem
becomes the listener’s problem in NVB. When the speaker recognizes his or her
own NVB, he or she changes his or her sound and influences the listener who is
him or her self and other listeners positively.
We keep going around in circles as long
as we treat speakers and listeners as separate entities. In SVB, in which the
speaker listens to himself while he or she speaks, the speaker is the listener. When the listener within the same skin
is not listening to the speaker, the speaker produces NVB. It is not the listener outside of the skin of the speaker, who needs to become “attuned to
properties of the speech signal which subserve both talker identification and
linguistic processing,” but it is the listener within the same skin as the
speaker, who needs to become “attuned.”
In the discussion section of the paper,
the authors mention that “Individual listener performance across training
groups ranged from 28% correct for the poorest learner to 97% correct for the
best learner, after 9 days of training. This finding suggests that simple
exposure to the set of voices over the 9-day period was not sufficient for
perceptual learning of talkers’ voices to occur.” Much more is
involved in real life when we the listener adjust to how the speaker sounds.
Generally speaking, someone who has experienced more SVB in his or her
behavioral history adjusts to someone with NVB less easily than someone
with more NVB. However, someone with more NVB in his or her behavioral
history will more easily adjust to someone who has more SVB in his or her
behavioral history. The authors clearly have no clue about this. “Given
that these listeners differed in their ability to learn the voices, it was
possible to characterize some of the listeners as “good”
voice learners and others as “poor” voice learners.”
What is a “good” or a
“poor” voice learner, depends from whether one reasons from a SVB or a NVB
perspective. From a SVB perspective, the “poor” voice learner, although he or
she may be impaired by the speaker, recognizes NVB as NVB and is therefore
a “good” voice learner. In other words, failure in NVB implies success in SVB. Many things are upside down because of how we talk. Only from a NVB perspective, the person who is distracted by
the NVB speaker is called a “poor” voice learner.
Without the SVB/NVB
distinction we are bound to draw many wrong conclusions. ““Good” learners
improved to a much greater extent than did “poor” learners. This divergence
suggests that through practice in categorizing and explicitly identifying voices,
“good” learners become “attuned” to the fine acoustic–phonetic details that
distinguish each talker’s voice.” From a SVB perspective, however, the fact
that “poor” learners improved to a much lesser extent indicates that they were
“attuned” to and upset about, distracted by and therefore negatively affected by the
not-so-fine aversive acoustic-phonetic details of the NVB speaker’s voice. Such
a distraction would never occur with a SVB speaker.
Many people are classified as
“poor” learners simply because they respond with fear, anxiety and stress to a
NVB speaker. ““Poor” learners do not seem to acquire the same kind of
perceptual sensitivity using these voice dimensions during this type of
laboratory training task.” However, the NVB speaker doesn’t acquire
“perceptual sensitivity” to the listener as long as or she can continue to force others
listen to him or to her. Thus, the conclusion that““Poor” learners do not seem to acquire
the same kind of perceptual sensitivity” is confounded by poor NVB
speakers. Yet, there is hope, as the authors found “it appears that both
talker-specific and listener-specific variables contribute to the eventual
identification of a talker’s voice.”
Moreover, the finding
that “Perceptual learning of a set of novel talkers’ voices caused listeners to be
better able to recover the linguistic content of the signal” experimentally
demonstrates that “the perceptual mechanisms responsible for analyzing talker
identity are not independent from the mechanisms responsible for extracting the
lexical content of an utterance from the speech wave form.” Can we finally admit that how the speaker
sounds always affects the listener’s ability to understand the speaker?
As SVB and NVB are universal subclasses of vocal verbal behavior within each language,
the authors inadvertently make indirect references to this distinction. “One explanation
of these results is that the “poor” learners did not receive sufficient training
to “fine tune” or adjust their attentional mechanisms to the relevant
talker-specific information in the signal.” This not receiving "sufficient training to fine tune" of course means they didn't engage often enough in SVB to be able to recognize NVB. Furthermore, the insufficient training also indicates that something is wrong on the
speaker’s side and not on the listener’s side. In NVB, however, it is always the listener
who is blamed for not understanding the speaker. Interestingly, the authors add
“It should be noted that the “poor” learners did not necessarily have
difficulty processing speech from a variety of talkers, but rather, when the
perceptual system was taxed, as when words were presented in noise, they were
unable to utilize their prior knowledge of each talkers’ idiosyncratic style of
speech to help recover the phonetic content and lexical information in the
signal.” They seem to be saying that in NVB speakers produce some kind of voice-noise,
which taxes the listeners’ perceptual system with their “idiosyncratic style of
speech.”