full-fledged pidgin


Exercise 1: imagined syllables

Posted in lsa317, phonetics by clunis on the July 12th, 2007

blow

  • [l] duration: 85.517ms
  • [ə] duration: n/a

below

  • [l] duration: 74.833ms
  • [ə] duration: 26.088ms

Making `below’ sound like `blow’ is a fairly simple matter of removing pulses from the [l] until my percept changes. Amazingly (to me), removing the [l] completely results in a perfectly normal-sounding production of `blow’. Removing 4 pulses (for an [l] duration of 46.441ms) results in a stimulus that begins to sound convincingly like `blow’, but removing a 5th pulse (for an [l] duration of 37.03ms, essentially halving the original segment) –without modifying the [ə] in any way– resulted in a stimulus that I and my suitemates consistently hear as `blow’.

From the experience with `below’ it stands to reason that extending the [l] in `blow’ by ~15 seconds (bringing it up to the original duration of [ə] + [l] in `below’) will force the percept to change. And, in fact, I found that my perception of the stimulus “switched-over” from `blow’ to `below’ when I added the pulse that extended the duration from 94.098ms to 103.486ms. My listeners experienced the same perceptual change. I resisted the urge to set up a same/different task with the various stimuli, but I’d be really interested to see the results.

driver

  • [ɹ] duration: 42.350ms
  • [ə] duration: n/a

deriver

  • [ɹ] duration: 84.487ms
  • [ə] duration: 0ms

My listeners heard `deriver’ as `driver’ even before I’d made any changes (or measurements!). I suspect that this is due to (a) the fact that I played `deriver’ first (2) I didn’t let them see the words written out at any point and (iii) the vastly mismatched distributional frequency between these tokens (for example, 200,000,000 google hits for `driver’ versus 785,000 for `deriver’ (and many of the latter appear to be puns on the similarity of `deriver’ to `driver’ — e.g. deriver’s ed and deriver’s license and backseat deriver)). As such, I predicted, prior to measuring, that the modifications required to turn `deriver’ into `driver’ will be minimal and the required changes to convert `driver’ to `deriver’ will be correspondingly huge.

If there is a [ə] or schwa-like reduced vowel in this recording of `deriver’ I can’t find it; F3 descends almost immediately after the release burst of the [d]. There is clearly some additional component present in the first 3 pulses of the [ɹ] that lowers the overall amplitude of the waveform, but I’m hard-pressed to describe this as distinct from the rest of the consonant. Still, I was quite surprised to find that the salient difference between these two tokens seems to be simply that the [ɹ] segment in `deriver’ is twice as long as the [ɹ] segment in `driver’. This suggests to me that simply halving the segment in `deriver’ or doubling it in `driver’ should change the percept of those tokens (to the extent that they were differently-perceived to begin with).

In practice, though, `deriver’ became `driver’ for me when I’d shortened the [ɹ]-like segment to 65.734ms. `driver’, by contrast, did not switch-over to `deriver’ for me until I’d increased the [ɹ]’s duration to 113.099ms. I note that this is in-line with my original expectations (`deriver’ was harder to manufacture than `driver’), but not especially out of line with my revised predictions after measuring. The duration required to turn `driver’ to `deriver’ is almost twice the duration required to turn `deriver’ into `driver’ — the values aren’t what I’d anticipated, but the ratio isn’t terribly far off from the original tokens.

lightning

  • [n] duration: 73.825.869ms
  • closure: 70.441ms

lightening

  • [n] duration: 96.651ms
  • closure: 58.000ms

Taken together, the stop closure + nasal durations for `lightning’ and `lightening’ are quite similar (144.266ms total for `lightning’ and 154.651ms for `lightening’). There appears to be more voicing in the closure for `lightening’ than for `lightning’ and I initially suspected that this might be what is perceived as the reduced vowel in the extra syllable of that word. I decided to test this hypothesis by adding 10ms of silence to the stop closure in `lightning’.

In fact, adding ten milliseconds of silence to the closure (for 81.512ms of closure) does change my percept of `lightning’ to `lightening’ when played in an A/B comparison and a slightly longer closure of 83.625ms also changed it for my listeners. Editing `lightening’ down to `lightning’ should be similarly straight-forward, but I found removing a selection from this aperiodic section somewhat more difficult than removing and [l] or an [ɹ]. I was eventually able to remove 12ms of closure, though, and the percept of `lightening’ (both for me and for my listeners) changed to `lightning’.

discussion

I’m going to have to go back and take another look at Steve Parker’s dissertation “Quantifying the sonority hierarchy” because this exercise dramatically reduces the (already very low) credulity with which I’m willing to entertain such concepts as “sonority” and “syllables”. It’s clearly the case from these experiments that something like the syllable is perceptually real. Since we can change one percept to another simply by inserting an intervening segment of the appropriate duration (one of which was padded with silence!), it must be the case that our minds use some combination of acoustic content and duration when making perception distinctions. What surprises me is the apparent importance of duration when so much phonological theory (and work in speech recognition and synthesis) emphasizes the primacy of the waveform.

I wonder if it would be possible to come up with minimal-pair type experiments like this that would make it possible to test the salience of acoustic versus articulatory cues for perceptual distinctions. I can’t think of a minimal pair, though, that might allow me to contrast these.

Comments are closed.