rule 34?
apparently they make an ultrasound specifically for checking to see if your pig is pregnant. what I can’t decide is whether this falls under rule 34 or whether I’m meant to be genuinely surprised.
I feel compelled.
I have to complete an ethics quiz to be allowed do my research, but I don’t think this says what IRB means it to say: “Only qualified scientists must conduct research.”
Now of course I’m pretty sure they intend this to mean that research must only be conducted by qualified research scientists — there are to be no amateurs with clipboards in the lab. What it actually says, though, is that only those individuals who are qualified scientists are compelled to conduct research. Carpenters, plumbers, lawyers, and poets are free to conduct research at their discretion.
So in the elitist, pedantic spirit of the original rule I propose my own: Only qualified linguists may compose sentences. It’s every bit as full of shit as the original, but at least I got the modal right.
seeking voice talent
I’m currently looking for someone who might be interested in working with me on a speech synthesis research project. The gig would involve some time in the recording studio recording transcripts intended to elicit as-nearly-complete-as-possible a selection of English speech sounds in different contexts, different emotional conditions, and at different paces. Speaker should be comfortable reading copy into a mic (acting experience/training a huge plus), interested in learning something about how his/her vocal tract works, and willing to participate in a number of articulatory measurements (think tubes, electrodes, ultrasound wand placed submentally (beneath your jaw)) for surprisingly little compensation. :)
There would be two sessions:
- Session 1: high-quality sound recording as described above.
- Session 2: recording of same transcripts in a phonetics laboratory with various instrumentation installed and attached.
When we’re done you get an open-source speech synthesizer that talks in your voice. Please comment or e-mail me if you’re interested (or know someone who might be).
Exercise 4: the only applicable answer
(I’m finishing these projects out of order (wavesurfer took me longer to deal with than Praat or Audacity, so the two WaveSurfer posts have languished on the largely-completed-but-unpublished stack for a while now). Exercise 5 is below. Anyone who might be reading this and who is not, in fact, one of the professors for LSA 317: Experimental Phonology should probably move along. I’m not going to provide enough context for this to make sense, sorry. )
I can definitely see that wavesurfer would be a an incredibly useful application if it worked well. As it is, though, I’ve found it wildly frustrating (even on a mac).
how I measure speech sounds.
Every time I measure speech sounds for some project or to answer some question I find myself wishing that the LSA web site (or some similar organization — ASA?) had a few pages devoted to the nuts and bolts of measuring speech sounds: what trade-offs people make, how to make sure you’re being consistent, etc. Then Keith Johnson, I think jokingly, suggested we, the students in his LSA 07 Experimental Phonology class, should post on our blogs about how we take measurements with illustrative screenshots. I’m pretty sure he was kidding, so I’ve decided to do it.
Disclaimer: I believe that I’m following what Pam Beddor taught me, but any terrible decisions or ridiculous errors are entirely my own.
(more…)
Exercise 5: Cue Trading In Speech Perception
I’m posting this exercise out of order because the other two are in states of not-yet-quite-completion. Maybe when they’re done and posted I’ll figure out how to resort the main page.
A. Guinea pigging
toons --> twos
| %twos | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| twos | 1 | 2 | 2 | 2 | 2 | 2 | ||||||||||
| toons | 1 | 2 | 2 | 2 | 2 | 2 | 2 |
rhinoglottophilia
this is just a reminder to myself to investigate more about this when I have time. How would a direct realist (or even a motor theorist) respond to a sound change where V becomes nasalized after a voiceless fricative (in the absence of any nasal consonant) due to rhinoglottophilia? Does L&M talk about any such changes?
Come to think of it, the development of tones from voiced/voiceless stops presents a very similar problem to these perception theories.
Exercise 3: Hampster Dance
(this is the last of the LSA 317 exercises I’ll be posting here. the post date/time should reflect when I started working on this exercise, but I didn’t actually end up finishing it until late night on the last night of the institute.)
Exercise 2: Cutting up a damp skunk
Question 1
What does “skunk” sound like with /s/ gone? How do you account for this
transformation?
/skʌŋk/ with the /s/ removed now sounds like [gʌŋk]. This is because the so-called `voiced’ velar plosive in (at least American) English is actually a voiceless, unaspirated velar plosive with a lower fundamental frequency.
Question 2
What does “damp” sound like with the /s/ in front of it? How do you account for
this transformation?
/s/ + /damp/ now sounds fairly convincingly like [stæmp] (convincingly enough to fool my linguist roommates). I suspect two possible explanations for this phenomenon:
- English phonotactics prevent the listener from hearing /d/ even though it is acoustically distinct from /t/.
- The expected /d/ and /t/ sounds are, as described above, more similar than they are dissimilar and the listener uses context to distinguish the two.
These are surprisingly difficult to tease apart with the data I have available. One approach might be just to collect a few hundred /d/ and /t/ productions in various contexts, measure them, and then quantify the similarity or dissimilarity of the two sounds for the speaker. As it is, though, I don’t even have one example of a /t/ for this speaker. Another approach might be to record a speaker saying the minimal pair /tæmp/ and /dæmp/ (in a frame like ’say /tæmp/ again’ and with some distractor words so he doesn’t artificially enunciate them). Again, though, I feel I should be able to make this distinction with the available data.
A third approach might be to move some other sound in front of the /dæmp/ to see if another sound (e.g. /æ/ also causes the /d/ to sound like a [t]. I did this and it does not, for me, make /d/ sound like [t], but this also isn’t a real word so perhaps I’m not listening to it as speech.
I think there’s a clue to this puzzle in the fact that moving the /s/ too close to the /d/ (anywhere from 0 to about 40ms in my informal testing) fails to cause the changed perception — /d/ still sounds like /d/. This /d/ does not differ from a /t/ in terms of voicing and the solution isn’t one or the other of my two stated possibilities but the unification of the two. This perceptual shift is the same as /kʌŋk/ –> /gʌŋk/ in question 1. With the context removed (or, in this case, added) we here the /k/ as a /g/ because, for all intents and purposes, that context is what’s different between the two sounds.
Question 3
Swap the stops. What do the words sound like now? How do you account for these
transformations? Remember, you did nothing to interchange the nasals in these words.
I suspect that it is now supposed to sound like [stæŋk'gʌmp], but I get something more like [stæŋk'gʌŋmp]. /stæmk/ definitely sounds like [stæŋk] to me, but I can still clearly hear what sounds like an /ŋ/ before the [mp] in `gump’. This clearly, I think, is interference from English phonotactics. There’s nothing /ŋ/-like about this /m/ and there’s nothing /m/ like about this /ŋ/. Incidentally, if I remove one of the pulses from the nasal at the end of `gump’ the /ŋ/ disappears and all I can hear is the illusory /m/. This is incredibly cool.
Question 4
Is it still “damp”?
No, but now it is [stæŋk] (I think that logically follows).
Question 5
When you delete the final burst of “damp” what does the word sound like? Where
did the velar nasal go?
Now it sounds just like /dæmp/ with an unreleased /p/ (I can’t get the upper corner/unreleased diacritic to work). Since there are no longer any signs of the /kh/ release burst there are is no phonotactic motivation to perceive the /m/ as an engma.
Question 6
Can you get it to sound like “damn”? What does it sound like? Why shouldn’t this
procedure succeed in getting “damn”?
No, I can’t get it to sound like `damn’. Three (related reasons): (1) `damn’ has a longer /m/ consonant in it than `damp’ does. (2) There is too much coarticulatory information from the /p/ closure on this /m/. The way to get rid of it would be to shorten the consonant (see also reason #1). (3) One might expect that we could just chop off the end of the existing /m/, extend the remaining consonant by a dozen or so milliseconds, and have `damn’. Unfortunately, there’s also a lower pitch in the /m/ of `damn’ than in the /m/ of `damp’. I believe this is related to the duration of the segment — in the shorter /m/ there’s just no time to set up a standing wave in the oral cavity so the /m/ in `damp’ is nasal but not as bilabial as the /m/ in `damn’.
Question 7
Can you get it to sound like “gun”? What does it sound like? Why shouldn’t this
procedure succeed in getting “gun”?
No, I can’t get this to sound like “gun” either. I think the explanation is the same: the portion of the vocal tract resonating for the /n/ in `skunk’ is less complex than in the longer /n/ of `gun’.
Exercise 1: imagined syllables
blow
- [l] duration: 85.517ms
- [ə] duration: n/a
below
- [l] duration: 74.833ms
- [ə] duration: 26.088ms
Making `below’ sound like `blow’ is a fairly simple matter of removing pulses from the [l] until my percept changes. Amazingly (to me), removing the [l] completely results in a perfectly normal-sounding production of `blow’. Removing 4 pulses (for an [l] duration of 46.441ms) results in a stimulus that begins to sound convincingly like `blow’, but removing a 5th pulse (for an [l] duration of 37.03ms, essentially halving the original segment) –without modifying the [ə] in any way– resulted in a stimulus that I and my suitemates consistently hear as `blow’.
From the experience with `below’ it stands to reason that extending the [l] in `blow’ by ~15 seconds (bringing it up to the original duration of [ə] + [l] in `below’) will force the percept to change. And, in fact, I found that my perception of the stimulus “switched-over” from `blow’ to `below’ when I added the pulse that extended the duration from 94.098ms to 103.486ms. My listeners experienced the same perceptual change. I resisted the urge to set up a same/different task with the various stimuli, but I’d be really interested to see the results.
driver
- [ɹ] duration: 42.350ms
- [ə] duration: n/a
deriver
- [ɹ] duration: 84.487ms
- [ə] duration: 0ms
My listeners heard `deriver’ as `driver’ even before I’d made any changes (or measurements!). I suspect that this is due to (a) the fact that I played `deriver’ first (2) I didn’t let them see the words written out at any point and (iii) the vastly mismatched distributional frequency between these tokens (for example, 200,000,000 google hits for `driver’ versus 785,000 for `deriver’ (and many of the latter appear to be puns on the similarity of `deriver’ to `driver’ — e.g. deriver’s ed and deriver’s license and backseat deriver)). As such, I predicted, prior to measuring, that the modifications required to turn `deriver’ into `driver’ will be minimal and the required changes to convert `driver’ to `deriver’ will be correspondingly huge.
If there is a [ə] or schwa-like reduced vowel in this recording of `deriver’ I can’t find it; F3 descends almost immediately after the release burst of the [d]. There is clearly some additional component present in the first 3 pulses of the [ɹ] that lowers the overall amplitude of the waveform, but I’m hard-pressed to describe this as distinct from the rest of the consonant. Still, I was quite surprised to find that the salient difference between these two tokens seems to be simply that the [ɹ] segment in `deriver’ is twice as long as the [ɹ] segment in `driver’. This suggests to me that simply halving the segment in `deriver’ or doubling it in `driver’ should change the percept of those tokens (to the extent that they were differently-perceived to begin with).
In practice, though, `deriver’ became `driver’ for me when I’d shortened the [ɹ]-like segment to 65.734ms. `driver’, by contrast, did not switch-over to `deriver’ for me until I’d increased the [ɹ]’s duration to 113.099ms. I note that this is in-line with my original expectations (`deriver’ was harder to manufacture than `driver’), but not especially out of line with my revised predictions after measuring. The duration required to turn `driver’ to `deriver’ is almost twice the duration required to turn `deriver’ into `driver’ — the values aren’t what I’d anticipated, but the ratio isn’t terribly far off from the original tokens.
lightning
- [n] duration: 73.825.869ms
- closure: 70.441ms
lightening
- [n] duration: 96.651ms
- closure: 58.000ms
Taken together, the stop closure + nasal durations for `lightning’ and `lightening’ are quite similar (144.266ms total for `lightning’ and 154.651ms for `lightening’). There appears to be more voicing in the closure for `lightening’ than for `lightning’ and I initially suspected that this might be what is perceived as the reduced vowel in the extra syllable of that word. I decided to test this hypothesis by adding 10ms of silence to the stop closure in `lightning’.
In fact, adding ten milliseconds of silence to the closure (for 81.512ms of closure) does change my percept of `lightning’ to `lightening’ when played in an A/B comparison and a slightly longer closure of 83.625ms also changed it for my listeners. Editing `lightening’ down to `lightning’ should be similarly straight-forward, but I found removing a selection from this aperiodic section somewhat more difficult than removing and [l] or an [ɹ]. I was eventually able to remove 12ms of closure, though, and the percept of `lightening’ (both for me and for my listeners) changed to `lightning’.
discussion
I’m going to have to go back and take another look at Steve Parker’s dissertation “Quantifying the sonority hierarchy” because this exercise dramatically reduces the (already very low) credulity with which I’m willing to entertain such concepts as “sonority” and “syllables”. It’s clearly the case from these experiments that something like the syllable is perceptually real. Since we can change one percept to another simply by inserting an intervening segment of the appropriate duration (one of which was padded with silence!), it must be the case that our minds use some combination of acoustic content and duration when making perception distinctions. What surprises me is the apparent importance of duration when so much phonological theory (and work in speech recognition and synthesis) emphasizes the primacy of the waveform.
I wonder if it would be possible to come up with minimal-pair type experiments like this that would make it possible to test the salience of acoustic versus articulatory cues for perceptual distinctions. I can’t think of a minimal pair, though, that might allow me to contrast these.