Well there has certainly been a lot of publicity for the laurel/yanny clip recently. It is great to have so many people discussing speech and speech perception – but also a little disheartening that so much misinformation gets accepted as valid phonetics.
For those who don’t know (where have you been?!?), it is all about an audio clip of a man saying the word ‘laurel’ (listen below) – and the fact that a surprising number of people claim to hear it not as ‘laurel’ but as ‘yanny’.
The original audio came from vocabulary.com, a site aiming to help people learn the meaning and pronunciation of lots of words. Here’s the relevant clip. Notice that it was pronounced clearly (by an opera singer, no less) in isolation, and recorded under good conditions.
Let’s get a few of the misconceptions out of the way, before going on to consider some possible explanations and discuss some broader issues, especially how (if at all) this example relates to forensic transcription.
- This is not an ambiguous audio clip. The word ‘laurel’ is not likely to be consistently confused with ‘yanny’ even in a noisy recording – and this is a nice clear recording. In its original form on vocabulary.com, it is quite clearly ‘laurel’ (the many unclear versions floating around the internet have been created by audio geeks trying to explain the phenomenon by manipulating the audio – see more on this below).
- Differences in how people hear this clip do not and could not relate to general interpersonal variation in habitual ‘style of hearing’. Basically someone who genuinely and consistently confused words like ‘laurel’ and ‘yanny’ would be unable to speak English, since the differences between these words are the very ones that allow us to differentiate hundreds and thousands of other common words.
- Differences in how people hear this clip do not and could not relate to age-related hearing loss. Age-related hearing loss is real and important – but it’s not at play here.
- For one thing, all the relevant acoustic information in this word is well below 5kHz (see spectrogram below), which is way lower than normal age-related hearing loss.
- For another thing, even if it were true (which it isn’t) that young people were generally better able than older folks to hear the frequencies of ‘y’-like sounds (like the beginning of ‘yanny’), the age-related-hearing-loss hypothesis requires us to believe that young people are less able to hear ‘l/r’-like frequencies, which is kind of absurd.
Here’s a spectrogram of the recording in the video above. Notice how nice and clear it is. Notice the very definite /l/ at the beginning, with a nice release burst before the vowel. Notice the dramatic dip in the third formant, classic for /r/. Notice the first and second formants are close together throughout, expected for /l/ and /ɔ/.
So what happened to make this nice clear ‘laurel’ into ‘yanny’?
(As gleaned from various websites – see links below).
A high-school student in Georgia, USA (Katie Hetzel), doing her homework, was listening to words from her lesson on vocabulary.com. When she clicked to hear the next word on the list (so she could define it), she was surprised to hear ‘yanny’, which was not part of her homework. Then she was more surprised to find the word was supposed to say ‘laurel’. At this point I have no definite explanation for why she initially heard ‘yanny’ (but see below for a conjecture).
She then asked her classmates what they heard. They all said they heard ‘laurel’ – except one who agreed with Katie that it sounded like ‘yanny’. For fun they put the clip on social media to ask more friends. Then someone put a ‘vote laurel or yanny’ poll over the top. Next thing – with the help of a youtube ‘influencer’ named Cloe Feldman – the poll is going viral. Hear Cloe’s account of the whole thing here and here (Katie’s account is between 5 and 9 mins in the second video – excerpted for you in the clip below).
In no time, votes were coming in showing approx 40-45% voting for each of ‘laurel’ and ‘yanny’, with the remainder saying they alternated (at will or spontaneously) between the two.
Various explanations appeared, most attributing the differences to different listeners focusing on different frequencies in the signal.
Soon a variety of different versions of the audio appeared, manipulating different frequencies of the audio clip – of course all of these degraded and distorted the quality of the original clip. The New York Times put up a slider that allowed listeners to move through various different versions, progressively more and more different from the original.
Explaining the illusion
Laurel/yanny is not a normal phonetic confusion. There’s no way an unbiased perception test would yield anything close to 50% of participants consistently hearing ‘yanny’. However, many responsible people report hearing ‘yanny’ some of the time, usually sporadically. I confess it has happened to me a few times too. It is a weird sensation. How can we explain it? I don’t know for sure but here are some thoughts.
It is notable that the third formant in the spectrogram (the horizontal black bar that starts around 2720Hz, then dips down and rises again) has very roughly a similar shape to that of the second formant of a word like yanny.
If, for whatever reason, a listener is led to interpret the third formant as the second, and the close first and second formants as the first, they might, in the right context, be led to an illusion of hearing something a bit like ‘yanny’. Maybe this is what happened to Katie when she first heard the word, and to the rest of us subsequently, influenced by her suggestion. Or maybe there’s a better explanation (if you have a suggestion, please let me know).
Of course, to be confident of this or any explanation, we’d need to be able to replicate it properly — not by distorting the sound, like the audio geeks do, but by consistently and predictably manipulating people’s perception of clear recordings of this and similar words.
Explaining the social phenomenon
Whatever the explanation for the perceptual illusion, it is unusual and sporadic, not a consistent ambiguity. The cues for ‘laurel’ are really much stronger than those for ‘yanny’. Repeated close listening should, under normal conditions, soon lead to recognition that ‘laurel’ is the right interpretation (even in the absence of external information about the origin of the clip).
Explaining the current phenomenon — a near 50/50 split in interpretations — involves some perceptual reasons but mostly psycho-social reasons, such as team spirit, peer pressure, confusion, reluctance to change from a first impression, desire to get on Ellen (see Cloe and Katie’s account above, from about 7min, and Ellen video below, from start), and so on.
What does all this have to do with forensic transcription?
Some of the phenomena that are being discussed along with laurel/yanny have heaps to do with forensic transcription. Here’s one example from Ellen:
The way a sound can go from meaningless static to a clear phrase under the influence of a prime is exactly what happens with lots of indistinct covert recordings, like the one that got a man convicted for murder based on words he didn’t say).
And the fact that priming seems so amazing and unfathomable to so many people is exactly what causes so many problems when police are allowed to present their transcripts of covert recordings to a court.
To solve those problems we really do need to get to a point where priming is considered a normal, non-amazing occurrence, that is understood, at least in general terms, by everyone.
But this laurel/yanny thing is quite different – as discussed a bit above – not least because it is a very clear recording. It’s a shame these different phenomena are all being rolled together into one big incomprehensible ‘thing’.
‘There’s no right answer – everyone can just interpret it the way they want to’
That’s a fine philosophy in lots of situations – but not when it comes to interpreting forensic audio (as I discuss also in relation to the Randy Newman example).
With forensic audio, we need to distinguish reliably between ‘audio given the right interpretation’, ‘audio given the wrong interpretation’, ‘audio whose interpretation I personally don’t know for sure‘ and ‘audio whose interpretation, in principle, no one can know for sure‘. Of course, only the first of these should be put before a jury.
If the laurel/yanny example teaches us anything about forensic transcription, it is to warn us about the dangers of letting interpretation of forensic audio be a matter for social negotiation. And the need to not just accept a ‘first impression’.
Some of the more useful links (roughly in date order)
and see also links embedded above.
The twitter thread has a lot of profanity (try viewing in reverse order, as it gets worse as days pass) but the odd glimpse of real humour. Among other gems, I have to admit I laughed at Donald Trump saying he heard covfefe.