How to pronounce GIF
Possible spoiler: I don’t have an easy answer for you.
I've been piecing together a post about AI development and why claims that it's "higher order abstraction" or "natural language programming" are… well, unconvincing.
This is not that post. But while drafting the other I realized it was inevitable I would go off on a tangent about this subject. To spare you (and me), I figured I could just make it another post.
The debate
Technically, a GIF is an acronym that describes a specific image encoding format. It stands for Graphics Interchange Format.
Originally – and we're talking very originally here – "GIF" was pronounced with a soft g (/dʒ/) as in "giraffe." This was GIF team lead Steve Wilhite's preference. But the word "graphics" has a hard g (/ɡ/) sound – so a lot of folks use /ɡ/.
This debate has raged for years. I hoped we would eventually drop it as the image format became obsolete. Unfortunately, the term "GIF" has now entered popular use to mean a small, looping video typically used for memes or reaction images in chat. Many things people call "gifs" aren't actually – they are PNG, WebP, AVIF, or even very short mpeg4 videos. "GIF" now has meaning beyond its technical purpose, so we will be debating pronunciation for the foreseeable.
What no one seems to ask is: why do we have two different pronunciations ("phonemes") for the letter "g" in the first place? Why use g for /dʒ/ and not j, which is always only /dʒ/?
Tale of Two Gs
Language is not an engineered construct; words and sounds change over time and geography. The above justifications for different pronunciations are not necessarily rational decisions. I suspect they are likely rationalizations, or explanations created to explain a choice that we didn't necessarily make intentionally. This may even be the case where Wilhite is concerned; he felt like GIF should be pronounced /dʒ/, then justified that to himself later.
If you go looking for words with the "gi" construction in English, you'll find that pronunciation seems random:
- Giant
- Girl
- Giraffe
- Ginger
- Gift
The different pronunciation between "girl" and "giraffe" is particularly interesting because they share gir in common. There is a pattern, however.
English is an odd bird of a language because it's strongly influenced by two separate root languages. Old English – a language we would not recognize today, even written down – was a Germanic language.
When England was conquered by the Duke of Normandy (a.k.a "William the Conqueror"), Norman French became the language of the educated elite and business. French is a Romantic language, which means its root is Latin. This was the language of the Church in England.
Under this two-pronged, heavy-handed cultural influence, somewhere between 70%-80% of Old English words fell out of use, often replaced by French equivalents.
This matters when we talk about the phonemes of g because the pronunciation of g drifted from /ɡ/ to /dʒ/ in the Romantic languages but not the Germanic ones. "Girl" and "Gift" are Old English holdovers; the soft-g words came to us through Romantic languages.
What this means for "GIF" is this: both /ɡ/ and /dʒ/ feel natural to English speakers when followed by the letter "i", and it's something of a die-roll which one someone will settle on first.
But Wilhite said…
Few people care.
Most of us don't know where the words we use come from. I had to look up the etymology of gift, girl, giraffe, ginger, and giant myself, and those words came through so many transformations you can't put your finger on who the "inventor" was. I suspect most people who use the word "gif" today don't know (or care) it refers to a very specific way of encoding images, much less what Wilhite's preferences were. Language just does not work that way.
Both pronunciations are broadly acceptable, although apparently /ɡ/ is somewhat more common. Many words have different pronunciations based on cultural identity, geography, or even snobbery. Attempts to enforce a specific pronunciation are not rooted in fact but in a desire to dominate and control other people's language, an impulse we see (for example) in correcting Southern American's pronunciation of "nuclear" ("nook-you-ler") or the American pronunciation and spelling of "aluminum."
For my part, I tend to use the two interchangeably, adopting the pronunciation of the group I am with. This is called "accommodation," and it arises from a desire or need to fit in with one's peer group. My subconscious tendency to do this with many words (like "aunt") likely comes from my childhood growing up on military bases surrounded by people who spoke with many accents and dialects; it's a behavior that's not uncommon among military brats.
The "divergence" approach – intentionally maintaining a pronunciation separate from your peers – can be seen either as an attempt to maintain an identity that feels threatened or as an aggressive attempt to dominate (depending on the context).
It has nothing to do with which pronunciation is correct. It has a lot more to do with your own background and identity.
What this has to do with AI programming
We seem to have strayed pretty far from the idea of natural language programming. I mean, I told you it was a tangent. I'm working on a blog post that pulls together several different threads, this just being one of them, that I think weave together an argument that natural language programming is impossible and arguably not even desirable.
Until then, however, you can reflect on this:
Programming is an activity that requires extreme detail and relies on complete – almost mathematical – clarity. An argument about how to pronounce a single word involves two millennia of history, a dedicated academic discipline, and forays into sociology and philosophy.
What does that say about the suitability of natural language for this task?