Tuesday, December 6, 2022

Which filler words commonly occur in spoken English?


 

 

 

 

 

 

 

 

 

 

 

 

 

 

That’s a very curious question. I thought about it after looking on Wikipedia and finding a page titled Most common words in English that contained a list titled 100 Most Common Words [in Written English]. That list is shown above, as a table. Ah is not on it, but So is at #41.

 

I looked around and eventually found there was a book with an answer. It came out in 2001, was written by Geoffrey Leech, Paul Ryson, and Andrew Wilson (all from Lancaster University), and titled Word Frequencies in Written and Spoken English Based on the British National Corpus. That corpus has 100 million words, 90% of them written. Starting on page 144 of it there is List 2.2 – Rank frequency list: spoken English (not lemmatized).

 

 


 

 

 

 

 

 

 

 

 

 


 

 

 

 

 

 

 

 

 

 

 

The first and second fifty words from that list are shown above as a pair of bar charts. Two pure filler words (highlighted in red) appear in the Top Fifty: Er at #17 and Erm at #27. Well and So, which might be fillers too (highlighted in green), are #32 and #33. In the Second Fifty, Say, which also might be a filler word is at #83. But Ah isn’t there at all – which may disappoint Toastmasters whose club meetings have an Ah-Counter. That role is discussed by Kate McClare in her article in the June 2021 Toastmaster magazine titled Counting on the Ah-Counter.

 

 


 

 

 

 

 

 

 

 

 


 

 

 

 

 

 

 

 

 

 

 

How do the lists for spoken and written English compare? As shown above via a pair of tables for the Top 20, they are not the same, and only eleven of the spoken words are on the written list.

 

Corpus is a rather obscure word. The first definition in the Merriam Webster Dictionary is:     

“the body of a human or animal especially when dead”

 

The third is:

“a collection or body of knowledge or evidence - Especially: a collection of recorded utterances used as a basis for the descriptive analysis of a language”

 

Is that a corpus or a porpoise?

 


No comments: