| Page 2
Using n-gram on kickback, graft, bribe and corruption – Their historical occurrences compared
First of all, let's check our dictionaries. According to the Merriam-Webster dictionary:
A kickback is a "payment made to someone who has (illicitly) facilitated a transaction or appointment."
To graft means to "(actively) make money by shady or dishonest means."
To bribe means to persuade (someone; the bribe-taker) to act (passively) in one's favor, typically illegally or dishonestly, by a gift of money or other inducement."
And, corruption refers to "dishonest or fraudulent conduct by those in power, typically involving
bribery. Or, the action of making someone or something morally depraved or the state of being so."
OK, understood, pretty straight-forward.
But when did all these start getting associated with our lives?
While it may take one quite a bit of time to look all over historical references for evidence,
we may try, with the help of the latest search technology, to get a rough idea of
how frequent these words have appeared in books, magazines and journals
(certainly the ones published in English, I mean) over the last 200 years.
Photo © Lorenzo Colloreta
So let's compared the historical trends of the occurences of the words
"kickback", "graft" , "bribe" and "corruption". This time we will try our luck with
Google Lab's N-gram Database, available at
Using the n-gram database , we are able to compare,
in a very convenient way, the occurrences of individual words, or more technical called unigrams or 1-gram,
as they appeared in books, magazines and journals since the early 1800s.
Using a smoothing value of 3.0 , the relative historical
occurrences of the four words are plotted against time (in terms of
years) in the following graphs.
Figure 1. Historical occurence of "kickback", from 1800 to 2009
In figure 1, occurence of the word "kickback" is shown. It was clearly not
quite a common word before the 1900. But before I move further on, I have to stress
that a 0% occurrence in the n-gram database does not necessarily indicate that
a particular word cannot be found. It simply indicates, according to Google Labs, that
the word have not appeared in at least forty books, magazines or journals 
or, in other words, the word did not appear to have been used substantially during that
period of time. And, obviously enough, it also does not mean that there was no such
thing as kickback before 1900. There might just be another (possibly a more common)
word or phrase for it before then and we simply have not looked into that (or overlooked).
Figure 2. Comparing historical occurences of "kickback" (blue) and "graft" (red)
So, the occurence of the word "kickback" (the blue line) did not appear to be substantial,
at least not until after 1900. In fact, according to Britannica, the word did not even
get officially listed in dictionaries until the period 1930-1935.
This trend is somewhat similar to the occurrence of the word "graft", which
saw significant increase of use after the 1900s yet this one is in fact a very old word
appeared since 1350-1400 . It should be emphasized that
the word "graft" (the red line), according to the n-gram statistics, is a much more
popular word when compared to "kickback" – you can see that the blue line
lies almost flat throughout in figure 2 when the red line for "graft" is drawn with it.
Figure 3. "kickback" (blue), "graft" (red) and "bribe" (green)
What about "bribe" then? The word "bribe" (the green line), as shown in Figure 3 above,
appeared to have been used rather constantly with only mild decrease over
the last 200 years. It is definitely used much more than the word "kickback" (the blue line)
and shows much less fluctuations than that of "graft" (the red line).
Figure 4. "kickback" (blue), "graft" (red), "bribe" (green) and "corruption" (yellow)
Now, let's add "corruption" to the picture.
Among the four words, the word "corruption" (the yellow line) is the one that
has been used most extensively and seemed to have already reached its maximum
since early 1830s with a local minimum around the 1930s (that's 100 years apart).
Looking at the lines for all four words, one might get the impression that "kickback" is the least
commonly used word. The words "graft" and "corruption" went up together from around 1910 until
they parted in the 1990s, with the word "graft" going down the trend and
"corruption" going up all alone by itself. Such steady increase of the use of word "corruption"
may be partially due to fact that the term "anti-corruption" has both been adopted by
mainstream journalists over the last 50 years and that many anti-graft agencies around the world
(especially those in Asia) are using the word in their names as well .
The word "bribe", interestingly, seemed to have maintained its share quite steadily all along
for the whole period in this study, i.e. from 1810 to 2009.
So, if we are to pick one for further analysis, which one would you pick?
The word "corruption" appears most. But if I am to pick one that gives the most stable
occurence over a 200-year period, I would say "bribe" should be the candidate to go for.
The words "graft" and "kickback", for obvious reasons, failed to make their ways into our semi-final.
| Page 2
Note 1: Jean-Baptiste Michel*, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K. Gray, William
Brockman, The Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant,
Steven Pinker, Martin A. Nowak, and Erez Lieberman Aiden*. "Quantitative analysis of culture using millions of
digitized books." Science, published online ahead of print: 2010/12/16, see http://ngrams.googlelabs.com/
Note 2: When data is viewed as a moving average, trends often become more
apparent. A smoothing value of 1 means that the data shown for 1950 will be an
average of the raw count for 1950 plus 1 value on either wide: (count for
1949 + count for 1950 + count for 1951) and then divided by 3. So a smoothing
of 10 means 21 values will be averaged: 10 on either side, plus the target
value in the center of them. At the left and right edges of the graph,
fewer values are averaged. With a smoothing of 3, the leftmost value (pretend
it's the year 1950) will be calculated as (count for 1950 + count for 1951
+ count for 1952 + count for 1953), divided by 4. A smoothing of 0
means no smoothing at all: and simply the raw data is presented.
Note 3: According to Google Labs, the n-gram viewer only considers n-grams
that occured in at least 40 books, magazines or journals.
Note 4: The word "graft" first originated in 1350-1400 as the earlier word "graff".
The Middle English word "graffe" and "craffe" were said to be borrowed from the Old French words
"graife", "greffe", and "graffe".
Note 5: Anti-corruption agencies established in Asia in or before 1999 having the word "corruption" in their names include:
Corruption Practice Investigation Bureau, Singapore (1952),
Independent Commission Against Corruption, Hong Kong (1974),
Agency Against Corruption, Taiwan (1989),
Commission Against Corruption, Macao (1999),
Anti Corruption Bureau, India (1988),
Commission to Investigate Allegations of Bribery or Corruption, Ski Lanka (1994),
Anti Corruption Bureau, Brunei (1982)
Anti-Corruption Unit, Cambodia (1999), and
National Anti-Corruption Commission of Thailand (1997).
It is interesting to note that there are (as far as I can tell) only four anti-corruption agencies
in Asia using names not including any variant of either the word bribe or corruption.
They are (and none of them was established before the year 2000):
Japan Financial Intelligence Center, Japan (2007),
Commission for the Investigation of Abuse of Authority, Nepal (2007),
National Accountability Bureau, Pakistan (1999)
and the Presidential Anti-Graft Commission of the Philippines, established 2001, aboilished in 2010.