Gini index and Zipf law in online communities

#online community #network
  1. Gini index, Lorenz curve, and Zipf’s law

Figure cite from wiki

The Gini coefficient (G) is defined based on the Lorenz curve L(F), which plots the proportion of the total income (y axis) cumulatively earned by the bottom f% (or 0 < F < 1) of the population. In particular, Assume that the area between the line at 45 degrees and the Lorenz curve is A, and the area under the Lorenz curve is B, then G = A / (A + B). Since A + B = 0.5, we have G = 1 – 2 B. G varies theoretically from 0 (complete equality) to 1 (complete inequality).

Those who are familiar with Zipf’s law would immediately connect it with Lorenz curve. As suggested by Adamic, Zipf’s law is the inverse function of Pareto distribution, which is obserbed widely in the distribution of incomes. In the following part we will show how to relate the exponent of Zipf’s law (beta) to G.

As mentioned, we have

Meanwhile, the cumulative distribution function (CDF) of Pareto distribution is

in which the left side is the bottom k% of the population and the right side is the income x. By inversing the Pareto distribution we get

where the left sie (y axis) is the income x and right side (x axis) is the ratio F of population. Here F is the ratio obtained by sorting the population increasingly. Now the Lorenz curve, which plots total income against the bottom F population, can be expressed as

The following plot shows Eq.(4) with four different values of alpha (typo: “k” should be alpha in the figure).

Figure cite from wiki

Now we have

In Eq.(3), if we sort the population decreasingly rather than increasingly, and replace ratio k with real number r, we get the rank-ordered curve

which is called Zipf’s law when the exponent equals 1. Therefore we know the exponent beta of Zipf’s law is related to the exponent alpha of pareto distribution by

By putting together Eq.(7) and Eq.(5) we get

  1. The Lorenz curve, Gini index, and Zipf’s law in online communities

We investigated user’s daily answering activities on stackoverflow.com and calculated the exponent of Zipf’s law and Gini index. It turns out that this community went through two stages of developments. At the first stage a small group of experts contributes a increasingly number of answers (2009-2012), but at the second stage “the mass” was taking over the power and palying a more and more important role (2012-2014).

ginievolution.png

ginizipf.png