There’s often talk of how many x thousand words is enough to cover x% of any average text. A book I was reading had statistics for a few languages.
English
1000 words covers 80.5%
2000 words covers 86.6%
5000 words covers 93.5%
French
1000 words covers 83.5%
2000 words covers 89.4%
5000 words covers 96.0%
Spanish
1000 words covers 81.0%
2000 words covers 86.6%
5000 words covers 92.5%
Chinese (Mandarin)
1000 words covers 73.0%
2000 words covers 82.2%
5000 words covers 91.64%
Japanese
1000 words covers 60.5%
2000 words covers 70.0%
5000 words covers 81.7%
Korean
1000 words covers 73.9%
2000 words covers 81.2%
5000 words covers 89.3%
Russian
1000 words covers 67.46%
2000 words covers 80.0%
5000 words covers 92.0%
German
1000 words covers 69.2%
2000 words covers 75.52%
5000 words covers 83.13%
The table in the book I was reading is reproduced in this document (Japanese, pdf):
http://tinyurl.com/3gsc69o
Book I was reading (on Amazon):
(actually it’s a library book)
There’s quite a bit of variety there. This was a Japanese book, and the Japanese statistics seem to show that you need a high vocab for everyday usage. The book states that you need 10,000 words to cover 91.7%, which is less than English, French and Spanish. I am always a bit doubtful of these figures because when I’ve seen statistics like this for Japanese it’s usually printed along with some text about the unique richness of Japanese vocabulary and a few examples of how inferior English is, which usually just demonstrate the writer’s poor grasp of English. In the book, it actually misses out the stats for Russian and German, which I presume didn’t go along with the point they were trying to make. These are taken from the linked pdf file. Japanese does have a very rich and varied vocabulary, but I am surprised that it uses a larger number of words than other major languages, and probably even more surprised that German is comparable.
Anyway, this is quite interesting for learners of languages. Numbers aside, how have you found different languages to be? The different levels for the avatar on LingQ are differ between languages, but they do not correspond with these statistics. Have you generally found that you can understand a similar amount between languages when knowing a certain number of words? Or does it differ wildly? Have you seen any other statistics that contradict these? Or agree with them?