The Pareto Principle & Language Learning
80% of your time, more 90% of your time, chasing 10% of the words.
Hi there, Steve Kaufmann here.
Uh, today I want to talk, of course about language learning, but I want to talk
about the 80/20 rule or Pareto's Principle as it applies to language learning.
Remember, if you enjoy these videos, please subscribe, click on the bell for
notifications, and if you follow me on a podcast service, please leave a comment.
I do appreciate it.
We've all, I shouldn't say we all, but many of us are familiar with the idea
that, let's say if you're in a company, uh, a small number of customers will
account for most of your business.
You know, uh, a small number of criminals do most of the crime.
Uh, there are many situations, if you Google uh, Pareto's Principle.
You will find all kinds of examples of how a small number of any sort
of series of events will account for the lion's share of these events.
So people say, well, this also applies to language learning.
By the way, I should point out that there was a comment
about the quality of my audio.
I'm always sensitive to this.
I want to make sure I do the best I can.
Uh, I've tried a number of different speakers, microphones
rather, and it always seems that the best sound comes from my Mac.
And so I discontinued using these other microphones, but then I had the
criticism again, so I decided that I would plug this, uh, uh, microphone
in and see if the sound is any better.
Please let me know.
Now, maybe again, it's a small number of people complaining who
account for all the complaints.
However, if I can actually improve the quality of the
audio, I'm very happy to do so.
So you often hear people say, and it's quite true to say that a small number
of words, a thousand words, depending on, you know, whatever statistics
are used, 500 words, a thousand words account for, let's say 70% of the
word count in a given bit of content.
Now, first of all, let's say that this Pareto Principle 80/20 rule,
it doesn't mean exactly 80 and 20.
It can be 70/30, it can be 90/10, but it just means that a small number
of problems account for most of the inconvenience that we experience.
Maybe in language learning, you make the same mistakes over and over again.
So it's a small number of grammar problems that give you most of
your difficulty, let's say with the language you're learning.
And people say, you know, if you just learn the hundred most common
words, then you'll be able to deal with most language context.
Unfortunately, when it comes to language learning, that's not the case.
In other words, you can know and it's relatively easy to get to know the most
common thousand words because they show up, you know, as I've said before, the
frequency is very high with these words.
You get to know them pretty quickly, uh, as long as you continue studying.
Unfortunately, the other 30%, even 20% of words that you need in
order to understand any meaningful context is a very large number.
And so in English, or at least in language learning, you end up spending
basically 80% of your time, more, 90% of your time chasing 10% of the words.
So it's kind of an inverse Pareto Principle.
And to illustrate this, I'll show you my statistics.
So let's look at how the pareto Principle, or really the reverse Pareto
Principle applies to language learning.
If I look at, say, Farsi, which is where I am right now, and I go to my
profile, and I recommend that you do that from time to time, uh, if I look
at sort of my known words for the last, uh, you know, all time, for example,
uh, I have say 12,000 known words.
However, um, and, and I have, if I look at LingQs created, uh, I have, um, 32,000.
So 32,000 words I have attempted to learn.
Most of them I still haven't learned.
However, if we say that a thousand words accounts for 60, 70% of any given
context, the overwhelming majority of my words, uh, are words that are outside
the sort of scope of those most frequent, you know, that Pareto Principle should
account for the bulk of any context.
I need all these words.
And similarly, if I change this to, let's take a Czech, for example.
Uh, I mean, I've created 60,000 LingQs in Czech, 60,000.
And probably the most common thousand words, uh, will account for the bulk
of any text that I'm gonna read.
But I need all these other words.
And I could show you this in every language that I've learned on LingQ.
And I, I think it applies to all of us.
We need a lot of words.
And so the most frequent words, it's nice to think that you're gonna
have 60, 70% of any context, uh, but you're gonna learn those words anyway.
So the idea that you can ace the language by focusing on the most common
words according to some kind of Pareto Principle actually is not the case.
You know, if, if we first of all accept the fact that the thousand most
frequent words in any language are going to amount or, or cover, you know,
60, 70, 80% of the word count in any text, depending on the nature of the
text, depending on the language, but there's a very high proportion of high
frequency words that show up in any text.
Uh, however, as I've said before, the frequency declines very, very quickly.
But you look at, at my numbers, uh, for Arabic, for Czech for literally any
language that I've learned at LingQ, and you can go and do this yourself, go look
through if you're studying one language or several languages, and you'll see
just how large a vocabulary you need in order to be fluent in order to start to
understand interesting content in, in...
from books or from movies or from any source.
Uh, maybe content that you need for your work, you're gonna find that you need...
even for movies where the vocabulary level is lower.
But if you get into technical subjects, the vocabulary
required is very specialized.
And in order to acquire this vocabulary, you have to do a
lot of reading and listening.
So you end up spending 90% of your time pursuing, you know, in the case of, I, I'm
going from memory here, but if, if even in Czech, I think I have 60,000 words,
but the most frequent thousand words will account for most of the content.
So there is no shortcut.
I guess that's the message.
There is no shortcut because language is about learning words.
I haven't, I tried to Google to find out if there's some sense of how many
grammatical structures or patterns there are in a given, you know, in English
or in other languages, and how many of them tend to show up all the time,
and how many of them are less frequent or conversely either certain patterns
that give the greatest difficulty to learners of a certain language.
And, and that might be an example of the Pareto Principle, you know, maybe ser and
estar in Spanish continue to give a lot of difficulty or, uh, in Russian or in
Slavic languages, the case endings or, uh, you know, other things like this.
So, but the structures that cause the most difficulty may also be the
structures that show up the most often, and maybe it's just through continuing
to listen and read and allowing a reign gradually get to, to get used to these
structures that you get better at them.
Because I've had the experience of reading the explanations over and over again and
still not being able to nail, you know, case endings in Russian or, um, you know,
verbs of motion or this kind of thing.
So I just wanted to raise this issue and I'd be interested in,
in, uh, the reaction from people.
Uh, to what extent is this Pareto Principle applicable?
Is there something that I'm missing here?
Is there some way that we can take advantage of the Pareto Principle
to speed up our language learning?
Unfortunately, I suspect that this is not the case, and I thought I'd leave you
with a couple of videos that you can, uh, you know, add to or follow up on if you
want more information on this subject.
I did one on the Pareto, Pareto Principle some while ago, which you can check out.
And again, uh, part of my sort of, what I find so important in this on the subject
is that we do need a lot of words.
So please check out the video that I did on that subject as well.
Bye For now.