30K known words was my dream number! (Was)
I might not be right but I have always kind of questionEd the number of words known as a way of measuring your level in the language. Simply for the reason that if you mark the word know you can easily forget it. At least that’s how it’s been for me. I think that for me I shoot for words read and hours listened to show my progress.
With any of the numbers it's difficult to really say what level you're at and as others have mentioned elsewhere certificates don't necessarily mean much either.
I think using the known words count is still useful. It measures progress. One just has to know that there is x% of that list that they've already forgotten or are very shaky on the meanings. We know there is progress though because we can read and listen to more complicated stuff than we could before.
There's also a problem when measuring any of these levels...what do they really mean? If I read a lot of scientific books, I could have a large vocabulary, but it may be a lot of words that aren't useful to everyday conversation. I might be intermediate, or advanced, but not be able to understand a conversation, even written down.
I mostly use the known words counter as a means of setting goals. The upside is it really makes me do the work. The downside is my work tends to be too imbalanced, too much reading as opposed to listening, writing and conversing.
See I started this year with Known Word goals, but what I realized is I can't control when I will learn words. It just kind of happens with enough exposure.
This caused me to shift my goals to reading, speaking and listening in 2021 as follows:
- 350 Hours Listening
- 2.500.000 Million Words Read
- 50 Hours Speaking
One result of this is that I blew past my known word goals that I originally set in half the time, but I can stay "motivated" to keep just doing the things I am enjoying doing. That is engaging in native content in my target language(s).
I hope everyone here is aware of how talking about the known words count in LingQ is not the same as your actual known word count or what linguists talk about when they talk about known words. I really prefer to say "word forms" when I´m talking about my LingQ stats to people not familiar with LingQ, cause that´s what the known words in LingQ are. Every form you know of every word counts in that stat.
I think people need a better understanding that Lingq is counting word forms. I know Steve has mentioned it, but I don't think people who do not study linguistics understands.
Bike, biked, and biking are all the same word form and lingq counts them all. So if someone chooses the words that are within a word form three times they still only know one word. 30,000 word forms might only translate down to 12,000 known words and within those words their might be a high-level of cognates.
Personally, I'm not interested in the "Known Words" stats for my L2s. I'm more interested in the overall number of words read, the overall number of hours listened, etc.
But, if someone wants to use the "Known Words" stats for measuring progress, it's probably best to divide it by 4, 5 or 6 (depending on the language).
In German, for example, you have sometimes between 10-20 different forms for a single verb like "gehen" (to go): "Ich gehe, du gehst, er / sie / es geht, wir gehen, ihr geht, sie gehen."
And that's only the present tense... It's the same in Romance languages, for instance.
Consequently, L2 learners usually know far fewer words than LingQ's "Known Words" stats indicates. There are also other problems involved in the "Known Words" stats: tens of thousands of collocations, proper names, etc..
LingQ has chosen the "word forms" (rokkvi) approach because this solution is easier to implement. That's all.
" LingQ has chosen the "word forms" (rokkvi) approach because this solution is easier to implement. That's all." - very true. Otherwise they would have to have functionality to group words by their base word, so to speak, the ability of users to link the word forms to the base word (or some ways for the program itself to draw upon external resources to do it on it´s own, which is not all that simple, even if the resources exist in proper form) and functionality that would count the base words. This would also mean the word count would be skewed for all the word forms that hadn´t been linked to a base word already.
One thing I wonder about is how LingQ could implement a morpheme prediction algorithm such that it could work with highly synthetic languages. As is, the "Known Word Count" seems to work pretty well for analytical languages that have minimal inflection, and works to varying degrees with synthetic languages.
Type ahead or next word prediction software has similar problems with highly synthetic languages when it has been designed around a concept of "predicting the next word".
A word/morpheme prediction algorithm like this had to have been implemented for east Asian languages, but I have no experience with that on LingQ so I don't really know how well it works (or doesn't work).
"Known Morpheme Count" probably doesn't have the same ring though.
When I started lingq, understanding the tool and the numbers was my main hurdle and a source of anxiety. If made me question myself and my progress.
It wasn't until I started focusing on the language in stead of the tool, that I had the tranquility to understand the numbers for what they are: subjective, relative tools that measure some aspect of progress. Also, I am no longer so focused on entering the numbers into lingq for listening or reading. What I do outside lingq is simply outside of lingq and not in the numbers.
I feel that I can measure real progress by my understanding of freely available audio, video and articles. There is no number to measure progress in absolute sense, but there is a subjective way to determine how much of a document, conversation or movie you understand. And that is what counts for me. If I can follow, understand and enjoy the content I am more than happy with my progress.
And that is the problem in a nutshell. A beginning lingq user lacks trust in the process and clings to the numbers. I did. I suspect most beginners do. When I started a post about this, I got the extremely good advice to just enjoy the process, enjoy the path of learning a new language. Of course I ignored it (bad advice, right?). Only later did I realize its value and now I mostly ignore the numbers, except for one thing: my lingqs have to increase by at least 50. The known words will follow. Also, I listen a lot to random talking. For some reason that helps.
This is just my opinion. If it helps only one student, I will be happy.
I hate to bust the bubble but 100K is closer to being fluent. 50K in the target areas that you enjoy. As a teacher I need about 20K words for talking with parents, students, and teachers in Mexico about education in general. Currently, I have about 5k words for education, but I need a lot more for my content area. I enjoy swimming, hiking, and mountain bike riding, camping, and boating. For these six areas I need 15K because there is a lot of cross over of words. I also garden and cook a lot and that requires about 5k worth of words. For each content area you will need about 10K words. Now those words also include, past, present, and future. biked, biking, bike
Don't focus on the word count, just focus on what you want and need to communicate for your own success.
I think the OP's question at the end is what this is all about: "Is [30K words] fluent enough for you guys?"
The collective experience of LingQ users who have reached that level seems to be yes, it's good enough for potential fluency. However, the consenus on LingQ and elsewhere is that "fluency" isn't enough for certain professional applications in the foreign country where the language is spoken. I would think that a teacher working in that professional environment might be one of those instances where one need not only be fluent, but also very advanced and mistakes wouldn't be tolerated. In such instances, a "near native" like proficiency is probably called for, but this is the exception, not the rule.
Out curiosity, where did you come up with your numbers? I'm interested in both the ones you say you have and that you will need.
What will happen when you get into the higher numbers is that you will need an ever greater amount of known words to close ever smaller gaps in your knowledge of the language. Climbing from 30k to 40k might have a huge effect. Going from 100k to 110k less so. So please dont plan your overall progress to be proportional to the number of known words. I have more than 96k known words in Czech which enables me to understand about 93-95% of most texts. But to close as much as possible of the remaining unknown 7-5% I will most likely have to double my known word count. So the last 5% is going to take the same amount of work and number of known words as the first 95% and of course you can never really arrive at an absolute 100% understanding of the language.
If you are assuming that 50k is going to be the optimal number of known words for fluency you are headed for a disappointment. The highest achiever in German on this platform has a known word count of 284069. More than 5 times the number that you envision. He is followed by several people with around 120k words.
I agree, because the law of "diminishing returns" and Pareto's 80:20 principle apply here. In other words, striving for a (near-) native level of reading comprehension is probably not a good investment of our learning time.
For many languages, reading about 2 million words should be enough to reach a higher level of reading comprehension.
For Slavic languages, it might be 2-3 million words (I'm not sure).
If it is a good investment of your learning time depends solely on your own goals. If you have very diverse interests like Steve Kaufman than learning a bit of everything will enrich your life greatly. I, however, am a foreigner in the Czech Republic so getting as close to a native speaker is important to me. I have already accepted that I will spend the next years working on closing the remaining 5% even if that means sacrificing other interests.
Personally, I've given up on the idea of becoming a "hyperglot" with 10, 11, etc. languages under my belt.
Instead, I'm going to focus on a few languages (English, French, Spanish, and Portuguese) where I can reach the highest possible level, because that's more in line with my personality and professional goals.
But, I'd like to add a few other L2s like Japanese, Chinese, Russian, Dutch, and Hebrew (at an intermediate level) to my core languages.
Unfortunately, life is short, and there's simply too much to learn :-)
Does the way people use LingQ change at all when they get into higher numbers? Most of the new words will be low-frequency ones, won't they? So at some point it might be more useful to pay attention to increasing the active vocabulary in some way than to getting more and more words into the passive vocabulary. I saw a youtube video (not related to lingq) on this a while ago.
I'm just returngin to the forums after several months away so I haven't read all comments in this thread, so some of this might be duplicative. See that t-harangi commented I skipped right to that and read it and, as I suspected, we are 100 percent in agreement on that as I do with the handful of other comments I've read.
When Master Steve has talked about "20-30,000 words" and "fluency," he is talking about Romance languages. This is the likely threshold where learners can 1) UNDERSTAND most of what native speakers are TALKING about, especially to you in coversation or whatever topics you have read a lot about/are familar with; 2) reach "potential fluency" in that you can convert much of that 20-30k of passive vocabulary into active vocabulary; 3) really reach a breakhthrough/turning point where you have enough of the langauge in you to really improve your skills by using it, you can come back to it, not forget it, etc--if you are a little rusty at first.
Again, this is for Romance languages. In English it's maybe 10-15K words. In Russian/Slavic languages, it's 90-100K words (where Steve is/was). I think this is interesting to note becuase, although I haven't watched as much of Steve's channel in the past two years, I would say that Russian, and to a lesser extent Czech, Korean, and Ukranian are the last langues that Steve really studied to these serious levels (and potential) levels here on LingQ. Once he started getting into Greek, Arabic, Farsi, etc. he's really just taking serious long "peeks" at the languages. He's not going for "fluency" anymore with these languages. He will strunggle a lot when he speaks, and certainly not read unassisted. Rather, he's exploring these languages, pursuing the interest, crossing them off the bucket list, etc. In other words, he's dabbling a lot more, and I wouldn't be surprised if a lot of that has to do with his viewers and customers. Not that the language man is under pressure to perform, but I think interacting with all of us, and that he's confident he truly "knows" many languages, he can move around a lot. The most extreme version of this is Moses McCormick. If and when Steve does this, he treats language learning like a buffet, and like a tapas bar for Benny Lewis. For Moses, it's like living off the samples at Costco. He speaks dozens of languages, but he's not holding 2 hour conversations in Hmong or Tibetan.
In my above post, I mentioned being away from things for a while. That was also why I referred to Moses in the present tense.. I just learned today, after starting to catch up on Master Steve's videos, that Moses has recently "moved on."
He was such a wonderful linguist. He loved languages, their study, and most of all, using them to connect to so many people--and not just his interlocutors, but all his followers and subscribers that he inspired.
T cannot recall who originally said it, but there is an apocryphal saying, sometimes wrongly attributed to Mark Twain, that runs "The two most important days of your life are the day you were born and the day you find out why." Whether he knew it or not, for 20 years Moses was doing what he was meant to do, and sharing it with us. As he and I are the same age, it is a truth that makes me admire him all the more.
I don´t have the time to read all comments here so I am sorry if everything I say here has already been posted.
1) The number of known words means something very, very different for each language, for a lot of reasons. A language like German, due to more complex grammar, and even more so due to a tendency of having composite words, has more versions of many words than English for example, so you need a higher number for the same level of understanding. For example "ich, mich, mir, mein, meine, meiner, meinen, meines" vs "I, me, my, mine" and "car, door" vs "Auto, Tür, Autotür". When Icelandic gets added, you can bet it´s going to be even more extreme with this than German, both with the increased number of word forms and perhaps even with composite words.
2) I think in most normal, Western European languages, you´d usually need more than 30K known words to be a fluent reader. Steve did say it was about 40K in some youtube video where, he, Mark and some LingQ users had a discussion. I agree with him on that and I´d estimate it to be somewhere between 40-50K known words.
3) Not all people mark known words in the same way. Some will be overconfident of now knowing a word and some will be less confident or just prefer to only mark words as known when they know them really, really well and are very unlikely to forget them. People also have different principles when they mark words as known. I mark everything as known except words from other languages (when I know they are at least). Some other people may try to not to mark composite words etc. as known, only the base form. I´d mark "words" like " don´t, you´d, l'eau, l'autoobile, s'appelle, t'me " as known for example, so for me the number to reach fluency would be higher than for someone who x-ed out words like that.
I can´t really tell you why people stop at a certain number, but it may just be that LingQ isn´t the only source they are using. They may find it enough to get to sub-fluency here and then find people to converse with and become fluent that way or just feel they have enough of a head start to just get into reading physical books.
I think I can count myself as a pretty successful polyglot and I like to get well above 30K, even with languages I know to some degree before starting with them here. I think I could have stopped at 30K or even 20K with Dutch and then jumped to other learning methods, since I was already fluent in a few related languages and had by then learned most of the most commonly used words - and what I´d learned with LingQ would have still served me well enough, but I like using LingQ, so I chose to keep at it.
You seem to only be discussion fluent literacy here, but I also really advise you to not ignore listening.
auch die deutsche Sprache saugt immer mehr ausländische und auch englische Wörter auf.
Das Unwort des Jahres ist das eigentliche englische Wort "Lockdown", das ich gefühlt in den Nachrichten 25 Male lese oder höre (mindestens). Also das Wort kennt weltweit jeder, auch dessen Bedeutung.
Aber wie gesagt, auch ich entdecke in der deutschen Zeitung immer wieder Wörter, deren Bedeutung mir nicht klar ist (technische Wörter, Wörter aus der Baubranche, oder aus der Computerbranche) und ich dann im Duden (Wörterbuch für den deutschen Sprachschatz) nachschaue und vielleicht drei Tage später wieder vergesse, weil ich es einfach nicht benötige.
Auch ich habe mir schon "Steve-Videos" angehört bzw. angesehen und mußte feststellen, dass die deutsche Sprache nicht seine stärkste Sprache ist und er da ziemlich nach Worten "ringen" muß.
Aber "sorry" ich will da keine Wertung vornehmen. In der englischen Sprache komme ich auch an meine Grenzen und brauche vielleicht noch ein paar Jahre um mich einigermaßen gut ausdrücken zu können.
Wie wäre es denn wenn wir uns gegenseitig ein wenig puschen und uns abwechselnd deutsch oder englisch unterhalten ? - Antwort erwünscht.
Ach so, 50 tausend bekannte deutsche Wörter decken meiner meinung nach bei weitem nicht alles ab. Denk daran, wir sind alle nur Menschen und bleiben begrenzt. -Nobody is perfekt ! Bye for now.
I feel you, I'm 30k into German and I do not feel comfortable yet. Some languages need larger numbers I guess, I will stick to it until I do feel comfortable, no matter how long it takes.
Good luck with your learning, and let's enjoy the road!
I am feeling much more comfortable at 38K. I found some novels by Heinz G. Konsalik which seem to fit this level (30k-ish)
Apart from the obvious question if you mark all words known including proper names etc. there is also a matter if you unmark words once you forget them. It is not uncommon for me to read a text and actually descrease my known words as a result. The more strict you are with your statistics, the more they reflect your level.
Having said that, I prefer to estimate my level by the number of words read and listened to. In my opinion and experience, you may communicate with people knowing only 5k words, if they are really most common ones plus you have spent already hours on conversations. Novels are indeed most demanding in terms of vocabulary needed. But here also time spent on reading can facilitate intuitive understanding of sentences despite some words unknown. Like in our native languages we keep encountering unknown words, but our intuition is amazing.
I recently saw an article on this forum, or maybe it was a Steve-tube, where someone compared 60K Russion words to 10K English words (all lingq). Also, they said something about the language structure influencing the multiplication factor, like in Finnish you have 15 cases, which results in a larger multiplication factor. I think German was mentioned as well. I am wondering whether the language guru's among us could make this systematic. In other words could lingq users have a table with approximate multiplier with English as reference? That would be really cool, and also some kind of a reference for learners. Just asking.
There is this table: https://www.lingq.com/en/help/avatar/#avatar_2
but the multiplication factor is much smaller there between English and Russian. I don't know what the table is based on. Or how this would work with Finnish, which not only has 15 cases but also many verb forms (finite and non-finite) and suffixes that modify the meaning of nouns and verbs.
Thanks. That is indeed what I mean. The numbers appear to be low. And unfortunately beta's are missing (Finnish). I really hope beta's are to be included and the numbers to be reviewed by the relevant experts. Thanks for the pointer!
no problem. I suppose that when reading Russian or Finnish texts, you would encounter only a fraction of different possible forms for each noun or verb, so not all of those forms would be counted as Known Words, even if you would actually "know" most of them, if required. So the multiplier wouldn't be that large after all, especially as the word count in English (or Dutch) would probably fail to include many phrasal verbs.
That would be the reason to use experts and not any automated tool. Although Finnish has quite some logic to it. Still, the table does need some revision. I hope the lingq-experts are reading too!
Looking at Korean (35K vs English 30K), I have to wonder what the basis of the numbers were.
I think the numbers are just motivational. It gives the beginners a goal that feels doable instead of overwhelming them immediately. These metrics don't meant anything to me.
In my opinion; the "known words" count is rather "useless". For several reasons:
1. Take a simple verb like "gehen" in German or "aller" in French. If you conjugate these verbs in the present tense, you get:
- German: Ich gehe, du gehst, er / sie / es geht, wir gehen, ihr geht, sie gehen
- French: je vais, tu vas, il / elle / on va, nous allons, vous allez, ils / elles vont
And there are "many" more variations when you consider future, perfect / imperfect, and subjunctive / Konjunktiv forms! So a single infinitive can have countless variations, all counted as different words.
2. It's the same with singular and plural forms that are counted as different words. For example:
- the table = der Tisch / la table, plural: the tables = die Tische / les tables
- the wall = die Wand / le mur, plural: die Wände / les murs
3. Then, as PerpetualTraveler correctly noted, you have proper names, city names, etc., or words from other languages included.
4. Apart from that, focusing on single words is deeply flawed, because native speakers don't build their sentences from single words, but they are heavy users of tens of thousands of highly conventional word groups, i.e. "collocations".
So, for example, it doesn't make sense to learn a simple word equation à la "erhalten / bekommen" (German) = "get", when you have countless collocations with "get" in English:
- Get a call
- Get a chance
- Get a clue
- Get a cold
- Get a degree/ a diploma
- Get a job
- Get a joke
- Get a letter (receive)
- Get a shock
- Get a splitting headache
- Get a tan
Not to mention phrasal verbs in this context with very different meanings beyond "get = erhalten / bekommen" like: "get in, get out, get off, get down, etc." (German equivalent with "gehen": prefix + verb constructions such as "aus, auseinander, hinein, an, ab, etc. + gehen").
To handle all this, LingQ would need a much more sophisticated implementation, esp. of the string tokenization process, so that the "known words metric" is more useful. But, this implementation would also be "much" harder.
Therefore, I concur with RJDavies:
"your target should not be to hit a certain known-word target but to set daily targets for how much exposure per day/week in terms of hours. I think 1-million words of reading is a better milestone for learners than X amount of know-words"
The number of "words read" is a much better metric because it shows how much a learner has been exposed to a target language. And this is important if reading functions as a kind of natural SRS process!
However, the one million "words read" count is still very low. A learner had better aim for around 2 million words read to feel reasonably confident in his / her target language.
But even with 2 million words read, learners of German can't expect to be able to read "everything" without the assistance of an AudioReader, dictionary, etc. Take, for example, Robert Musil ("The Man Without Qualities") and Thomas Mann ("Die Buddenbrooks'", "Der Zauberberg", etc.). These two "word magicians" are among the most sophisticated German-speaking authors ever. If you don't have a (near) native speaker level in German, you will have a hard time reading their books, because the 2 million "words read" mark is still far from a (near) native speaker level.
It's the same for other target languages...
One German author I find really hard is WG Sebald. I have almost given up on his Die Ringe des Saturn.
The problem with the Words Read count is that it doesn't reflect on what's read and learned outside LingQ. The Known Words count is a somewhat better metric for this. For example, I have learned Portuguese without using LingQ much, but I can quickly gain a lot of known words when I start using LingQ more with Portuguese. The Known Words count will then give some estimate for my level in Portuguese compared to, for example. Spanish.
Yes, Sebald might be a tough nut to crack:
His distinctive and innovative novels were written in an intentionally somewhat old-fashioned and elaborate German (one passage in famously contains a sentence that is 9 pages long). https://en.wikipedia.org/wiki/W._G._Sebald
Speaking of "unreadable" novels in German... So if you German learners think that after reading all Harry Potter novels and watching 1000 episodes of "Die Tagesschau" in German, you need a "new" challenge to keep you busy for the next 5-10 years of your life, here it is:
https://www.welt.de/kultur/literarischewelt/article158478214/Diese-sechs-unlesbaren-Romane-muessen-Sie-lesen.html [nice that Proust is also made a German-speaking author :-)]
Fluency is really a byproduct of speaking practice and technically, I think one could be fluent with 30K known words with a lot speaking practice.
But conversely, as a LingQ user, your primary engagement with the language is probably reading and listening so to feel like you're "fluent" in those activities you will end up needing more more words than if you were just hanging out and chatting with your friends in Germany.
So, yeah, 50K known words is a good benchmark for unassisted reading and listening.
An interesting thing about this is really only LingQ users are able to use this metric. Most language learners, engage with all these activities in various levels without having any idea how many words they might actually know.
But from my experience, I felt like I could speak English fluently well before I was able to read a book unassisted. But doing the reading listening method, one can end up on the opposite side, having a massive passive vocab, but not enough active practice to be "fluent."
No you cannot be fluent with 30k words. You might find ways to express yourself understandably but you cannot control what the other person will answer. So even if you yourself might be able to express yourself with 30k words (which is already doubtful) you are not going to be able to understand the response of a native speaker, who uses a vocabulary 10 times as big.
I think we have all been in the situation where we have proudly constructed a meaningful sentence only to be responded to with an incomprehensible sequence of strange sounds.
To be fluent you don't have to speak like a native speaker but you do have to understand them effortlessly. This cannot be done with 30k known words.
I'm sorry, but I'm just gonna have to disagree with this, statement of "No fluency with 30K words," just based on personal experiences accumulated over studying different languages and living in different countries.
First off, of course this will depend on the language. I see you're studying Czech, and yeah, I'm sure that's different from English or French. I don't doubt you'll need more words in a slavic language.
I'm basing my claim on two distinct data points from personal experience:
1. My level of reading comprehension in French, Spanish, and German at 30K known words marked on LingQ, to me feels roughly equal to:
2. My level of reading comprehension after moving to the US and speaking fairly fluent English in regular conversations after a few months of being here.
As I advance with different languages, I often see parallels with hitting different milestones, and the parallel of the 30K reading comprehension seems very consistent to me with these languages, namely that around 30K is when unassisted reading of regular paper book is possible, albeit not completely comfortable yet. Back in the day, we didn't have LingQ or Kindle, so paper books were the only thing I could read and I remember very clearly when I've reached that same level being able really process a book -- a felling I equate to about the 30K benchmark.
Prior to coming here, I have studied English in high-school, and my abilities were roughly at a level of someone completing both Assimil books in French with a fair amount of speaking practice -- another parallel.
IF YOU WANNA PROVE ME WRONG, here is the experiment I'd recommend you do:
1. Have a person study French, German, or Spanish using Assmil, book 1 and 2. (Roughly 7500 unique words including proper nouns.) Then:
2. Have this person move to a country of said TL. and begin conversing on a daily basis.
3. At the same time, have this person read and listen to books on LingQ and mark words on a daily basis.
My hypothesis is that this person will hit similar milestones as I have and will feel themselves conversationally fluent before hitting the 30K known words mark on LingQ and will begin to move onto paper books when they hit the Advance 2 level here, which is 32-34K depending on language.
And again, this will probably not be the case with a slavic language, but the OP was talking about German, and I'm confident that this experiment would prove me right in Germany.
Well, there are levels of fluency. I think one could probably get to 30K, leave LingQ and manage to start reading normal books with a low literate fluency, which would then increase through time. I can´t really say exactly when I felt fluently literate in the languages I´ve passed 30K in, because it wouldn´t really count. That´s because I knew quite a bit or a lot of them already or knew other similar languages. I feel I had a low level fluency in French at about 20-30K, ok fluency at 30-40 and almost complete fluency beyond 40K. But I could already fluently read simple books like Little Nicholas in French before I started LingQ.
I see your points, rokkvi, but for the sake of clarity, I usually don't like to use the word "fluency" when it comes to reading -- though I understand the need to relate it somehow to reading ability and ease of comprehension, so I usually refer to that as assisted vs. unassisted reading.
To me, fluency has always been a term applied to conversational speaking ability and in my responses this is what I mean by it.
I believe conversational fluency is possible with 30K known words, as well as reading of paper books -- though that may be a bit more sluggish. I started reading paper books in French at 30K and it wasn't supper smooth, but it was doable. I'm about to hit 30K words in Spanish, and I've been testing myself with reading more on Kindle with minimal lookups (contemporary fiction by a Spanish author) and I'm feeling I could definitely make the jump if I had to.
But yes, for truly unassisted, comfortable reading, my French 40K + words feel a lot better than 30K. But if you took my LingQ and Kindle away at 30K, with any of these languages, I think I would do petty good with paper.
Yes it´s not that easy to define what the bar for "fluency" is for reading, it isn´t really easy to define it for conversing either and these two are far from being the same thing.
To contribute to the collective wisdom here.....
I like the term "assisted vs. unassisted reading" and think I'll use it from now on. I've never used "fluency" for anything another than easy, flowing back and forth conversation between two peers
In response to a question I once put to Master Steve regarding what I would now call "unassisted reading," he replied that it all how do with "how much uncertainty you are comfortable with." He is a lot more tolerant of it that I am. When I was about 30K or so in Spanish, I could comfortably read a nonfiction book I had purchased years before and would periodically test myself to see how well I did. At 30ishK or little less I went through it no problem, with only a little not understood, but fiction was a more challenging. It tried reading La Reina del Sur and might have gotten through it, but it was too much for. Another person, like a Steve or tharangi, would have been fine.
I remember asking when I could go "LingQless" and people suggesting around 40-45K (I think it was Francisco) would be a more realistic known word count for fiction books being more comfortable unassisted. Once of things that got me there was Francisco's suggestion that I import some of the novels I was thinking of reading. As promised my word count, which was slowing, did indeed "skyrocket" as he said and I was much better able to tackle lingqless novel reading after that. I've yet to actually try it, becuase I want to do all my reading in LingQ to preserve the stats, but I read teh first two chapters of La Sombra del Viento and any that wasn't understood didnt' really hold me back.
I'm not sure if you saw my comment above, where I discuss this more at length (and t.harangi with practical experience). You are correct in principle, in that you are describing exactly why our "passive" vocabulary needs to be higher than our "active" vocabulary, but are totally off on the scale and application.
30,000 of passive, "known words" that you can understand is about the threshold for Romance languages and German where you can indeed understand the native speaker of the foreign langue. It will be higher for Slavic languages and lower (say 15,000 words) for English. Your active vocabulary that you use in conversations will be much lower.
It is called "potential fluency" because, although you'll be able to understand the other person, you need to word on your half of the conversation and covert some of that 30K to active vocabulary you know how to use.
I just wanted to chime in about the known word count, especially to any LingQ newbies reading this. When I first started out here, I was improving my Portuguese and reading tons of imported articles. I created LingQs for those vocabulary words I didn't know and then clicked to the next screen and eventually to the Complete Lesson button.
At that time, I didn't appreciate that all the proper names (first and last names of people + geographic locations) as well as foreign text (usually English) were being added to my known word count, badly distorting it. After a couple of months, I realized my error but it was too late. My KWC is over 33K now, but the actual number must be several thousand lower. I'll never know for sure.
When I started a brand new language (Greek), I was extremely careful to Ignore (using the X shortcut key) any proper names or foreign words and now I know my KWC is accurate. If you're a nerd for statistics, like I am, it's worth the extra effort to exclude these words that do not belong in your count.
Have you done some analysis on the frequency of those extra words? Several thousands out of 33K sounds a lot to me.
I don't know any way of seeing a list of your known words, so how would I analyze or even identify them? Several thousand does not seem unrealistic to me in this case. I read a lot of articles about books, films, history, current events, etc. and they were loaded with proper names.
All saved words are listed in the database, aren't they. Alternatively, one could take a sample of texts and count how many words are proper names etc. Might be too much effort though.
I don't personally worry about this though. Even 10% inaccuracy is pretty good in my field.
I think Steve estimated it at 10% at one time.
The problem with proper nouns is that when you reach the upper 20K and above word count, these nouns become an increasingly higher percentage of "new words" you encounter, since you have already likely have learned the most common 20K + words in a given language.
Let's say you read a series like the James Bond books -- the author will likely use similar vocab throughout and each book will have less and less unknown words in it, BUT each book will take place in different countries with different cities, different names, etc. so those nouns will continue to be "new and unknown" despite the rest of the words being more and more familiar.
I'm reading a Norwegian book in French right now. A lot of names and streets in Norwegian would be only blue words I would have in it. If I then read a Polish book in French, and if proper nouns are the only blue words in it? I think after a lot of reading, these words marked as "known," can dilute your word count by more than 10%.
Since the creation of the Netflix import feature, I have been importing all of the shows and movies I've watched there over the years. Many thousands of blue/new words to clear and then either lingq, add to known, or ignore.
I started doing this when I was well north of 30K words and it seems that I am adding words right to "known," lingqing less, and using the ignore button A LOT more.
I'm not sure I would need LingQ any more after reaching such high levels. Or maybe I would change topics so that there's more new vocabulary to learn. I wouldn't import many books from the same author, unless it's Shakespeare or some other with a rich vocabulary.
I'm glad you're bringing up the proper noun thing. So many people don't realize how this can inflate your known word count over the course of a few books.
I know I don't count proper nouns as I feel that a name is a basic given. Proper nouns would inflate numbers along with cognates.
I mark proper nouns as known. Shame on me. The thought behind that is that it is of some value to know that a word is a name. It usually is of very low value though and it differs from word to word really. If I´m learning French it is of good value to know that Germany is "Allemagne" and London is "Londres" and of some value to know that "François" is a masculine name and "Françoise" is a feminine name and that neither of these words means "French" as a language.
But it really isn´t of much use to your French to know that "John" is a masculine name if you already speak English, or to know the name of a place where the name is the same in French as in some other language you know ("Toronto" for example). It also isn´t of great use to see some names that are right out of languages you don´t know at all and simply realize they are names of people, for example: " The African Union is facing a backlash after terminating the appointment of Arikana Chihombori-Quao, its ambassador to the United States " - I´m going to get that Chihombori-Quao is a name and mark it as "known", but I´m not going to remember the word or recognize it if I see it in it´s native language. The words that make up a name might even have a specific meaning in it´s native language, for all I know this name could mean "brave-runner" or "honest-merchant", which I wouldn´t know and then again the words don´t belong to the language I´m learning anyway.
Optimally one should really differentiate between proper nouns like I´ve mentioned in the first paragraph and the ones like in the second paragraph. Mark most or all of the first kind as known and none of the others. It´s just too much of a bother to me. It would slow me down and also change the way I´ve been doing to so far. It does skew my word count, but in the end the word count is a flawed measure no matter what, as others have pointed out already. I see it as something that correlates with how many words I know and how much and diverse reading I´ve done, rather than directly measuring any of these things. I also see it as a measure of setting goals, which can admittedly get out of hand when you focus on "known words" and ignore listening, speaking and writing.
Interesting post. During lockdown (which is when I started using LingQ/Input), I've been obsessing over my known-word count; like you and many others I made the connection between increasing known-word count and general comprehension. But it's important to remember the number we speak of is only the product of input: it is the input itself which determines (or at least is heavily correlated) your comprehension.
If you take your German stats and my Greek for example, my known-word count is 10-12K lower than yours however, my input stats (reading/LingQ'ing/listening) are a lot higher. I'm approaching a million words read and cannot comprehend news broadcasts, I can just about grasp broad themes/scenes in children's books. And I should say I spend a lot of time watching films, listening to podcasts off LingQ too. I also attended classes for 3-4months before I even made a LingQ account (so I hit the ground running).
The known-word count is probably best used to measure monthly progress provided you don't change the way you count words. I suggest your target should not be to hit a certain known-word target but to set daily targets for how much exposure per day/week in terms of hours. I think 1-million words of reading is a better milestone for learners than X amount of know-words - since it's measuring your input and not the product of.
Anyway, awesome post. I saw a post I can no longer find about 10K/20K/30K milestones, I wish there were more of them :)
It was yours! Cheers :)
In my experience, German grammar is much harder to read and requires you to know so many grammar rules and words, in comparison to Spanish and French. Even though I have fewer words in French, I can understand it way more than I can understand any German text. Moreover, lingq's translate system doesn't work well with long German sentences either which is another factor.
I'm experiencing that a lot as I read novels. Most of the words I know, but they are ordered in some "intricate" ways that don't seem to make sense on first read. After translating the whole sentence it often makes sense, but to get some of those things to click can be difficult.
Almost 2 years ago when I started German here at LingQ my expectations were the same - 20-30K known words and I'm a fluent listener. I thought such number would allow me to read and listen everyday content with no trouble at all - things like Wikipedia, internet forums, news bulletins and movies.
30K didn't bring it to me. Although I could read and listen, it wasn't comfortable at all. Going through the news was a pain.
Only now, with 60K, I can just listen to Tagesschau (German news) and enjoy it. But still, today's 20min Tagesschau news broadcast contains 65 (6%) unknown words! And often I know all the words but still can't grasp the meaning behind it.
I'm very optimistic about reaching my language goals, the progress that is possible via LingQ never stops to amaze me. Though I don't think I'm ever going to be satisfied with the results.
i Think steve was already fluent in like 9 language when he created LingQ that he pretty much dabbles in languages anymore with fluency not being the goal. He will even say that some of the languages he’s learned in the past he isnt fluent anymore. I did read in another thread on here that 50k known word in German is close to fluency. Looks to be a good goal to shoot for.
I feel similarly. I had the initial goal of reaching Advanced 2 in French (around 32K). I am currently at around 35K and I agree that I need to push my target up to the 40-50K range to really feel comfortable.
I can read most news articles but novels are tougher. I can get the main point of a novel without looking things up but I definitely miss details. Similarly, I can get the main events of TV shows with French subtitles but definitely can't understand without them. As for conversations with my friend, I am getting better at holding basic conversations but get lost when the topics become more specialized. I'm curious to hear how other people in the 30K range are feeling.
Do you find you are seeing a lot of yellow words still? encountering a lot of blue still? Are you forgetting known words? I'm not at 30K so I can't say for sure whether you should feel a little more fluent than what you do. That's essentially double where I'm at though and I feel like I'll be pretty well off doubling the words known from where I'm at, but who knows. I think German will require a little more than say a Spanish or a similar language. Stopping at 25K may be just fine for a language like that. German might be closer to 40k to 50k. I'll be interested to hear what others say.
Also it seem that you really don't need anymore than 50K known words if you're not interested in reading novels.
But it's quite possible to read and enjoy novels well before having 50K Known Words in LingQ, isn't it? I recently read my first novel in Spanish (El niño con el pijama de rayas by John Boyne), even if I had less than 5K known words back then. The question is then how easy it has to be.
Yes, it is more than possible most writes use the same 1,000 words some a little more. And when I am talking about word most of the words are nouns, pronouns, adjectives, and verbs. If the novel is in the target area that you have been studying or are familiar with you will be just fine.
1000 words is the vocabulary of a 4 year old not the one of a literary writer. This is low level A2 vocabulary.
here's an article on the subject of English vocabulary: https://www.lextutor.ca/cover/papers/nation_2006.pdf
"2,000 provides coverage of 87.83%, 4,000 plus proper nouns – 94.8%, 9,000 plus proper nouns – 98.24%, proper nouns 1.53%. A vocabulary of 8,000 to 9,000 words is needed to read a novel, and even then, 1 word in 50 will be unfamiliar. A few of these will be repeated topic words, but most will occur only once or twice."
I think it talks about word families, not unique words, so I'm not sure how it translates to the LingQ known words count. Those are averages over five novels used in the study. They should be compared with the numbers of unique words in the same novels.
At about 1,000 words a person can read in a second language if they a highly fluent (Masters Degree or extremely well read) in their first language. Excluding articles such as as, is, are, he, she, they and so forth.
Again when I am referring to words I am counting words that are the focus of a novel. Most writers do not write at a high literary level. And most people do not read at a high literary level.
I teach English as a Second Language and my students can read English at about a 3rd grade level within 6 months. It takes them about 2 years to read collections such as The Hunger Games, The City of Ember, and Divergent. They have instruction for 2 hours a day plus additional homework. Books are best understood if there is a series of books so that a learner becomes invested in the characters.
The majority of my students are from Mexico. The second language must have a similar base vocabulary and include a large amount of cognates (cognates are also not included in the 1,000 word count).