Known word goal

Hello. Since the 35000 known words, that Lingq says would be Advanced 2 for high inflected languages, are really far from being advanced, I was wondering what the actual end goal of other Lingq users is, when it comes to known words. In my language,Czech, I guess 140.000 are a reasonable goal for a good level literacy.

1 Like

I don’t really have a great idea, but I’ll point to Steve Kaufmann’s stats. He’s at 67,000 in Czech…90,000+ in Russian. I’m not sure how he considers his level in Czech, but I think he’s quite fluent in Russian.

I guess what is your definition of a good level of literacy? Someone who is fluent and can take part in conversations on a wide variety of topics and read and understand just about anything they pick up (other than a highly specialized tech or science book)? I would think 140,000 is way more than you would need.

Again, not really sure, just thought I’d throw a couple of datapoints in for further discussion by others who probably know a lot more than I.

By good level of literacy I mean to be able to read content and literature on a native level. How this would translate into fluency in the real world is a different question.

I think measuring literacy based on known words will have a huge amount of variance between languages.

I’m currently close to 36k known words for Swedish. I’ve been comfortably reading fairly serious books since about 15-20k known words. This includes novels written for adults and pop psychology books. This is after 800k+ words read. Swedish is not very inflected compared to Slavic or Romance languages. Also, much of the more complicated technical vocabulary is compound words or basically the same as in English, which in some ways makes reading more complex texts easier than more basic ones.

At 31k known words in Korean, novels for children are still a slog and I come across tons of new words anytime I try to read simple news articles. Subtitles for K-dramas are just starting to feel at an okay difficulty. This is after 220k+ words read. I’m hoping by the time I’m at 500k words read in Korean things will be going smoother and I’m comfortably reading more complex books, but it wouldn’t surprise me if I needed to be at 100k+ known words, 1.5M words read or more for my Korean to feel similar to where my Swedish is today. Since the more complex words are much more likely to have native Korean or Chinese roots than English/Latin, I don’t anticipate a similar speedup with bigger words.

1 Like

I am at almost 61.000 czech known word now but many more complex texts still have up to 5-6% of unknown word in them. By that I mean the amounts of words that I lingqed in the lesson after I read it and marked all the words I actually know. I often hear that the first 3.000 words cover 95% of all speech. From that I can only summize that the 61.000 words as counted on Lingq dont amount to a lot when it comes to word groups. I would also assume that a big chunk of the words are only recognizable for me in written form in context. So the 61.000 known lingq words might actually convert to only 2.000 word groups actually known in real life away from the computer. So even though I have almost twice the amount of words of what Lingq considers highly advanced, I am actually barely scratching B1.

On the plus side reading is becoming easier and easier. I am just about to finish a 550 page novel in czech, which is becoming a quite comfortable activity, even if it takes me much longer still than it would take me to read it in English or my mother tounge German.

When I am thinking back to my first steps made in this language, where I did not understand a single word and everything about the language frustraded me to now, where I am reading novels and watching television and derive geniune entertainement from it, I have to say that it was a hugely satisfactory journey.

2 Likes

I think those statements where they say X # of words covers 95% or 80% of the language are misleading. I think they really mean that you will recognize 95% or 80% of the words you read in everyday text (news, pop novels, etc). However that last 5% or 20% is not 5 or 20% of the rest of the language. It’s far more than that. By that I mean…let’s go with your numbers that 3000 words covers the 95% most used words. That remaining 5% is not just 150 or so words…It’s more like 10,000 (maybe more or less who knows). That 5% that you don’t understand is a bunch of rare interchangeable words that don’t appear frequently.

I saw some “experiment” where they show you in english (my native tongue) what it means to understand 80% or 95% or whatever it was. It’s quite a shocker what that really means and how little you know with just 80%. Can’t remember if they had a 95% example. I should see if I can dig that website up.

1 Like

Here is the article I was trying to find:

What 80% Comprehension Feels Like - Sinosplice

2 Likes

Thanks, those example paragraphs in English with nonsense words exactly illustrate the issue; reading that felt exactly like reading a Russian news story for me. The bulk of the sentences consist of common, everyday words that learners will know well, but it’s the specifics of the situation that require specific, less common vocabulary. So even though you know 95% of the words, you do not get 95% of the impact.

BTW, in software/product development we often refer to the final 5% that takes 95% of the time/effort.

1 Like

I’m not sure what you mean by “advanced,” but I have found that completing “Advanced Level 2” in Spanish, which was reaching 33,200 Known Words (previously 32,500 for some reason), was more than enough for “fluency or potential fluency” (solid B2.)

1 Like

Interesting article. That is exactly what I meant. Ever greater amount of words will be needed to close tinier and tienier gaps of vocab.

So 61000 known Lingq words got me to around 95% understanding, but the lion share of known words I will need to aqcuire will be needed to close the 5% gap as well as possible (though it will of course never be 100%).

It follows that even a high word count of 61.000 is only a fraction of what is needed for an advanced level. So how Lingq can state that 35.500 known words are highly advanced, I dont quite understand. I am hungry to get my count over 100.000 and higher. This website is becoming an adiction:)

2 Likes

I’ve read some post that the levels roughly correspond to the CEFR levels (A1, A2, B1, B2, etc.). I don’t know to what degree that is true or not…and certainly this only applies to reading/listening as it pertains to lingQ (moreso on the reading end).

I just recently achieved Intermediate 2 which might then say I’m a B2 at reading (probably not listening). I suspect I’m still more B1 than B2 based off this “self assessment”:

https://rm.coe.int/CoERMPublicCommonSearchServices/DisplayDCTMContent?documentId=090000168045bb52

So maybe the “Advanced levels” you can knock down a peg or two as it relates to the CEFR levels.

Anyway…in the end it doesn’t matter. You keep going to the level that satisfies what you need or what out of the language…these levels are all guides or markers (maybe just feel good goals).

It is addictive! I agree. I keep wanting that next 1,000…next 10,000 perhaps now since that’s when my next “level” is.

Maybe it helps, i studied russian on another account here, started reading harry potter around 14k known, was hard with around 25-30% unknown words, but kept at it. Finished all the books within 6 weeks (i no lifed it) and i remember being around 46k before quitting with 1.5M words read (didn’t quit russian, just lingq). After that i read 7 books from Brandon Sanderson and after the 3rd it really became easy to read, maybe couple words per page were new to me.
Right now trying out an audiobook in russian and it seems quite hard, but i can understand the majority of it.
Hope this was useful, as russian should have a similar number of known words as czech, since they are both inflected languages.

How has your Experience with Russian here on LingQ been? I´m at 46k words right now an I´m comfortably reading books (mostly non fiction)

Kinda arbitrary goal, but I am aiming to get into top 10 on this site for French which is around 58k words. It will take a while though since most content has fairly limited # of unknown words. Im guessing it will take around 20 more books (almost all my reading content on here are books these days). That should put me somewhere between 3-3.5m words read I think

I am already transitioning to reading most of my content outside of LingQ tho so I am at a pretty advanced level reading-wise at this point.

Ultimately my goal is to get to the magic 98% comprehension level that allows for unassisted reading. I don’t know about other languages but with French and German, clearing Advanced 2 got me pretty close to being able to do that. With French I was able to tackle paper books after 30K known words --not perfectly, but it was manageable. And by 40K in French I feel I’m well past 98% for most of books I read – and I’m even able to manage Les Miserables on a Kindle, which I think is considered the most complex book in French as far as unique words.

It’s worth noting that LingQ’s “known %” indicators are calculated differently than the academic definition. In academic studies, they refer to running words, meaning that if every 50 running words you read, one may be unknown (2 out of 100) you have a 98% comprehension. LingQ on the other hand, identifies the unique words within a lesson and tells you the percentage of the unknown unique words (not the unknown running words) – and they include proper nouns, which the academic studies dismiss.

Anyway, if you’re a reading a paperback and you only have to look up a 2-4 words on a page, you’re in really good shape. And 35-40K known words on LingQ will allow you to do that.

3 Likes

I am surprised about the low numbers of needed words given by you and everybody else on this site. I had a look at a finished lesson out of the czech version of Master and Margarita. The lesson has 1068 unique words (how many proper nouns I dont know). Out of these I have created 96 Lingqs. This is 8,99 % unknown words with a current word count of 61713 known words. Unassistant reading would not be possible. With 40k known words it would be incomprehensible. I am still creating lingqs left and right even with this high number of known words. Not that I am complaining. I want to keep learning and am happy that all my material keeps giving me loads of input. I have no desire to get complacent with too comfortable material but when people tell me, that with 30.40k they feel good to go and have reached fluent literacy, I am still wondering, why my experience is so different from theirs.

Your experience is different because the language is different. From what I remember, it is squarely in line with others studying the Slavic languages have reported.

When I wrote my reply a few days I was only speaking to my experiences “in Spanish.” And what others like t.harangi have commented for French seems to jive with that.

In my personal experience based on what I’ve done in Spanish, and my opinion based on others experiences in other languages over the years, is that the LingQ Avatar targets are only reasonable for Romance languages. They are decent, but formerly better, for English too. The reason is that the staff at LingQ have changed the thresholds up and down to correspond to more attainable levels when someone is doing a 90 Day Challenge or other lingq-focused activty. I don’t think this was a good idea.

Define what your goal is. For most people, and me, the goal was fluency. That comes with speaking and writing a lot. To get there you need a base of passive vocabulary (“Known Words” in LingQ Land) that you can more or less understand in context and which you active during the speaking and writing stages.

In English, that number is somewhere between 10-15,000 words. In Romance languages like Spanish and French, that is 20-30,000 words. In highly inflected languages, like Russian and presumably Czech, that number is 90-100,000 words based on learners I’ve spoken too. I remember ftornay and jaliscostate tell me this and that something special which I can’t remember happening around 60,000. I saved that advice, but don’t have my notes handy. The latter is not active much anymore, but try ftornay and khardy are here. If it helps, I remember that when Master Steve was learning Czech (watch his 5 Days to Fluency videos) he said that at 28K known words and 20K lingqs, he was able to read newspaper articles that were on particular subjects he was interested in. (probably politics, history, etc)

English is the most researched language on this front. We have good data for it, but less so for other languages. I have long said, but done nothing about it, that someone needs to take the time to go through stories like “Who is She?”, Steve’s book, and other stories on LingQ that are common to many languages. They need to paste all that text into different micosoft word or other program and come up with the number of words in the same stories across multiple languages. From there we can have a least a back of envelope/vaguely scientific “conversion factor.” Even if we don’t use that for the Avatar growing stages, we can at least use it for real learning milestones.

Using English as our base of 10-15K known words, we know that the conversion number is 2-3 for Romance langauges. We suspect it’s 6-9 for the Slavic languages, and even greater for Korean it seems.

Someday I will figure this out, but if someone with more time or energy beats me to it, God bless them!

1 Like

This makes complete sense. Also when we go with your higher numbers (15kx9=135k) we almost have the 140k that I stated as an estimate for fluent literacy in my original post.

1 Like

where did you find these conversion factors? I read on a ling forum a while back with conversion numbers etc and I can’t find it. You are currently doing this copy paste method? I wish there was a comparison of 10,000 spanish words equals x chinese and x german etc. These kinds of conversions would be extremely helpful.

If you read about “conversion factors” on the forum, it was probably me saying the exactly same thing (or thereabouts) as I did on this forum becuase I bring it up every few years, usually when I’m wondering myself or someone elese is asking.

The numbers that I cited are from my own experience with Spanish as well as other LingQ learners with their languages. I think we can be very confident with the Romance languages becuase of all the collective experience here, we can have an idea for the Slavic languages as well, again from that collective experience. However, I believe we could make this a little more scientific and reliable using the copy and paste method I described, along with a few other things, including discussions with forum members who are experienced in this languages. I’m not currently do this, but I will someday. If someone else other than me, with A LOT more time and motivation should take the initiative, that’d be okay by me.