I think as you approach a very large number of known words (for Russian that would be, say, above 60K - steve has about 90k and that's enough to crack top 10 on the leaderboard) the new words added become less and less meaningful. In many books, news article, etc - especially if you upload them yourself - there will be proper names, words from other languages, acronyms, etc. Eventually most of your words added will be from these groups, rather than simply rarer words from your target language.
This is absolutely true. The law of diminishing returns starts kicking in pretty hard once you get into the neighborhood of Advanced 3, which I think is Advanced 2 + 10K words -- something like that. This is why I decided to "Graduate" from LingQ after I reach certain thresholds, which for German was actually at 50K known words. Though of course one can keep going with it further, especially if they're working with specialty materials and vocab. But I feel that for reading books, which is what I'm primarily concerned with, there is an actual benefit to switching over to unassisted reading at a certain threshold. (Much like turning off subtitles on movies.)
Also, with all these word counts, it should be pointed out that adding proper nouns -- names of people and places -- to one's known word count can significantly inflate one's numbers after a while. I know some people do this, but when you read books, the number of genuine unique TL words you encounter will diminish, but the number of unique proper nouns can remain constant from book to book -- different locations, different people etc. So, if you mark proper nouns as "known," at some point after 25-30K, a larger percentage of the known words you'd be marking would start to become proper nouns, and thus can inflate your word count pretty significantly after a while.
All that just to say that, yes, after 40K of conscientiously marked individual words, those diminishing returns will start kicking in for sure.
I have 88485 known Russian words. Despite this I have found a course of 4 lessons which have 88 (25%), 66 (22%), 90 (23%) and 93 (21%) unknown words. The 1st lesson has just 8 names of all 88 unknown words, all other words are Russian meaningful words.
I still have lessons with more than 25% unknown words in German, despite my 69635 known German words.
I remember a Russian course with interesting topics recorded on radio. Each lesson there was 40+ minutes long and had hundreds unknown words, despite I marked hundreds as known every previous lesson of the course.
So, this is where it gets interesting, because LingQ calculates the percentage of unknown words in a lesson differently than the academic readability studies do. If you have a lesson of a total of 2,000 words, it's not calculating the unknown % against the 2,000 but against the unique words within the 2,000, so if it says 88 (25%), that means that there were 352 unique words in the 2,000 running words. (88= 25% 88 x 4=100% =352)
BUT, academic readability is calculated against running words, so 88 = 4.399% of 2,000 -- and if you take out the 8 names you end up with a clean 4%, meaning you had a 96% readability of that lesson, and with German, my bet would be some of those will be easily understood compound words, getting you closer to the 98% readability mark.
There are of course many reasons and benefits to continuing with LingQ past the Advanced stages, but I've always been interested in the number crunching aspects of it. And my main goal has always been reaching the unassisted reading levels.
Thanks for sharing this, I like how you think because you're searching for real conversion on what you do which is exactly what I'm trying to figure out now for my learning process.
I'm at 500 words for German, so I need two zeros 00 more for reaching you. ha ha
But to be honest, I'm not even interested in the numbers but just in the conversion process and in reaching the level I want for each language I practice.
So, keep sharing your strategies and intuitions as they could be really interesting to read.
Thanks for the input, that's indeed very motivating. Your feedback also perfectly fits my own experience at LingQ: massive exposure is the way, keeping hammering content exposure is the key to language acquisition.
Moreover I completely agree with you regarding targets, they seem to be totally off when it comes to inflected languages, especially slavic languages. I reached the 30,000 milestone in Russian today and I feel nowhere between C1 and C2 comprehension level, even though I start to understand more and more. My comprehension ability drastically skyrocketed during the last months, reaching 700 to +1,000 known words a day. I guess we should shoot for around 100,000 words for an actual C1 comprehension level in Russian.
In my opinion LingQers' experience and feedback are more accurate and relevant. A way higher word count for upper targets, namely Advanced 1 and 2, would be more precise and motivating. I get extremely motivated by seeing people reaching high milestones and describing their language evolution, it really stimulates emulation.
Viel Erfolg beim weiteren Spracherwerb!
I don´t understand how your activity rating can be over 44K, looking at your stats. Mine was about 13K when I learned about 1K words a day for 3 weeks straight and listened a fair bit too. Wonder how it´s calculated.
For anyone comparing read words to known words and trying to figure out what is normal and what is believable, this is not a simple thing. You can´t just compare your ratio to someone else´s and say it´s unbelievable if they are vastly different. It depends on many factors, among which I could at least think of the following, all of which the higher, would increase your known to read words ratio.
1) How much of the language did you know before studying it on LingQ? 2) How much alike is this language and other languages you know (and to which degree do you know them)? 3) How talented and experienced are you in language learning? 4) How well can you concentrate when you study? This both relates to your ability (talent) to concentrate and whether there are distractions like background noise (are you alone at home in peace or doing it on the train on your way to work?). 5) How good is your memory? (Technically this is really a part of number 3 anyway). 6) How much are you listening, speaking and writing the language and how much are you reading it outside of LingQ?
Now I was already fluent in 5 languages (Icelandic, English, Danish, Swedish, German) in November when I started using LingQ and I knew enough French to read a book like Le petit Nicholas. I studied French and then went full on into Dutch at the start of April with almost no experience and about 600 known words from looking at it briefly in LingQ.
My French is now 46K known words having read about 1,43M words
My Dutch is now 26K known words having read about 470K words
How can my Dutch known words ratio be so much higher than the French, even when I had learned no Dutch prior, but plenty of French? Well mostly cause Dutch is like an alternative form of German with some English, Nordic words and French words thrown in there. I almost knew it before studying it.
Are my know words to read word ratios strange or unbelievable? Only if you have fixed ideas of what is normal in that category I think.
Actually, I was a little sleepy when I wrote that post and there are some mistakes in it. The biggest reason why German will have more "words" to learn than most other European languages is probably that German likes to make compound words, where many other languages don´t f.e. "Familienkrise" (DE) but "family crisis" (EN)
Also, that fact that I have a much higher know/read word ratio in Dutch is not only because of how similar it is to other languages I already knew, but also because the ratio will probably drop after you have learned a high enough number of words. There ratio of unknown words will simply start dropping the more words you know. You will also pick up commonly used words first for the most part, cause they will appear repeatedly in your reading material. Then there will be less and less common words and more rare ones that you don´t already know, making it harder to learn them, cause they don´t repeat much.
I have 9,646 known words in Italian having read 343,733 words.
For Dutch, it's 8,382 known words having read 397,636 words.
I'm a native German speaker and I also speak English fluently, but I've also been learning French and Spanish for many years, but I'm certainly not fluent in them. And yet my Italian ratio is better. Why? Dutch doesn't have as much material, so I've been using Intermedia 2 or Advanced 1/2 material for quite some time whereas I'm still using Intermediate 1 material for Italian. Lessons tend to be shorter (I review words more) and easier. Also, I'm usually reading literature in Dutch these days. If I know what a word means, but the spelling is outdated, I just ignore the word.
I find that I understand the majority of Dutch words in writing the first time I see them.
Interesting that you mention outdated spelling and that would explain a lot. I quite often come across Dutch words in old literature like Anna Karenina or Don Quichot that google translate doesn´t have a translation for at all or has a translation that can´t possibly be correct for the text I read it in, but I understand the word because of how similar it is to a word in another language I know.
I came across this text in Anna Karenina for exmple: ".... was een grote, knappe jongen, met op zijn hoofd een Schotse mute, ..." and mute (NL) is just translated to mute (EN) / stumm (DE). But to me this was obviously just some Dutch version of the German word "Mütze" (cap in English) and even more obviously you don´t carry a "Scottish inability to speak" on your head.
Funny, I'm currently reading Anna Karenina, too. I read somewhere that they had a spelling reform in the 1930s or 40s, that would mean that it's not very likely to find free classics using modern spelling.
Yes, even when I adapt the spelling of some words, I don't always find them in dictionaries. I don't recall which work it was, but one work was full of them that it almost became too much.
But even if I understand them just fine, if the spelling isn't the same, I'll only mark it as familiar, in the past, I'd only mark it as recognised. Do you mark them as known then?
The version of Anna Karenina I´m reading is not the one available in LingQ in Dutch. I found that one in a course called "novels" and it doesn´t have all the chapters. What I´m reading now I found on the internet in a pdf and I´ve been importing parts of it into LingQ by copying the text. That translation is not the same as the one available in LingQ and they certainly do not match.
I mark these words as known, to answer you question. I mostly try to mark words from other languages with the x - do not count or show, but I usually mark personal names and names of places as known words. That in itself is not necessarily completely off if you thank about it. With personal names, at least you know it is a name and even a little more about it. I know that Klaus is a name, it´s probably and German and almost certainly a man, to name an example. With names of places, these can be unique to the language or they can not be. It seems ridiculous to mark "Washington" as a known word in German, French, Dutch, Norwegian etc. but what about places that have unique names in different languages, like for example: The Netherlands(EN), Nederland (NL), die Niederlande (DE), les Pays-Bas (FR), Holland (IS) - die Niederlande ....... New Zealand, Nieuw-Zeeland, Neuseeland, Nouvelle-Zélande, Nýja Sjáland.
Even compound words, which I will mark as known all the time, can have different meanings than the sum of the words they are made from, like how "Ohrfeige" (DE) does not mean ear-fig. If you think about it you also sometimes lose known words (especially in languages that don´t do compound words so often) when two words or more in sequence don´t mean just the meanings of the separate words combined, for example you don´t necessarily know what a "hot dog" is when you know the meanings of "hot" and "dog".
I see, thanks.
Yes, I also tend to mark loan words with the x, unless I don't know the meaning. I learnt that "fast food" is used for a fast food restaurant in French. Since that wasn't transparent, I added it. With personal names, I ignore them if they are identical to versions I already know. I'll ignore José in Portuguese because I already know it from Spanish, but I'll add personal names where I didn't know before what they mean. Country names I add if they differ from German.
I agree about compound words, in Romance languages also because you can't be sure how exactly they'll word it. Even if the compound has an equivalent in e.g. German, they'll word it differently.
I think you put a lot of thought into which words you mark in which way and are quite concerned to measure your known words correctly. Seems like a very German thing to do to me.
I don´t put that much thought into it, since I´m here to learn, not so much to precisely measure how many words I know and how well I know them, although that is a nice bonus to monitor progress (in literacy at least). I just accept that the total number will be somewhat inflated and is not to be taken too literally.
Nonetheless I sometimes find myself getting carried away a bit in a competitive spirit in trying to get as many known words as I can. That probably gives me something of a positive bias in when to mark words as known, but mostly I feel it makes me neglect listening in favour of reading.
I can't reply to your other post, so I'll do it here:
Haha, no, I'm not really that consistent, actually, I often hesitate between recognised and familiar enough and I'm not always consistent when it comes to making them known. But I do like the stats and I don't like to inflate them as they help me stay motivated even when I feel like not progressing at all. But I also noticed that being too strict started to have a negative effect on my motivation, so I haven't found the right balance yet.
I prefer reading over listening, too, so I have to make sure I don't neglect it.
I actually completely overlooked one thing that relates to read / known words ratio and it´s how much you review your lingQs, probably because I almost never do it myself. That should of course push the ratio up the more you do it.
Extremely inspiring! Well done and thanks for the post. I am currently learning Russian and improving my knowledge of French and German. I have set my words read per day so that I should be at 1.5 million words read in all 3 languages by the end of next year. I am hoping this will get me near to the Advanced 2 stage. I am particularly inspired by how well you understood the films with a vocab of 50k, I will definitely set my sights on this within the next few years.
On LingQ the person with the highest amount of known words is around the 200K area.
The person with the highest known words in French is only in the 100K area.
This makes me fee like LingQ's milestones numbers are off.
To reach Advanced 2 in German on LingQ takes 30,250 Known Words. But French takes 33,200. I wonder how LingQ came up with these numbers to represent milestones.
They're off. I've read or heard somewhere that these 30.250 and 33.200 known words are somewhat arbitrary numbers, since it's hard to evaluate what is Advanced 2 and how many words in each language are necessary to reach it, but people need targets and these values help.
All of the languages that have declensions/conjugations or other complex features (e.g. compound words) should have higher target value for Advanced 2 (e.g. Italian should have higher set value than English, but lower than German or Russian).
There's a caveat, though. If LingQ puts a more realistic 120.000 known words target for Advanced 2, how many LingQers will actually reach this level? Lots of people will be demotivated to see how far they should go, so these modest values of 30k words help to get people going, and then they understand that the values are off, but keep working towards their goals.
I understood that the numbers are wrong when I passed Intermediate 2 level and I was nowhere near this level, then I saw Steves video where he says that at 30-35k in German you can start speaking quite comfortably. What I heard him saying was: "Your speaking level will be B1/B2 at 30-35k". And that's true for me.
I'd have to disagree with the Advance 2 milestones being off. In my experience, clearing Advanced 2 here on LingQ will put you in the minimum comprehension levels for a C2 exam. It will also allow you to read paper books with manageable ease, and you can start understanding movies without subs around the same time -- and movies are usually the hardest thing to comprehend in a language.
I don't think you need to mark 120,000 words for these. I think if you can read and watch movies without assistance, you're Advanced 2, everything else is just gravy.
Also, I'm a big fan of Steve, but him saying "Your speaking level will be B1/B2 at 30-35k" -- I'd take this with a grain of salt. First off, it's really hard to tie speaking ability to known word counts alone. Fluency is a combination of comprehension and practice, and 30-35K with a lot of listening and regular speaking practice can easily make you fluent past the B2 level.
I exaggerated with 120.000 :) I just do not know where is the C2 threshold for me, somewhere in between 80-100-120k, but Advance 2 milestone was more like a confirmation of B2 reading & listening ability. At least for me. Only now I feel like I'm nearing C1 in passive knowledge.
Sorry, my writing caused confusion. Steve said in one of his old videos that at 30-35k in German you can start speaking quite comfortably. In my head I interpreted what he said as "Your speaking level will be B1/B2 at 30-35k", since at the time I saw that video I was at 25k and I felt to have reached B1.
The reason for this is some languages have more cases/declinations/conjugations than others. Thus it´s much easier to reach these high numbers of known words for such languages. German has different and diverse forms of the same words because of cases/conjugations to a larger extent than French does, although the fact that LingQ will also count homme, l'homme and d'homme as 3 words does push the "known words" number up quite a bit for French. So yes, you are right that the LingQ bars for each level in each language are sometimes off.
Mark, the funny thing is also that I looked at the person with the highest amount of French words and it´s a native French speaker. Not so much learning French as just documenting how many words they know in their native tongue. I remember that the person who was #1 in known French words previously had 80K+ words, then all of a sudden the current leader with the 100K+ words started on LingQ and very quickly got to the 100K (think they had 30K+ in a week once).
Congrats on the milestone and, I have to say, my experience with Spanish has been very similar to what you described above. I have reached about 60k known words in Spanish and I'm still adding known words regularly, so the plateau really hasn't been a factor yet. I have read about 4,000,000 words in Spanish so the ratio is about 66:1 read/known words, although it is slowing down recently. These days it takes about 20,000 words read to achieve 100 known words, so the ratio is now 200:1, but its still increasing! Can't wait to chase your stats in German one day
1. "It gets easier along the way." I found that true with Russian, my first foreign language. I now recognize lots of roots, the prefixes make intuitive sense, the context is clearer, etc.
2. Interesting idea. I do that regularly with English and with one or two particularly slow-paced Russian YT channels, but I've not tried it broadly with Russian.
4. It's curious that you find German farther removed from Russian than Romance languages. I've never really studied any of those., though I did have some Latin. Some Latin grammatical concepts certainly are similar to Russian. I've just started German (just ~1k words at this writing), and I've been surprised by some words that I recognize from Russian, whether borrowed or from a common PIE source: Leute, probieren, etc. On the other hand, while reading German I have been thinking that it must be a challenge for Russians. There are so many little helper words! Why use just one word when three or four, scattered throughout the sentence, will do? )) Also, the role of the words in a sentence doesn't seem as clear from their form as it does in Russian, though not as bad perhaps as in English. It's certainly easier to recognized the grammatical gender of Russian words than German.
As a native English speaker who has been studying Russian, German feels like a weird but familiar uncle. I see so many similarities: Familiar verb tenses; similar verb conjugation (if thou knowest what I mean); German separable verbs vis-à-vis English phrasal verbs; lots of cognates; the articles -- the blessed, glorious articles: It's perfectly clear whether it's a girl or the girl! And so on. I almost pity the native English speaker for whom German is his first foreign language, as he may notice mainly the differences and not appreciate the similarities.
Congratulations! It's always interesting to see how people with similar approaches have similar experiences and results. The ballpark of 2 million words for 50K is pretty much how it went for me with German. And as you said, there's no such thing as a plateau, with the reading method.
With other languages, I think the ratio of words read / known is different, just because German has so many compound words. After about 45K, most of the stuff I was encountering in German were compound words. French is different for me, I'm closing in on 40K after 2.1 million read.
Really? I checked and I have 3220340 Words of Reading. And 20,000 known words and Term (48054) Lingqs that are not white. I dont count a lot of compounds or genitive words though.
You are a big inspiration for us. If you can do it, then everyone can do it. We all have 24 hours in a day. So no excuses. Could you share your journey for German language? How did you start off? How did you get your hands dirty? How long have you been learning the German language? How many hours did you spend on a daily basis? Anything you would like to share.. Thanks
I learn since July 2018 and I use LingQ for German since February 2019.
The minimum is 2 hours a day (with few exceptions when I am too busy). I started learning with Duolingo, Anki, and Deutsche Welle, then Italki lessons after 3 months and usuccessful attempts to watch movies/tv series. In Jan 2019 I noticed that I do not progress past A2: not what I expected after learning Italian to B1+ in 10 months through speaking. I dropped all the useless tools, reduced Italki lessons from 3 to 1.5 hours/week and focused on listening and reading. And I stopped worrying that my speaking is weaker in German than in Italian even though I spent 3-4x more time on learning German.
Podcasts: Everything for A2/B1 level from evgueny40 in the beginning, then EasyGerman Podcast until it gets too easy, Finanzfluss, Der Finanzwesir Rockt, Mach es einfach.
Books: Remarque (Die Nacht von Lissabon, Lieber deinen Nàchsten), Zusak (Die Bùcherdiebin), Exupéry (Der kleine Prinz), Lindgren (Karlsson vom Dach, Karlsson fliegt wieder), Gebruder Grimm (Màrchen).
After having done intensive reading and listening for a while, did you go back to attempt to watch TV series in German and how are you coping with it at the moment? Any feedback on this.
Last week I watched 3 movies to test what I'm capable of. In brackets is the comprehension level in %: Pulp Fiction (95), Ziemlich Beste Freunde (98), Der Pianist (98). All of these movies I saw in Russian in the past. What I did not like is that everything is slow compared to books, audiobooks or podcasts. The content density is low and a lot of the time you just sit and wait for heroes to get from A to B before they start talking. So, I will not use movies to learn, only to have fun and relax.
Thanks for sharing your experience.
Truly? I have a hard time believing this. Now to be fair I only count words that are active and I don't count a lot of compound words or genitive forms. There is also just other words I do not count and now I am not counting more because I like the number 20,000. I have been doing this for 4 years and just a month ago I was doing 8 to 12 hours a day on linqq for German. I checked and I have 3220340 Words of Reading. And 20,000 known words and Term (48054) Lingqs that are not white. I have classes and such and have long since been able to understand most of what I hear, so it just seems weird.
For me the known words are the ones I recognize. Then I put all the words to known: all conjugations, all compound words, single/plural forms. The exceptions are personal names and places - these I put into Ignored. It means that if I try to count only roots, then my "real" word count could drop into 5.000-7.000 range. Where am I exactly? I cannot evaluate..
You, on the other hand, are very specific in how you choose your known words for this reason your 20.000 could be much closer to your "real" word count.
There's so much variety in how people could use LingQ that it's hard to compare results, since we all have our own idea of how to learn the language.
I think marking compound words and and conjugations is totally fine. On LingQ, we count individual words, which I think is a better way of doing it than marking root words. And that is part of the reason why the thresholds are in the 20-30K range, because you need to mark that many individual words in oder to truly understand the @10K root words that they're based on.
Hey Oxygen. If you don´t count compound words, names, conjugations, declinations etc. your word count will not make sense if you compare it to that of others who mark all of these as known, which I suspect is the case for most ppl on LingQ. People also have different standards of when they mark the words as known. Some will do it as soon as they sort of recognise them, others only when they really know them well. I couldn´t possibly tell you if it´s strange how your word count is so much lower than Serge´s when it you both seem to describe yourselves as being on a similar level. Many of things may play into it and you may not actually be on similar levels in reality.
As you say, there is a huge difference between passive and active vocabulary. Many words that I know in context, I would never be able to produce in speech. Also, I wish LingQ didn’t count each different conjugation of the verb as a separate “known” word. I could really boost my word-count by uploading some conjugation tables. LOL
It is the same with every language. Our passive vocabulary is larger than active vocabulary. However, with passive vocabulary you can cherish wonderful literature and enjoy complex conversations. I know some people are very fluent with their limited vocabulary however they can not enjoy the same wonderful literature with limited vocabulary and avoid reading books altogether.
Congratulations and thanks for an interesting post. I'm also learning German, although nowhere near 50k (approaching 6000 words). I'm pretty much using the same method as you are, and I feel the same way about German :)