Feature Request, the ability to know that a word belongs to which group of Frequency words

mentecuerpo · February 8, 2020, 1:36am

For example, look at this paper by Nation.
How much input do you need to learn the most frequent 9,000 words?
http://nflrc.lll.hawaii.edu/rfl/October2014/articles/nation.pdf

Group
1 1000
2 1000
3 1000
4 1000
5 1000
6 1000
7 1000
8 1000
9 1000
10 1000

It would be nice if every word in the top languages can be classified to a frequency group from one to ten; group 10 will be the lowest frequency word list.

For example, the LingQ team can use ten books in every language as a baseline to create the frequency list or any other method. Then the language learner can see to which group a word belongs.

The problem I have with Italian, for example, is that my unknown words are all low-frequent, but it will be helpful for me to know how low frequency the word is.

I think LingQ developers can add this info to learners databases.

Thanks.

aronald · February 8, 2020, 3:08pm

A nice feature would be to show us the % coverage of each word and just add them up to get a total coverage. I thought about just doing this myself but LingQ won’t provide me with a known word list.

A roundabout way for you to figure this out yourself wouod be to load in separate text documents that have words organized by frequency. So, get a 50k word list here:

https://en.m.wiktionary.org/wiki/Wiktionary:Frequency_lists

organize it by frequency and then create separate text files of frequency and load those into LingQ to see what % known you have from each group.

Ive loaded the top 50k word lists into my lessons to see how many words I know come from those lists. It’s nice to see the unknown words keep dropping.

mentecuerpo · February 8, 2020, 4:38pm

Great Idea.

Before I load them to lingQ, I can separate them by 1k each group and put a title GROUP # to separate each group to help me visually see the word frequency group a giving word belongs.

I can put all the words in one big text for one upload (I wonder if it will take all the 5k?), or I can create five different uploads with 1k each.

aronald · February 8, 2020, 6:28pm

I would probably do both. LingQ usually breaks text into groups of 2000 words but for some reason some of these 50k word uploads stayed as one leason. I never actually open these lessons because I like knowing how many of the words I’ve come across naturally through reading, and i don’t want to throw off my words read count. Also, looking at how the number of blue and yellow words change over time is interesting.

Your original article is based on word families and the wiki page is word forms. So just use % coverage and create groups based on that.

Also, keep in mind that those wiki words are based on movie subtitles. So they include proper nouns. You can go through the list and remove them or just let them be and realize that your actual coverage will be slightly better than what it’ll show.

t_harangi · February 8, 2020, 7:55pm

One problem is that the paper you quote talks of word families, and LingQ and the Wictionary Frenquency Lists are using individual words as their measurement – which is the better way to do it.

Ultimately, the answer you seek is in the paper you quote: If you keep track of your total words read, you should have a decent idea of where you are in the comprehension of words on the frequency scale.

But I think with an input based method, it’s a bit pointless to worry about the where words land as far as frequency – you will learn them in the order you encounter them.

And from a practical, non academic sense, it’s kinda pointless to have 10 different levels – you really only have 3 levels in real life: the first 3000 word families, to get you to a conversant level, then the 6K-7k range to get you upper intermediate, and then everything else.

And LingQ will tell you where you are in the Beginner to Advanced scale based on your total numbers.