Finnish Parser

As a user of LingQ almost exclusive for Finnish, I am struggling with the lack of Finnish parsing, specifically the lack of identification of the same word in different forms. The implications of inadequate parsing for a language as morphologically diverse as Finnish renders the whole concept of making LingQs which are to be reviewed later almost useless.

Since any noun in Finnish can theoretically appear in over 100 different forms, albeit with varying frequencies, the lack of any identification of words in different forms greatly impacts the ease of use as well as the utility of LingQ as anything beyond an automated pop-up dictionary for Finnish.

I understand that making an accurate parser for Finnish is no easy task, but there are a number of dictionaries currently available that could be used to identify certain forms. For example, at www.sanakirja.fi one can type in any conjugated verb or inflected noun and the base form of that word will be suggested. This seems to be one of more accurate sources that I’ve found. www.wiktionary.org also has a similar functionality, although this is by no means perfect and less accurate than the previous subscription-based resource.

Beyond making LingQs much more useful, a decent Finnish parser would open the door for many more features which are currently implemented for other languages at LingQ. One such feature is frequency analysis. As it stands, without any identification of Finnish forms, frequency analysis is at best useless, and at worst - impossible. Since the large number of forms will likely make even extremely common verbs seem much more rare than they actually are.

Finally, this post wouldn’t be complete without an illustration of the problem for those who don’t have any experience with Finnish.

Koira - A/The dog
Koirassa - In a/the dog.
Koiralle - To a/the dog.
Koiraan - To a/the dog.
Koiraansa - His/her dog (partitive).
Koirissa - In (the) dogs.
Koirissamme - In our dogs.
etc.

The following meme illustrates the issue ad absurdum.

I would love to discuss the possibility of parsing of Finnish at LingQ.


1 Like

Hi. I study Finnish too. I understand your point. There are more words and we have to look up all of them.
The point is: why is that a problem? What do you need an automatic parser for if immersion is just meant to help your subconscious parse the text better and better? Where would it help me acquire the language better?

There is also the point of lingq having many languages of which Finnish is just beta. If they decide to develop a parser, it would take a long time at the cost of other developments. Also, should they limit this to a Finnish-only parser? Or should every language have a parser? And is it more important than say a good frequency analyzer for Finnish? Or improvements in the user interface and in the other functionality?

Although I understand your point, I don’t agree, yet. I don’t see any added value to the immersion method of learning Finnish.

I agree to some extent. - Having the option to reduce the forms of a lexeme (all forms of a word that share the same meaning; e.g. go, went, gone) to the lemma form (the one you’d use to look up a word in a standard dictionary, e.g. “go” for go, went, gone) when exporting a vocab list would be immensely helpful. As is I have to remove lexeme duplicates manually (and that’s not always the easiest when you’re new to a languge and can’t always spot the lemma bylooking at a word - as would be the case with the irregular verb “go” and its forms “went” and “gone” in English).

  • However, within the review process of LingQ I like that the LingQs are the individual forms - context included. It’s less of a vocab learning tool as a word-in-use learning tool (less “what does ‘go’ mean” and more “so, do I use ‘go’ or ‘went’ here?”).
1 Like