readability formula: learning languages with lix
I decided to learn Russian four years ago when, heavily pregnant, I had nothing better to do. I bought a book called Colloquial Russian off the internet and I read it. Then I turned on my computer and discovered that there were web pages in Russian. Free reading practice, there for the taking! I couldn’t understand them though, apart from the pictures I had no idea whether the content would be worth the time it would take me to translate them. I needed material I was reasonably familiar with to start with.
I went to an academic bookshop in search of their Russian section. The choice was limited, but I found a copy of “The Hobbit”. I couldn’t read it yet, I could barely read the title. Nevertheless I was encouraged by the thought that, when I got good enough, there was a whole book of material I would want to read.
But when would I be good enough? The Hobbit, as I recall, was written for children. Does that mean that it is an easy read? How many months, how many years, would it take me before I could pick it up and read it comfortably?
When I discovered LingQ I was delighted. There was a vast library of authentic material, with audio and transcript, graded by natives for ease of reading. Furthermore I could import my own reading material, and it would keep track for me of the words that I knew and the words I was meeting for the first time. I’d never heard of anything so sensible.
It didn’t help me with reading my Russian copy of “The Hobbit” though. If only there was some kind of formula, some calculation you could do, using a pencil and maybe a calculator, to tell you whether the book in your hand was written in easy, moderate or difficult language.
I remember when I trained to write technical documents. We were briefly shown something called the Gunning Fogg Index, which is a simple little formula to calculate how easy to read a document is. You have to count up, over a sample piece of a few sentences, the average sentence length, and the average number of long words (three or more syllables). Do the sums and you end up with a number. 15 means easy to read, 25 means moderately hard to read (your reader needs to have a good school education), 30 means you had better rewrite it if you want anyone else to understand it.
Maybe this would work for Russian? Although Russian seemed to have more long words and shorter sentences than English. Also (this was frustrating), although document readability statistics were built into the version of Word I was running, the program refused to recognise Russian text as language and kept returning answers of zero.
Someone I raised this with (possibly on the LingQ forum) pointed out that there were several well-known readability formulas, and they all were designed to work on English language documents only. So what do the Russians use to determine how easy to read their documents are? Even Russians didn’t know.
Googling in Russian didn’t produce any results. Maybe I wasn’t using the right keywords. Maybe I was spelling them wrong.
I did eventually find, to my surprise, a result in Swedish. It turns out that, back in the fifties and sixties, a Swede called Björnsson did exactly this. He produced a readability formula called LIX, and tested it for eleven different Western European languages. He found that, although the norms are slightly different across languages, you can use the same formula to decide whether a piece of French is easy or difficult to read, as you can for a piece of Greek. You do NOT need to be able to read French or Greek to be able to use it.
Why didn’t I know this before? Because, it seems, no-one very much was interested. Back in the sixties there wasn’t the computer power to automate the calculations, and besides, for everyone in the English-speaking world, there were already a whole handful of formulas to choose from.
Anderson, however, was interested. Despite the name, not another Swede, but an Australian academic working in educational research. He studied the LIX and published on his findings in the 1980s. In brief he found the LIX to work for English, German, French and Greek, and also proposed an alternative index: the RIX. Given that the RIX is simpler to calculate I’m guessing he didn’t have computer power back then either.
It looks like the LIX formula is exactly what I have been looking for. I take a sample paragraph from my Russian copy of “The Hobbit”, count the sentences, count the words, count the long words (seven letters or more, so I don’t even have to work the number of syllables in each word), and plug
them into this formula:
LIX = (number of words)/( number of sentences) + (number of long words ) * 100% / (number of words)
Based on this text:
“Жил-был в норе под землей хоббит. Не в какой-то там
мерзкой грязной сырой норе, где со всех сторон торчат хвосты червей и
противно пахнет плесенью, но и не в сухой песчаной голой норе, где не
на что сесть и нечего съесть. Нет, нора была хоббичья, а значит –
Words: 49 Long Words: 11
Sentences: 3 Chars: 223″
LIX = 16.3 + 22.5 = 38.8
Is that high for Russian? Well, that’s a good question. To understand these results we need to calibrate them. Ideally we would run everything in the LingQ Russian library through this formula, and come out with a table, for each level from Beginner 1 to Advanced 1, of the representative LIX ranges. Then I could say, with an air of authority:
“This Russian translation of the Hobbit in Russian is written in low Intermediate 2 level language. A good Intermediate 2 student should be able to read it comfortably.”
I don’t think I can go that far, because it would require more time and a better grasp of statistics than I have at my disposal. Nevertheless, Ilya has very kindly written me a program which should calculate the LIX for any input language. I’m currently testing it with the first chapter of each of the 7 books in J. K. Rowling’s Harry Potter series, in each LingQ language. I shall report back on the results.