Would this be helping the community?

In a previous thread, I had mentioned how I got a Spanish frequency list and used google translate to translate a couple of thousand words in spanish and then uploaded some of them as a .csv list. Would I be helping if I did this for languages for which there are not a lot of already defined words that pop up when your lingqing. Or is the whole process of selecting the definition based on the context of the lesson your studying part more important? I know it saves a lot of time to have a definition already there. Any opinions of whether or not this would be worthwile? It really wouldn’t take that long to do this for a few thousand words.

I think it would help. Those who want to go through a searching process still would be able to do it if they wanted.

Please explain a little more about your process R.J. Do you have to copy each meaning for each word, or how do you get the Hint for each of thousands of words? I presume this process does not generate a phrase fragment.

We are going to ( I hope) launch the expanded lesson page this week, which makes LingQing a little easier. Let’s have a look at that and then decide whether your process is worth the effort. If it is, then it is something that we may want to do at LingQ, for all of the learners’ languages.

@ rjtrudel

Richard, “you wrote: In a previous thread, I had mentioned how I got a Spanish frequency list and used google translate to translate a couple of thousand words in spanish…I know it saves a lot of time to have a definition already there”.

I know it saves a lot of time because I use the system of that kind. Do you mean that you translate with Google only a subset of the words, say the less frequent words? Or, to the contrary, google tyranslates for you only the most frequent words? If you have already described it in the previous thread, where is that therad? Sorry for missing it, and thanks.

P.S. I have not yet seen Steve’s comment while typing mine. Steve and Mark have seen my system. It automatically hints the captions of the movies using the frequency lists and Google Translate. We discussed the possibilities to use similar ideas in LingQ just a month ago :wink:

@rjtrudel - I think that sounds like a good idea. After all, users have the option to use your suggested hint, choose another or create their own. Phrase fragments are taken from the text they are in.

(1)I got a list of 10,000 most common spanish words here Wiktionary:Frequency lists - Wiktionary. It has other languages as well, plus I know there are other places on the internet to find lists in many languages. Click on the 1-1000 part. (You would have to do this in batches of 1000.
(2) click on the part that says printable version, then copy the list,
(3) in openoffice (calc) past-special-html formatted in the first column
(4) delete all the cells which don’t have anything to do with the list of words
I ended up with 1000 most common spanish words in the first column of an openoffic document.
(5) save in .xls format (excel 5.0). click keep current format.
(6) go to translate.google.com, click on translate a document.
(7) browse where you saved your document and then click translate. A window pops up with a table of 1000 translated words.
(8) copy all these then paste into openoffice document. delete the column that has numbers 1-1000 in it. So at this point it looks like this http://db.tt/YG5rT8B .The only minor problem is that the spanish word is listed in the translation as well.
(9) set the columns up to prepare for the upload to lingq. For phrase fragment column i Just put none in every text cell. (So you are right there are no phrase fragments)
(10) save as a .csv file. Then upload. I’m not sure if your system has a limit of how many you can upload at once. I think I did 400 a couple of times.

It took me much longer to type that than to actually do it. So its pretty quick.

Thank you very much. It looks like a great idea.

@IIya_L are you using subs2srs? I’m using that to use the closed captioning/subtitles to load up movies with translations line by line with that. Check out the results http://db.tt/m930Kav . Just load these up your ipod and each line from the movie “Death Becomes Her” will be played one at a time with the spanish lyrics on top, english translations (by google) on bottom. How does your system work? I have also done this for the material on lingq. Here is the manifesto in spanish line by line with english translation… http://db.tt/Lfemo6c

Hi Richard,

Thanks very much for the detailed description of your “pre-hinting” process and the links related to subs2srs. It is very interesting and seems straightforward. I am sure LingQ would just win from implementing your suggestions internally, for every language.

I am currently not using subs2srs (but try keep an eye open on such things).
Subs2srs is, as far as I understand, is mostly to help create rich flash cards, with hints/translations in various languages and audio/video fragments. Please correct me if I am wrong.

Our system is mostly a regular movie player with a special functionality destined for the learner. The player also allows to create flash cards from the movies it plays, and to play the cards internally or export them, just as a fraction of its functionally.

Whether it plays the movies or the cards, the player can display the subtitles on the language it plays, and, optionally, a controlable amount of the hints to the subtitles. The “controlable amount” of the hints eventually boils down to that only the hints the user needs to are displayed automatically , in the vicinity of the words the hint, whilst still any hint can be invoked “on user demand”. The hints can be displayed on any language supported by Google Translate. There is a mixture of the open source and a proprietary software and algorithms behind it.

To display our subtitles, I do not use the known subtitle formats (such as those used in subs2srs), because I have chosen to have interactive, “click-able” subtitles. Many of the player controls are provided within these non-standard interactive subtitles. For instance, by clicking near a subtitle you would replay the corresponding sound, and by clicking a word without the ( automatically shown) hint you would see the hint down the clicked word.

And of course, we want to cooperate with LingQ - ;). The hope is that the player will be widely available in the next year.

@ Richard and everybody. I am also begin to apreciate parallel subtitles as a nice accelerator i the learning. It indeed speeds you up in a new language if you are already know the movie in another language (and still like it -:slight_smile:

Here for example is a link to the site with the subtitles to popular feature length movies, some of these movies are subtitled in tenths of languages:

http://subscene.com/

However, I am surprised to find out that the audio tracks in two languages may differ significantly. Much more than the textual subtitles in the same two languages. Looks like the subtitles are translated more closely to the originals than the corresponding audios tracks are dubbed…

Does anybody aware of interesting films with the two or more audio tracks being much close in the meaning?

As two movies, nowadays it seems to nothing becoming cheaper then to get any movie at your home. Much cheaper than pitza! Yesterday I googled “legal movie downloads” ( not bit torrent, not p2p ) and then studied the site www.hippomovies.com.

I paid 99 cents for their 24 hour trial. The trail allowed me to download 10 feature length English movies, all from the list I wanted, in a viraity of formats. The formats were DivX, DVD and better (HD). (All the formats have played nicely in our player!) I believe hippomovie just cheats the users they cant be legal for that price ;-). But the user won’t care.

I forgot to mention that the movies downloaded during the 24 hour trial remain with the user (workable) forever. Unlike to the more famous services of cinemanow.com, apple, netflix.com a few others. (And those famous services are allso more expensive ).

Is anybody aware of other legal movie dowloads services, especially with not only English movies? Sorry for becoming off-topic, and thanks.

Hehe Yeah that hippomovies.com doesn’t look too legal to me. Especially since it has new movies available that are still in theaters and what not. Have you tried movies from Itunes? My friend uses it alot Its suppose to be ok. Not sure about Non english movies though.

I’m sure If you want to find movies in your target language most countries seem to have legal websites with streaming content that is either free to sign up or you have to pay a membership. Problem is Most of the streaming stuff isn’t downloadable ( unless you have some ripping software to take download it). However you can usually watch it as much as you want. For korean theres lots of free legal content IF you sign up to the right websites but its a bit hard without having korean connections. If i were studying any other language other than korean I’d ask a Native of the target language to find you some legal websites you can sign up to. Lots of good free media out there.

@ilya. Hbo latino dubs their movies in spanish and the audio exactly matches the closed captioning. i use ccextractor to create srt subtitles from the closed captioning, then opensubtitle translator to translate to english via google translate. Thats how i made the death becomes her file. Did you download it? Also, i would be interested in getting a hold of your player. Is it for sale? Is it similar to the yabla player at http://www.lomastv.com ?

Here’s the link of the demo of the yabla player for lomastv. http://bit.ly/cJebv3

@ keroro and everybody:
Thanks a lot for the suggestions, I will be looking at it.

@ Richard.
Thank you RJ, great!

=>“Thats how i made the death becomes her file. Did you download it?”
I downloaded it yesterday on my PC mashine (not ipod as you had recommended). I have got a sequence of mp3 files and played them one by one with the VLC player. The sound fragments were good. In a haste, I have seen neither the subtitles nor the subtitle files, though I wanted to see them. I shall look at it again in the evening.

There must be a subtle problem if one relies on the timing of each caption, as specified in an arbitrary subtitle file (the timing which the subs2srs seems to extract). The timing would but loosely delineate the actuals starts and ends of the corresponding sound fragments. It may be especially notecible if you extract the sound fragments in a natural or fast speech, where caption would follow soon one after one. (The traditional captioning standards were designed for the “hearing disabled” people after all, who didn’t care about the sound fragments ;-).). Have you noticed such effect (not necessarily in your “death becomes her” files)? What do you think about it?

=>“Also, i would be interested in getting a hold of your player”. Is it for sale?"
We shall sent it you free you when it is posible. Currently it is not sent neither sold. I had to come to Vancouver to show it Steve and Mark in person. (And I greatly enjoyed our discussian and am thankful to them for it)

=>“Is it similar to the yabla player at http://www.lomastv.com ?”
At least yabla is the closest of what I know of. I do not know many like yabla. Do you or anyone else knows ? Yabla currently offers a short 3-5 min length movies, predominantly of an amatur U-Tube level. Our player is designed to handle also the commercial full feature length films, allow the full screen, television and HiDef quality. Such films as e.g. Harry Porter(s), The Godfather, 12 Angry Men, poular TV shows.

@Pierre. I also like and choose films with a lot of talking. Though our player can skip the fragments without talking, if you wish.

=>“I think the difference … come from the need to more or less makes the words fit with the mouth moves…”
I think the same. It must be much more dificult to make an exact good dub than an exact good translation of the textual caption, that’s why the dubs turn out to be less exact. and I agree the dubs and the translations may not be exact.

@IIya_L
“I downloaded it yesterday on my PC mashine (not ipod as you had recommended). I have got a sequence of mp3 files and played them one by one with the VLC player. The sound fragments were good. In a haste, I have seen neither the subtitles nor the subtitle files, though I wanted to see them. I shall look at it again in the evening.”

VLC doesn’t support the common id3 tags for lyrics (Pretty much all mp3 players do though" There is a plug in for it somewhere. Thats why I want you to check it out on your ipod if you get a chance. Remember lyrics aren’t turned on by default, while the mp3 is playing you need to touch the center of the screen where the lyric not is. Heres a before http://db.tt/KxqBZ8B and after http://db.tt/VAGzRG3 .

"There must be a subtle problem if one relies on the timing of each caption, as specified in an arbitrary subtitle file (the timing which the subs2srs seems to extract). The timing would but loosely delineate the actuals starts and ends of the corresponding sound fragments. It may be especially notecible if you extract the sound fragments in a natural or fast speech, where caption would follow soon one after one. (The traditional captioning standards were designed for the “hearing disabled” people after all, who didn’t care about the sound fragments ;-).). Have you noticed such effect (not necessarily in your “death becomes her” files)? What do you think about it? "

As for the timing issue. HBO Latino does a fantastic job with timing what is said to what is onscreen. Also, subs2srs has a tool where you can offset the time of individual files or every file. I usally have to offset about a half a second, plus theres a preview mode, so it tests the effects of the offset in seconds. In other words you don’t have to make all the files and then realize it was off. It works unbelievably well. Occasionally, a sentence will be off a fraction of a second but it still works great. Also, occasionally the ipod will stay on a file for a microsecond too long, but again it works great.

update
I updated my original instructions on how to import word frequency lists into the vocab section of lingq. When I was originally doing it I tried to google translate the .csv file. That did not work. You DO NOT need microsoft office to do this. OpenOffice works by itself.

(1)I got a list of 10,000 most common spanish words here Wiktionary:Frequency lists - Wiktionary. It has other languages as well, plus I know there are other places on the internet to find lists in many languages. Click on the 1-1000 part. (You would have to do this in batches of 1000 or less depending on how many lingqs the system can handle
(2) click on the part that says printable version, then copy the list,
(3) in openoffice (calc) past-special-html formatted in the first column
(4) delete all the cells which don’t have anything to do with the list of words
I ended up with 1000 most common spanish words in the first column of an openoffic document.
(5) go to translate.google.com, click on translate a document.
(6) browse where you saved your document and then click translate. A window pops up with a table of 1000 translated words.
(7) copy all these then paste-special-html into openoffice document. delete the column that has numbers 1-1000 in it. So at this point it looks like this http://db.tt/YG5rT8B .The only minor problem is that the spanish word is listed in the translation as well.
(8) set the columns up to prepare for the upload to lingq. column one-spanish word, column two-translation/hint column 3-the word “none” (for every cell)
(9) save as a .csv file. Then upload. I’m not sure if your system has a limit of how many you can upload at once. I think I did 400 a couple of times.
When you are doing languages that have non-english scripts you need to find what encoding works for what language. For Russian I used Cryllic ISO 8859-5 and it (seemed) to work.

Thank you a lot for great idea!
I have done it for my personal English-Russian dictionary.
http:///www.travmatik.com/10000_english_words.xls