Import Lesson - keep carriage returns/new lines for large amount of text

Trying to import all the subtitles from a Netflix movie.
This took a bit of time to figure out and now that I’m trying to import all the edited dialogue the lesson importer is jumbling my text together without carriage returns / new lines making it difficult to read.

Is there a way to import a large amount of text and retain the new lines?

Sample text:

0: Mais t’es parfaite.

1: Qu’est-ce qu’on peut faire de plusavec toi ?

2: [rires d’enfants]

3: On a la robe,mais la princesse est malade.

4: C’est pas vrai.

5: - Comment on va faire ?- On va la passer à une autre.

6: [rires d’enfants]

7: [maîtresse]Chut !

8: Qui veut porter le costumede Blanche-Neige ?

9: Moi !

10: [rires][Damien] J’avais cinq ans.

11: Mon cœur battait pour la plus belle de la classe qui ne m’avait jamais regardé.

12: Tu vas nous sauver ![les rires continuent]

13: Je sentais enfin ses yeux posés sur moi.

14: - [chuchoté] Vas-y. - C’était merveilleux.

15: [les applaudissements continuent]

16: [rires]

17: [rires moqueurs]

18: [les rires diminuent]

19: Elle s’appelait Aurore.

20: Et j’ai cru mourir.


2 Likes

Is retaining formatting an added cost for developers?

ref: Paste from Word | Docs | TinyMCE
ref: how to paste <pre> into tinymce and preserve the formatting? - Stack Overflow

1 Like

I usually just use sentence mode. You can prepend [] to lines which lingq will think is an empty timestamp, if I remember correctly. I don’t think you can control formatting in non-sentence mode, though.

Did you use the browser extension to import this content from Netflix? If imported using the extension, formatting should work better.

Never test, I don’t believe this would work.

Grabbing the subtitles from Netflix required scraping via the Developer Tools in Chrome.

ie Reddit - Dive into anything

May have found a solution.

  1. Copy from Textmate (the text editor I’m using to wrangle data)

  2. Paste into empty Google Docs

  3. Copy into Lesson input text field

1 Like

the lingq plugin does it automatically using an approach similar to that which GitHub - vanIvan/netflix-subtitle-loader: Chrome extension for loading subtitles from Netflix web series. uses where it grabs the webvtt-lssdh-ios8 subtitles.

1 Like

Oh wow! That is impressive!

LingQ should be advertising that feature for Netflix more!

1 Like

SOLVED: As mark and mescyn pointed out the LingQ Importer does a perfect job already grabbing the subtitles from Netflix, which is where the original text was coming from for me.

1 Like

Is retaining formatting of imported text planned at all?

We are looking at that. I assume you mean when using the extension?

Yes, I at least tried it out at asahi.com where all formatting is discarded

I managed to “fix” the problem by using the LingQ REST API to import the articles instead using a python script

We’ll see what we can do.