Solving Youtube Transcription End of Line problem

There is a little detail when you One-Click Import from Youtube which is the following:

The subtitles have an End of Line character very often splitting a sentence into two lines.

Currently, this prevents LinQ from translating and creating a LinQ from such sentences, even though they have fewer than 9 words.

A little crude way I’ve been importing longer videos to solve this problem is using Ctrl + H (Find and Replace) on a Document Editor.

1 - First you Copy and Paste the entire subtitle’s transcript on the Editor, without timestamps.

2 - Then you Find every End of Line character and Replace All of them with a Space character, make sure you have the Regular Expressions box checked, like so:

(the End of Line character is indicated by “$”

This is how the imported text looks after:

You can make the formatting even nicer using this method, For instance, by adding an End of Line after every Question or Sentence. That makes it more enjoyable and easy to read.




4 Likes

@happy wheels This is indeed the solution I seek, thanks for the guidance!

I agree this is the fast and dirty solution. But it really depends on the content you have. If you have content with punctuation, sure. Unfortunately, a lot of YouTube content has subtitles without punctuation. The real solution would be for LingQ to pass all the content through an AI, telling it to fix the spacing and punctuation. This, however, is no easy task.