0

I'm using cygwin to connect to a tiny VM with limited RAM (512M).

Also, I'm trying to import to a sqlite3 db from a 4GB csv file and I don't have any clue on import, except 2 lines (8.717.201 total)

Seems that I have a control-m char (^M) on 2 lines, so it break csv format and fail to import.

When I try to use sed 's|,^M|,|' file.csv control-m char is write textual ASCII (2 chars), so it doesnt search-replace.

When I do it with a test file, opened in vi for search and replace, I can see that is write as code (blue colored ^M and act like a single char)

How can I fix the csv file? (or how I can write again the control-m sequence on cygwin?

Example problematic line:

$ cat -e test
keyword3,keyword1,keyword4$
keyword1,keyword2,keyword3^M$
,keyword4$
keyword5,keyword1,keyword2$

How should be:

$ cat -e test
keyword3,keyword1,keyword4$
keyword1,keyword2,keyword3,keyword4$
keyword5,keyword1,keyword2$

PS: As you can see, english is not my native language, so.. sorry for any mistake ¯_(ツ)_/¯

  • 1
    Related: What is ^M and how do I get rid of it?. In the Cygwin terminal, you should be able to use Ctrl-V then Enter. At least with GNU sed, you can also use \r in place of ^M – steeldriver Jun 08 '21 at 00:03
  • This recent question How to remove \n in a string is also relevant since you are both dealing with line endings that are (legally) embedded within [CT]SV files. – steeldriver Jun 08 '21 at 00:23
  • why comma in sed 's|,^M|,|' file.csv ? It should be sed 's|^M||' file.csv – matzeri Jun 09 '21 at 06:09
  • If the control character is only on two lines, as you say, it might make sense with such a large file to only do substitution on the affected lines, e.g. sed '1,2 s|^M||' if they were on first two lines. As for the first sentence of your question, is that related in any way? If that's a separate issue you should create a new question for it. – B Layer Jun 09 '21 at 21:39

2 Answers2

0

Actually, that carriage return helps you identify wrong line breaks:

s '/^M$/{N;s/^M\n//;}' test

As steeldriver wrote, you can usually produce that ^M by ctrlV followed by ctrlM.

The command means

  • /^M$/{...}: On lines with a page break at the end of a line execute commands in curly brackets
  • Next appends the next line to the buffer with the newline between the lines embedded
  • s/^M\n// substitutes the carriage return + newline with nothing (removes the line break)

This simple script assumes that a line is broken maximum one time. Otherwise you'd need something like

sed 'H;1h;$!d;x;s/^M\n//g' file
Philippos
  • 13,453
0

One of the method to obtain a ^M representing a new line (or enter) for replacement on sed or vi is to type :

ctrlV enter