I have a Perl script that parses data sent to me from a bunch of school districts. I'm adding a new school and have run into a problem I've never faced before. When I do $line = <INPUT>
, it slurps up the whole file instead of one line.
If I run file
on the file, it returns UTF-8 Unicode text, with CRLF, CR line terminators
. All my other files return ASCII text, with CRLF line terminators
. I've run it through dos2unix but it still operates as one long string. When I edit it in emacs, it still shows ^M for the line endings.
What can I do to convert these line endings into usable line endings?
Update: The vendor sent me another file with different line endings which still don't work. They report as CRLF, LF. I've extracted a few sample lines.
Here's some snippets from my code:
$line = <INPUT> if ($schooldistricts{$schooldistrict}{'header'});
LINE: foreach $line (<INPUT>) {
next LINE unless ($line =~ /\S/);
<do stuff>
}
The file does have a header which gets stripped off correctly. Then in the foreach loop it reads the first line successfully and then that's it -- it's like the rest of the file is empty.
I tried setting $/
to \r\n\n
but then the script does nothing. Same if I try \r\n
. Is there a way to definitively see what characters are encoded for the line ending?
Second update: As an experiment, I brought the file into Excel, split it out, and saved it as a tab-delimited file. On the server, I ran dos2unix. The Perl script still won't parse after the second line. File
now returns UTF-8 Unicode text, with CRLF line terminators
. That's the right line ending so that leaves Unicode as being the issue. Is there something different about how Unicode would encode the line endings?
od -c
, it should show CRs as\r
and LFs as\n
. And you can use the same escapes with e.g.printf
;printf 'one\rtwo\nthree\r\n'
would print stuff with three different CR/LF-combinations. (Also I'm not sure if you tried the solutions you got in answers.) – ilkkachu May 12 '23 at 19:34awk
,sed
,ruby
,raku
,python
, etc. also fit the bill? – jubilatious1 May 19 '23 at 21:52hexdump -C
and tell us what you see for line endings. Additionally, if you think your problem is Unicode-related have a look at: https://stackoverflow.com/q/13836352/7270649 – jubilatious1 May 23 '23 at 04:24