2

Sorry if it's answered somewhere, I have no idea how to look for it. I received a series of reports from a bank that I'm supposed to process and they seem to be... badly encoded?

First two lines in VIM:

     1 ^M^@
     2 ^@:^@2^@0^@:^@3^@0^@4^@0^@7^@1^@9^@^M^@

Same two lines in e.g. gedit:

     1 
     2 :20:3040719

Anyone can tell me what's going on? It doesn't matter if I open the file with fenc=utf8 or fenc=cp1250 (which is the encoding these files were supposed to be encoded with). I even tried fenc=ucs-bom because I thought it has something to do with endianness but it doesn't change anything either. I know ^@ is null and ^M is Windows style new line (CRLF) but changing between ff=dos and ff=unix doesn't matter either.

I have an older file from the same bank (before some changes they've introduced) and it works fine - file shows it's extended-ASCII while the broken file is shown as data:

$ file *sta
20220411_182719.sta: Non-ISO extended-ASCII text, with CRLF line terminators
20220412_071916.sta: data

I can replace those characters in VIM and process the file then but I need to automate this process for thousands of files a day with PHP and can't really use VIM. Ideally I'd like to just tell the bank support what they've messed up.

cprn
  • 1,025

1 Answers1

4

Ok, found it. It's UTF-16 Little Endian.

:e ++enc=utf16le

I can convert it properly now while processing.

cprn
  • 1,025