94

I have a bash script which enumerates through every *.php file in a directory and applies iconv to it. This gets output in STDOUT.

Since adding the -o parameter ( in my experience ) actually writes a blank file probably before the conversion takes place, how can I adjust my script so it does the conversion, then overwrites the input file?

for file in *.php
do
    iconv -f cp1251 -t utf8 "$file"
done

7 Answers7

94

This isn't working because iconv first creates the output file (since the file already exists, it truncates it), then starts reading its input file (which is now empty). Most programs behave this way.

Create a new, temporary file for the output, then move it into place.

for file in *.php
do
    iconv -f cp1251 -t utf8 -o "$file.new" "$file" &&
    mv -f "$file.new" "$file"
done

If your platform's iconv doesn't have -o, you can use a shell redirection to the same effect.

for file in *.php
do
    iconv -f cp1251 -t utf8 "$file" >"$file.new" &&
    mv -f "$file.new" "$file"
done

Colin Watson's sponge utility (included in Joey Hess's moreutils) automates this:

for file in *.php
do
    iconv -f cp1251 -t utf8 "$file" | sponge "$file"
done

This answer applies not just to iconv but to any filter program. A few special cases are worth mentioning:

  • GNU sed and Perl -p have a -i option to replace files in place.
  • If your file is extremely large, your filter is only modifying or removing some parts but never adding things (e.g. grep, tr, sed 's/long input text/shorter text/'), and you like living dangerously, you may want to genuinely modify the file in place (the other solutions mentioned here create a new output file and move it into place at the end, so the original data is unchanged if the command is interrupted for any reason).
  • 3
    I'm not quite sure whether the authorship of sponge should be attributed exclusively to Joey Hess; it's the package moreutils that includes sponge that he maintains, but as regards the origin of sponge, by following the links from the homepage of moreutils, I've found it originally posted and suggested for inclusion by Colin Watson: "Joey writes about the lack of new tools that fit into the Unix philosophy. My favourite of such things I've written is sponge" (Mon, 06 Feb 2006). – imz -- Ivan Zakharyaschev Mar 30 '11 at 14:13
  • 4
    I use Mac OS, there is no -o option in iconv, I have to change iconv -f cp1251 -t utf8 -o "$file.new" "$file" to iconv -f cp1251 -t utf8 "$file" > "$file.new" – code4j Sep 06 '14 at 20:17
  • Some commands, like sort, are pretty smart concerning -o parameter, and if they detect output file is the same as input they internally manage a temp file so it just works. – jesjimher Apr 18 '18 at 12:55
71

An alternative is recode, which uses the libiconv library for some conversions. Its behavior is to replace the input file with the output, so this will work:

for file in *.php
do
    recode cp1251..utf8 "$file"
done

As recode accepts multiple input files as parameter, you can spare the for loop:

recode cp1251..utf8 *.php
manatwork
  • 31,277
  • 3
    Thanks, this deserves more upvotes. Just wondering where is stared in manual about the 2 dots between the encodings... – neurino Nov 20 '12 at 21:14
  • 2
    “REQUEST often looks like BEFORE..AFTER, with BEFORE and AFTER being charsets.” That manual is indeed hard to follow with all those double dots (which are part of the syntax) and triple dots (which mean more of this). An advice: try info recode instead. Is more verbose. – manatwork Nov 21 '12 at 06:46
  • Note that recode program expects cp1251 encoded files to have CR-LF endings. If not, you have to run unix2dos program first. – AleXoundOS Apr 23 '20 at 01:16
  • Perfect! And working also with classical problem of Windows monopoly and non-cumpliance recode WINDOWS1252..UTF8 *.csv – Peter Krauss Jun 03 '20 at 15:00
4

For now

find . -name '*.php' -exec iconv -f CP1251 -t UTF-8 {} -o {} \;

works like a charm

  • 8
    At first, I indeed thought it works. But it appears the output exceeding 32K is cut off, and with even more input it triggers core dumps. – x-yuri Dec 18 '14 at 19:31
2

You can use find, at least this worked for me on Raspbian Stretch:

find . -type f -name '*php' -execdir iconv -f cp1251 -t UTF-8 '{}' -o '{}'.tmp \; -execdir mv '{}'.tmp '{}' \;
jesse_b
  • 37,005
rannala
  • 21
1

Here is a simple example. It should give you a enough info to get started.

#!/bin/bash
#conversor.sh
#Author.....: dede.exe
#E-mail.....: dede.exe@gmail.com
#Description: Convert all files to a another format
#             It's not a safe way to do it...
#             Just a desperate script to save my life...
#             Use it such a last resort...

to_format="utf8"
file_pattern="*.java"

files=`find . -name "${file_pattern}"`

echo "==================== CONVERTING ===================="

#Try convert all files in the structure
for file_name in ${files}
do
        #Get file format
        file_format=`file $file_name --mime-encoding | cut -d":" -f2 | sed -e 's/ //g'`

        if [ $file_format != $to_format ]; then

                file_tmp="${unit_file}.tmp"

                #Rename the file to a temporary file
                mv $file_name $file_tmp

                #Create a new file with a new format.
                iconv -f $file_format -t $to_format $file_tmp > $file_name

                #Remove the temporary file
                rm $file_tmp

                echo "File Name...: $file_name"
                echo "From Format.: $file_format"
                echo "To Format...: $to_format"
                echo "---------------------------------------------------"

        fi
done;
slm
  • 369,824
1

You can use Vim in Ex mode:

ex -sc '%!iconv -f cp1251 -t utf8' -cx "$file"
  1. % select all lines

  2. ! run command

  3. x save and close

Zombo
  • 1
  • 5
  • 44
  • 63
1

One option is to use perl's interface to iconv and its -i mode for inplace editing:

perl -MText::Iconv -i -pe '
  BEGIN{$i=Text::Iconv->new(qw(cp1252 UTF-8));$i->raise_error(1)}
  $_ = $i->convert($_)' ./*.php

With GNU awk, you can also do something like¹:

gawk -v cmd='iconv -f cp1252 -t utf-8' -i /usr/share/awk/inplace.awk '
  {print | cmd}; ENDFILE {close(cmd)}' ./*.php

The ksh93 shell also has a >; operator for that which stores the output in a temp file which is renamed to the redirected file if the command was successful:

for f in *.php; do
  iconv -f cp1252 -t utf-8 < $f >; $f
done

¹ do not use -i inplace as gawk tries to load the inplace extension (as inplace or inplace.awk) from the current working directory first, where someone could have planted malware. The path of the inplace extension supplied with gawk may vary with the system, see the output of gawk 'BEGIN{print ENVIRON["AWKPATH"]}'