Convert file contents to lower case

Question

I have temp file with some lower-case and upper-case contents.

Input

Contents of my temp file:

hi
Jigar
GANDHI
jiga

I want to convert all upper to lower.

Command

I tried the following command:

sed -e "s/[A-Z]/[a-z]/g" temp

but got wrong output.

Output

I want it as:

hi
jigar
gandhi
jiga

What needs to be in the substitute part of argument for sed?

See also How to convert UTF-8 txt files to all uppercase in bash? — Stéphane Chazelas, Dec 05 '14 at 13:32

score 174 · Accepted Answer · edited Sep 30 '19 at 06:06

174

If your input only contains ASCII characters, you could use tr like:

tr A-Z a-z < input

or (less easy to remember and type IMO; but not limited to ASCII latin letters, though in some implementations including GNU tr, still limited to single-byte characters, so in UTF-8 locales, still limited to ASCII letters):

tr '[:upper:]' '[:lower:]' < input

if you have to use sed:

sed 's/.*/\L&/g' < input

(here assuming the GNU implementation).

With POSIX sed, you'd need to specify all the transliterations and then you can choose which letters you want to convert:

sed 'y/AǼBCΓDEFGH.../aǽbcγdefgh.../' < input

With awk:

awk '{print tolower($0)}' < input

edited Sep 30 '19 at 06:06

Stéphane Chazelas

544,893

answered Dec 05 '14 at 07:43

Anthon

79,293

4

Please note that \L is a GNU extension. – Anthon Dec 05 '14 at 07:48
\L works good for me so far. En light the point that you are trying to make GNU extension – JigarGandhi Dec 05 '14 at 07:56
2

@JigarGandhi. sed is a Unix command. Different systems have different variants with different behaviour and functionality. Thankfully, nowadays, there's a standard that most conform to so you can count on a minimum set of features common to all. \L is not among them and was introduced by GNU sed (matches the same operator in standard ex/vi) and is generally not available in other implementations. – Stéphane Chazelas Dec 05 '14 at 13:25
A-Z matches the characters in between A to Z which unless you're in the C locale doesn't make much sense. – Stéphane Chazelas Dec 05 '14 at 13:27
11

Note that some tr implementations like GNU tr don't work properly in multi-byte locales (most of them are nowadays, try echo STÉPHANE | tr '[:upper:]' '[:lower:]' for instance). On GNU systems, you may prefer the sed variant or awk's tolower(). – Stéphane Chazelas Dec 05 '14 at 13:29
5

Slight correction: sed 's/.*/\L&/g' < input. The \1 reference to the matched substring won't work unless you specify the substring with parenthesis as wurtle does in his. However, it's slightly cleaner to use & to represent the whole match, as shown – Edward Brown Nov 04 '16 at 15:30
@EdwardBrown +1 but, also, the g isn't required, since .* picks up the whole line. – Marcelo Cantos Nov 20 '16 at 18:47
tr '[:upper:]' '[:lower:]' < input works a charm, might want to add the following to the command tr '[:upper:]' '[:lower:]' < input > output – MitchellK Apr 29 '18 at 11:10
I wanted this as a command i can use, so poped in a shell script: #!/bin/sh\ntr A-Z a-z in ~/bin and chmod +x – ThorSummoner Aug 30 '18 at 21:10
@MarceloCantos, . won't match non-characters, so .* would not match the whole line if it contained sequences of bytes that don't form valid characters in the locale. Try for instance printf 'ABC\200DEF\n' | sed 's/.*/\L&/' with GNU sed in a UTF-8 locale. – Stéphane Chazelas Sep 30 '19 at 06:10
On MacOS Catalina, tr '[:lower]' '[:upper:]' fails to convert a line containing only 7asdf to uppercase. Must be a word-level conversion. Interestingly tr a-z A-Z converts that line to uppercase, so it must be operating at character level. Important difference! – nclark Dec 30 '20 at 19:15

TankorSmash · Answer 2 · 2016-08-16T16:05:06.220

35

Using vim, it's super simple:

$ vim filename
gg0guGZZ

Opens the file, gg goes to the first line, 0, first column. With guG, lowers the case of all the characters until the bottom of the file. ZZ saves and exits.

It should handle just about anything you throw at it; it'll ignore numbers, it'll handle non ASCII.

If you wanted to do the opposite, turn the lower cased letters into upper case, swap the u out for a U: gg0gUGZZ and you're set.

edited Aug 16 '16 at 16:05

answered Dec 05 '14 at 18:20

TankorSmash

850
6
8

24

Lol "super simple" – blambert Feb 17 '17 at 21:50
this obviously doesn't scale well for many files – Corey Goldberg Nov 25 '17 at 13:40
2

@CoreyGoldberg vim file1 file2 fileetc and then something like :bufdo gg0guG:w<CR> would probably work for any number of files. Have not tested that though! – TankorSmash Apr 22 '18 at 00:18
@TankorSmash that still doesn't scale to a large number of files – Corey Goldberg Apr 22 '18 at 15:39

mikeserv · Answer 3 · 2019-09-30T02:13:41.620

19

I like dd for this, myself.

<<\IN LC_ALL=C 2<>/dev/null \
dd conv=lcase
hi
Jigar 
GANDHI
jiga
IN

...gets...

hi
jigar
ghandi
jiga

The LC_ALL=C is to protect any multibytes in input - though any multibyte capitals will not be converted. The same is true for (GNU) tr - both apps are prone to input mangling in any non-C locale. iconv can be combined with either for a comprehensive solution.

The 2>/dev/null redirect discards dd's default status report - and its stderr. Without it dd would follow completion of a job like the above w/ printing information like how many bytes were processed and etc.

edited Sep 30 '19 at 02:13

answered Dec 05 '14 at 09:54

mikeserv

58,310

This solution is way faster than tr when handling large files, thanks! – WhiteWinterWolf Aug 18 '16 at 13:02

MvG · Answer 4 · 2014-12-05T09:46:23.830

You can also use Perl 5:

perl -pe '$_=lc' temp

The option -p tells perl to run the specified expression once for each line of input, printing the result, i.e. the final value of $_. -e indicates that the program will be the next argument, as opposed to a file containing the script. lc converts to lowercase. Without an argument, it will operate on $_. And $_= saves that again so it will get printed.

A variation of that would be

perl -ne 'print lc' temp

Using -n is like -p except that $_ won't get printed in the end. So instead of saving to that variable, I'm including an explicit print statement.

One benefit of Perl in contrast to sed is that you don't need any GNU extensions. There are projects which have to be compatible with non-GNU environments but which also already have Perl a s a dependency. Compared with tr, it might be that Perl lc can be more easily made locale-aware. See the perllocale man page for details.

score 9 · Answer 5 · answered Dec 05 '14 at 07:46

9

You need to capture the matched pattern and then use it in the replacement with a modifier:

sed 's/\([A-Z]\)/\L\1/g' temp

The $...$ "captures" the enclosing matched text, the first capture goes to \1, the next to \2, etc. The numbering is according to opening brackets in case of nested captures.

The \L converts the captured pattern to lower case, there's also \U for upper case.

answered Dec 05 '14 at 07:46

wurtel

16,115

3

you dont need to do this - the whole pattern is always caught in & – mikeserv Dec 05 '14 at 09:41
True, but then I would have missed the opportunity to explain capturing matches :-) – wurtel Dec 05 '14 at 12:47

score 2 · Answer 6 · answered Mar 31 '18 at 10:09

2

Further to MvG's answer, you could also use Perl 6:

perl6 -pe .=lc temp

Here $_ is implicit, and you don't need the single quotes to protect it from expansion by the shell ($_ being a special Bash parameter; see: https://www.gnu.org/software/bash/manual/html_node/Special-Parameters.html)

answered Mar 31 '18 at 10:09

ozzy

845

The www.gnu.org link you provide does not list Perl6's (now Raku's) $_ "topic" variable as being used by Bash, at least not as a "Bash: Special Parameter". – jubilatious1 Nov 12 '21 at 07:57

score 1 · Answer 7 · answered Feb 21 '21 at 08:13

1

Using Emacs, you could first select all text in your buffer. Then invoke

M-x downcase-region

answered Feb 21 '21 at 08:13

nondeterministic

191

Convert file contents to lower case

Input

Command

Output

7 Answers7

Linked