Convert underscore to PascalCase, ie UpperCamelCase

Question

If I have a string that looks like this:

"this_is_the_string"

Inside a bash script, I would like to convert it to PascalCase, ie UpperCamelCase to look like this:

"ThisIsTheString"

I found that converting to lowerCamelCase can be done like this:

"this_is_the_string" | sed -r 's/([a-z]+)_([a-z])([a-z]+)/\1\U\2\L\3/'

Unfortunately I am not familiar enough with regexes to modify this.

(1) This doesn’t really matter, as far as this question (and the answers presented so far) are concerned, but, FYI, \U\2 inserts the found text from the second group, converted to ALL CAPS. Compare to \u\2, which inserts the text in Sentence case, with only the first character capitalized. (2) All of the examples given below will translate “this_is_a_string” to “ThisIsAString” — which is what you asked for, but is slightly hard to read. You might want to revise your requirements for the special case of a one-letter word (substring). … (Cont’d) — Scott - Слава Україні, Apr 14 '15 at 19:58
(Cont’d) … (3) Do you have only one such string per line? And is it always the first (or the only) text on the line? If you have a string that’s not at the beginning of the line, the below answers will convert it to lowerCamelCase. To fix, take Janis’s answer and change (^|_) to (\<|_). — Scott - Слава Україні, Apr 14 '15 at 19:58
inverse: http://stackoverflow.com/questions/28795479/awk-sed-script-to-convert-a-file-from-camelcase-to-underscores — Ciro Santilli OurBigBook.com, Feb 01 '16 at 17:06

Janis · Accepted Answer · 2015-04-14T19:46:58.547

57

$ echo "this_is_the_string" | sed -r 's/(^|_)([a-z])/\U\2/g'            
ThisIsTheString

Substitute pattern
(^|_) at the start of the string or after an underscore - first group
([a-z]) single lower case letter - second group
by
\U\2 uppercasing second group
g globally.

edited Apr 14 '15 at 19:46

answered Apr 14 '15 at 19:09

Janis

14,222

7

Note: \U is a GNU extension to POSIX. – Ciro Santilli OurBigBook.com Nov 19 '17 at 10:47
2

Just a note, you should capture numbers too sed -r 's/(^|[-_ ]+)([0-9a-z])/\U\2/g'. So strings like "this_is_2nd_string" work too. – pinkeen Jul 01 '19 at 23:43
7

How can I achieve this with non-GNU sed? – Cameron Hudson Feb 14 '20 at 19:26
2

not working well on mac ~$ bash --version GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin21) Copyright (C) 2007 Free Software Foundation, Inc. ~$ ~$ echo "this_is_the_string" | sed -r 's/(^|_)([a-z])/\U\2/g' UthisUisUtheUstring – Nir O. Mar 26 '23 at 15:58

don_crissti · Answer 2 · 2015-04-16T21:58:53.143

Since you're using bash, if you stored your string in a variable you could also do it shell-only:

uscore="this_is_the_string_to_be_converted"
arr=(${uscore//_/ })
printf %s "${arr[@]^}"
ThisIsTheStringToBeConverted

${uscore//_/ } replaces all _ with space, (....) splits the string into an array, ${arr[@]^} converts the first letter of each element to upper case and then printf %s .. prints all elements one after another.
You can store the camel-cased string into another variable:

printf -v ccase %s "${arr[@]^}"

and use/reuse it later, e.g.:

printf %s\\n $ccase
ThisIsTheStringToBeConverted

Or, with zsh:

uscore="this_is_the_string_to_be_converted"
arr=(${(s:_:)uscore})
printf %s "${(C)arr}"
ThisIsTheStringToBeConverted

(${(s:_:)uscore}) splits the string on _ into an array, (C) capitalizes the first letter of each element and printf %s ... prints all elements one after another..
To store it in another variable you could use (j::) to joins the elements:

ccase=${(j::)${(C)arr}}

and use/reuse it later:

printf %s\\n $ccase
ThisIsTheStringToBeConverted

This seems a great solution, but unfortunately doesn't work on mac whose bash version is stuck at 3.2.57 because of license issues. — wlnirvana, Aug 05 '20 at 13:51
@wlnirvana, AFAIK macOS has always come with zsh (even used to be /bin/sh there and it's the default interactive shell in newer versions I'm told) where it's just ${(j[])${(s[_]C)string}} or ${${(C)string}//_} — Stéphane Chazelas, Nov 01 '20 at 16:15

score 13 · Answer 3 · answered Apr 14 '15 at 19:37

13

Here's a Perl way:

$ echo "this_is_the_string" | perl -pe 's/(^|_)./uc($&)/ge;s/_//g'
ThisIsTheString

It can deal with strings of arbitrary length:

$ echo "here_is_another_larger_string_with_more_parts" | 
    perl -pe 's/(^|_)./uc($&)/ge;s/_//g'
HereIsAnotherLargerStringWithMoreParts

It will match any character (.) that comes after either the start of the string or an underscore ((^|_)) and replace it with the upper case version of itself (uc($&)). The $& is a special variable that contains whatever was just matched. The e at the end of s///ge allows the use of expressions (the uc() function in this case) within the substitution and the g makes it replace all occurrences in the line. The second substitution removes the underscores.

answered Apr 14 '15 at 19:37

terdon

242,166

1

Speaking of perl, there's also a perl module String::CamelCase that "camelizes" underscored text. – don_crissti Apr 15 '15 at 12:01
@don_crissti ooh, sounds perfect for this. Thanks. – terdon Apr 15 '15 at 12:06
Shorter Perl: perl -pe 's/(^|_)([a-z])/uc($2)/ge' – Jan 12 '18 at 22:29
Or: perl -pe's/_*([^_]*)/\u$1/g' – Stéphane Chazelas Nov 01 '20 at 15:07
and how dow we assign the output to another variable? To call it without echo? (sorry a newbie) – Rahul Gandhi Jul 29 '21 at 15:12
1

@RahulGandhi please see How can I assign the output of a command to a shell variable? – terdon Jul 29 '21 at 15:14

score 6 · Answer 4 · edited Apr 15 '15 at 11:51

6

It is not necessary to represent the entire string in a regular expression match -- sed has the /g modifier that allows you to walk over multiple matches and replace each of them:

echo "this_is_the_string" | sed 's/_\([a-z]\)/\U\1/g;s/^\([a-z]\)/\U\1/g'

The first regex is _$[a-z]$ -- each letter after underscore; the second one matches the first letter in a string.

edited Apr 15 '15 at 11:51

Community

1

answered Apr 14 '15 at 19:08

myaut

1,431

ctrl-alt-delor · Answer 5 · 2015-04-14T21:25:44.733

6

I only put in this answer because it is shorter and simpler than any other so far.

sed -re "s~(^|_)(.)~\U\2~g"

It says: upcase, the character following a _ or the start. Non letters will not be changed, as they have no case.

edited Apr 14 '15 at 21:25

answered Apr 14 '15 at 21:18

ctrl-alt-delor

27,993

1

"Everything should be made as simple as possible, but not simpler." – Albert Einstein. This is not equivalent to the other answers; your answer will convert "FOO_BAR" to "FOOBAR", while the other answers will leave it alone. – Scott - Слава Україні Apr 14 '15 at 21:51
@scott Ah yes, I did not think of that. – ctrl-alt-delor Apr 14 '15 at 21:56
1

@Scott Isn't that the desired behavior? I guess that ideally, it should become FooBar but the underscore should be removed as per instructions. As I understand the instructions anyway. – terdon Apr 15 '15 at 10:24
@terdon: “Isn’t that the desired behavior?” (1) I don’t know. And I don’t believe that we can know what the OP wants unless he tells us; the question is insufficiently explicit. (2) I occasionally chastise people for making unwarranted assumptions about the potential input from the example(s) presented. But, considering that the question is about case conversion, I believe it’s valid to extrapolate (from the fact that the example is all lower case) to the assumption that the OP wants to manipulate only lower case strings. … (Cont’d) – Scott - Слава Україні Apr 16 '15 at 04:33
2

(Cont’d) … (3) I think it’s somewhat clear that the spirit of the question is to transform a string so that word breaks indicated by underscores (_) are instead indicated by case transitions. Given that, “FOO_BAR” → “FOOBAR” is clearly wrong (as it discards the word break information), although “FOO_BAR” → “FooBar” may be correct. (4) Similarly, a mapping that causes collisions seems to be contrary to the spirit of the question. For example, I believe that an answer that converts “DO_SPORTS” and “DOS_PORTS” to the same target is wrong. – Scott - Слава Україні Apr 16 '15 at 04:34
1

(Cont’d again) … (5) In the spirit of not causing collisions, it seems to me that “foo_bar” and “FOO_BAR” should not map to the same thing, so therefore I object to “FOO_BAR” → “FooBar”. (6) I think the bigger issue is namespaces. I haven’t programmed in Pascal since Blaise was alive, but in C/C++, by convention, identifiers that are primarily in lower case (to include snake_case and CamelCase) are generally the domain of the compiler, while identifiers in upper case are the domain of the pre-processor. So that’s why I think that the OP didn’t want ALL_CAPS identifiers to be considered. – Scott - Слава Україні Apr 21 '15 at 05:05

score 4 · Answer 6 · answered Sep 26 '18 at 21:22

In perl:

$ echo 'alert_beer_core_hemp' | perl -pe 's/(?:\b|_)(\p{Ll})/\u$1/g'
AlertBeerCoreHemp

This is also i18n-able:

$ echo 'алерт_беер_коре_хемп' | perl -CIO -pe 's/(?:\b|_)(\p{Ll})/\u$1/g'
АлертБеерКореХемп

score 1 · Answer 7 · answered Sep 27 '19 at 19:29

1

I did it this way:

echo "this_is_the_string" | sed -r 's/(\<|_)([[:alnum:]])/\U\2/g'

and got this result:

ThisIsTheString

answered Sep 27 '19 at 19:29

Fábio Roberto Teodoro

151

score 0 · Answer 8 · answered Nov 01 '20 at 12:15

0

My choice is:

echo "this_is-the_string-2.0" |  perl -pe 's/(?:^|[^a-z])([a-z0-9])/\u$1/g'

Which results in:

ThisIsTheString20

answered Nov 01 '20 at 12:15

drAlberT

101

1

Or perl -pe 's/([a-z0-9]+)|./\u$1/g' – Stéphane Chazelas Nov 01 '20 at 15:10
nice, but I find it a bit cryptic in fact – drAlberT Nov 01 '20 at 15:33

Convert underscore to PascalCase, ie UpperCamelCase

8 Answers8

Linked

Related