38

If I have a string that looks like this:

"this_is_the_string"

Inside a bash script, I would like to convert it to PascalCase, ie UpperCamelCase to look like this:

"ThisIsTheString"

I found that converting to lowerCamelCase can be done like this:

"this_is_the_string" | sed -r 's/([a-z]+)_([a-z])([a-z]+)/\1\U\2\L\3/'

Unfortunately I am not familiar enough with regexes to modify this.

  • (1) This doesn’t really matter, as far as this question (and the answers presented so far) are concerned, but, FYI, \U\2 inserts the found text from the second group, converted to ALL CAPS.  Compare to \u\2, which inserts the text in Sentence case, with only the first character capitalized.  (2) All of the examples given below will translate “this_is_a_string” to “ThisIsAString” — which is what you asked for, but is slightly hard to read.  You might want to revise your requirements for the special case of a one-letter word (substring).  … (Cont’d) – Scott - Слава Україні Apr 14 '15 at 19:58
  • (Cont’d) …  (3) Do you have only one such string per line?  And is it always the first (or the only) text on the line?  If you have a string that’s not at the beginning of the line, the below answers will convert it to lowerCamelCase.  To fix, take Janis’s answer and change (^|_) to (\<|_). – Scott - Слава Україні Apr 14 '15 at 19:58
  • 1
    inverse: http://stackoverflow.com/questions/28795479/awk-sed-script-to-convert-a-file-from-camelcase-to-underscores – Ciro Santilli OurBigBook.com Feb 01 '16 at 17:06

8 Answers8

57
$ echo "this_is_the_string" | sed -r 's/(^|_)([a-z])/\U\2/g'            
ThisIsTheString

Substitute pattern
(^|_) at the start of the string or after an underscore - first group
([a-z]) single lower case letter - second group
by
\U\2 uppercasing second group
g globally.

Janis
  • 14,222
15

Since you're using bash, if you stored your string in a variable you could also do it shell-only:

uscore="this_is_the_string_to_be_converted"
arr=(${uscore//_/ })
printf %s "${arr[@]^}"
ThisIsTheStringToBeConverted

${uscore//_/ } replaces all _ with space, (....) splits the string into an array, ${arr[@]^} converts the first letter of each element to upper case and then printf %s .. prints all elements one after another.
You can store the camel-cased string into another variable:

printf -v ccase %s "${arr[@]^}"

and use/reuse it later, e.g.:

printf %s\\n $ccase
ThisIsTheStringToBeConverted

Or, with zsh:

uscore="this_is_the_string_to_be_converted"
arr=(${(s:_:)uscore})
printf %s "${(C)arr}"
ThisIsTheStringToBeConverted

(${(s:_:)uscore}) splits the string on _ into an array, (C) capitalizes the first letter of each element and printf %s ... prints all elements one after another..
To store it in another variable you could use (j::) to joins the elements:

ccase=${(j::)${(C)arr}}

and use/reuse it later:

printf %s\\n $ccase
ThisIsTheStringToBeConverted
don_crissti
  • 82,805
  • 1
    This seems a great solution, but unfortunately doesn't work on mac whose bash version is stuck at 3.2.57 because of license issues. – wlnirvana Aug 05 '20 at 13:51
  • @wlnirvana, AFAIK macOS has always come with zsh (even used to be /bin/sh there and it's the default interactive shell in newer versions I'm told) where it's just ${(j[])${(s[_]C)string}} or ${${(C)string}//_} – Stéphane Chazelas Nov 01 '20 at 16:15
13

Here's a Perl way:

$ echo "this_is_the_string" | perl -pe 's/(^|_)./uc($&)/ge;s/_//g'
ThisIsTheString

It can deal with strings of arbitrary length:

$ echo "here_is_another_larger_string_with_more_parts" | 
    perl -pe 's/(^|_)./uc($&)/ge;s/_//g'
HereIsAnotherLargerStringWithMoreParts

It will match any character (.) that comes after either the start of the string or an underscore ((^|_)) and replace it with the upper case version of itself (uc($&)). The $& is a special variable that contains whatever was just matched. The e at the end of s///ge allows the use of expressions (the uc() function in this case) within the substitution and the g makes it replace all occurrences in the line. The second substitution removes the underscores.

terdon
  • 242,166
6

It is not necessary to represent the entire string in a regular expression match -- sed has the /g modifier that allows you to walk over multiple matches and replace each of them:

echo "this_is_the_string" | sed 's/_\([a-z]\)/\U\1/g;s/^\([a-z]\)/\U\1/g'

The first regex is _\([a-z]\) -- each letter after underscore; the second one matches the first letter in a string.

myaut
  • 1,431
6

I only put in this answer because it is shorter and simpler than any other so far.

sed -re "s~(^|_)(.)~\U\2~g"

It says: upcase, the character following a _ or the start. Non letters will not be changed, as they have no case.

  • 1
    "Everything should be made as simple as possible, but not simpler." – Albert Einstein.  This is not equivalent to the other answers; your answer will convert "FOO_BAR" to "FOOBAR", while the other answers will leave it alone. – Scott - Слава Україні Apr 14 '15 at 21:51
  • @scott Ah yes, I did not think of that. – ctrl-alt-delor Apr 14 '15 at 21:56
  • 1
    @Scott Isn't that the desired behavior? I guess that ideally, it should become FooBar but the underscore should be removed as per instructions. As I understand the instructions anyway. – terdon Apr 15 '15 at 10:24
  • @terdon: “Isn’t that the desired behavior?”  (1) I don’t know.  And I don’t believe that we can know what the OP wants unless he tells us; the question is insufficiently explicit.  (2) I occasionally chastise people for making unwarranted assumptions about the potential input from the example(s) presented.  But, considering that the question is about case conversion, I believe it’s valid to extrapolate (from the fact that the example is all lower case) to the assumption that the OP wants to manipulate only lower case strings.  … (Cont’d) – Scott - Слава Україні Apr 16 '15 at 04:33
  • 2
    (Cont’d) …  (3) I think it’s somewhat clear that the spirit of the question is to transform a string so that word breaks indicated by underscores (_) are instead indicated by case transitions.  Given that, “FOO_BAR” → “FOOBAR” is clearly wrong (as it discards the word break information), although “FOO_BAR” → “FooBar” may be correct.  (4) Similarly, a mapping that causes collisions seems to be contrary to the spirit of the question.  For example, I believe that an answer that converts “DO_SPORTS” and “DOS_PORTS” to the same target is wrong. – Scott - Слава Україні Apr 16 '15 at 04:34
  • 1
    (Cont’d again) …  (5) In the spirit of not causing collisions, it seems to me that “foo_bar” and “FOO_BAR” should not map to the same thing, so therefore I object to “FOO_BAR” → “FooBar”.  (6) I think the bigger issue is namespaces.  I haven’t programmed in Pascal since Blaise was alive, but in C/C++, by convention, identifiers that are primarily in lower case (to include snake_case and CamelCase) are generally the domain of the compiler, while identifiers in upper case are the domain of the pre-processor.  So that’s why I think that the OP didn’t want ALL_CAPS identifiers to be considered. – Scott - Слава Україні Apr 21 '15 at 05:05
4

In perl:

$ echo 'alert_beer_core_hemp' | perl -pe 's/(?:\b|_)(\p{Ll})/\u$1/g'
AlertBeerCoreHemp

This is also i18n-able:

$ echo 'алерт_беер_коре_хемп' | perl -CIO -pe 's/(?:\b|_)(\p{Ll})/\u$1/g'
АлертБеерКореХемп
1

I did it this way:

echo "this_is_the_string" | sed -r 's/(\<|_)([[:alnum:]])/\U\2/g'

and got this result:

ThisIsTheString
0

My choice is:

echo "this_is-the_string-2.0" |  perl -pe 's/(?:^|[^a-z])([a-z0-9])/\u$1/g'

Which results in:

ThisIsTheString20
drAlberT
  • 101