5

In Analyzing Scripts webpage of TLDP, the following code is provided for analysis:

export SUM=0
for f in $(find src -name "*.java"); do
    export SUM=$(($SUM + $(wc -l $f | awk '{ print $1 }')))
done
echo $SUM

I understand that it calculates the sum of the number of lines of all *.java files in the directory src. What I do not understand is the reason for using the export keyword, which is described thus:

The export command makes available variables to all child processes of the running script or shell.

Since SUM is never accessed by a child process, is there any reason for exporting it?

Quasímodo
  • 18,865
  • 4
  • 36
  • 73

2 Answers2

11

I understand that it calculates the sum of the number of lines of all *.java files in the directory src.

This is not necessarily completely true. It calculates the sum of the number of lines of all *.java files in the directory tree rooted at src (i.e. src and all its child directories). But it will fail for any file paths containing whitespace or when there are directory names ending with .java.

Since SUM is never accessed by a child process, is there any reason for exporting it?

No.

I would probably write the snippet of code like this, making it filename-safe in the process:

find src -type f -name '*.java' -exec wc -l {} \; | awk '{ s += $1 } END { print s }'

A better solution would probably be this:

find src -type f -name '*.java' -exec cat {} + | wc -l
Chris Davies
  • 116,213
  • 16
  • 160
  • 287
  • 2
    +1 which has the nice side-effect of eliminating the need to remove the filename field (since wc is reading stdin rather than a file) – steeldriver Jan 29 '20 at 16:23
  • 1
    -exec utility {} + is standard syntax, not just GNU. – Kusalananda Jan 29 '20 at 16:47
  • 1
    "it will fail for ... directory names ending with .java" -- Seems to just give an error message from wc for me, not strictly failing (with Bash). – ilkkachu Jan 30 '20 at 16:05
  • My guess is that the reason it was exported is the person who wrote it thought the $SUM inside the double parentheses was running in a subshell, and therefore would not be defined. This is not true. Code inside single parentheses does run in a subshell. – Monty Harder Jan 30 '20 at 16:23
  • @MontyHarder But even then you don't need to export, because subshells inherit shell variables. Environment variables are only needed for external programs. – Barmar Jan 30 '20 at 17:10
  • 1
    Filenames in Java have to reflect class names, so you're bound to valid specifiers and conventions by Java itself ([A-Z][a-z0-9_]*) which makes it highly unlikely to find a Java file anywhere, which could cause problems. Directories reflect package names, which in turn are separated by dots inside Java source, so the probability of a directory name, containing a dot in a Java source dir is negligible, too. – user unknown Mar 05 '20 at 15:27
  • find src -name "*.java" -exec wc -l {} + will print a summary, which can be piped to tail -n 1 to suppress the individual sizes. Of course, if the bare number has to be extracted for further commands, cat might be the most elegant solution. – user unknown Mar 05 '20 at 16:05
2

You're right, there is no need to use export here. There are more problems with this code though:

$  ~/.cabal/bin/shellcheck e.sh

In e.sh line 4:
for f in $(find src -name "*.java"); do
         ^------------------------^ SC2044: For loops over find output are fragile. Use find -exec or a while read loop.


In e.sh line 5:
    export SUM=$(($SUM + $(wc -l $f | awk '{ print $1 }')))
                  ^--^ SC2004: $/${} is unnecessary on arithmetic variables.
                                 ^-- SC2086: Double quote to prevent globbing and word splitting.

Did you mean:
    export SUM=$(($SUM + $(wc -l "$f" | awk '{ print $1 }')))

For more information:
  https://www.shellcheck.net/wiki/SC2044 -- For loops over find output are fr...
  https://www.shellcheck.net/wiki/SC2086 -- Double quote to prevent globbing ...
  https://www.shellcheck.net/wiki/SC2004 -- $/${} is unnecessary on arithmeti...

And it's a good practice to use lowercase variable names in the scripts, see this for an explanation.

In general, tldp.org is not recommended, see this or this. I've also seen people not recommending tldp.org here at StackExchange.

  • 1
    The fragment is about script analysis - what does it do and how to make it better. I would really hope that it was a contrived solution that could be easily and straightforwardly corrected and optimised. – Chris Davies Jan 29 '20 at 16:24
  • Filenames in Java have to reflect class names, so you're bound to valid specifiers and conventions by Java itself ([A-Z][a-z0-9_]*) which makes it highly unlikely to find a Java file anywhere, which could cause problems. – user unknown Mar 05 '20 at 15:29
  • @userunknown: are you talking about variable names or their values? – Arkadiusz Drabczyk Mar 05 '20 at 16:07
  • @ArkadiuszDrabczyk: About class names, but interface names too. A Java file, specifying the public class FileInputStream, must be named FileInputStream.java, no spaces allowed. The same holds for Interfaces. Both are neither variable names, nor values. Visit github to see real world examples. – user unknown Mar 05 '20 at 23:57
  • @userunknown: sorry but I have no idea what are you talking about. What part of my answer are you referring to? – Arkadiusz Drabczyk Mar 06 '20 at 00:01
  • The command find src -name "*.java" will in almost all cases only find files without blanks, tabs, newlines, vertical tabs and formfeeds in their name, because there are restrictions in the Java specification, which narrow the possible file names indirectly. Blindly following shellcheck can be misleading. – user unknown Mar 06 '20 at 00:33
  • @userunknown: will in almost all cases only find files without blanks, tabs, newlines, vertical tabs and formfeeds in their name - yeah, almost all. What's your point? Do you suggest not quoting variables because it will almost always work? – Arkadiusz Drabczyk Mar 06 '20 at 00:42
  • Almost all means in this case: I bet if you visit 1000 companies, writing Java code, you will not find a single source or src folder, containing a file which does not follow the rule, because else the compiler will report an error, the build is broken. Of course you may now create such a file intentionally to prove me wrong. Find just a single file of that kind in the git repositories. Having a head crash on your hard drive while using find is by order of magnitude more common. – user unknown Mar 06 '20 at 00:53
  • You're completely wrong. What stops me from doing touch 'a b c.java? What if OP acquired more skills and decided to use the same find command with other file types, say .c? What if Java standards changed to be "more easy for newcomers" or whatever? What if someone had a .java file that contained a whitespace despite of all the standards because someone else who has seen IDE for the first time in their life created "my program.java" file because they didn't know about the standards? – Arkadiusz Drabczyk Mar 06 '20 at 00:59
  • And why do you want to break this command that would work in all cases and make it work in almost all cases as you said yourself? Is almost all better than all? – Arkadiusz Drabczyk Mar 06 '20 at 00:59
  • Or what if this command was supposed to be used by people who write Java compilers to test if their compiler correctly discards files that contain a whitespace? – Arkadiusz Drabczyk Mar 06 '20 at 01:00