1

I have a javascript file with deeply needed comments. Javascript uses C style comments:

//this is a single line comment

And

/* this is
a multi-line
comment */

However, the actual code looks closer to this:

/* blah blah
// [NOTE] blah blah
// blah
// blah blah blah
// blah
// blah blah
function("string", false);
// 1: blah blah blah blah blah blah
// [1] https://example.com
function("string", true);
/* blah blah
* blah blah //* /
function("string", false);
// * * * /
// 18: blah blah blah blah blah blah
// [-] (part1) https://example.org/43
function("string", 0);
// * * * /
// 20: blah blah blah blah blah blah
// [NOTE] blah blah blah blah blah blah blah blah blah blah blah blah
// [-] https://example.org/62015
function("some_string", "string"); // (comment)
// 0301: blah blah blah blah blah blah
// [-] https://example.org/205
// function("string", false);
// 040: blah blah blah blah blah blah
// What is this?
// [-] https://example.org/58917
function("string", true);
// 050: blah blah blah blah blah blah
// [-] https://example.org/57226
function("string", false);
// 103: blah blah blah blah blah blah
// [-] https://example.org/53751
// function("string", false);
// 203: blah blah blah blah blah blah
// [WARNING] This may break
// [-] https://example.org/70082
function("string", false);
// 27: blah blah blah blah blah blah
// [-] https://example.org/57170
// function("string", 90); // default: 90
// 55: blah blah blah blah blah blah
// [-] https://example.org/73595
// function("string", true);
// * * * /
// ***/
function("string", 99); //comment comment

The answers I found, like this one, deal with simpler situations.

shell script - How can I remove all comments from a file? - Unix & Linux Stack Exchange How can I remove all comments from a file?

The main problem I experienced was that my various regex were too greedy, for example, selecting everything from the very first /* to the very last */. I did not try Perl because I'm not familiar with it. And, unfortunately, the tools I did try did not have support for all the regex syntax I would have wanted to try. I'm not sure which simple tool is best suited for this task.

MountainX
  • 17,948
  • 2
    Regular expressions cannot parse out comments in that format because it does not represent a regular language. Non-regular regexps (like Perl's) might have a chance. – Michael Homer Dec 06 '18 at 03:42
  • Is this a specific or general problem? In the specific, some variant of minify | prettify is probably good enough (that is, using actual JavaScript parsers). – Michael Homer Dec 06 '18 at 03:43
  • I'm guessing the comments don't always start at the beginning of the line, right? Related: https://j11y.io/javascript/removing-comments-in-javascript/ – Panki Dec 06 '18 at 07:10
  • This is a specific problem. Following the advice suggested in the comments, I used this approach to solve my specific problem: https://stackoverflow.com/a/2394040/463994 – MountainX Dec 06 '18 at 14:54
  • Complex example still lacks cases like function("string //"). – Kaz Dec 16 '18 at 17:07
  • This is practically not achievable neither in any regular expression tool nor in perl. To do this reliably you would need to re-write good part of the javascript interpreter to take care of its syntax. Of course theoretically you can do that, but it is a huge task and not worth the effort. – jimmij Dec 16 '18 at 18:08

1 Answers1

3

You use cpp -undef -P your_file.js or cc -undef -E -P -xc your_file.js for that.

If you don't think cpp or cc are "common" enough tools, tough luck. They really should be.

  • I used gcc, as per this answer: https://stackoverflow.com/a/2394040/463994 – MountainX Dec 16 '18 at 22:27
  • Yes, you should use the -undef or -fpreprocessed options to inhibit macros; thanks for the heads-up. –  Dec 16 '18 at 23:34