How to find all characters, newlines included?

Question

I have this LaTeX code:

%%% 10
\bibitem{bib2}
W.~Kockelmann, G.~Burca, J.F.~Kelleher, S.~Kabra, S.-Y.~Zhang, N.J.~Rhodes
 et~al., \emph{Status of the neutron imaging and diffraction instrument
 IMAT}, \href{https://doi.org/10.1016/j.phpro.2015.07.010}{\emph{Phys.\
 Procedia} {\bfseries 69} (2015) 71}.

For work I have to regenerate this bibitem using a script (I call it "bib-script") that checks if references are available on a database (inSPIRE). If there are no results on inSPIRE, bib-script returns my bibitem as a comment, i.e.

...
%%% 10
\bibitem{bib10}
%
%W.~Kockelmann, G.~Burca, J.F.~Kelleher, S.~Kabra, S.-Y.~Zhang, N.J.~Rhodes
% et~al., \emph{Status of the neutron imaging and diffraction instrument
% IMAT}, \href{https://doi.org/10.1016/j.phpro.2015.07.010}{\emph{Phys.\
% Procedia} {\bfseries 69} (2015) 71}%%% 11
\bibitem{bib11}
...

So, I have to regenerate my bibitem by hand in order to obtain the following result:

%%% 10
\bibitem{bib10}
%
%W.~Kockelmann, G.~Burca, J.F.~Kelleher, S.~Kabra, S.-Y.~Zhang, N.J.~Rhodes
% et~al., \emph{Status of the neutron imaging and diffraction instrument
% IMAT}, \href{https://doi.org/10.1016/j.phpro.2015.07.010}{\emph{Phys.\
% Procedia} {\bfseries 69} (2015) 71}
W.~Kockelmann, G.~Burca, J.F.~Kelleher, S.~Kabra, S.-Y.~Zhang, N.J.~Rhodes
 et~al., \emph{Status of the neutron imaging and diffraction instrument
 IMAT}, \href{https://doi.org/10.1016/j.phpro.2015.07.010}{\emph{Phys.\
 Procedia} {\bfseries 69} (2015) 71}.
%%% 11
\bibitem{bib11}
...

(I have choosen this record randomly, so it could be on inSPIRE, but it doesn't matter.)

I have no problem if my reference is written on the same line, i.e.

%%% 10
\bibitem{bib10}
%
%W.~Kockelmann, G.~Burca, J.F.~Kelleher, S.~Kabra, S.-Y.~Zhang, N.J.~Rhodes et~al., \emph{Status of the neutron imaging and diffraction instrument IMAT}, \href{https://doi.org/10.1016/j.phpro.2015.07.010}{\emph{Phys.\ Procedia} {\bfseries 69} (2015) 71}%%% 11

In this case I use this code:

 (setq a (make-marker))
 (set-marker a (search-forward "\\begin{thebibliography}"))

 (setq z (make-marker))
 (set-marker z (search-forward "\\end{document}"))

 ;; do not ignore case in searches
 (setq old-case-fold-search case-fold-search)
 (setq case-fold-search nil)

 (perform-replace "\\\\bibitem{\\(.*\\)}
 %
 %\\(.*\\)%%% \\([0-9]+\\)" "\\\\bibitem{\\1}
 %
 %\\2
 \\2.
 %%% \\3" t t nil 1 nil a z)

Otherwise, I have problems to manage newline. I know about \n and ^J (C-q C-j), but something like this:

 M-x RET query-replace-regexp RET \\bibitem{\(.*\)}^J%^J%\([.^J ]*\)%%% \([0-9]+\) RET \\bibitem{\1}^J%^J%\2^J\2.^J%%% \3

does not work... Surely it is wrong, but I wrote it only to explain my idea: find all characters with newline, so . for all characters and ^J for newline and then * to extend this research until %%% $[0-9]+$.

How can I use perform-replace to resolve my problem? Is there a better way?

Remarks

I am sorry that my question is not clear, so I will try to clarify it.

I have some bibitems like that:

 ...
 %%% 10
 \bibitem{bib10}
 %
 %W.~Kockelmann, G.~Burca, J.F.~Kelleher, S.~Kabra, S.-Y.~Zhang, N.J.~Rhodes
 % et~al., \emph{Status of the neutron imaging and diffraction instrument
 % IMAT}, \href{https://doi.org/10.1016/j.phpro.2015.07.010}{\emph{Phys.\
 % Procedia} {\bfseries 69} (2015) 71}%%% 11
 \bibitem{bib11}
 ...

So, I want to replace the previous code with the following:

 %%% 10
 \bibitem{bib10}
 %
 %W.~Kockelmann, G.~Burca, J.F.~Kelleher, S.~Kabra, S.-Y.~Zhang, N.J.~Rhodes
 % et~al., \emph{Status of the neutron imaging and diffraction instrument
 % IMAT}, \href{https://doi.org/10.1016/j.phpro.2015.07.010}{\emph{Phys.\
 % Procedia} {\bfseries 69} (2015) 71}
 W.~Kockelmann, G.~Burca, J.F.~Kelleher, S.~Kabra, S.-Y.~Zhang, N.J.~Rhodes
 et~al., \emph{Status of the neutron imaging and diffraction instrument
 IMAT}, \href{https://doi.org/10.1016/j.phpro.2015.07.010}{\emph{Phys.\
 Procedia} {\bfseries 69} (2015) 71}.
 %%% 11
 \bibitem{bib11}
 ...

How can I do this replacement with perform-replace?

The question is hard to parse as it contain many details that are not necessary to understand the question. Please simplify by showing a dead-simple example of what you want. — Damien Cassou, Apr 18 '18 at 13:39
@Drew I would like to edit my "\bibitem" as shown in my question. I meant all characters, newlines included. I have use bad 'with', sorry. — Onner Irotsab, Apr 18 '18 at 18:37

phils · Answer 1 · 2018-04-18T10:59:16.060

I struggled to parse that question, but I think you're asking for the regexp syntax which means "Any character -- even a newline character."

You have tried using this:

[.^J ]

(where ^J here (and in every appearance below) is where you correctly typed C-qC-j, as you were doing this interactively.)

However as C-hig (elisp)Regexp Special RET tells us:

‘[ ... ]’
     is a “character alternative”, which begins with ‘[’ and is
     terminated by ‘]’.  In the simplest case, the characters between
     the two brackets are what this character alternative can match.
     ...
     Note also that the usual regexp special characters are not special
     inside a character alternative.  A completely different set of
     characters is special inside character alternatives: ‘]’, ‘-’ and
     ‘^’.
     ...

So the . in your [.^J ] is not special; hence that construct actually only matches a literal ., a newline, or a space.

Outside of character alternatives, a . matches "any character other than a newline", and a literal newline matches "a newline", so what I believe you were looking for was:

\(?:.\|^J\)

Which matches either "any character other than a newline", or "a newline".

Note that $?: ... $ is just a "shy" (or "non-capturing") group which is the same as $ ... $ except that it can't be referenced subsequently with \DIGIT (which I'm assuming you won't need for this).

I am sorry for confusion, but I am non-native English speaker. Yes, I want to find and replace some characters even newlines. I had yet tried `$.*^J$`, but this finds line by line. I have tried your suggestion, but I have obtain the same: `%$.*\|^J$%%% $[0-9]+$` find all lines which end with the sequence "%%% number". Have I misrepresented your suggestion? — Onner Irotsab, Apr 18 '18 at 13:00
A line ending with "%%% number" would be `.*%%% [0-9]+$` (replace the `$` with a newline if you want to include the newline in what is matched). — phils, Apr 18 '18 at 13:05
You may find `M-x re-builder` useful. See https://emacs.stackexchange.com/q/5568 regarding the different kinds of regexp syntax it accepts, as the default syntax is not what you would use interactively. — phils, Apr 18 '18 at 13:10

How to find all characters, newlines included?

Remarks

1 Answers1