Note that if you remove every newline character from a file, even the last one, then it's no longer a text file (unless the file ends up being empty) as a text file contains a sequence of text lines, text lines being delimited by newline characters.
Now, to remove all but alphabetical characters (any alphabet), as @Kusalanada said, POSIXly, you'd use tr -cd '[:alpha:]'
.
Now, unfortunately, with some tr
implementations, including GNU tr
, that doesn't work for multi-byte characters. In UTF-8 locales, that means all characters but ASCII ones.
On GNU systems, you can use GNU awk
or GNU sed
which do support multibyte characters:
<file sed 's/[^[:alpha:]]//g' | tr -d '\n'
<file awk -v ORS= '{gsub(/[^[:alpha:]]/, ""); print}'
That syntax is not GNU-specific, but you'll find some non-GNU sed
/awk
implementations that don't support multibyte characters. Beware that GNU sed
/awk
at least will not remove sequences of bytes that don't form valid characters (like the output of printf 'à b \200\n'
in a UTF-8 locale).
With uconv
from the ICU project, you could do:
<file uconv -i -x '[^[:Letter:]]>;'
Where -i
tells uconv
to skip input it can't decode.
But that only works for UTF-8 data. Note that it uses Unicode character properties (some version of Unicode) as opposed to what your locale decides what's alphabetical or not.
With GNU grep
, you could use:
<file grep -o '[:alpha:]' | tr -d '\n'
Or if built with PCRE support (using Unicode properties):
<file grep -Po '\pL' | tr -d '\n'
With GNU awk
, another approach to skip the invalid input is to use RS
:
<file gawk -v RS='[[:alpha:]]' -v ORS= '{print RT}'
To modify the files in-place, you can use gawk
's inplace
module:
gawk -i /usr/share/awk/inplace.awk gawk -v RS='[[:alpha:]]' -v ORS= '{print RT}' file
Do not use -i inplace
as gawk
tries to load the inplace
extension (as inplace
or inplace.awk
) from the current working directory first, where someone could have planted malware. The path of the inplace
extension supplied with gawk
may vary with the system, see the output of gawk 'BEGIN{print ENVIRON["AWKPATH"]}'