I am processing a big list of domains to convert to IDN with the following command:
cat list | idn > clean
list format example:
президент.рф
mañana.com
bücher.com
café.fr
cliché.com
hualañe.cl
köln-düsseldorfer-rhein-main.de
mūsųlaikas.lt
sendesık.com
sushicorner-würzburg.de
domain.com
# almost 1 M lines
But I get the following message
idn: idna_to_ascii_4z (big list): Output would be too big or too small
Then I must make sure that my list does not exceed the allowed limit (too big or too small)
I found this:
RFC 1035 the length of a FQDN is limited to 255 characters, and each label (node delimited by a dot in the hostname) is limited to 63 characters
and
1-character limit botton (example: t.co)
Question: How do I remove from my list, domains with hostnames greater than 63 characters and less than 1, by command line? (bash to run idn without error)
Actions: I have tried the following (although I wish it was all in one command) (partial source):
sed -n '/.\{63\}/p' list > out
grep -vi -f <(sed 's:^\(.*\)$:\\\1\$:' out) list | sort -u > out2
But when I run the idn command, the same message idn comes up
cat out2 | idn
idn: idna_to_ascii_4z (big list): Output would be too big or too small
I appreciate any help
PD: Maybe the problem is related to IDN and the size of the list (which is very large). I do not know. I have no information if IDN has any limitations on the number of lines | domains | hostnames to process. The help file does not provide much information on this point
Update: The problem was solved, but the correct answer was eliminated by the author @cas, apparently due to a spam incident. Vote for closing
hello..com
. You may do that withgrep -F '..' list
. – Kusalananda Aug 27 '19 at 15:47