How to find line with least characters

Question

I am writing a shell script, using any general UNIX commands. I have to retrieve the line that has the least characters (whitespace included). There can be up to around 20 lines.

I know I can use head -$L | tail -1 | wc -m to find the character count of line L. The problem is, the only method I can think of, using that, would be to manually write a mess of if statements, comparing the values.

Example data:

seven/7
4for
8 eight?
five!

Would return 4for since that line had the least characters.

In my case, if multiple lines have the shortest length, a single one should be returned. It does not matter which one is selected, as long as it is of the minimum length. But I don't see the harm in showing both ways for other users with other situations.

What if there are multiple line with length of 4? Should they be printed too? — chaos, Jun 03 '15 at 07:11
In my case, if multiple lines have the shortest length, a single one should be returned. It does not matter which one is selected, as long as it is of the minimum length. But I don't see the harm in showing both ways for other users with other situations. — Matthew D. Scholefield, Jun 03 '15 at 16:04

Janis · Answer 1 · 2015-06-03T21:52:16.313

20

Here's a variant of an awk solution for printing the first found minimum line:

awk '
  NR==1 || length<len {len=length; line=$0}
  END {print line}
'

which can simply be extended by one condition to print all minimum lines:

awk '
  length==len {line=line ORS $0}
  NR==1 || length<len {len=length; line=$0}
  END {print line}'
'

edited Jun 03 '15 at 21:52

answered Jun 03 '15 at 07:38

Janis

14,222

score 18 · Answer 2 · answered Jun 03 '15 at 08:18

18

With sqlite3:

sqlite3 <<EOT
CREATE TABLE file(line);
.import "data.txt" file
SELECT line FROM file ORDER BY length(line) LIMIT 1;
EOT

answered Jun 03 '15 at 08:18

FloHimself

11,492

1

That one is my favorite here, never thought of SQL... – chaos Jun 03 '15 at 16:18
2

This is code golf status clever – shadowtalker Jun 04 '15 at 17:01
2

Will this read the entire file into memory and/or create a second on-disk copy? If so, it's clever but inefficient. – John Kugelman Jun 05 '15 at 01:35
1

@JohnKugelman This will probably soak up the whole 4 lines into a temporary memory only database (that is what strace indicates). If you need to work with really large files (and your system isn't swapping), you can force it by just appending a filename like sqlite3 $(mktemp) and all data will be written to disk. – FloHimself Jun 05 '15 at 05:09
I get the following errors: """xaa:8146: unescaped " character """ and """xaa:8825: expected 1 columns but found 2 - extras ignored""" .The file consists of json documents 1 per each line. – Ahmadov Jul 15 '16 at 09:13

terdon · Accepted Answer · 2015-07-23T09:44:54.357

A Perl way. Note that if there are many lines of the same, shortest length, this approach will only print one of them:

perl -lne '$m//=$_; $m=$_ if length()<length($m); END{print $m if $.}' file

Explanation

perl -lne : -n means "read the input file line by line", -l causes trailing newlines to be removed from each input line and a newline to be added to each print call; and -e is the script that will be applied to each line.
$m//=$_ : set $m to the current line ($_) unless $m is defined. The //= operator is available since Perl 5.10.0.
$m=$_ if length()<length($m) : if the length of the current value of $m is greater than the length of the current line, save the current line ($_) as $m.
END{print $m if $.} : once all lines have been processed, print the current value of $m, the shortest line. The if $. ensures that this only happens when the line number ($.) is defined, avoiding printing an empty line for blank input.

Alternatively, since your file is small enough to fit in memory, you can do:

perl -e '@K=sort{length($a) <=> length($b)}<>; print "$K[0]"' file

Explanation

@K=sort{length($a) <=> length($b)}<> : <> here is an array whose elements are the lines of the file. The sort will sort them according to their length and the sorted lines are saved as array @K.
print "$K[0]" : print the first element of array @K: the shortest line.

If you want to print all shortest lines, you can use

perl -e '@K=sort{length($a) <=> length($b)}<>; 
         print grep {length($_)==length($K[0])}@K; ' file

Add -C to measure the length in terms of number of characters instead of number of bytes. In a UTF-8 locale, $$ has fewer bytes than € (2 vs 3), but more characters (2 vs 1). — Stéphane Chazelas, Aug 12 '15 at 14:28

score 13 · Answer 4 · edited Apr 24 '19 at 21:45

13

Python comes out fairly concise, and the code Does What It Says On The Tin:

python -c "import sys; print min(sys.stdin, key=len),"

The final comma is obscure, I admit. It prevents the print statement adding an additional linebreak. Additionally, you can write this in Python 3 supporting 0 lines like:

python3 -c "import sys; print(min(sys.stdin, key=len, default='').strip('\n'))"

edited Apr 24 '19 at 21:45

Matthew D. Scholefield

1,049

answered Jun 04 '15 at 11:59

Steve Jessop

231

what does the tin say? – mikeserv Jun 04 '15 at 12:08
@mikeserve: it says, "prints the minimum of sys.stdin, using len as the key" ;-) – Steve Jessop Jun 04 '15 at 12:11
1

ahh. nothing about binary size, dependency creep or execution time, then? – mikeserv Jun 04 '15 at 12:12
2

@mikeserv: no, the small print isn't on the tin. It's on an advisory leaflet in a locked filing cabinet, in a cellar, behind a door marked "beware of the leopard". – Steve Jessop Jun 04 '15 at 12:14
Gotcha - so on display. – mikeserv Jun 04 '15 at 12:35
fails with Traceback ... ValueError: min() arg is an empty sequence on empty input. My rejected fix is here – Evgeny Jul 23 '15 at 15:38
For Python3, the print() function takes an end= named argument for the line end. So this is better, and equivalent to the Python2 trailing comma: print(min(sys.stdin, key=len, default=''), end='') – filbranden Jul 23 '19 at 12:18

score 10 · Answer 5 · edited Jul 23 '15 at 12:13

10

I always love solutions with pure shell scripting (no exec!).

#!/bin/bash
min=
is_empty_input="yes"

while IFS= read -r a; do
    if [ -z "$min" -a "$is_empty_input" = "yes" ] || [ "${#a}" -lt "${#min}" ]; then
        min="$a"
    fi
    is_empty_input="no"
done

if [ -n "$a" ]; then
    if [ "$is_empty_input" = "yes" ]; then
        min="$a"
        is_empty_input="no"
    else
        [ "${#a}" -lt "${#min}" ] && min="$a"
    fi
fi

[ "$is_empty_input" = "no" ] && printf '%s\n' "$min"

Note:

There is a problem with NUL bytes in the input. So, printf "ab\0\0\ncd\n" | bash this_script prints ab instead of cd.

edited Jul 23 '15 at 12:13

Evgeny

5,476

answered Jun 03 '15 at 07:46

yaegashi

12,326

This really is the purest. Although, the clumsiness of tests in bash would convince me to pipe an intermediate result into sort instead. – orion Jun 04 '15 at 12:08
(( ${#a} < ${#min} )) is possibly cleaner than [ "${#a}" -lt "${#min}" ]. Its unusual, but in this case the double quotes around the string length expansions are not necessary - string length will always be a contiguous string of digits. – Digital Trauma Jun 04 '15 at 18:54
2

Have you tried benching your no exec! solution versus others which do? Here's a comparison of the performance differences between exec! and no exec! solutions for a similar problem. execing a separate process is very seldom advantageous when it spiders - in forms like var=$(get data) because it restricts the data flow to a single context - but when you move data through a pipeline - in a stream - each applied exec is generally helpful - because it enables specialized application of modular programs only where necessary. – mikeserv Jun 04 '15 at 20:10
1

@DigitalTrauma - an expanded contiguous string of digits is not any more or less exempt from the conditions which make shell-quoting necessary than any other expanded string. $IFS is not digit-discriminatory - even if there are none in a default $IFS value, though many shells will accept a preset environment configuration for $IFS - and so that is not a particularly reliable default. – mikeserv Jun 04 '15 at 20:58
@mikeserv Yes I hadn't considered possible effects of $IFS – Digital Trauma Jun 04 '15 at 21:11
2

See Why is using a shell loop to process text considered bad practice? – Stéphane Chazelas Jun 04 '15 at 21:27
1

Thank you all for the comments and upvotes (some of the rep should go to @cuonglm for correcting my answer). Generally I don't recommend others to daily practice pure shell scripting but that skill can be found very useful in some extreme conditions where nothing other than static linked /bin/sh is available. It's happened to me several times with SunOS4 hosts with /usr lost or some .so damaged, and now in modern Linux age I still occasionally encounter similar situations with embedded systems or initrd of boot failing systems. BusyBox is one of the great things we recently acquired. – yaegashi Jun 05 '15 at 01:18

chaos · Answer 6 · 2015-06-03T16:37:59.280

Here a pure zsh solution (it prints all lines with the minimal length, from file):

IFS=$'\n'; print -l ${(M)$(<file):#${~${(o@)$(<file)//?/?}[1]}}

Example input:

seven/7
4for
8 eight?
five!
four

Output is:

4for
four

I think it needs a short explanation :-)

First, we set the internal field separator to newline:

IFS=$'\n';

So far so good, now the hard part. print uses the -l flag to print the result separated by newlines instead of spaces.

Now, we start at the inside:

$(<file)

The file is read line by line and treated as array. Then:

${(o@)...//?/?}

The o flag says that the result should be ordered in ascending order, the @ means to treat the result as array too. The part behind (//?/?) is a substitution an replaces all characters with a ?. Now:

${~...[1]}

We take the first array element [1], which is the shortest, in your case its now ????.

${(M)$(<file):#...}

Matching is performed on each array elements separately, and the unmatched array elements are removed (M). Each element that matches ???? (4 characters) stays in the array. So the remaining elements are the ones that have 4 characters (the shortest ones).

Edit: If you need only one of the shortest lines, this modified version prints the first one:

IFS=$'\n'; print -l ${${(M)$(<file):#${~${(o@)$(<file)//?/?}[1]}}[1]}

score 8 · Answer 7 · edited May 23 '17 at 12:40

8

Try:

awk '{ print length, $0 }' testfile | sort -n | cut -d" " -f2- | head -1

The idea is to use awk to print the length of each line first. This will appear as:

echo "This is a line of text" | awk '{print length, $0}'
22 This is a line of text

Then, use the character count to sort the lines by sort, cut to get rid of the count and head to keep the first line (the one with the least characters). You can of course use tail to get the line with the most characters in this case.

(This was adopted from this answer)

edited May 23 '17 at 12:40

Community

1

answered Jun 03 '15 at 07:03

Bichoy

3,106
2
21
34

+1 for the logic but it won't work in all the cases. If the two lines are having the same number of characters and which is minimum. It will give you only the first line which is encountered because of head -1 – Thushi Jun 03 '15 at 07:08
To get the longest line, it's a bit more efficient to reverse the sort than to use tail (as head can exit as soon as its job is done, without reading the rest of its input). – Toby Speight Jun 03 '15 at 09:50
@Thushi Using a bit of regex, after printing line numbers, everything but the lines with the same number as line 1, could be removed, thus outputting all of the shortest lines. – Matthew D. Scholefield Jun 03 '15 at 16:38
Absolutely amazing! Found the root API of Swagger API output :) curl -s http://localhost:8750/swagger/docs/v2 | jq -r '.paths | keys[]' | awk '{ print length, $0 }' | sort -n | cut -d" " -f2- | head -1 – Marcello de Sales Aug 11 '21 at 10:11

mikeserv · Answer 8 · 2015-06-05T15:57:34.493

tr -c \\n 1 <testfile |   #first transform every [^\n] char to a 1
grep -nF ''           |   #next get line numbers
paste -d: - testfile  |   #then paste it together with itself
sort  -t: -nk2,2          #then sort on second field

...and the winner is... line 2, it would seem.

2:1111:4for
4:11111:five!
1:1111111:seven/7
3:11111111:8 eight?

But the problem with that is that every line must more than double in length in order for it to work - so LINE_MAX is effectively halved. The cause is that it is using - what, a base 1? - to represent the length of the line. A similar - and perhaps more tidy - approach might be to compress that information in stream. The first idea along those lines that occurs to me is that I ought to unexpand it:

tr -c \\n \  <testfile    |   #transform all [^\n] to <space>
unexpand -t10             |   #squeeze every series of 10 to one tab
grep -nF ''               |   #and get the line numbers
sed    's/:/!d;=;:/;h;:big    #sed compares sequential lines
$P;$!N; /\(:[^ ]*\)\( *\)\n.*\1.*\2/!D     #newest line is shorter or...
        g;/:./!q;b big'   |   #not; quit input entirely for blank line
sed -f - -e q testfile        #print only first occurrence of shortest line

That prints...

2
4for

Another one, just sed:

sed -n '/^\n/D;s/\(.\)\(\n.*\)*/\1/g
$p;h;   s// /g;G;x;n;//!g;H;s// /g
G;      s/^\( *\)\(\n \1 *\)\{0,1\}\n//
D'      <infile >outfile

The syntax is standards compliant - but that is no guarantee that any old sed will handle the $reference-group$\{counts\} correctly - many do not.

It basically applies the same regexp to input repeatedly - which can be very beneficial when it is time to compile them. That pattern is:

\(.\)\(\n.*\)*

Which matches different strings in different ways. For example:

string1\nstring2\nstring3

...is matched with s in \1 and '' the null string in \2.

1\nstring2\nstring3

...is matched with 1 in \1 and \nstring2\nstring3 in \2

\nstring2\nstring3

...is matched with \n in \1 and '' the null string in \2. This would be problematic if there was any chance of a \newline occurring at the head of pattern space - but the /^\n/D, and //!g commands are used to prevent this. I did use [^\n] but other needs for this little script made portability a concern and I wasn't satisfied with the many ways it is often misinterpreted. Plus, . is faster.

\nstring2
string1

... match \n and s again in \1 and both get the '' null string in \2. Empty lines don't match at all.

When the pattern is applied globally the two biases - both the left-most standard bias and the lesser right-side \newline bias - are counter-balanced to effect a skip. A few examples:

s/\(.\)\(\n.*\)*/\1:\2/g
s/\(.\)\(\n.*\)*/\2\1:/g
s/\(.\)\(\n.*\)*/\1: /g
s/\(.\)\(\n.*\)*/ :\2/g

...if all applied (not in succession) to the following string...

string1\nstring2

...will transform it to...

s:t:r:i:n:g:1:\nstring2
s:t:r:i:n:g:\nstring21:
s:t:r:i:n:g:1: 
 : : : : : : :\nstring2

Basically I use the regexp to always handle only the first line in any pattern-space to which I apply it. That enables me to juggle two different versions of both a retained shortest-match-so-far line and the most recent line without resorting to test loops - every substitution applied handles the entire pattern-space at once.

The different versions are necessary for literal string/string comparisons - so there must be a version of each line where all characters are guaranteed to be equal. But of course if one or the other should wind up actually being the earliest occurring shortest line in input, then the line printed to output should probably be the original version of the line - not the one I've sanitized/homogenized for comparison's sake. And so I need two versions of each.

It is unfortunate that another necessity is a lot of buffer switching to handle same - but at least neither buffer ever exceeds any more than the four lines needed to stay current - and so maybe it is not terrible.

Anyway, for each cycle the first thing that happens is a transformation on the remembered line - because the only copy actually saved is the literal original - into...

^               \nremembered line$

...and afterward the next input line overwrites any old buffer. If it does not contain at least a single character it is effectively ignored. It would be far easier just to quit at the first occurring blank line, but, well, my test data had a lot of those and I wanted to handle multiple paragraphs.

And so if it does contain a character its literal version is appended to the remembered line and its spaced comparison version is positioned at head of pattern space, like this:

^   \n               \nremembered line\nnew$

Last a substitution is applied to that pattern space:

s/^\( *\)\(\n \1 *\)\{0,1\}\n//

So if the newline can fit within the space needed to contain the remembered line with at least one char to spare then the first two lines are substituted away, else only the first.

Regardless of the outcome the first line in pattern space is always Deleted at end-of-cycle before starting again. This means that if the new line is shorter than the last the string...

new

...is sent back to the first substitution in the cycle which will always strip only from the first newline char on - and so it remains whole. But if it is not then the string...

remembered line\nnew

...will begin the next cycle instead, and the first substitution will strip from it the string...

\nnew

...every time.

On the very last line the remembered line is printed to standard out, and so for the example data given, it prints:

4for

But, seriously, use tr.

Do you even need to insert line numbers? My reading of the OP is that just the shortest line is required, and not necessarily the line number of that line. I guess no harm in showing it for completeness. — Digital Trauma, Jun 04 '15 at 21:25
@DigitalTrauma - nah, probably not. But it is hardly very useful without them - and they come so cheaply. When working a stream i always prefer to include a means of reproducing the original input identically in the output - the line-numbers make that possible here. For example, to turn the results of the first pipeline around: REINPUT | sort -t: -nk1,1 | cut -d: -f3-. And the second is a simple matter of including another sed --expression script at the tail. — mikeserv, Jun 04 '15 at 21:32
@DigitalTrauma - oh, and in the first example the line numbers do affect sort's behavior as a tie-breaker when same-length lines occur in input - so the earliest occurring line always floats to the top in that case. — mikeserv, Jun 04 '15 at 21:48

cuonglm · Answer 9 · 2015-06-04T01:08:34.030

5

With POSIX awk:

awk 'FNR==1{l=$0;next};length<length(l){l=$0};END{print l}' file

edited Jun 04 '15 at 01:08

answered Jun 03 '15 at 07:05

cuonglm

153,898

It won't work if more than one line is having the same number of characters and which is also minimum. – Thushi Jun 03 '15 at 07:11
@Thushi: It will report the first minimum line. – cuonglm Jun 03 '15 at 07:14
Yeah.But that's not correct output right? Even the other lines are having the minimum number of characters. – Thushi Jun 03 '15 at 07:16
1

@Thushi: That doesn't mention in OP requirement, waiting update from OP. – cuonglm Jun 03 '15 at 07:17
Ok.No problem. It is just a general use case which I mentioned(Something like implicit use cases/requirements). Anyhow we will wait for him. :) – Thushi Jun 03 '15 at 07:20
3

I don't think L was the best letter to chose to name the variable :D Something like min would make things more clear – fedorqui Jun 03 '15 at 11:12

Digital Trauma · Answer 10 · 2015-06-04T21:26:59.847

3

Borrowing some of @mikeserv's ideas:

< testfile sed 'h;s/./:/g;s/.*/expr length "&"/e;G;s/\n/\t/' | \
sort -n | \
sed -n '1s/^[0-9]+*\t//p'

The first sed does the following:

h saves the original line to the hold buffer
Replace every character in the line with : - this is to remove any danger of code injection
Replace the whole line with expr length "whole line" - this is a shell expression which may be evaluated
The e command to s is a GNU sed extension to evaluate the pattern space and put the result back in the pattern space.
G appends a newline and the contents of the hold space (the original line) to the pattern space
the final s replaces the newline with a tab

The number of characters is now a number at the start of each line, so sort -n sorts by line length.

The final sed then removes all but the first (shortest) line and the line length and prints the result.

edited Jun 04 '15 at 21:26

answered Jun 04 '15 at 19:17

Digital Trauma

8,554

1

@mikeserv Yes I think expr is nicer here. Yes, e will spawn a shell for each line. I edited the sed expression so that it replaces each char in the string with a : before the eval which I think should remove any possibility of code injection. – Digital Trauma Jun 04 '15 at 21:22
I would usually opt for xargs expr personally - but, other than avoiding an intermediate shell, that's probably more a stylistic thing. I like it, anyway. – mikeserv Jun 04 '15 at 21:27

Digital Trauma · Answer 11 · 2015-06-05T17:29:59.463

3

It occurred to me that the whole thing is possible in one sed expression. It ain't pretty:

$ sed '1h;s/.*/&\n&/;G;:l;s/\n[^\n]\([^\n]*\)\n[^\n]/\n\1\n/;tl;/\n\n/{s/\n.*//;x};${x;p};d' testfile
4for
$

Breaking this down:

1h            # save line 1 in the hold buffer (shortest line so far)
s/.*/&\n&/    # duplicate the line with a newline in between
G             # append newline+hold buffer to current line
:l            # loop start
s/\n[^\n]\([^\n]*\)\n[^\n]/\n\1\n/
              # attempt to remove 1 char both from current line and shortest line
tl            # jump back to l if the above substitution succeeded
/\n\n/{       # matches if current line is shorter
  s/\n.*//    # remove all but original line
  x           # save new shortest line in hold buffer
}
${            # at last line
  x           # get shortest line from hold buffer
  p           # print it
}
d             # don't print any other lines

The BSD sed in OS X is a bit more finicky with newlines. This version works for both BSD and GNU versions of sed:

$ sed -e '1h;G;s/\([^\n]*\)\(\n\)\(.*\)/\1\2\1\2\3/;:l' -e 's/\(\n\)[^\n]\([^\n]*\n\)[^\n]/\1\2/;tl' -e '/\n\n/{s/\n.*//;x;};${x;p;};d' testfile
4for
$

_{Note this is more of a "because its possible" answer than a serious attempt to give a best practice answer. I guess it means I've been playing too much code-colf}

edited Jun 05 '15 at 17:29

answered Jun 04 '15 at 22:16

Digital Trauma

8,554

@mikeserv From man sed on OS X: "The escape sequence \n matches a newline character embedded in the pattern space". So I think GNU sed allows \n in the regex and in the replacement, whereas BSD only allows \n in the regex and not in the replacement. – Digital Trauma Jun 05 '15 at 17:09
Borrowing the \n from the pattern space is a good idea and would work in the second s/// expression, but the s/.*/&\n&/ expression is inserting a \n into the pattern space where there wasn't one before. Also BSD sed appears to require literal newlines after label definitions and branches. – Digital Trauma Jun 05 '15 at 17:12
1

Those newlines are parameter delimiters - you need them to delimit any command which might accept an arbitrary parameter - at least, that's what the spec says. The spec also says that a sed script shall be a text file except that it need not end in a newline. So you can usually delimit them as separate args as well - sed -e :\ label -e :\ label2 and so on. Since you're doing 1h anyway, you could just switch to some logic based on x;H to get your newline - and you can trim a leading newline from pattern space at end of cycle without pulling in a newline w/ D. – mikeserv Jun 05 '15 at 17:25
@mikeserv Nice. Yes, I inserted the newline I needed by doing the G first and changing the s/// expression. Splitting it up using -e allows it all to go on one (long) line with no literal newlines. – Digital Trauma Jun 05 '15 at 17:31
The \n escape is spec'd for sed's LHS, too, and i think that is the spec's statement verbatim, except that POSIX bracket expressions are also spec'd in such a way that all characters lose their special meaning - (explicitly including \\) - within one excepting the brackets, the dash as a range separator, and dot, equals, caret, colon for collation, equivalence, negation, and classes. – mikeserv Jun 05 '15 at 17:32
One handy thing about newline delimmed params is they can be basically anything - and you don't even have to know what it is, so long as it is unique. It makes for some interesting options when doing... sed ... | sed -f - ... because you can define arbitrary branches labeled for the first sed's params programmatically without having to worry overmuch about syntax chars and so on. It also works for read and write files. – mikeserv Jun 05 '15 at 17:38

Peter.O · Answer 12 · 2015-06-05T14:01:17.997

To get just the first shortest line:

f=file; sed -n "/^$(sed 's/./1/g' $f | sort -ns | sed 's/././g;q')$/{p;q}" $f

To get all the shortest lints, just change {p;q} to p

Another method (somewhat unusual) is to have sort do the actual sort by length. It is relatively slow even with short lines, and becomes dramatically slower as the line length increases.
However, I find the idea of sorting by overlapping keys quite interesting. I'm posting it in case others may also find it interesting/informative.

How it works:
Sort by length-variants of the same key – key 1 which spans the entire line
Each successive key variant increments the key length by one character, up to the length of the file's longest line (determined bywc -L)

To get just the first (sorted) shortest line:

f=file; sort -t'\0' $(seq -f "-k1.%0.0f" $(<"$f" wc -L) -1 1) "$f" | head -n1

which is the same as:

f=file.in; 
l=$(<"$f" wc -L)
k=$(seq -f "-k1.%0.0f" $l -1 1) 
sort -st'\0' $k "$f" | head -n1

score 2 · Answer 13 · answered Jun 04 '15 at 16:18

2

Another perl solution: store the lines in a hash-of-arrays, the hash key being the line length. Then, print out the lines with the minimum key.

perl -MList::Util=min -ne '
    push @{$lines{ length() }}, $_;
} END {
    print @{$lines{ min keys %lines }};
' sample

4for

answered Jun 04 '15 at 16:18

glenn jackman

85,964

You can use push @{$lines{+length}}; and print @{$lines{+min keys %lines}}; for less typing :) – cuonglm Jun 04 '15 at 16:31
If I was golfing, I wouldn't have used the variable name "lines" either: perl -MList::Util=min -nE'push @{$l{+length}},$_}END{say@{$l{min keys%l}}' sample – glenn jackman Jun 04 '15 at 16:34
+1 for a non-golfed version (which works!), though for only the print all variant. – perl gets a bit gnarly for those of us who aren't up to par.with perl's cryptic nature. BTW. the golfed say prints a spurious blank line at the end.of the output. – Peter.O Jun 04 '15 at 17:36

snth · Answer 14 · 2015-06-05T19:06:13.380

2

Assuming blank lines are not considered the shortest line and that blank lines might exist, the following pure AWK will work:

awk '
    {
        len   = length;
        a[$0] = len
    }
    !len { next }
    !min { min = len }
    len < min { min = len }
    END {
        for (i in a)
            if (min == a[i])
                print i
    }
' infile.txt

edited Jun 05 '15 at 19:06

answered Jun 05 '15 at 10:54

snth

311

score 2 · Answer 15 · answered Jun 08 '15 at 23:28

2

What about using sort?

awk '{ print length($0) "\t" $0 }' input.txt | sort -n | head -n 1 | cut -f2-

answered Jun 08 '15 at 23:28

Gaurav

1,103

iruvar · Answer 16 · 2019-04-25T04:44:05.430

With GNU awk

gawk '
    {
         a[length]=$0
    };
    END
    {
        PROCINFO["sorted_in"]="@ind_num_asc";
        for (i in a)
        {
            print a[i]; 
            exit
        }
    }
    ' file

Read each line into an array indexed by line length.
Set PROCINFO["sorted_in"] to @ind_num_asc to force array scanning to be ordered by the array index, sorted numerically
The setting of PROCINFO in the manner above forces the line with the smallest length to be picked up first in the traversal of the array. So print first element from the array and exit

This has the disadvantage of being a nlogn while some of the other approaches are n in time

agc · Answer 17 · 2019-04-25T06:50:46.127

1

Mid-level shell tools method, with no sed or awk:

f=inputfile
head -n $(xargs -d '\n' -L 1 -I % sh -c 'exec echo "%" | wc -c' < $f | 
          cat -n | sort -n -k 2 | head -1 | cut -f 1)  $f | tail -1

edited Apr 25 '19 at 06:50

answered Apr 25 '19 at 06:21

agc

7,223

It'd be nice to not need an $f variable; I've a notion that might be possible using tee somehow... – agc Apr 25 '19 at 06:24

How to find line with least characters

17 Answers17

Explanation

Explanation

Linked

Related