86

I have a text file I'm outputting to a variable in my shell script. I only need the first 50 characters however.

I've tried using cat ${filename} cut -c1-50 but I'm getting far more than the first 50 characters? That may be due to cut looking for lines (not 100% sure), while this text file could be one long string-- it really depends.

Is there a utility out there I can pipe into to get the first X characters from a cat command?

Ramesh
  • 39,297
jkj2000
  • 1,149

8 Answers8

111
head -c 50 file

This returns the first 50 bytes.

Mind that the command is not always implemented the same on all OS. On Linux and macOS it behaves this way. On Solaris (11) you need to use the gnu version in /usr/gnu/bin/

DisplayName
  • 11,688
  • head has no -c option. I’d go for dd(1) instead. – mirabilos Nov 14 '14 at 09:20
  • 8
    Note that this answer assumes that the file contains only ASCII characters, as the OP asked for the first X characters, not bytes. – Calimo Nov 14 '14 at 09:29
  • 3
    @mirabilos It might not be portable, but my version (GNU coreutils 5.97) does. – Yossarian Nov 14 '14 at 11:57
  • Yeah, my OS X (BSD) also does. – DisplayName Nov 14 '14 at 12:00
  • 1
    POSIX doesn't define -c as a valid option, however, so it is definitely dependent on your local environment. http://www.unix.com/man-page/posix/1/head – Jules Nov 14 '14 at 12:59
  • 1
    @Calimo Yes, I know, but I tried making a text file with 100 characters then running my command and it printed 50 characters. But you're right about ASCII, but since OP flagged this as answered there were none in his case. – DisplayName Nov 14 '14 at 13:28
  • @Yossarian that’s exactly what I’m saying. GNU does. (@DisplayName: OSX uses GNU tools.) But Unix doesn’t. – mirabilos Nov 14 '14 at 14:20
  • @mirabilos

    Yes it does use Some GNU tools, but in this case it does not, running man head tells me that it's BSD head.

    – DisplayName Nov 14 '14 at 14:54
  • An overwhelming majority of OS X tools are Unix, maybe 85%. – DisplayName Nov 14 '14 at 14:56
  • BSD head(1) does not have -c either. – mirabilos Nov 14 '14 at 14:59
  • `HEAD(1) BSD General Commands Manual HEAD(1)

    NAME head -- display first lines of a file

    SYNOPSIS head [-n count | -c bytes] [file ...] `

    – DisplayName Nov 14 '14 at 15:05
  • I have misunderstood you, I meant FreeBSD,here http://www.freebsd.org/cgi/man.cgi?query=head – DisplayName Nov 14 '14 at 15:10
  • @DisplayName see, even the BSDs diverge (FreeBSD is often called a "little Linux" in the BSD scene). Anyway, all manpages using the -mdoc nroff macropackage have, by default, “BSD something Manual” on top of them, so that is no indicator. (The presence of this option in FreeBSD is, but, as I said, the option is nōn-standard.) – mirabilos Nov 15 '14 at 14:28
  • No they don't, emacs and gcc for example, say GNU. – DisplayName Nov 15 '14 at 15:04
  • For binary data use hexdump; here's my answer containing both head and hexdump: https://unix.stackexchange.com/a/544247/114401 – Gabriel Staples Sep 28 '19 at 23:01
38

Your cut command works if you use a pipe to pass data to it:

cat ${file} | cut -c1-50 

Or, avoiding a useless use of cat and making it a little safer:

cut -c1-50 < "$file"

Note that the commands above will print the first 50 characters (or bytes, depending on your cut implementation) of each input line. It should do what you expect if, as you say, your file is one huge line.

terdon
  • 242,166
9
dd status=none bs=1 count=50 if=${filename}

This returns the first 50 bytes.

doneal24
  • 5,059
  • dd has no status=none flag. Use 2>/dev/null instead (and quote properly): dd if="$filename" bs=1 count=50 2>/dev/null (even so, consider using bs=50 count=1 to reduce the number of syscalls involved). – mirabilos Nov 14 '14 at 09:18
  • 2
    @mirabilos dd does have status=none when using Ubuntu 14.04, coreutils 8.21, but you're right to use 2>/dev/null if using a earlier version. – doneal24 Nov 14 '14 at 16:19
  • There is no GNU coreutils (nor does dd(1) have a version) on most Unix systems, which is why I urge you to use the portable version. After all, this is not the “Ask Ubuntu” SO site, but the “Unix & Linux” one. – mirabilos Nov 14 '14 at 16:48
  • 1
    @mirabilos Most Linux distros use GNU coreutils as does FreeBSD and other BSDs. It is available on Solaris as package gnu-coreutils. Yes, this is "Unix & Linux" and both Unix and Linux systems use GNU coreutils. – doneal24 Nov 14 '14 at 18:02
  • 2
    No, Unix systems do not generally use GNU utilities. GNU is an acronym for “GNU is not Unix”, even. Please stick to portable solutions, or, if you must give GNU-only solutions, state so, and, if at all possible, show an equivalent portable solution. – mirabilos Nov 14 '14 at 19:23
  • FreeBSD does not use GNU coreutils. dd on recent versions of FreeBSD do however support status=none for compatibility with GNU dd. – Stéphane Chazelas Apr 06 '17 at 15:37
5

Most answers so far assume that 1 byte = 1 character, which may not be the case if you are using a non-ASCII locale.

A slightly more robust way to do it:

testString=$(head -c 200 < "${filename}") &&
  printf '%s\n' "${testString:0:50}"

Note that this assumes:

  1. You are using ksh93, bash (or a recent zsh or mksh (though the only multi-byte charset supported by mksh is UTF-8 and only after set -o utf8-mode)) and a version of head that supports -c (most do nowadays, but not strictly standard).
  2. The current locale is set to the same encoding as the file (type locale charmap and file -- "$filename" to check that); if not, set it with ie. LC_ALL=en_US.UTF-8)
  3. I took the first 200 bytes of the file with head, assuming the worst-case UTF-8 where all the characters are encoded on at most 4 bytes. This should cover most cases I can think of.
Calimo
  • 280
  • Of course, this also assumes GNU head, or another implementation of it which adds the nōn-standard -c option. But you’re requiring GNU bash already. (Note: mksh’s UTF-8 mode could do this for UTF-8 encoded files.) I’d ask the OP if they require octets or multibyte characters, just “characters” is a vague/gerneric term. – mirabilos Nov 14 '14 at 15:02
  • That also assumes $filename or $testString doesn't contain blank newline or wildcards or start with -. – Stéphane Chazelas Nov 21 '14 at 10:32
  • The ${var:offset:length} construct you're using here actually comes from ksh93 and is also supported by recent versions of zsh (zsh has its own $testString[1,50]). You need ${testString:0:50} in ksh93 and zsh however. – Stéphane Chazelas Nov 21 '14 at 10:35
  • Just edited my answer to address the comments above – Calimo Apr 06 '17 at 15:15
2
grep -om1 "^.\{50\}" ${filename}

Other variant (for first line in file)

(IFS= read -r line <${filename}; echo ${line:0:50})
Costas
  • 14,916
  • 1
    This is abuse of high-level tools – and prone to not doing what you want, e.g. if they’re locale-aware. – mirabilos Nov 14 '14 at 09:19
  • @mirabilos What do you mean under high-level tools: read and echo ? Or bash expansion ? – Costas Nov 14 '14 at 10:06
  • grep (regexp), and yes, the use of shell here (hint: the first line may be large). (That being said, the bashism is also not in POSIX, but most shells implement that.) – mirabilos Nov 14 '14 at 11:51
1

1. For ASCII files, do like @DisplayName says:

head -c 50 file.txt

will print out the first 50 chars of file.txt, for example.

2. For binary data, use hexdump to print it out as hex chars:

hexdump -n 50 -v file.bin

will print out the first 50 bytes of file.bin, for example.

Note that without the -v verbose option, hexdump would replace repeated lines with an asterisk (*) instead. See here: https://superuser.com/questions/494245/what-does-an-asterisk-mean-in-hexdump-output/494613#494613.

1

To read and output 50 characters (not bytes), with zsh, you can do:

read -eu0 -k50 < $file

If the input contains sequences of bytes that don't form valid characters in the current locale, each of those bytes will be counted as one character.

  • -e: echoes what is read instead of storing it in a variable:
  • -k50: reads 50 characters. read -k was initially meant for reading key presses on the terminal (and would put the terminal in the correct mode to get one keypress at a time), but when used with -u<fd>, it reads characters from the corresponding file descriptor instead.
  • -u0 reads those characters from file descriptor 0 (stdin) which here we redirect from the file.
-1

You can use sed for this which will tackle the problem pretty easily

sed -E 's/^(.{0,50}).*/\1/' yourfile

-E allows us to use Extended regular expressions, instead of basic regular expressions, so we don't have to use backslashes to escape the more advanced regular expression operators.

s/x/y/ substitutes x with y in each line, where x is a regular expression and y is an expression which can contain literal values or references to capture groups.

^(.{0,50}) matches up to the first 50 characters of each line and marks it as a capture group.

.* matches the rest of the line (if there were more than 50 characters), since we want to replace the whole thing.

\1 is a backreference referring to the first capture group.

philraj
  • 421
munkeyoto
  • 107
  • 1
    Curious to know how this got downvoted if it solves the OP's question: "I only need the first 50 characters" This accomplishes what was requested without UUOC (Useless Use of Cat) – munkeyoto Nov 14 '14 at 15:28
  • 1
    This answer gives the first fifty characters of each line in the file, not just the first 50 of the file. Also doesn't print anything at all if all the lines are less than 50 characters long. Your solution would work better with sed -n -e '1s/^\(.\{50\}\).*/\1/p' ${filename} – doneal24 Nov 14 '14 at 16:22
  • Understood could have just: head -n 1 | sed -e 's/^(.{50}).*/\1/' ... And it would have solved the issue. OP stated: "only need the first 50 characters" – munkeyoto Nov 14 '14 at 16:42
  • 2
    Nope. If the first line is only 49 characters long it would output nothing. – doneal24 Nov 14 '14 at 18:03
  • 2
    Doug I understood this the first time around yet the OP mentioned nothing about printing if the line contained less than 50 chars, so I still fail to see your point, nor the point of this being downvoted since again it fell into what would have worked with head: head -n 1 ${filename} | sed -n -e '1s/^(.{50}).*/\1/p' – munkeyoto Nov 14 '14 at 18:15
  • 1
    If the file contains 100 lines, each being 10 characters long, then your solution prints nothing. The OP would like to see the first 5 lines of this file. In the case where the entire file is only 49 characters long it is unclear if the OP wants to see nothing or wants to see all characters up to a limit of 50 characters. – doneal24 Nov 14 '14 at 18:34
  • As a non-linux sed expert, yet a software developer, this answer, without a detailed explanation to accompany it, is ridiculous. And to use the word "easily" in how you describe the use of this command is equally comical. – Gabriel Staples Sep 28 '19 at 23:04
  • In my testing, .{n} in sed will capture lines even if they are less than n characters in length, even though that's not to spec. However, even if this weren't the case, this answer would only need the slight alteration of .{0,50} to do the job. The answer's score is a bit harsh. I'll submit an edit. – philraj Jul 06 '20 at 04:47
  • @GabrielStaples this answer (before I edited it) was no less confusing than most of the other answers, especially if you have a basic understanding of regular expressions, which is a very important skill to have as a developer, because they're so useful even if just searching through your code or making batch replacements. Calling it ridiculous was unwarranted. – philraj Jul 07 '20 at 14:43
  • 1
    @philraj, thanks for the clarification. That's very helpful for people. – Gabriel Staples Jul 11 '20 at 19:16
  • @munkeyoto, if you 1) point out in your answer that this solution returns the first 50 characters of each line, not just the first 50 chars of the entire file, and 2) also present an alternative solution, in addition to the one you currently have, to just print out the first 50 chars of the entire file, whether or not they span multiple lines, then I will upvote this answer. It has its merits for sure. – Gabriel Staples Jul 11 '20 at 19:18