Questions tagged [text-processing]

Manipulation or examining of text by programs, scripts, etc.

Unix systems tend to favor text files, often consisting of one record per line. Most unix configuration files are text files. Unix systems come with many tools to manipulate such files. Most tools process the file in a stream: read a line, process it, emit the corresponding output; this makes it possible to chain scripts with pipes.

Use this tag when your question is about processing text files and you're not sure which tool to use. If your question is about a specific tool, use its tag. If your question is about multiple tools, include this tag and the tags for the other tools.

When asking a text processing question, you should always

  • Explain the task you need to do
  • include a reasonable part of your input file (preformatted by indenting with four whitespaces)
  • include the expected output for this input data (also formatted)
  • give your attempt to solve the problem and what didn't work (this is not to embarrass you, it helps to give an explanation for the solution, so you'll learn to help yourself next time)

Text processing utilities

  • a simple line-by-line text processor, mostly used for regexp substitutions.
  • a scripting language dedicated to text file processing

Text processing often involves combining many single-purpose tools, such as:

  • select fields on each line
  • compare two files line by line
  • search a pattern in text files
  • show the first few lines of a file
  • display binary files in decimal, octal or hexadecimal
  • sort lines or fields alphabetically
  • split a file into fixed-size pieces
  • show the last few lines of a file; tail -f keeps the file open in case more data arrives
  • replicate the output of a command and send it to several destinations

For a list of many text utilities and more, check out busybox commands or GNU coreutils.

Other related tags

  • text processing is usually performed by shell scripts that calls the tools described above
  • many tasks require chaining several tools
  • the collection of GNU utilities (text processing and others), for regular Linux systems
  • a collection of utilities (text processing and others) for embedded Linux systems
  • when the going gets tough, it's better to switch to more general languages

Further reading

8413 questions
215
votes
7 answers

How can I wrap text at a certain column size?

I know that I can use something like cat test.txt | pr -w 80 to wrap lines to 80 characters wide, but that puts a lot of space on the top and bottom of the printed lines and it does not work right on some systems What's the best way to force a text…
cwd
  • 45,389
163
votes
13 answers

How do I remove the first 300 million lines from a 700 GB txt file on a system with 1 TB disk space?

How do I remove the first 300 million lines from a 700 GB text file on a system with 1 TB disk space total, with 300 GB available?  (My system has 2 GB of memory.)  The answers I found use sed, tail, head: How do I delete the first n lines of a…
Kris
  • 1,283
142
votes
23 answers

Is there a way to get the min, max, median, and average of a list of numbers in a single command?

I have a list of numbers in a file, one per line. How can I get the minimum, maximum, median and average values? I want to use the results in a bash script. Although my immediate situation is for integers, a solution for floating-point numbers…
Peter.O
  • 32,916
135
votes
20 answers

How to count the number of a specific character in each line?

I was wondering how to count the number of a specific character in each line by some text processing utilities? For example, to count " in each line of the following text "hello!" Thank you! The first line has two, and the second line has 0.…
Tim
  • 101,790
112
votes
8 answers

Show all the file up to the match

grep --before-context 5 shows 5 lines before the match. I want to show everything before the match. Doing grep --before-context 99999999 would work but it is not very... professional. How to show all the file up to the match?
101
votes
11 answers

Remove last character from line

I want to remove last character from a line: [root@ozzesh ~]#df -h | awk '{ print $5 }' Use% 22% 1% 1% 59% 51% 63% 5% Expected result: Use 22 1 1 59 51 63 5
Özzesh
  • 3,669
75
votes
5 answers

Print file content without the first and last lines

Is there a simple way I can echo a file, skipping the first and last lines? I was looking at piping from head into tail, but for those it seems like I would have to know the total lines from the outset. I was also looking at split, but I don't see a…
user394
  • 14,404
  • 21
  • 67
  • 93
74
votes
2 answers

Find any lines exceeding a certain length

Is it possible to find any lines in a file that exceed 79 characters?
rowantran
  • 1,865
66
votes
4 answers

Printing every Nth line out of a large file into a new file

I am trying to print every Nth line out of a file with more than 300,000 records into a new file. This has to happen every Nth record until it reaches the end of the file.
Terisa
  • 707
42
votes
5 answers

Is there a command line spell to drop a column in a CSV-file?

Having a file of the following contents: 1111,2222,3333,4444 aaaa,bbbb,cccc,dddd I seek to get a file equal to the original but lacking a n-th column like, for n = 2 (or may it be 3) 1111,2222,4444 aaaa,bbbb,dddd or, for n = 0 (or may it be…
Ivan
  • 17,708
39
votes
7 answers

How to count the times a specific character appears in a file?

For example, we want to count all quote (") characters; we just worry if files have more quotes than it should. For…
yael
  • 13,106
38
votes
3 answers

How to find unmatched brackets in a text file?

Today I learned that I can use perl -c filename to find unmatched curly brackets {} in arbitrary files, not necessarily Perl scripts. The problem is, it doesn't work with other types of brackets () [] and maybe <>. I also had experiments with…
phunehehe
  • 20,240
34
votes
6 answers

Insert a new line after every N lines?

How can I use text-processing tools to insert a new line after every N lines? Example for N=2: INPUT: sadf asdf yxcv cxv eqrt asdf OUTPUT: sadf asdf yxcv cxv eqrt asdf
LanceBaynes
  • 40,135
  • 97
  • 255
  • 351
32
votes
2 answers

Horizontal file concatenation

Is there a Linux command like cat that joins files with the same number of lines horizontally?
user4518
30
votes
1 answer

reformatting output with aligned columns

Possible Duplicate: A shell tool to “tablify” input data I have a program outputting something like this: abc defgh ijklm nopqr stu vw xyza bcde fghi which I'd like to clean up to get something like this: abc defgh ijklm nopqr stu …
gregseth
  • 516
1
2 3
56 57