Piping Basics

Apr 13, 2012

In a UNIX environment, if you wanted to find all files last modified in April, you might do:

ls -l | grep Apr

The first part of this command, ls -l returns a listing like:

-rw-r--r--  1 karl  us   2573 Jan 20 09:44 2011-1-10-Why-Markdown-And-Why-Not-Word.html
-rw-r--r--  1 karl  us   8119 Mar 30 16:52 2011-7-5-Rethink-your-Data-Model.html
-rw-r--r--  1 karl  us   3594 Mar 30 16:49 2012-02-03-Node-Require-and-Exports.html
-rw-r--r--  1 karl  us   2816 Apr  3 18:32 2012-4-2-Is-Kindle-The-Next-Rim.html
-rw-r--r--  1 karl  us   3504 Apr  4 19:04 2012-4-4-You-Really-Should-Log-Client-Side-Error.html

The second part, grep Apr will filter out lines passed into its standard input (think doing a Console.ReadLine from code) based on the provided pattern (in this case Apr). The pipe operator | redirects the output of one command into the input of the other. Therefore, the above output from ls becomes the input for grep.

To better understand this, let's look at something that won't work. Say you wanted to delete all markdown files. You might be tempted to try:

find . -iname "*.md" | rm

The first part does what we expect it to do, it finds all files with a markdown extension. However, piping find's output to rm doesn't do anything other than display the help message for rm. Why is that? Remember, pipe redirects a program's output to another program's input. rm however does not work via standard input. It works via the command-line. In C#, that's the difference between Console.ReadLine() and using the args[] parameter.

The solution to this problem is to use a special utility which converts standard input into a command-line. This is what xargs does. Unfortunately, xargs can be quite different from platform to platform, but all we need right now is the simplest thing:

find . -iname "*.md" | xargs rm

If you've been following along, you can guess that xargs takes data from the standard input (hence data can be piped to it) and converts that to command-line parameters for whatever program you specify (rm in this case).

How do you know if a program takes data from standard input vs the command-line? Well, if you look at the help message from grep, you'll see that it takes its input from [FILE], whereas rm takes it from file. It's a subtle difference.

To wrap it up, we can also look at the redirection operator >. Rather than sending standard output to standard input like pipe, the redirection operator sends the standard output to a file, overwriting any previous values (you can append using >>). If we wanted to save the list of markdown files (rather than delete them), we'd do:

find . -iname "*.md" > markdown.list