COMS 3157 Advanced Programming

Recitation 6: Pipes and Redirection in the Shell

In this recitation, we will look at how to apply redirection and pipes to various tasks when working in the shell. Though you are not expected to memorize each of these commands, what they do, and the flags they support, this exercise will hopefully inspire you to use some of these commands and their features in your daily workflows, in this course and beyond.

Before getting started, make sure you have a copy of our Wordle solutions. You can obtain those solutions following our Lab Workflow guide, or just by cloning a fresh copy of the skeleton repo (which now contains the solutions) somewhere outside of your ~/cs3157/ directory:

git clone ~j-hui/cs3157-pub/lab1
cd lab1/solutions/part2

6.1 Revisiting the basics

For the first part of this recitation, we will look at several commands you should have seen and used before by this point. To answer these questions, read each command’s man pages, and try running them yourself.

ls (“LiSt”): list the files in the current directory
- (6.1.1.1) What does the -l flag do?
- (6.1.1.2) What does the -a flag do?
- (6.1.1.3) The default behavior of ls is to list files horizontally, but if it detects that stdout isn’t a terminal (e.g., if you run it in a pipeline like ls | other-cmd), then it will list each file in a separate line. What flag can you pass to force ls to print each file on its own line?
echo: print arguments to stdout
- (6.1.2.1) What does echo "\n" print? Is it what you expect?
cat (“conCATenate”): concatenate file contents and print to stdout
- (6.1.3.1) What does cat normally take as its (non-flag) arguments?
- (6.1.3.2) What is the significance of the - argument?
- (6.1.3.3) What is the default behavior of cat?

6.2 Teaching an old `cat` new tricks

We normally just use cat to display the contents of text files, but can do far more that that, despite being an extremely simple shell utility.

(6.2.1) How can we use cat to copy a file? For instance, how do we do something like cp game.c game2.c, without using cp?
(6.2.2) How can we use cat to write to a text file? For instance, how can we write the following multi-line text into file named cat.txt, without using a text editor (e.g., vim)?
```
They say, you can't teach an old dog new tricks.
But maybe an old cat can do what an old dog cannot.

- John
```

6.3 Word count

wc is a handy utility used to count the number of lines, words, or bytes in files. Like cat, wc normally takes a list of file names as arguments, but treats the - file name specially.

wc normally reports all of those counts, but we can ask it to only report the number of lines by passing it the -l flag. For example, we can verify that there are indeed 1000 words in the common1000 words list:

$ wc -l < words/common1000
1000

(6.3.1) How long is the longest word in the words/common1000 list?
(6.3.2) What’s the difference between the following invocations of wc?
```
wc -l game.c
wc -l < game.c
cat game.c | wc -l
```
Are any of them equivalent?
(6.3.3) By default, echo adds a newline after whatever it prints; e.g., if you run echo | wc -l, wc will report that it counted 1 newline. What flag can you pass to echo to suppress that behavior?
(6.3.4) What arguments can you pass to echo to make it so that it prints two newlines? Verify using echo <args..> | wc -l.
(6.3.5) How many processes are running on CLAC right now? (Hint: running the ps -A command displays all running processes.)
(6.3.6) How many students have (or had) an account on CLAC? (Hint: every student’s home directory is in /students/.)

6.4 Who needs games when you’ve got `grep`?

grep is another staple of UNIX shell utilities, and is used to search through and filter lines of text. Like cat and wc, it can be used to search through a list of files, or through input coming from stdin.

For example, to search for any words that contain the letter m in our short list of words:

grep m words/short-list

Or to search for any words that contain the substring gre in common1000:

grep gre common1000

grep has a lot of features, which you can read about using man grep. One of the more useful flags is the -r “recursive” flag, which asks grep to search every file in every subdirectory:

grep -r hello

Another useful flag is -v, which inverts the grep query and only prints out lines that do not contain your search term.

(6.4.1) How words contain the letter m in the common1000 list?
(6.4.2) How many student UNIs on CLAC contain the number 2?
(6.4.3) How many instances of Vim are running on CLAC right now?
(6.4.4) In the Wordle solutions, how many times is print_words() used, and where? What about valid_word()? And what about the WORD_SIZE macro?

6.5 Real Wordle problems

If you haven’t already, compile the Wordle solutions by running make.

In shell syntax, you can expand the output of one command into another, using $(). For example, if you run:

wc -l $(echo game.c words.c)

echo game.c words.c outputs game.c words.c, so the above command will effectively evaluate:

wc -l game.c words.c

The run-tests.sh script uses this to automatically run each test case. For example, for the test case named hello1, it runs:

./wordle $(cat tests/hello1.test) < tests/hello1.in > test-output/hello1.out

The $(cat tests/hello1.test) expands to the arguments needed for the hello1 test case, while < tests/hello1.in redirects test input to ./wordle’s stdin.

(6.5.1) Write your own new test case for Wordle. You should not have to write your own output .out file. (Tip: you may find the tee command useful, but it’s not necessary.)
(6.5.2) What are the line counts of all files in the Wordle solutions tracked by Git? (Hint: git ls-files lists all files tracked by Git.)

There’s also a system-wide word list installed in /usr/share/dict/words, that (supposedly) contains every word in the English dictionary, with each word on its own line.

(6.5.3) How many lines are there in /usr/share/dict/words?
(6.5.4) /usr/share/dict/words contains duplicate entries to account for the fact that some words may be followed by 's. For example, it contains both meme and meme's. It does this so that spellcheckers using this file don’t need to consider these edge cases, even though humans would not really consider these distinct words. How many “actual” words are there in /usr/share/dict/words, i.e., those that don’t contain an apostrophe?
(6.5.5) What’s the longest “actual” word in /usr/share/dict/words?
(6.5.6) You can actually use /usr/share/dict/words as the words file for a game of Wordle. Give this a try.
(5.6.7) You can also use /usr/share/dict/words to “brute force” your way through a game of Wordle, provided you have unlimited guesses. Run ./wordle with the -g - flag to allow unlimited guesses, and use /usr/share/dict/words to feed guesses (whether valid or invalid) into your game.

Solutions

Note that the answers to some of these questions don’t actually matter (some of don’t even have a fixed answer). What’s more important are the commands you need to run to obtain those answers, and the thought process that goes into figuring out what command to run. What’s most important is understanding that thought process, and how you incorporate into your command-line workflow.

(6.1.1) See man ls/try running it yourself.
- (6.1.1.1) List files in long format, which also shows other metadata like the owner, permissions, and last accessed timestamp of each file.
- (6.1.1.2) List all files, including hidden ones (whose file names begin with a .).
- (6.1.1.3) The -1 flag will force each file to be printed on its own line. Use of this flag isn’t usually necessary, but generally considered good practice when writing shell scripts with ls.
(6.1.2) See man echo/try running it yourself.
- (6.1.2.1) echo "\n" just prints \n, not a newline. This might be surprising at first, but it’s important that echo sees "\n" as the characters \ and n, not as newline character. Use the -e flag to tell echo to interpret \n as a newline.
  
  By the way, we had to write "\n" because \ means something special in shell syntax, outside of the quotation marks.
(6.1.3) See man cat/try running it yourself.
- (6.1.3.1) cat takes file names as its non-flag argument, whose contents it concatenates.
- (6.1.3.2) The - argument tells cat to read from stdin instead of a file. It can be mixed with other arguments, e.g., cat myfile - will first output the contents of myfile, then whatever it reads from stdin.
- (6.1.3.3) If you run cat with no arguments, it defaults to reading from stdin, i.e., as if you had run cat -.
(6.2.1) cat game.c > game2.c reads from game.c, and prints it to stdout, which is redirected to game2.c effectively copying the contents of game.c into game2.c. cat < game.c > game2.c also works.
(6.2.2) If you run cat or cat -, cat will just read from stdin (i.e., what you type in with your keyboard), and print it to stdout. If you redirect stdout to a file, you essentially get a crude text editor. So you can do something like cat > cat.txt, and then type in the text.
(6.3.1) According to man wc, the -L flag counts the maximum line length, so either wc -L words/common1000 or wc -L < words/common1000 will work here. You should find that the longest word/line is 16.
(6.3.2) By running these commands, you’ll find that wc -l game.c prints out the name of the file (game.c), while the other two don’t; since they’re reading from stdin, there’s no actual “file” whose name it will show. wc -l < game.c and cat game.c | wc -l are equivalent, because both are using the contents of game.c as the stdin of wc -l. In fact, any cmd < file is always equivalent to cat file | cmd.
(6.3.3) According to man echo, you can use the -n flag to suppress the newline. You should find that echo -n | wc -l reports 0.
(6.3.4) According to man echo, the -e flag tells echo to interpret escape sequences like \n, so echo -e "\n" | wc -l will report 2 (one line for the interpreted \n, one line that echo adds by default).
(6.3.5) You can find out by running ps -A | wc -l, and subtracting 1 from the line count. You should run ps -A on its own first, to get an idea of its output format; it outputs a header row showing the meaning of each column, and then outputs a line per running process. If you want more information about those processes, you can run ps -A -f, which will tell you things like exactly what command was run to start that process, and which user ran it.
(6.3.6) Each student account has a home directory associated with it in /students/. You can use ls /students/ to list all the students home directories in /students/, so running ls -1 /students/ | wc -l will count the number of home directories (the -1 is not strictly necessary because ls will figure out that you are piping its output elsewhere, and automatically output each file name on a separate line). At this time of writing, the number was 466 (though not all of those accounts are active).
(6.4.1) grep m outputs all lines containing the letter m from stdin; grep m < words/common1000 | wc -l indicates that there are 134.
(6.4.2) You can use grep 2 as a filter for any lines containing the character 2, so ls -1 /students/ | grep 2 | wc -l will tell you the number of student UNIs that contain a 2. At this time of writing, there are 358 such UNIs. You can run ls -1 /students/ | grep 2 to see what they are.
(6.4.3) You can use ps -A to list all the processes, and grep vim to filter for only vim processes, so running ps -A | grep vim | wc -l counts the number of processes. You can even use ps -A -f | grep vim to peek at what files others are editing.
(6.4.4) You can look for these using the -r flag:
```
grep -r print_words
grep -r valid_word
grep -r WORD_SIZE
```
If grep tells you that you are getting matches in binary files, you can skip them using the -I flag.
(6.5.1) The tee command replicates input from stdin, to stdout and to each file you specify as an argument. I use it to capture the input I type and the output generated by Wordle.

Here’s what I did to create a new test. First, I wrote my mytest.test file, which contains the arguments I run wordle with (and I made sure to specify the -n argument to make sure that the test behaves deterministically, rather than choosing a random word). Then, I ran:
```
tee tests/mytest.in | ./wordle $(cat tests/mytest.test) | tee tests/mytest.out
```
tee tests/mytest.in captures the input that I type into stdin, and saves it in tests/mytest.in; tee tests/mytest.out captures the output emitted by Wordle, and saves it to tests/mytest.out, while also showing it on the screen.

Of course, this way of creating tests only works because I have working implementation of Wordle. You should always closely inspect the output of Wordle and to sure that it is what you expect. If you are writing your own test cases before you have a working solution, you will need to write these files manually.
(6.5.2) Running wc -l $(git ls-files) counts all the lines in the current repo; git ls-files expands to all the tracked files, which then all given to wc -l as parameters. Note that running this is not the same as running git ls-files | wc -l, which will just count the number of tracked files (i.e., the number of lines of git ls-files).
(6.5.3) There are lots of words. wc -l < /usr/share/dict/words tells us there are 104334 to be exact (for the current version of the dictionary).
(6.5.4) You can filter out lines containing ' using grep -v "'"; note that the quotation marks around ' are necessary because otherwise your shell will interpret ' as starting a string, in shell syntax. So the command to run here is grep -v "'" < /usr/share/dict/words | wc -l, which should give 74744 (for the current version of the dictionary).
(6.5.5) Building on 6.5.4, you can use wc -L to count the longest line, so grep -v "'" < /usr/share/dict/words | wc -L tells us the longest actual word is 22 characters long.
(6.5.6) Just run ./wordle -f /usr/share/dict/words (:
(6.5.7) You can use the words file to drive a “dictionary attack” on Wordle with ./wordle -g - -f words/common100 < /usr/share/dict/words. This should also work when you use -f /usr/share/dict/words.