Stream Manipulation With grep, sed, and xargs
It’s not unusual for a developer or data analyst to have the need to modify files with similar pattern across all board. Scenario: Say you have 100 files in your project that contain the text “ORANGE” and you want to replace them all into “APPLE.” A novice approach would be to manually open each file, look for ORANGE, and replace with APPLE. But we can do better with the following UNIX command:
grep -rl "ORANGE" . | xargs sed -i '' 's/ORANGE/APPLE/g'
Let’s see what’s going on here:
- grep is UNIX command for searching text in files: ‘-r’ will search through all files recursively; ‘-l’ will force only file names to be return; “ORANGE” is our searching pattern; and finally ‘.’ specify where we are searching from, in this case, starting at current location
- xargs: grep will usually output result to stdout. But this time, we will pipe (‘|’) it to xargs command, which turn the result into an array and invoke a new Unix command on each of the array element
- sed (or stream editor), will take each of those file name and apply a regular expression search/replace pattern to it; ‘-i’ option will allow in-place editing of files and making an extention back-up. In this case, -i is passed with ” or there will be no back up (Make sure you know what you are doing here because it’s irreversible. ‘g’ option makes the search global throughout the stream;
And that’s it. There, we have successfully replaced 100 files with one simple command.
Bonus Scenario:
There are many .tmp files laying around in your project. How to delete them all?
find . -iname "*.tmp" | xargs rm
Super Bonus Scenario: This question comes up often in software developer’s job interview question: “How to extract out phone numbers in many huge files?” My approach would be to use grep with clever regular expressions. How complicated does the pattern matching have to be in order to avoid under-match and over-match? That’d open an interesting conversation between the interviewer and the interviewee.