Follow-Up on Command-Line Finding and Filtering

A simpler solution that doesn’t require tr… if you have GNU utils or other alternatives.

Assumed audience: 90% myself in the future, when I (inevitably) ask this question again—but also anyone else who hits this particular question about command-line invocations.

Epistemic status: Slightly higher than the previous post on the subject, courtesy of the requested reader feedback!

In my previous post, I used the tr utility to deal with needing to transform newlines into null characters. However, as I hoped when I put a request for a better way to do it in my Epistemic Status qualifier, a reader emailed me with a better solution!

If you’re using the GNU version of grep, it has a --null-data (shortened as -z) flag which makes grep treat its input as null-character-separated. You can combine that with the -print0 flag to find to get the same results as I got with tr (presumably with better performance because it doesn’t require doing the replacement in another tool):

$ find notes -name ".md" -print0 |\
  grep --null-data "notes/2020" |\
  xargs -0 wc -w

This reminded me that ripgrep has the same feature, with the same --null-data flag. Similarly, fd has a --print0 (-0) option. You can combine these and (if you like) cw1 to get the same effect:

$ fd notes --print0 ".md" notes |\
  rg --null-data 'notes/2020' |\
  xargs -0 cw -w

Huzzah for versions of tools that understand these things and make this simpler than the solution I posted yesterday (and thanks to my reader for sending in that note)!


Notes

  1. cw is nice because with especially large sets of data, the fact that you can invoke across threads becomes very handy. If I word-count all of my notes with it (currently 667 files and just shy of 150,000 words), using 4 threads instead of 1 (the default, and all you get with wc) takes about 6 – 8 milliseconds off the run time. Not important at this scale… but if you’re dealing with very large amounts of data, it might be. ↩︎