Pcregrep to the rescue

Posted on ‐ Tagged #grep, #prcegrep, #search, #regex, #tip

I was faced with an issue today which seemed extremely simple at first, but proved to throw up one unexpected hurdle. The task seemed simple. A number of sites built by my employer needed to have a specific module updated, all I had to do was generate a list of candidates. Sounds easy enough, right?

I figured I’d just run (git) grep on all the Git repositories (I already had all repositories checked out on one of our virtual servers, but more on that in a future installment) to compile a list of sites running the old version and presto, my part in the endeavor was done.

The problem

I wouldn’t be writing this post if it was really that simple. Turns out the version information in the plugins’ files was split across multiple lines, and grep, being very much line-by-line based, wasn’t entirely cut out for the task. PCRE (Perl Compatible Regular Expressions) can span multiple lines, if you explicitly tell it to, but the manual page of grep states:

-P, --perl-regexp
   Interpret PATTERN as a Perl regular expression.  This is highly experimental and grep -P may warn of unimplemented features.

That just didn’t make me feel entirely confident about using it, so instead, I set out looking for another way.

Introducing pcregrep

After some suggestions on stackoverflow to use awk instead (which wasn’t going to work in my case, at least not nearly as easily), I stumbled across pcregrep instead. With pcregrep I could simply pass –multiline to let the PCRE engine match in multiline mode and voila, half an hour after kicking it off (there were a lot of files to go through) I had my list of files.

Pcregrep wasn’t something I’d heard of before, though having found it, I’m not at all surprised to see it exist. It is however one of those tools I think you should know about, in case you run into the same issue I did.