2 May 2013

.NET Regex - Match something that's NOT there, with a Regular Expression Negative LookAhead Assertion

I wanted to search through a bunch of html and stick the text "http://" in wherever I detect an anchor element where the "http" bit is missing from the URL in the href.

Naturally, I want a RegEx.Replace, but here's the thing. What I want to do here is match something that is not there, i.e. the absence of the "http" in the URL. I was dimly aware of doing something like this before in Perl but didn't know how to do it in .NET.

So it turns out, what I wanted is called a Zero-Width Negative LookAhead Assertion. You define it in your pattern with a (?!MyString) like so.

text = Regex.Replace(text, @"<a href=""(?!http)", @"<a href=""http://");

The Zero-Width part means that whatever it asserts, it doesn't appear in the matched text. The Negative LookAhead part means that it looks ahead and evaluates to TRUE if the given string is NOT found. (Positive LookAhead returns TRUE if the text IS found).

More info can be found here: http://msdn.microsoft.com/en-us/library/az24scfc.aspx
If I helped you out today, you can buy me a beer below. Cheers!