txt2re: Tool to relieve headaches caused by Regexp (Regular Expressions)

Posted Nov 25th, 2008 by David Calhoun in javascript, regular expressions

Since I’m in the middle of transitioning from beginner Javascript to what I consider intermediate Javascript, I’ve often been faced with the intimidating task of using Regular Expressions.

It’s no easy task.  The syntax is ridiculously difficult to read and probably harder to write, at least for the uninitiated (i.e. normal people like me).  Javascript guru Douglas Crockford agrees, as can be seen in part 4 of his video “The JavaScript Programming Language” and also in Chapter 7 of his book “Javascript: The Good Parts”.

A LOT of people, including myself, learn best by doing.  That means a lot of trial-and-error.  And a good way to learn by doing is by using the great online tool txt2re.

The interface has a lot to be desired, but it’s still very useful.  For example, I can type in the following string into a textbox on the site: Example “text”!

And I get this kind of bewildering and colorful output:

Notice my original text at the very top.  It took me a while to see it (due to the bad UI), but once I saw it, it started to make sense.  Notice that each character is contained in its own colored box.  By clicking the links below the characters, you can filter out specific stuff.  For instance, I can click on the quotation mark link ” below each quotation mark to filter.

txt2re will take my two new rules to filter out the quotation marks and spit back the Regex code in whatever code I so desire: Perl, PHP, Python, Java, Javascript, ColdFusion, C, C++, Ruby, VB, VBScript, J#.net, C#.net, C++.net, or VB.net.

I happen to only be interested in Javascript at the moment, so I click on the Javascript tab and get my code:

var txt='Example "text"!';

var re1='.*?';	// Non-greedy match on filler
var re2='(")';	// Any Single Character 1
var re3='.*?';	// Non-greedy match on filler
var re4='(")';	// Any Single Character 2

var p = new RegExp(re1+re2+re3+re4,["i"]);
var m = p.exec(txt);
if (m.length>0)
{
var c1=m[1];
var c2=m[2];
document.write("("+c1.replace(/<!--,"<")+")"+"("+c2.replace(/</,"<")+")"+"\n");<br /--> }

One of the things I especially love about this is that the Regexp doesn’t combine the expression on one line, making it infinitely harder to decode.  Instead it takes each rule you specified and leaves them separated (notice the re1, re2, re3, re4, in the code).

By leaving them separated AND commented, you can, in theory, go back later and manually change your Regexp if necessary.  Intuitively, keeping the rules separated increases maintainability.

Or if you want, you can put all the rules on one line.  Just try to maintain and modify this code without having it break!

Image courtesy of cackhanded, from Douglas Crockford’s presentation “The JavaScript Programming Language” (Part 4)

Trackback URI | Comments RSS

Leave a Reply

Categories