the blog for developers

Fluent Interface for Regular Expressions

Some years ago when we migrated Radeox – a wiki engine – to a new Regular Expression engine, we had to change lots of Regular Expressions. Today I would do it in a different way, with Fluent Interfaces. Joshua has a solution for building Regular expressions with a Fluent Interface, about which I wrote before.

Regex socialSecurityNumberCheck =
  new Regex(Pattern.With.AtBeginning
    .Digit.Repeat.Exactly(3)
    .Literal("-").Repeat.Optional
    .Digit.Repeat.Exactly(2)
    .Literal("-").Repeat.Optional
    .Digit.Repeat.Exactly(4)
    .AtEnd);

Not only is this solution easier to understand than using a String for regular expressions, migrating from ORO Regex to JDK Regex for example would be easy with this solution. The solution we chose was to create two engine implementations for ORO and JDK regex: Compiler (JdkCompiler,OroCompiler), MatchResult (JdkMatchResult,OroMatchResult), Matcher (JdkMatcher,OroMatcher) and so on. For example:

public class OroCompiler extends Compiler {
  private Perl5Compiler internalCompiler;

  private boolean multiline;

  public OroCompiler() {
    this.internalCompiler = new Perl5Compiler();
  }

  public void setMultiline(boolean multiline) {
    this.multiline = multiline;
  }

  public Pattern compile(String regex) {
    org.apache.oro.text.regex.Pattern pattern = null;
    try {
      pattern = internalCompiler.compile(regex,
         (multiline ? Perl5Compiler.MULTILINE_MASK : Perl5Compiler.SINGLELINE_MASK) | Perl5Compiler.READ_ONLY_MASK);
    } catch (MalformedPatternException e) {
      // handle later
    }

    return new OroPattern(regex, multiline, pattern);
  }
}

The actual regular expressions are stored in Java .properties files, to make them easily exchangable. The bold expression looks like this for example:

filter.bold.match=(^|>|[\\p{Punct}\\p{Space}]+)__(.*?)__([\\p{Punct}\\p{Space}]+|<|$)

With the migration of Radeox to a new regex implementation I had to write and implement several interfaces and it wasn't very easy to both support ORO and the JDK implementation. It wan't easy either to change the slightly different pattern dialact from ORO to JDK. With a Fluent Interface, just change the Fluent Interface, no need to think about different Java Regex APIs and to think about different syntaxes.

Thanks for listening.

You can leave a Reply here. Of course, you should follow me on twitter here.

You can share this post!
Do you want to tell others about this article? Use the social bookmark icons to submit this artice to the service of your choice. Thanks.

About the author: Stephan Schmidt is head of development at brands4friends. He has more than 15 years of internet technology experience and 10 years experience in agile. He was head of development, consultant and CTO and is a speaker, author and blog writer. He specializes in organizing and optimizing software development helping companies by increasing productivity with lean software development and agile methodologies. Want to know more? All views are only his own.
Leave a reply.

Leave a Reply

What people wrote somewhere else:

Additional comments powered by BackType