AppSense Regular Expression for Microsoft Office

I needed to add a new rule to AppSense recently on process start.  I wanted the rule to only run when a Microsoft Office application was run.

Now I usually eat regular expressions for breakfast (with some ketchup on top for good measure).  However I noticed that my regular expression wasn’t working in AppSense and it turned out to be the flavour of Regular Expression that it uses.

You see, I tend to use JavaScript regular expressions or .Net regular expressions.  But AppSense was presumably written in C++ and uses the CAtlRegExp regular expression of the ATL class which is…..lame.  Grouping syntax is different, and so is character matching syntax.

To test my regular expressions, rather than update the AppSense policy and wait for it to deploy to the machine, I just downloaded the regular expression tester from here.

So this was my first attempt – the MfcRegex tool said it was a successful match!  So I plonked it into AppSense:

.*\\Microsoft Office\\Office\d\d?\\((WINWORD)|(EXCEL)|(POWERPNT)|(MSACCESS)|(OUTLOOK)|(VISIO)|(WINPROJ))\.EXE$

But wait!  AppSense tries to be clever and escapes the brackets with preceding backslashes (I noticed this in the client debug logs), so this RegEx was failing because AppSense was evaluating it to this:

.*\\Microsoft Office\\Office\d\d?\\\(\(WINWORD\)|\(EXCEL\)|\(POWERPNT\)|\(MSACCESS\)|\(OUTLOOK\)|\(VISIO\)|\(WINPROJ\)\)\.EXE$

So by this point I was close to throwing my computer out of the window, until finally I used this syntax which works like a charm:

.*\\Microsoft Office\\Office\d\d?\\{WINWORD}|{EXCEL}|{POWERPNT}|{MSACCESS}|{OUTLOOK}|{VISIO}|{WINPROJ}\.EXE$

Notice that I have changed the brackets and slightly altered the syntax.  If you wanted to limit it to a specific version of Office (2010 in my case) you can use a regular expression similar to this:

.*\\Microsoft Office\\Office14\\((WINWORD)|(EXCEL)|(POWERPNT)|(MSACCESS)|(OUTLOOK)|(VISIO)|(WINPROJ))\.EXE$

 

 

Strip out style attributes in HTML

This post describes the process I use to strip out style attributes in HTML code using a regular expression.

My website is presenting data from a field in SharePoint.  This field uses HTML and CSS style attributes to construct the note.  A user would enter this data via a Sharepoint website, and my .Net website will present it elsewhere. The trouble is, when my site presents this data the message can look like a right mess.  Different fonts, different sizes and different colours (you’ve met those idiots before who like to use Comic Sans font in a professional environment, right?).  So before I present the data in a Literal control I decided to write a regular expression to strip out any style/class attributes etc.  And here is the .Net function (which I have in a class):

  //function to strip CSS styles etc from sharepoint notes
    public static string stripStyles(string message)
    {

        //replace non-ascii with empty string
        message = Regex.Replace(message, @"[^\u0000-\u007F]", string.Empty);

        //replace 3 or more BR with one BR
        message = Regex.Replace(message, "(?:\\s*<br[/\\s]*>\\s*){3,}", "");

        //remove any style attributes   
        message = Regex.Replace(message, "style=(\"|')[^(\"|')]*(\"|')", "");

        //remove any classe attributes
        message = Regex.Replace(message, "class=(\"|')[^(\"|')]*(\"|')", "");  

        //remove empty p tags
        message = Regex.Replace(message, "(<p>\\s*</p>|<p>\\s*​\\?</p>)", "");
        
        //remove font tags
        message = Regex.Replace(message, "</?(font)[^>]*>", "");

        return message;

    }

It won’t produce perfect results, because there are also uses of the <font> tag scattered about in these messages.  But I’m going to leave those alone for now since I suspect <font> tags may be used to highlight (bold/colour) certain words (auto-generated from the WYSIWYG editor).