Google
 

Sunday, December 21, 2008

Regular Expressions Cheat Sheet (V2)

This post doesn't have a lot to do with PerformancePoint specifically, however there are lots of possibilities to use Regular Expressions to solve some of your business requirements.

With the previously blogged-about plugin for Excel (morefunc) you have the ability to use Regular Expressions as formulas in Excel.  You can also use Regular expressions in Excel with some VBA code.

What are Regular Expressions?  They're anything but regular.  They allow you to design expressions to apply to tasks such as searching for patterns of data, data validation, etc.  Rather than just searching for the @ sign to find a "possible" email address, you could use regular expressions to find an exact interpretation of an email address with this regular expression.

\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

You could also use this as a data validation rule to ensure that you are cleanly entering email addresses into a form for later import into the application using CSV.

These expressions look very cryptic until you get the gist of the syntax.

These types of tools could be handy for cleansing data before they enter the PerformancePoint staging database or within an input form.  They could also be useful for mapping and profiling data.

For example, labels are required to be no more than 40 characters in length, member properties are restricted to 100 characters, and there are a series of restricted characters and reserved words.  You could design a regular expression that checks for these values within your data before it is loaded into the system.  You could build something to do this in Excel or use SSIS, or even in a stored procedure before moving the data into a label table.

Another application would be to "tokenize" or split a pattern into a comma-delimited string. (eg. Account-Dept-Location), Lexical Analysis in geekspeak terms.

http://effbot.org/zone/xml-scanner.htm

http://en.csharp-online.net/CSharp_Regular_Expression_Recipes%E2%80%94A_Better_Tokenizer

It is easy to tokenize a value with 3 hyphens without learning regular expressions, but what if you needed to parse something like this to get the numbers out?

123ab2c
1234abc

Regular expressions might be useful...

How about a search/replace, but with intelligence?

http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.replace.aspx

Within SSIS, you can use this to perform data profiling and to classify data into buckets.

SSIS Regex Component
SSIS Regular Expression Transformation

You can use it in SQL

SQL and Regular Expressions
Regular Expressions with PATINDEX in SQL
Regular Expressions with OACREATE in SQL
Working with crummy data in SQL

It's about as handy as knowing XSLT and MDX

Regex is as forgettable as XSLT and MDX

The most useful tools for learning Regular Expressions:

Regulator and Regulazy

Some 3rd-party implementations of Regex:

Regular Expressions in Matlab
Regular Expressions in SAP Routines (Converting dates, thousands separator,

A deep dive into Regular Expressions.

Regular Expressions in Wikipedia.
All you ever need to know from 4guysfromrolla

I can see this one being useful to translate the XML format that PEL rules are stored in back to a readable text interpretation... without using an HTML parser.

Removing All HTML Tags from a String Using Regular Expressions

http://www.4guysfromrolla.com/webtech/073000-1.shtml

These type of statements are great for enforcing some of your business rules/validation logic using a data-driven approach. 

UPDATE updateTable
SET ut.field1 = dbo.regexReplace( ut.field1, rt.regex, rt.replace, 1, 1 )
,ut.field2 = dbo.regexReplace( ut.field2, rt.regex, rt.replace, 1, 1 )
FROM updateTable AS ut
CROSS JOIN regexTable AS rt

DECLARE @strTest varchar(8000) SELECT @strTest = '\\servername\foldername99999\filename.ext' SELECT [Position]=PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', @strTest), [Before]=SUBSTRING(@strTest, 1, PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', @strTest)-1), [After]=SUBSTRING(@strTest, PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', @strTest), LEN(@strTest))

Regular Expressions to decompose MDX query

Regular expressions can be useful to decompose a MDX query (with elements explicitly defined, i.e. not using MDX functions) and retrieve involved objects (i.e. dimensions, levels, members, etc.). I often used them in performance improvement processes or for MDX pre-validation purpose.

How about being able to parse your calculated measures out of the MDX script used by the cube?

http://geekswithblogs.net/darrengosbell/archive/2006/08/10/87561.aspx

Or building some pattern filtering functionality using MDX stored procedures?

http://blogs.conchango.com/christianwade/archive/2006/04/30/3896.aspx

There's nothing stopping you from using the System.Text.RegularExpressions libary in Reporting Services functions either.

http://msdn.microsoft.com/en-us/library/ms157328.aspx

Here is a good resource to better understand Regular Expressions.  There are plenty of possibilities - just make sure when you get this hammer, everything doesn't look like a nail.

Regular Expressions Cheat Sheet The Regular Expressions cheat sheet is a one-page reference sheet. It is a guide to patterns in regular expressions, and is not specific to any single language.

Regular Expressions Cheat Sheet (V2) - Cheat Sheets - Added Bytes

No comments: