Friday, September 12, 2008

Regex to strip spaces and SGML tags


A DBA coworker wanted some regex to strip SGML tags and spaces from a database for some reports he was doing.
He wanted to turn this: 
BiScO<sub>3</sub>:<html_ent glyph=”@nbsp;” ascii=” “></html_ent> Centrosymmetric BiMnO<sub>3</sub>-type Oxide
into:
BiScO3:CentrosymmetricBiMnO3-typeOxide
So, I came up with this pattern: <(.|\n)*?>|\s and tested it at http://www.regextester.com.
Apparently, in Oracle 10g, you can use the REGEXP_REPLACE function to do some nifty lookups and transformations.

0 comments:

Post a Comment