Wednesday, February 20, 2013


Oracle Regular Expression
Regular expressions allow you to perform powerful context searches in variable-length strings. They provide a a powerful set of pattern matching capabilities by combining the following
Character Classes
are groups of possible characters at a point in the search
Collation Classes
are sets of characters and are treated like a range
Metacharacters
are operators that specify search algorithms
Metasequences
are operators created by two metacharacters or literals
Literals
are characters, character sets and words

Character Classes
Character classes are groups or ranges of possible characters, they are traditionally delimited by square brackets [], You use a dash "-" to specifiy a range i.e [1-9]. You can use the posix character class inside the brackets [:alpha:] see below for more examples
[:alnum:]
All alphanumeric characters
[:alpha:]
All alphabetic characters
[:cntrl:]
All non-printable control characters
[:digit:]
All numeric digits
[:graph:]
all [:digit:], [:lower:], [:punct:] and [:upper:] portable character class
[:lower:]
All lower alphabetic characters
[:print:]
All printable characters
[:punct:]
All punctuation characters
[:space:]
All nonprinting space characters
[:upper:]
All upper alphabetic characters
[:xdigit:]
All hexidecimal characters
Now for an example
Character class example:

DECLARE
  counter            NUMBER := 1;
  source_string   VARCHAR2(12)    := 'A1';
  pattern1           VARCHAR2(12)   := '[[:alpha:]]';  -- use the posix character class
  pattern2           VARCHAR2(12)   := '[[:alnum:]]';
BEGIN
  -- Compare using standard character class ranges.
  FOR i IN 1..LENGTH(source_string)
  LOOP
    IF REGEXP_INSTR(SUBSTR(source_string,counter,i),pattern1) = i
    THEN
        dbms_output.put(REGEXP_SUBSTR(SUBSTR(source_string,counter,i),pattern1));
    ELSE
        dbms_output.put_line(REGEXP_SUBSTR( SUBSTR(source_string,counter,i),pattern2));                                                           
    END IF;
    counter := counter + 1;
  END LOOP;
END;
/

This class is new and is designed to allow you to collate languages that require a collating element. You define a collation class by using [..]
Collation Classes
Collation class example
[a-[.ch.]]
Note: allows you to find element that is between an a or a ch

Metacharacters
A metacharacter provides some mechanics for performing pattern matching, lots of programming and scripting languages use the same syntax.
Metacharacter
Name
Type
Description
()
parentheses
Delimiter
act as a constraint on the scope of comparsion
{m}
exact
Interval
matches exactly m occurences of the preceding subexpression or character
{m, }
at least
Interval
matches at least m occurences of the preceding subexpression or character
{m,n}
between
Interval
matches exactly m occurences but no more than n of the preceding subexpression or character
|
OR
Logical
acts as a logical OR operator
.
dot
Matching
matches any one character
^
caret
Matching
matches the beginning of a line
$
dollar
Matching
matches the end of a line
[^]
caret
Negation
negates when used inside square brackets (like the NOT operator)
-
dash
Range
specifies a range when inside square brackets
?
question mark
Repetition
makes the proceding character optional
*
asterisk
Repetition
matches any instance of zero to many characters
+
plus
Repetition
matches at least once or many times the preceding character.
The best way to learn these is to google regular expression on the web, as i said before many programming languages and scripting languages use metacharacters.

Metasequences
Those of you who have used Perl before will under the metasequences, they are a character combined with a backslash to represent a pattern or characters.
Metasequence
Name
Type
Description
\n
backreference
Posix
matches the nth preceding subexpression
\d
digit
Perl
matches any digit
\D
nondigit
Perl
matches any non-digit
\w
word character
Perl
matches any word characters
\W
nonword character
Perl
matches any non-word characters
\s
whitespace character
Perl
matches any whitespace character
\S
nonwhitespace character
Perl
matches any non-whitespace character
\A
beginning of a string
Perl
matches a beginning of a string
\Z
end of a sting
Perl
matches the end of a string
\z
end of a string
Perl
matches the end of a string

Literals
Literals values are simply string literals, they may consist of one or many characters.

Oracle 11g Regular Expression
Oracle 11g has implemented a number of regular expression functions, so far there are 5 in total
Regular Expression Functions
REGEXP_COUNT
lets you count the number of times a specific pattern is found in a string
REGEXP_INSTR
lets you find a position index value in a string
REGEXP_LIKE
lets you find a regular expression match inside a string, it's like the LIKE operator
REGEXP_REPLACE
lets you find and replace a substring inside of a string
REGEXP_SUBSTR
lets you find a substring inside a string
Match Type Flags
i
sets the search to case-insensitive matching, overriding the nls_sort parameter
c
sets the search to case-sensitive matching, overriding the nls_sort parameter
n
Enables the dot (.) to truly match any character, including the newline
m
Enables a search to recognise multiple lines inside a string
x
sets the search to ignore any whitespace characters
Examples
REGEXP_COUNT
DECLARE
   mystring varchar2(24) := 'abcdefABCDEFabcdefABCDEF';
   mycount number;
BEGIN
   -- match a patterm - should result in 2
   select regexp_count(mystring, 'abcdef')
       into mycount
     from dual;
   dbms_output.put_line('Total Matches: ' || mycount);
   -- match a single pattern starting from position 7 - should result in 1
   select regexp_count(mystring, 'abcdef',7) 
       into mycount
     from dual;
   dbms_output.put_line('Total Matches: ' || mycount);
   -- match a pattern but ignore case - should result in 4
   select regexp_count(mystring, 'abcdef',1,'i')
      into mycount
     from dual;
   dbms_output.put_line('Total Matches: ' || mycount);
END;
/
REGEXP_INSTR
create table employees (emp_id        number
                                    ,emp_name  varchar2(30)
                                    );
insert
into employees values (1, 'Paul Valle');
select regexp_instr(emp_name,'Valle',1,1,0,'i') as Begin
    regexp_instr(emp_name,'Valle',1,1,1,'i') - 1 as End
from employees
REGEXP_LIKE
create table employees ( emp_id       number
                                    ,emp_name  varchar2(30)
                                    );
insert
into employees values (1, 'Paul Valle');
select emp_id
 from employees
where regexp_like(emp_name, 'Valle');
REGEXP_REPLACE
create table employees (emp_id       number
                                    ,emp_name  varchar2(30)
                                    );
insert
into employees values (1, 'Paul Valle');
select regexp_replace(emp_name, 'Paul', 'Lorraine',1)
  from employees;
REGEXP_SUBSTR
create table employees ( emp_id       number
                                     ,emp_name  varchar2(30)
                                    );
insert
into employees values (1, 'Paul Valle');
select regexp_substr(emp_name, 'V[:alpha:]+e',1)
  from employees;   -- using a character class


No comments:

Post a Comment