Regular Expressions, those oddities that live between two forward slashes, are very powerful and quite mysterious. Staring at something like /([abcdef0123456789]+)/i all day can give you a heaadache. With a little luck and a bit of hard work, you’ll know exactly what the previous expression means.
For this post, all examples will be using Perl.
Text Search
A regular expression, or regex, in its simplest form is a text search. Here’s an example:
$var = "Hello World";
if ( $var =~ m/Hello/ ) {
print "Match\n";
}
In perl, the operator =~
is used to run a regex against a variable. The m/Hello/
will match if the variable has “Hello” anywhere.
To make the match case-insensitive, simply add an i after the last forward slash. So change the regex to m/Hello/i
to match “Hello”, “HeLlO” and “hello”.
Carets and Dollar Signs
A caret (^) at the beginning of a regex represents the beginning of a string. Here’s an example:
$var = "Hello World";
if ( $var =~ m/^Hello/ ) {
print "Match\n";
}
A dollar sign ($) at the end of a regex represents the end of a string. Another example:
$var = "Hello World";
if ( $var =~ m/World$/ ) {
print "Match\n";
}
If you want to match one of these special characters, put a backslash before it.
$var = "Hello^ $World";
if ( $var =~ m/e\^ \$W/ ) {
print "Match\n";
}
Braces
Putting a list of characters inside braces “[]” will match any of these characters.
$var = "Hello World";
if ( $var =~ m/[aeiou]/ ) {
print "There is a vowel.\n";
}
You can even tell if a string contains a hexadecimal value. This example uses the special character +. It means that the previous character must appear 1 or more times.
$var = "0x157afde";
if ( $var =~ m/^0x[0123456789abcdef]+$/ ) {
print "It is hexadecimal\n";
}
Within the braces, instead of listing every possible character, you can specify a range to be matched. For instance, 0-9 will match any digit 0 through 9. Here’s a slightly shorter example:
$var = "0x157afde";
if ( $var =~ m/^0x[0-9a-f]+$/ ) {
print "It is hexadecimal\n";
}
The caret (^) continues its job as a special character within braces. Putting one at the beginning of the braces will match anything but those listed inside the braces.
$var = "0x157afde";
if ( $var =~ m/^[^0-9a-fx]+$/ ) {
print "It is not hexadecimal\n";
}
Periods and Asterisks
A period (.) will match any character.
$var = "Hello World";
if ( $var =~ m/^H.llo W.+$/ ) {
print "Match\n";
}
An asterisks (*) is similar to a plus sign (+), but an asterisks will match 0 or more of the previous character.
$var = "Hello World";
if ( $var =~ m/^Hello .*$/ ) {
print "Saying hello\n";
}
Tomorrow
Tomorrow, more special characters, including white-space characters, non-whitespace characters, and matching parentheses.