Python Regular Expressions

Python Regular Expressions
Regular expressions are a very useful technique in extracting information from text such as code, spreadsheets, documents or log-files. The first thing to keep in mind while implementing regular expression is that everything essentially needs to be a character & programmers write patterns to match a specific sequence of characters/strings.

Defining Regular Expression

Regular expressions are characters in special order that help programmers find other sequence of characters or strings or set of strings using specialized syntax held in a pattern. Python supports regular expressions through the standard Python library – ‘re’ which is packed with every Python installation.

Here, we will be learning about the vital functions that are used to handle regular expressions. There are many characters having special meaning when they are used as regular expressions. This is mostly used in UNIX.

Raw Strings In Python

It is recommended to use raw-strings instead of regular strings. When programmers write regular expressions in Python, they begin raw strings with a special prefix ‘r’ and backslashes and special meta-characters in the string, that allows us to pass through them to regular-expression-engine directly.

match Function

This method is used to test whether a regular expression matches a specific string in Python. The re.match(). The function returns ‘none’ of the pattern doesn’t match or includes additional information about which part of the string the match was found.

Syntax:

re.match (pattern, string, flags=0)

Here, all the parts are explained below:

match(): is a method
pattern: this is the regular expression that uses meta-characters to describe what strings can be matched.
string: is used to search & match the pattern at the string’s initiation.
flags: programmers can identify different flags using bitwise operator ‘|’ (OR)

Example:

#!/usr/bin/python

 import re #simple structure of re.match()
 matchObject = re.match(pattern, input_str, flags=0)

A Program by USING re.match:

Example:

#!/usr/bin/python

 import re
 list = [ "mouse", "cat", "dog", "no-match"]
 # Loop starts here
 for elements in list:
  m = re.match("(d\w+) \W(d/w+)" , element)
  # Check for matching
  if m:
   print (m . groups ( ) )

In the above example the pattern uses meta-character to describe what strings it can match. Here ‘\w’ means word-character & + (plus) symbol denotes one-or-more.

Most of the regular expressions’ control technique comes to role when “patterns” are used.

search Function

It works in a different manner than that of match. Though both of them uses pattern; but ‘search’ attempts this at all possible starting points in the string. It scans through the input string and tries to match at any location.

Syntax:

re.search( pattern, strings, flags=0)

Program to show how it is used:

#!/usr/bin/python
 import re
 value = "cyberdyne"
 g = re.search("(dy.*)",  value)
  if g:
   print("search: " g.group(1))
 s = re.match("(vi.*)", value)
 if s:
  print("match:", m.group(1))

Output:

dyne

split Function

The re.split() accepts a pattern that specifies the delimiter. Using this, we can match pattern & separate text data. ‘split()” is also available directly on string & handles no regular expression.

Program to show how to use split():

Example:

#!/usr/bin/python
import re
 value = "two 2  four 4  six 6"
 #separate those non-digit characters
 res = re.split ("\D+" , value)
 # print the result
 for elements in res :
   print (element)

Output:

2
4
6

In the above program, \D+ represents one or more non-digit characters.

Coding Theory

Search This Blog