Python Regular Expressions
Regular expressions are a very useful technique in extracting information from text such as code, spreadsheets, documents or log-files. The first thing to keep in mind while implementing regular expression is that everything essentially needs to be a character & programmers write patterns to match a specific sequence of characters/strings.
Regular expressions are a very useful technique in extracting information from text such as code, spreadsheets, documents or log-files. The first thing to keep in mind while implementing regular expression is that everything essentially needs to be a character & programmers write patterns to match a specific sequence of characters/strings.
Defining Regular Expression
Regular expressions are characters in special order that help programmers find other sequence of characters or strings or set of strings using specialized syntax held in a pattern. Python supports regular expressions through the standard Python library – ‘re’ which is packed with every Python installation.
Here, we will be learning about the vital functions that are used to handle regular expressions. There are many characters having special meaning when they are used as regular expressions. This is mostly used in UNIX.
Raw Strings In Python
It is recommended to use raw-strings instead of regular strings. When programmers write regular expressions in Python, they begin raw strings with a special prefix ‘r’ and backslashes and special meta-characters in the string, that allows us to pass through them to regular-expression-engine directly.
match Function
This method is used to test whether a regular expression matches a specific string in Python. The re.match(). The function returns ‘none’ of the pattern doesn’t match or includes additional information about which part of the string the match was found.
Syntax:
re.match (pattern, string, flags=0)
Here, all the parts are explained below:
Example:
#!/usr/bin/python
import re #simple structure of re.match()
matchObject = re.match(pattern, input_str, flags=0)
A Program by USING re.match:
Example:
#!/usr/bin/python
import re
list = [ "mouse", "cat", "dog", "no-match"]
# Loop starts here
for elements in list:
m = re.match("(d\w+) \W(d/w+)" , element)
# Check for matching
if m:
print (m . groups ( ) )
In the above example the pattern uses meta-character to describe what strings it can match. Here ‘\w’ means word-character & + (plus) symbol denotes one-or-more.
Most of the regular expressions’ control technique comes to role when “patterns” are used.
search Function
It works in a different manner than that of match. Though both of them uses pattern; but ‘search’ attempts this at all possible starting points in the string. It scans through the input string and tries to match at any location.
Syntax:
re.search( pattern, strings, flags=0)
Program to show how it is used:
#!/usr/bin/python
import re
value = "cyberdyne"
g = re.search("(dy.*)", value)
if g:
print("search: " g.group(1))
s = re.match("(vi.*)", value)
if s:
print("match:", m.group(1))
Output:
dyne
split Function
The re.split() accepts a pattern that specifies the delimiter. Using this, we can match pattern & separate text data. ‘split()” is also available directly on string & handles no regular expression.
Program to show how to use split():
Example:
#!/usr/bin/python
import re
value = "two 2 four 4 six 6"
#separate those non-digit characters
res = re.split ("\D+" , value)
# print the result
for elements in res :
print (element)
Output:
2
4
6
In the above program, \D+ represents one or more non-digit characters.
Comments
Post a Comment