Python re module (regular expressions)
2020-12-13 03:10
标签:中转 including imp group span import 额外 err element regular expressions (RE) 简介 re模块是python中处理正在表达式的一个模块 1.compile() 2. findall(pattern, string, flags=0) 3. match(pattern, string, flags=0) 从字符串的开头进行匹配, 匹配成功就返回一个匹配对象,匹配失败就返回None flags的几种值: 4. search(pattern, string, flags=0) 浏览整个字符串去匹配第一个,未匹配成功返回None 5. sub(pattern,repl,string,count=0,flags=0) 替换匹配成功的指定位置字符串 6. split(pattern,string,maxsplit=0,flags=0) 7. group()与groups() 匹配对象的两个主要方法: group() 返回所有匹配对象,或返回某个特定子组,如果没有子组,返回全部匹配对象 groups() 返回一个包含唯一或所有子组的的元组,如果没有子组,返回空元组 Python re module (regular expressions) 标签:中转 including imp group span import 额外 err element 原文地址:https://www.cnblogs.com/51try-again/p/10255210.html 1 r"""Support for regular expressions (RE).
2
3 This module provides regular expression matching operations similar to
4 those found in Perl. It supports both 8-bit and Unicode strings; both
5 the pattern and the strings being processed can contain null bytes and
6 characters outside the US ASCII range.
7
8 Regular expressions can contain both special and ordinary characters.
9 Most ordinary characters, like "A", "a", or "0", are the simplest
10 regular expressions; they simply match themselves. You can
11 concatenate ordinary characters, so last matches the string ‘last‘.
12
13 The special characters are:
14 "." Matches any character except a newline.
15 "^" Matches the start of the string.
16 "$" Matches the end of the string or just before the newline at
17 the end of the string.
18 "*" Matches 0 or more (greedy) repetitions of the preceding RE.
19 Greedy means that it will match as many repetitions as possible.
20 "+" Matches 1 or more (greedy) repetitions of the preceding RE.
21 "?" Matches 0 or 1 (greedy) of the preceding RE.
22 *?,+?,?? Non-greedy versions of the previous three special characters.
23 {m,n} Matches from m to n repetitions of the preceding RE.
24 {m,n}? Non-greedy version of the above.
25 "\\" Either escapes special characters or signals a special sequence.
26 [] Indicates a set of characters.
27 A "^" as the first character indicates a complementing set.
28 "|" A|B, creates an RE that will match either A or B.
29 (...) Matches the RE inside the parentheses.
30 The contents can be retrieved or matched later in the string.
31 (?aiLmsux) Set the A, I, L, M, S, U, or X flag for the RE (see below).
32 (?:...) Non-grouping version of regular parentheses.
33 (?P
虽然在Python 中使用正则表达式有几个步骤,但每一步都相当简单。
1.用import re 导入正则表达式模块。
2.用re.compile()函数创建一个Regex 对象(记得使用原始字符串)。
3.向Regex 对象的search()方法传入想查找的字符串。它返回一个Match 对象。
4.调用Match 对象的group()方法,返回实际匹配文本的字符串。
向re.compile()传递原始字符串
Python 中转义字符使用倒斜杠(\)。字符串‘\n‘表示一个换行字符,
而不是倒斜杠加上一个小写的n。你需要输入转义字符\\,才能打印出一个倒斜杠。
所以‘\\n‘表示一个倒斜杠加上一个小写的n。但是,通过在字符串的第一个引号之
前加上r,可以将该字符串标记为原始字符串,它不包括转义字符。
因为正则表达式常常使用倒斜杠,向re.compile()函数传入原始字符串就很方
便, 而不是输入额外得到斜杠。
输入r‘\d\d\d-\d\d\d-\d\d\d\d‘ ,
比输入‘\\d\\d\\d-\\d\\d\\d-\\d\\d\\d\\d‘要容易得多。
def compile(pattern, flags=0):
"Compile a regular expression pattern, returning a pattern object."
return _compile(pattern, flags)
def findall(pattern, string, flags=0):
"""Return a list of all non-overlapping matches in the string.
If one or more capturing groups are present in the pattern, return
a list of groups; this will be a list of tuples if the pattern
has more than one group.
Empty matches are included in the result."""
return _compile(pattern, flags).findall(string)
def match(pattern, string, flags=0):
"""Try to apply the pattern at the start of the string, returning
a match object, or None if no match was found."""
return _compile(pattern, flags).match(string)
def search(pattern, string, flags=0):
"""Scan through string looking for a match to the pattern, returning
a match object, or None if no match was found."""
return _compile(pattern, flags).search(string)
search() vs. match()
# Python offers two different primitive operations based on regular expressions:
# re.match() checks for a match only at the beginning of the string,
# while re.search() checks for a match anywhere in the string (this is what Perl does by default).
>>> re.match("c", "abcdef") # No match
>>> re.search("c", "abcdef") # Match
<_sre.sre_match object at ...>
# Regular expressions beginning with ‘^‘ can be used with search() to restrict the match at the beginning of the string:
# re.match(‘str‘, "string") 等价于 re.search(‘^str‘, "string")
>>> re.match("c", "abcdef") # No match
>>> re.search("^c", "abcdef") # No match
>>> re.search("^a", "abcdef") # Match
<_sre.sre_match object at ...>
# Note however that in MULTILINE mode match() only matches at the beginning of the string,
# whereas using search() with a regular expression beginning with ‘^‘ will match at the beginning of each line.
# 多行匹配 模式 对 match() 无效
# 带^的正则匹配 search() 在 多行匹配 模式下,会去 字符串的每一行 匹配 要查找的字符或字符串
>>> re.match(‘X‘, ‘A\nB\nX‘, re.MULTILINE) # No match
>>> re.search(‘^X‘, ‘A\nB\nX‘, re.MULTILINE) # Match
<_sre.sre_match object at ...>
def sub(pattern, repl, string, count=0, flags=0):
"""Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl. repl can be either a string or a callable;
if a string, backslash escapes in it are processed. If it is
a callable, it‘s passed the match object and must return
a replacement string to be used."""
return _compile(pattern, flags).sub(repl, string, count)
根据正则匹配分割字符串def split(pattern, string, maxsplit=0, flags=0):
"""Split the source string by the occurrences of the pattern,
returning a list containing the resulting substrings. If
capturing parentheses are used in pattern, then the text of all
groups in the pattern are also returned as part of the resulting
list. If maxsplit is nonzero, at most maxsplit splits occur,
and the remainder of the string is returned as the final element
of the list."""
return _compile(pattern, flags).split(string, maxsplit)
文章标题:Python re module (regular expressions)
文章链接:http://soscw.com/essay/27127.html