admin管理员组文章数量:1579085
Literal strings
package com.sheting.basic.regex;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;
public class RegexDemo {
public static void main(String[] args) {
try {
String regex = "apple";
String input = "applet";
Pattern p = Patternpile(regex);
Matcher m = p.matcher(input);
// m.group() 返回匹配的字符串. m.group() 等价于 "applet".subSequence(m.start(), m.end());
while (m.find()) {
System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
}
} catch (PatternSyntaxException pse) {
System.err.println("Bad regex: " + pse.getMessage());
System.err.println("Description: " + pse.getDescription());
System.err.println("Index: " + pse.getIndex());
System.err.println("Incorrect pattern: " + pse.getPattern());
}
}
}
output
Found [apple] starting at 0 and ending at 4
Metacharacters(元字符)
. 代表任意字符
package com.sheting.basic.regex;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;
public class RegexDemo {
public static void main(String[] args) {
try {
String regex = ".ox";
String input = "The quick brown fox jumps over the lazy ox.";
Pattern p = Patternpile(regex);
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
}
} catch (PatternSyntaxException pse) {
System.err.println("Bad regex: " + pse.getMessage());
System.err.println("Description: " + pse.getDescription());
System.err.println("Index: " + pse.getIndex());
System.err.println("Incorrect pattern: " + pse.getPattern());
}
}
}
output
Found [fox] starting at 16 and ending at 18
Found [ ox] starting at 39 and ending at 41
Character classes
Simple character class
[ ] 代表一个字符集合,能匹配其中的任意一个字符,就算匹配上了.
package com.sheting.basic.regex;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;
public class RegexDemo {
public static void main(String[] args) {
try {
String regex = "[csw]";
String input = "abcdesw";
Pattern p = Patternpile(regex);
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
}
} catch (PatternSyntaxException pse) {
System.err.println("Bad regex: " + pse.getMessage());
System.err.println("Description: " + pse.getDescription());
System.err.println("Index: " + pse.getIndex());
System.err.println("Incorrect pattern: " + pse.getPattern());
}
}
}
output
Found [c] starting at 2 and ending at 2
Found [s] starting at 5 and ending at 5
Found [w] starting at 6 and ending at 6
Negation character class
^代表否定,除了这些字符其他都算匹配上.
package com.sheting.basic.regex;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;
public class RegexDemo {
public static void main(String[] args) {
try {
String regex = "[^csw]";
String input = "abcdesw";
Pattern p = Patternpile(regex);
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
}
} catch (PatternSyntaxException pse) {
System.err.println("Bad regex: " + pse.getMessage());
System.err.println("Description: " + pse.getDescription());
System.err.println("Index: " + pse.getIndex());
System.err.println("Incorrect pattern: " + pse.getPattern());
}
}
}
output
Found [a] starting at 0 and ending at 0
Found [b] starting at 1 and ending at 1
Found [d] starting at 3 and ending at 3
Found [e] starting at 4 and ending at 4
Range character class
- 代表一个范围, -两边和之间的字符.
package com.sheting.basic.regex;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;
public class RegexDemo {
public static void main(String[] args) {
try {
String regex = "[a-c]";
String input = "abcdesw";
Pattern p = Patternpile(regex);
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
}
} catch (PatternSyntaxException pse) {
System.err.println("Bad regex: " + pse.getMessage());
System.err.println("Description: " + pse.getDescription());
System.err.println("Index: " + pse.getIndex());
System.err.println("Incorrect pattern: " + pse.getPattern());
}
}
}
output
Found [a] starting at 0 and ending at 0
Found [b] starting at 1 and ending at 1
Found [c] starting at 2 and ending at 2
Union character class
package com.sheting.basic.regex;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;
public class RegexDemo {
public static void main(String[] args) {
try {
String regex = "[a-c[f-k]]";
String input = "abcdeflm";
Pattern p = Patternpile(regex);
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
}
} catch (PatternSyntaxException pse) {
System.err.println("Bad regex: " + pse.getMessage());
System.err.println("Description: " + pse.getDescription());
System.err.println("Index: " + pse.getIndex());
System.err.println("Incorrect pattern: " + pse.getPattern());
}
}
}
output
Found [a] starting at 0 and ending at 0
Found [b] starting at 1 and ending at 1
Found [c] starting at 2 and ending at 2
Found [f] starting at 5 and ending at 5
Intersection(交叉&&) character class
&&两边共同的字符才是要匹配的字符.
package com.sheting.basic.regex;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;
public class RegexDemo {
public static void main(String[] args) {
try {
String regex = "[aeiouy&&[y]]";
String input = "party";
Pattern p = Patternpile(regex);
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
}
} catch (PatternSyntaxException pse) {
System.err.println("Bad regex: " + pse.getMessage());
System.err.println("Description: " + pse.getDescription());
System.err.println("Index: " + pse.getIndex());
System.err.println("Incorrect pattern: " + pse.getPattern());
}
}
}
output
Found [y] starting at 4 and ending at 4
Subtraction character class
package com.sheting.basic.regex;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;
public class RegexDemo {
public static void main(String[] args) {
try {
String regex = "[a-f&&[^a-c]&&[^e]]";
String input = "abcdefg";
Pattern p = Patternpile(regex);
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
}
} catch (PatternSyntaxException pse) {
System.err.println("Bad regex: " + pse.getMessage());
System.err.println("Description: " + pse.getDescription());
System.err.println("Index: " + pse.getIndex());
System.err.println("Incorrect pattern: " + pse.getPattern());
}
}
}
output
Found [d] starting at 3 and ending at 3
Found [f] starting at 5 and ending at 5
Predefined character classes
\d: A digit. Equivalent to [0-9].
\D: A nondigit. Equivalent to [^0-9].
\s: A whitespace character. Equivalent to [ \t\n\x0B\f\r].
\S: A nonwhitespace character. Equivalent to [^\s].
\w: A word character. Equivalent to [a-zA-Z_0-9].
\W: A nonword character. Equivalent to [^\w].
package com.sheting.basic.regex;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;
public class RegexDemo {
public static void main(String[] args) {
try {
String regex = "\\w";
String input = "aZ.8 _";
Pattern p = Patternpile(regex);
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
}
} catch (PatternSyntaxException pse) {
System.err.println("Bad regex: " + pse.getMessage());
System.err.println("Description: " + pse.getDescription());
System.err.println("Index: " + pse.getIndex());
System.err.println("Incorrect pattern: " + pse.getPattern());
}
}
}
output
Found [a] starting at 0 and ending at 0
Found [Z] starting at 1 and ending at 1
Found [8] starting at 3 and ending at 3
Found [_] starting at 5 and ending at 5
Line terminators
Pattern’s SDK documentation refers to the period metacharacter as a predefined character class that matches any character except for a line terminator (a one- or two-character sequence identifying the end of a text line). Unless dotall mode (discussed later) is in effect, line terminators are matched by period in dotall mode. Pattern recognizes the following line terminators:The carriage-return character (\r)
The new-line (line feed) character (\n)
The carriage-return character immediately followed by the new-line character (\r\n)
The next-line character (\u0085)
The line-separator character (\u2028)
The paragraph-separator character (\u2029)
Capturing groups
()表示一个整体,要匹配圆括号中的所有字符,且顺序一致.
\2表示()中的内容重复两次.
package com.sheting.basic.regex;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;
public class RegexDemo {
public static void main(String[] args) {
try {
String regex = "(Java( language)\\2)";
String input = "The Java language language";
Pattern p = Patternpile(regex);
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
}
} catch (PatternSyntaxException pse) {
System.err.println("Bad regex: " + pse.getMessage());
System.err.println("Description: " + pse.getDescription());
System.err.println("Index: " + pse.getIndex());
System.err.println("Incorrect pattern: " + pse.getPattern());
}
}
}
output
Found [Java language language] starting at 4 and ending at 25
Boundary matchers
^: The beginning of a line
$: The end of a line
\b: A word boundary
\B: A non-word boundary
\A: The beginning of the text
\G: The end of the previous match
\Z: The end of the text, except for the final line terminator (if any)
\z: The end of the text
String regex = “^The\w*”; 表示该行以The 开头
package com.sheting.basic.regex;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;
public class RegexDemo {
public static void main(String[] args) {
try {
String regex = "^The\\w*";
// String input = " The Java language language"; // no match
String input = "The Java language language";
Pattern p = Patternpile(regex);
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
}
} catch (PatternSyntaxException pse) {
System.err.println("Bad regex: " + pse.getMessage());
System.err.println("Description: " + pse.getDescription());
System.err.println("Index: " + pse.getIndex());
System.err.println("Incorrect pattern: " + pse.getPattern());
}
}
}
output
Found [The] starting at 0 and ending at 2
Zero-length matches
package com.sheting.basic.regex;
import java.util.Arrays;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;
public class RegexDemo {
public static void main(String[] args) {
try {
String regex = "\\b\\b";
String input = "Java is";
Pattern p = Patternpile(regex);
Matcher m = p.matcher(input);
System.err.println(Arrays.asList("Java is".split(regex)));
while (m.find()) {
System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
}
} catch (PatternSyntaxException pse) {
System.err.println("Bad regex: " + pse.getMessage());
System.err.println("Description: " + pse.getDescription());
System.err.println("Index: " + pse.getIndex());
System.err.println("Incorrect pattern: " + pse.getPattern());
}
}
}
output
[Java, , is]
Found [] starting at 0 and ending at -1
Found [] starting at 4 and ending at 3
Found [] starting at 5 and ending at 4
Found [] starting at 7 and ending at 6
Quantifiers
Quantifiers are categorized as greedy, reluctant, or possessive:
A greedy quantifier(?, *, or +)
attempts to find the longest match.
Specify X?
to find one or no occurrences of X,
X*
to find zero or more occurrences of X,
X+
to find one or more occurrences of X,
X{n}
to find n occurrences of X,
X{n,}
to find at least n (and possibly more) occurrences of X,
and X{n,m}
to find at least n but no more than m occurrences of X.
A reluctant quantifier (??, *?, or +?) attempts to find the shortest match.
SpecifyX??
to find one or no occurrences of X,
X*?
to find zero or more occurrences of X,
X+?
to find one or more occurrences of X,
X{n}?
to find n occurrences of X,
X{n,}?
to find at least n (and possibly more) occurrences of X,
and X{n,m}?
to find at least n but no more than m occurrences of X.
A possessive quantifier (?+, *+, or ++) is similar to a greedy quantifier except that a possessive quantifier only makes one attempt to find the longest match, whereas a greedy quantifier can make multiple attempts.
Specify X?+
to find one or no occurrences of X,
X*+
to find zero or more occurrences of X,
X++
to find one or more occurrences of X,
X{n}+
to find n occurrences of X,
X{n,}+
to find at least n (and possibly more) occurrences of X,
and X{n,m}+
to find at least n but no more than m occurrences of X.
greedy
会找到最后一个ox才算匹配上
regex = .*ox
input = fox box pox
Found [fox box pox] starting at 0 and ending at 10
reluctant
只要找到ox就算匹配上
regex = .*?ox
input = fox box pox
Found [fox] starting at 0 and ending at 2
Found [ box] starting at 3 and ending at 6
Found [ pox] starting at 7 and ending at 10
possessive
regex = .*+ox
input = fox box pox
Zero-length matches
regex = a?
input = abaa
Found [a] starting at 0 and ending at 0
Found [] starting at 1 and ending at 0
Found [a] starting at 2 and ending at 2
Found [a] starting at 3 and ending at 3
Found [] starting at 4 and ending at 3
Embedded flag expressions
(?i): enables case-insensitive pattern matching.
package com.sheting.basic.regex;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;
public class RegexDemo {
public static void main(String[] args) {
try {
String regex = "(?i)tree";
String input = "Treehouse";
Pattern p = Patternpile(regex);
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
}
} catch (PatternSyntaxException pse) {
System.err.println("Bad regex: " + pse.getMessage());
System.err.println("Description: " + pse.getDescription());
System.err.println("Index: " + pse.getIndex());
System.err.println("Incorrect pattern: " + pse.getPattern());
}
}
}
output
Found [Tree] starting at 0 and ending at 3
(?x): permits whitespace and comments beginning with the # metacharacter to appear in a pattern. A matcher ignores both.
package com.sheting.basic.regex;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;
public class RegexDemo {
public static void main(String[] args) {
try {
String regex = ".at(?x)#match hat, cat, and so on";
String input = "matter";
Pattern p = Patternpile(regex);
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
}
} catch (PatternSyntaxException pse) {
System.err.println("Bad regex: " + pse.getMessage());
System.err.println("Description: " + pse.getDescription());
System.err.println("Index: " + pse.getIndex());
System.err.println("Incorrect pattern: " + pse.getPattern());
}
}
}
output
Found [mat] starting at 0 and ending at 2
(?s): enables dotall mode in which the period metacharacter matches line terminators in addition to any other character.
package com.sheting.basic.regex;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;
public class RegexDemo {
public static void main(String[] args) {
try {
String regex = "(?s).";
String input = "\n";
Pattern p = Patternpile(regex);
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
}
} catch (PatternSyntaxException pse) {
System.err.println("Bad regex: " + pse.getMessage());
System.err.println("Description: " + pse.getDescription());
System.err.println("Index: " + pse.getIndex());
System.err.println("Incorrect pattern: " + pse.getPattern());
}
}
}
output
Found [
] starting at 0 and ending at 0
以下测试没有匹配上
package com.sheting.basic.regex;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;
public class RegexDemo {
public static void main(String[] args) {
try {
String regex = ".";
String input = "\n";
Pattern p = Patternpile(regex);
Matcher m = p.matcher(input); // NO Match
while (m.find()) {
System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
}
} catch (PatternSyntaxException pse) {
System.err.println("Bad regex: " + pse.getMessage());
System.err.println("Description: " + pse.getDescription());
System.err.println("Index: " + pse.getIndex());
System.err.println("Incorrect pattern: " + pse.getPattern());
}
}
}
(?m): enables multiline mode in which ^ matches the beginning of every line and $ matches the end of every line.
package com.sheting.basic.regex;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;
public class RegexDemo {
public static void main(String[] args) {
try {
String regex = "(?m)^abc$";
String input = "abc\nabc";
Pattern p = Patternpile(regex);
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
}
} catch (PatternSyntaxException pse) {
System.err.println("Bad regex: " + pse.getMessage());
System.err.println("Description: " + pse.getDescription());
System.err.println("Index: " + pse.getIndex());
System.err.println("Incorrect pattern: " + pse.getPattern());
}
}
}
output
Found [abc] starting at 0 and ending at 2
Found [abc] starting at 4 and ending at 6
注意: “^abc$” abc\nabc 这个是no matches.
(?u): enables Unicode-aware case folding. This flag works with (?i) to perform case-insensitive matching in a manner consistent with the Unicode Standard. The default setting is case-insensitive matching that assumes only characters in the US-ASCII character set match.
(?d): enables Unix lines mode in which a matcher recognizes only the \n line terminator in the context of the ., ^, and $ metacharacters. Non-Unix lines mode is the default: a matcher recognizes all terminators in the context of the aforementioned metacharacters.
本文标签: expressionsregularPartJava
版权声明:本文标题:Regular expressions in Java, Part 1 内容由热心网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:https://m.elefans.com/dongtai/1727846874a1133194.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论