admin管理员组

文章数量:1579085

Literal strings

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "apple";
            String input = "applet";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            // m.group() 返回匹配的字符串. m.group() 等价于 "applet".subSequence(m.start(), m.end());
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [apple] starting at 0 and ending at 4

Metacharacters(元字符)

. 代表任意字符

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = ".ox";
            String input = "The quick brown fox jumps over the lazy ox.";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [fox] starting at 16 and ending at 18
Found [ ox] starting at 39 and ending at 41

Character classes

Simple character class

[ ] 代表一个字符集合,能匹配其中的任意一个字符,就算匹配上了.

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "[csw]";
            String input = "abcdesw";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [c] starting at 2 and ending at 2
Found [s] starting at 5 and ending at 5
Found [w] starting at 6 and ending at 6
Negation character class

^代表否定,除了这些字符其他都算匹配上.

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "[^csw]";
            String input = "abcdesw";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [a] starting at 0 and ending at 0
Found [b] starting at 1 and ending at 1
Found [d] starting at 3 and ending at 3
Found [e] starting at 4 and ending at 4
Range character class

- 代表一个范围, -两边和之间的字符.

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "[a-c]";
            String input = "abcdesw";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [a] starting at 0 and ending at 0
Found [b] starting at 1 and ending at 1
Found [c] starting at 2 and ending at 2
Union character class
package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "[a-c[f-k]]";
            String input = "abcdeflm";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [a] starting at 0 and ending at 0
Found [b] starting at 1 and ending at 1
Found [c] starting at 2 and ending at 2
Found [f] starting at 5 and ending at 5
Intersection(交叉&&) character class

&&两边共同的字符才是要匹配的字符.

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "[aeiouy&&[y]]";
            String input = "party";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [y] starting at 4 and ending at 4
Subtraction character class
package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "[a-f&&[^a-c]&&[^e]]";
            String input = "abcdefg";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [d] starting at 3 and ending at 3
Found [f] starting at 5 and ending at 5
Predefined character classes

\d: A digit. Equivalent to [0-9].
\D: A nondigit. Equivalent to [^0-9].
\s: A whitespace character. Equivalent to [ \t\n\x0B\f\r].
\S: A nonwhitespace character. Equivalent to [^\s].
\w: A word character. Equivalent to [a-zA-Z_0-9].
\W: A nonword character. Equivalent to [^\w].

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "\\w";
            String input = "aZ.8 _";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [a] starting at 0 and ending at 0
Found [Z] starting at 1 and ending at 1
Found [8] starting at 3 and ending at 3
Found [_] starting at 5 and ending at 5

Line terminators
Pattern’s SDK documentation refers to the period metacharacter as a predefined character class that matches any character except for a line terminator (a one- or two-character sequence identifying the end of a text line). Unless dotall mode (discussed later) is in effect, line terminators are matched by period in dotall mode. Pattern recognizes the following line terminators:

The carriage-return character (\r)
The new-line (line feed) character (\n)
The carriage-return character immediately followed by the new-line character (\r\n)
The next-line character (\u0085)
The line-separator character (\u2028)
The paragraph-separator character (\u2029)

Capturing groups

()表示一个整体,要匹配圆括号中的所有字符,且顺序一致.
\2表示()中的内容重复两次.

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "(Java( language)\\2)";
            String input = "The Java language language";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [Java language language] starting at 4 and ending at 25

Boundary matchers

^: The beginning of a line
$: The end of a line
\b: A word boundary
\B: A non-word boundary
\A: The beginning of the text
\G: The end of the previous match
\Z: The end of the text, except for the final line terminator (if any)
\z: The end of the text

String regex = “^The\w*”; 表示该行以The 开头

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "^The\\w*";
            // String input = " The Java language language"; // no match
            String input = "The Java language language";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [The] starting at 0 and ending at 2
Zero-length matches
package com.sheting.basic.regex;

import java.util.Arrays;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "\\b\\b";
            String input = "Java is";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            System.err.println(Arrays.asList("Java is".split(regex)));
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

[Java,  , is]
Found [] starting at 0 and ending at -1
Found [] starting at 4 and ending at 3
Found [] starting at 5 and ending at 4
Found [] starting at 7 and ending at 6

Quantifiers

Quantifiers are categorized as greedy, reluctant, or possessive:
A greedy quantifier(?, *, or +) attempts to find the longest match.
Specify X? to find one or no occurrences of X,
X*to find zero or more occurrences of X,
X+ to find one or more occurrences of X,
X{n} to find n occurrences of X,
X{n,} to find at least n (and possibly more) occurrences of X,
and X{n,m}to find at least n but no more than m occurrences of X.

A reluctant quantifier (??, *?, or +?) attempts to find the shortest match.
SpecifyX?? to find one or no occurrences of X,
X*? to find zero or more occurrences of X,
X+? to find one or more occurrences of X,
X{n}? to find n occurrences of X,
X{n,}? to find at least n (and possibly more) occurrences of X,
and X{n,m}? to find at least n but no more than m occurrences of X.

A possessive quantifier (?+, *+, or ++) is similar to a greedy quantifier except that a possessive quantifier only makes one attempt to find the longest match, whereas a greedy quantifier can make multiple attempts.
Specify X?+ to find one or no occurrences of X,
X*+ to find zero or more occurrences of X,
X++ to find one or more occurrences of X,
X{n}+ to find n occurrences of X,
X{n,}+ to find at least n (and possibly more) occurrences of X,
and X{n,m}+ to find at least n but no more than m occurrences of X.

greedy
会找到最后一个ox才算匹配上

regex = .*ox
input = fox box pox
Found [fox box pox] starting at 0 and ending at 10

reluctant
只要找到ox就算匹配上

regex = .*?ox
input = fox box pox
Found [fox] starting at 0 and ending at 2
Found [ box] starting at 3 and ending at 6
Found [ pox] starting at 7 and ending at 10

possessive

regex = .*+ox
input = fox box pox
Zero-length matches
regex = a?
input = abaa
Found [a] starting at 0 and ending at 0
Found [] starting at 1 and ending at 0
Found [a] starting at 2 and ending at 2
Found [a] starting at 3 and ending at 3
Found [] starting at 4 and ending at 3

Embedded flag expressions

(?i): enables case-insensitive pattern matching.

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "(?i)tree";
            String input = "Treehouse";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [Tree] starting at 0 and ending at 3

(?x): permits whitespace and comments beginning with the # metacharacter to appear in a pattern. A matcher ignores both.

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = ".at(?x)#match hat, cat, and so on";
            String input = "matter";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [mat] starting at 0 and ending at 2

(?s): enables dotall mode in which the period metacharacter matches line terminators in addition to any other character.

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "(?s).";
            String input = "\n";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [
] starting at 0 and ending at 0

以下测试没有匹配上

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = ".";
            String input = "\n";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input); // NO Match
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

(?m): enables multiline mode in which ^ matches the beginning of every line and $ matches the end of every line.

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "(?m)^abc$";
            String input = "abc\nabc";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [abc] starting at 0 and ending at 2
Found [abc] starting at 4 and ending at 6

注意: “^abc$” abc\nabc 这个是no matches.

(?u): enables Unicode-aware case folding. This flag works with (?i) to perform case-insensitive matching in a manner consistent with the Unicode Standard. The default setting is case-insensitive matching that assumes only characters in the US-ASCII character set match.

(?d): enables Unix lines mode in which a matcher recognizes only the \n line terminator in the context of the ., ^, and $ metacharacters. Non-Unix lines mode is the default: a matcher recognizes all terminators in the context of the aforementioned metacharacters.

本文标签: expressionsregularPartJava