Regular expressions in Java, Part 1|电子爱好者

admin管理员组
文章数量:1579085

Literal strings

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "apple";
            String input = "applet";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            // m.group() 返回匹配的字符串. m.group() 等价于 "applet".subSequence(m.start(), m.end());
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [apple] starting at 0 and ending at 4

Metacharacters(元字符)

. 代表任意字符

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = ".ox";
            String input = "The quick brown fox jumps over the lazy ox.";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [fox] starting at 16 and ending at 18
Found [ ox] starting at 39 and ending at 41

Character classes

Simple character class

[ ] 代表一个字符集合,能匹配其中的任意一个字符,就算匹配上了.

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "[csw]";
            String input = "abcdesw";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [c] starting at 2 and ending at 2
Found [s] starting at 5 and ending at 5
Found [w] starting at 6 and ending at 6

Negation character class

^代表否定,除了这些字符其他都算匹配上.

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "[^csw]";
            String input = "abcdesw";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [a] starting at 0 and ending at 0
Found [b] starting at 1 and ending at 1
Found [d] starting at 3 and ending at 3
Found [e] starting at 4 and ending at 4

Range character class

- 代表一个范围, -两边和之间的字符.

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "[a-c]";
            String input = "abcdesw";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [a] starting at 0 and ending at 0
Found [b] starting at 1 and ending at 1
Found [c] starting at 2 and ending at 2

Union character class

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "[a-c[f-k]]";
            String input = "abcdeflm";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [a] starting at 0 and ending at 0
Found [b] starting at 1 and ending at 1
Found [c] starting at 2 and ending at 2
Found [f] starting at 5 and ending at 5

Intersection(交叉&&) character class

&&两边共同的字符才是要匹配的字符.

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "[aeiouy&&[y]]";
            String input = "party";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [y] starting at 4 and ending at 4

Subtraction character class

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "[a-f&&[^a-c]&&[^e]]";
            String input = "abcdefg";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [d] starting at 3 and ending at 3
Found [f] starting at 5 and ending at 5

Predefined character classes

\d: A digit. Equivalent to [0-9].
\D: A nondigit. Equivalent to [^0-9].
\s: A whitespace character. Equivalent to [ \t\n\x0B\f\r].
\S: A nonwhitespace character. Equivalent to [^\s].
\w: A word character. Equivalent to [a-zA-Z_0-9].
\W: A nonword character. Equivalent to [^\w].

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "\\w";
            String input = "aZ.8 _";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [a] starting at 0 and ending at 0
Found [Z] starting at 1 and ending at 1
Found [8] starting at 3 and ending at 3
Found [_] starting at 5 and ending at 5

Line terminators
Pattern’s SDK documentation refers to the period metacharacter as a predefined character class that matches any character except for a line terminator (a one- or two-character sequence identifying the end of a text line). Unless dotall mode (discussed later) is in effect, line terminators are matched by period in dotall mode. Pattern recognizes the following line terminators:

The carriage-return character (\r)
The new-line (line feed) character (\n)
The carriage-return character immediately followed by the new-line character (\r\n)
The next-line character (\u0085)
The line-separator character (\u2028)
The paragraph-separator character (\u2029)

Capturing groups

()表示一个整体,要匹配圆括号中的所有字符,且顺序一致.
\2表示()中的内容重复两次.

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "(Java( language)\\2)";
            String input = "The Java language language";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [Java language language] starting at 4 and ending at 25

Boundary matchers

^: The beginning of a line
$: The end of a line
\b: A word boundary
\B: A non-word boundary
\A: The beginning of the text
\G: The end of the previous match
\Z: The end of the text, except for the final line terminator (if any)
\z: The end of the text

String regex = “^The\w*”; 表示该行以The 开头

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "^The\\w*";
            // String input = " The Java language language"; // no match
            String input = "The Java language language";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [The] starting at 0 and ending at 2

Zero-length matches

package com.sheting.basic.regex;

import java.util.Arrays;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "\\b\\b";
            String input = "Java is";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            System.err.println(Arrays.asList("Java is".split(regex)));
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

[Java,  , is]
Found [] starting at 0 and ending at -1
Found [] starting at 4 and ending at 3
Found [] starting at 5 and ending at 4
Found [] starting at 7 and ending at 6

Quantifiers

Quantifiers are categorized as greedy, reluctant, or possessive:
A greedy quantifier(?, *, or +) attempts to find the longest match.
Specify X? to find one or no occurrences of X,
X*to find zero or more occurrences of X,
X+ to find one or more occurrences of X,
X{n} to find n occurrences of X,
X{n,} to find at least n (and possibly more) occurrences of X,
and X{n,m}to find at least n but no more than m occurrences of X.

A reluctant quantifier (??, *?, or +?) attempts to find the shortest match.
SpecifyX?? to find one or no occurrences of X,
X*? to find zero or more occurrences of X,
X+? to find one or more occurrences of X,
X{n}? to find n occurrences of X,
X{n,}? to find at least n (and possibly more) occurrences of X,
and X{n,m}? to find at least n but no more than m occurrences of X.

A possessive quantifier (?+, *+, or ++) is similar to a greedy quantifier except that a possessive quantifier only makes one attempt to find the longest match, whereas a greedy quantifier can make multiple attempts.
Specify X?+ to find one or no occurrences of X,
X*+ to find zero or more occurrences of X,
X++ to find one or more occurrences of X,
X{n}+ to find n occurrences of X,
X{n,}+ to find at least n (and possibly more) occurrences of X,
and X{n,m}+ to find at least n but no more than m occurrences of X.

greedy
会找到最后一个ox才算匹配上

regex = .*ox
input = fox box pox
Found [fox box pox] starting at 0 and ending at 10

reluctant
只要找到ox就算匹配上

regex = .*?ox
input = fox box pox
Found [fox] starting at 0 and ending at 2
Found [ box] starting at 3 and ending at 6
Found [ pox] starting at 7 and ending at 10

possessive

regex = .*+ox
input = fox box pox

Zero-length matches

regex = a?
input = abaa
Found [a] starting at 0 and ending at 0
Found [] starting at 1 and ending at 0
Found [a] starting at 2 and ending at 2
Found [a] starting at 3 and ending at 3
Found [] starting at 4 and ending at 3

Embedded flag expressions

(?i): enables case-insensitive pattern matching.

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "(?i)tree";
            String input = "Treehouse";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [Tree] starting at 0 and ending at 3

(?x): permits whitespace and comments beginning with the # metacharacter to appear in a pattern. A matcher ignores both.

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = ".at(?x)#match hat, cat, and so on";
            String input = "matter";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [mat] starting at 0 and ending at 2

(?s): enables dotall mode in which the period metacharacter matches line terminators in addition to any other character.

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "(?s).";
            String input = "\n";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [
] starting at 0 and ending at 0

以下测试没有匹配上

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = ".";
            String input = "\n";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input); // NO Match
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

(?m): enables multiline mode in which ^ matches the beginning of every line and $ matches the end of every line.

package com.sheting.basic.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexDemo {
    public static void main(String[] args) {

        try {
            String regex = "(?m)^abc$";
            String input = "abc\nabc";
            Pattern p = Patternpile(regex);
            Matcher m = p.matcher(input);
            while (m.find()) {
                System.out.println("Found [" + m.group() + "] starting at " + m.start() + " and ending at " + (m.end() - 1));
            }
        } catch (PatternSyntaxException pse) {
            System.err.println("Bad regex: " + pse.getMessage());
            System.err.println("Description: " + pse.getDescription());
            System.err.println("Index: " + pse.getIndex());
            System.err.println("Incorrect pattern: " + pse.getPattern());
        }
    }
}

output

Found [abc] starting at 0 and ending at 2
Found [abc] starting at 4 and ending at 6

注意: “^abc$” abc\nabc 这个是no matches.

(?u): enables Unicode-aware case folding. This flag works with (?i) to perform case-insensitive matching in a manner consistent with the Unicode Standard. The default setting is case-insensitive matching that assumes only characters in the US-ASCII character set match.

(?d): enables Unix lines mode in which a matcher recognizes only the \n line terminator in the context of the ., ^, and $ metacharacters. Non-Unix lines mode is the default: a matcher recognizes all terminators in the context of the aforementioned metacharacters.

本文标签： expressions regular Part Java

版权声明：本文标题：Regular expressions in Java, Part 1 内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://m.elefans.com/dongtai/1727846874a1133194.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

电子爱好者 - 最新技术资讯及电子产品介绍！

Regular expressions in Java, Part 1

Literal strings

Metacharacters(元字符)

Character classes

Simple character class

Negation character class

Range character class

Union character class

Intersection(交叉&&) character class

Subtraction character class

Predefined character classes

Capturing groups

Boundary matchers

Zero-length matches

Quantifiers

Zero-length matches

Embedded flag expressions

更多相关文章

Java基础_字符串及正则表达式

java正则表达式

Oracle Java官网关于不可变对象的解析

Java语法之正定表达式的用法

Java基础知识正则表达式

JAVA编程调优全集-性能设计沉思录(12)

正则表达式：基础详解以及在Java中的使用

Java中的正则表达式(详细)

Regular expressions in Java

【java】之正则表达式摘要

疯狂Java讲义（七）----第二部分

java注解的正则表达_Java 正则表达式 解释说明

java正则表达式实例_Java正则表达式的实例操作指南

java+正则表达+数字,认识Java正则表达式

java正则表达式 ppt_Java正则表达式实例详解

java字符串正则分割字符串_java分割字符串和正则表达式 | 学步园

java 插件 保密豆_JAVA豆知识

java正则u_Java 正则表达式

第七章Java基础类库

拿图就走系列之《深入理解java虚拟机》

发表评论

推荐文章

Pixhawk飞行日志教程---使用日志诊断问题

计算机管理创建超级用户,win10家庭版怎么开启Administrator超级管理员帐户

ChatGPT等大模型可以代替搜索引擎吗？

网上找到一篇 关于java正则表达式,进阶学习一下

字节跳动，华为，阿里巴巴，小米，腾讯 2021大厂面试经历系列之初、中、高级测试工程师面试题汇总（附答案）

热门文章

CF645E Intellectual Inquiry

为什么 VS Code 会这么牛逼？

电脑文件夹怎么加密？文件夹加密软件合集

密码编码学与网络安全

1003 Emergency (25分)（一个字短！）

解决 APPARENT DEADLOCK!!! Creating emergency threads for unassigned pending tas

搜索引擎索引之如何建立索引

搜索的艺术——搜索引擎使用心得

java 文字串叠字检查_Java 正则表达式详细实例解析

谷歌浏览器 手机浏览器切换

最新文章

一芯FC1178BC主控U盘量产修复指南

慧荣SMISM3280AB开卡量产工具适用于无法识别设备黑片U盘量产工具修复使用

u盘无法识别怎么办，u盘无法识别解决方法

linux 下u盘分区修复无法识别问题解决

定了，6大领域93个开源任务，阿里开源导师带你参与中科院开源之夏2022

识别到硬盘 计算机不显示盘符,笔者教你修复可以识别u盘但不显示盘符的问题...

agio U盘强制弹出导致的无法识别需格式化的问题的修复方案

U盘无法与计算机连接,U盘无法连接电脑

通过修复VMware软件解决虚拟机无法识别到U盘设备的问题

@mysql数据库面试手册

Ubuntu及Debian下挂载U盘及exFat文件系统U盘无法挂载的解决

linux usb3.0无法识别u盘启动,Deepin 20系统能识别USB3.0：如果不能用请重启系统或重插几次...

为什么计算机无法读取u盘,电脑无法识别读取U盘怎么办？逐一排查解决问题

解决Ubuntu下U盘无法识别的问题

测试工程师「 面试题 」那点故事

小米手机肿么还原时钟

15000流明是多少瓦

一般普通投影机功率多大?

苹果绿联转换器有些投影机不能用

坚果V9投影机具体参数?

有关九年级作文850字精选

80后90后_高一作文

java注解的正则表达_Java 正则表达式解释说明

java 插件保密豆_JAVA豆知识

网上找到一篇关于java正则表达式,进阶学习一下

谷歌浏览器手机浏览器切换

识别到硬盘计算机不显示盘符,笔者教你修复可以识别u盘但不显示盘符的问题...

测试工程师「面试题」那点故事

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改官方免费下载