JavaScript | regex

背景

regex是一个通用的概念，在编程的世界，在python，c，java，php，javascript中都有这个东西，并且有些是通用的，有些又不是。

因此regex大多数时候，掌握基础和常用的套路，面对具体问题和具体的语言环境的时候，现写是最好的。

Regular Expressions in 10 Different Languages

简单总结

主要从这个网站regexone概括从而写总结，因为我看了好几个，发觉这个写得比较通俗易懂。（以我的英语水平）

这个网站主要采用介绍和exercise的形式，exercise对应该部分，因此主要立足于exercise来总结。text等于实际应用中的word pieces，答案需要找到三个pieces的共同部分，使用regex的写法将其共同选中。

需要强调的是，这个regex exercise 是同时支持好几个语言的，所以以下答案是基于这样的前提，不一定在JavaScript中也适用。

intro

正则表达式（英語：Regular Expression，常简写为regex、regexp或RE），虽然regex正式名称叫正则表达式，但是这个翻译确实不太好，无论是规则表达式，还是“通配式” 我都觉得比这个正则翻译得贴切。

就像“一拳打死十个拳”和“可洛斯拳” 这样的同一个事物两个名字对比的这种感觉。

总之regex是一种，批量查找和替换的识别系统，这个系统不大，但是可以用来处理各种情况。可以说是一个设计得非常高明的东西。

Exercise 1: Matching Characters

tast	text
match	abcdefg
match	abcde
match	abc

abc

这是最简单直白的，和你打开office 365里的word ctrl+f，输入abc没啥两样。

Exercise 1½: Matching Digits

tast	text
match	abc123xyz
match	define “123”
match	var g = 123;

123

这个同上面一样，比较简单，但是干活一般都不是干这么简单的。

\d*3

这个也行，但是直觉JavaScript的regex没有这个写法

\d{3}

the character \d can be used in place of any digit from 0 to 9.

\d 表明了一个0-9之间的digit，```{3}`` 就是接着三个，相当于说：三个数字比如：091，221，111，788，都可以被匹配到。

\D

如果你想匹配的不是数字，而是抓住“每一个text都有非digit的内容”这个共同点，那么\D也是一个选择。

Exercise 2: Matching With Wildcards

tast	text
match	cat.
match	896.
match	?=+.
skip	abc1

...\.

the dot （.）match any single character , in order to specifically match a period, you need to escape the dot by using a slash \. accordingly.

...\W

\W: Any Non-alphanumeric character,\W 指代一个非字母（大小写都包括）和数字的character，因此用三个dot（.）指代任意一个字符后，跟上一个排除数字字母的字符，也能通过测试。

Exercise 3: Matching Characters

tast	text
match	can
match	man
match	fan
skip	dan
skip	ran
skip	pan

[fcm]an

There is a method for matching specific characters using regular expressions, by defining them inside square brackets.

[^drp]an

excludes specific characters using the square brackets and the ^ (hat).

Exercise 4: Excluding Characters

tast	text
match	hog
match	dog
skip	bog

[^b]og

标准答案

[hd^b]og

hd放在^前是等效的，放在b后面会和b一个命运。在这个例子里使用[hd]og也是一样的效果，regex写复杂了如何看起来清晰断节点面，，应该是一种技巧。

Exercise 5: Matching Character Ranges

tast	text
match	Ana
match	Bob
match	Cpc
skip	aax
skip	bby
skip	ccz

[A-Z][a-z]{2} ,[A-Z]\w+

第一个：第一个character为大写的A-Z中的一个，后面接上{2}个小写的[a-z]中的一个。这个是针对目标细节的去匹配

第二个：\w is equivalent to the character range [A-Za-z0-9_] ,+意味着是1个及其以上。这个是从首字母大写的角度来匹配的。

[^(a-c)]{2} , [^(a|b|c)]{2}

第一个：从skip入手，凡是开头是两个小写字母的全不要。

第二个：所有开头的两个character ，是从abc里挑一个，且重复两次的，都不要。也就是说，这个skip任务里，如果text多一个ddf,是显然会报错的。

Exercise 6: Matching Repeated Characters

tast	text
match	wazzzzzup
match	wazzzup
skip	wazup

waz{3,5}up, waz{3,}\w+

第一个：匹配的是，wa开头，然后z要重复3 or 4 or 5 次，后接up。

第二个：匹配的是，wa开头，z要重复至少3次，后接[A-Za-z0-9_]这个范围的character，次数是+ ，意味着至少1个。

Exercise 7: Matching Repeated Characters

tast	text
match	aaaabcc
match	aabbbbc
match	aacc
skip	a

a+b*c+, a{2,4}b{0,4}c{1,2}

第一个：a重复的次数是+，意思是1次极其以上，b重复的次数是*，意思是0次极其以上。

第二个，就是第一个的子区间（子匹配），更严格。

总之，日常中我们常用*号来表示一个任意内容（黄文打码常用*），这在regex中不是的，*是和+更类似的存在。他两确实也长得像，都是轴对称图形。

Exercise 8: Matching Optional Characters

tast	text
match	1 file found?
match	2 files found?
match	24 files found?
skip	No files found.

\d+ files? found\?

匹配了数字，数字的个数是+，1，2，24，可以，12325543992 也可以，空格就直接用的空格，files？说明匹配file 或者files，问号前面一个可有可无。

后接了found，\? slash的应用和\.中一样，将特别变为平凡。

对于这种技能，我觉得\ 可以叫“打回原型杠”，或者“奥义·显出原形吧”。

Exercise 9: Matching Whitespaces

tast	text
match	1. abc
match	2. abc
match	3. abc
skip	4.abc

\d+\.\s+abc

\d+：数字开头，1位以上。

```.``：一个小点

\s+：空白，white-spances ，1位以上

abc：3 characters like abc

\s 是一个通用的特殊办法来匹配空白，因为无论是tabs (\t)，new line(\n)，return(\r) 都是由最基本的空白组成，1 tab = 4 white-spaces

\d+\.[^\S]+abc

\S 则Any Non-whitespace character，不是空白就行。

Exercise 10: Matching Lines

tast	text
match	Mission: successful
skip	Last Mission: unsuccessful
skip	Next Mission: successful upon capture of target

^Mission: successful$

best practice to write as specific regular expressions as possible

describes both the start and the end of the line using the special ^ (hat) and $ (dollar sign)

^和$ 之间的内容，作为匹配内容。可以看到的是基本就是原样字符串，总觉得应该可以往里面塞regex表达式。

^([A-Z]is{2}ion:\ssuccessful)$

可以，但没必要，还给搞复杂了。

Exercise 11: Matching Groups

tast	text	Capture Groups
capture	file_record_transcript.pdf	file_record_transcript
capture	file_08927340329.pdf	file_08927340329
skip	test_fake.pdf.tmp

^(file.+)\.pdf$

Any subpattern inside a pair of parentheses will be captured as a group.()

^(IMG\d+\.png)$和^(IMG\d+)\.png$ 的区别就是代表的范围不一样，前面一个尾巴带```.png``。

Exercise 12: Matching Nested Groups

tast	text	Capture Groups
capture	Jan 1987	Jan 1987, 1987
capture	May 1969	May 1969, 1969
capture	Aug 2011	Aug 2011, 2011

^(.{3}\s(\d{4}))$

任意三个字符后跟一个空白，再跟4个数字。把整体包起来，多了一个^……$, 也就是说这个是按着行去处理的。

(\w+ (\d+))

标准答案，把年份包起来，再一起包起来。

Exercise 13: Matching Nested Groups

tast	text	Capture Groups
capture	1280x720	1280, 720
capture	1920x1600	1920，1600
capture	1024x768	1024，768

(\d+)x(\d+)

这个就很好写了哈哈，到了这一步。

(\d*)x(\d{3}\d?)

也可以写复杂一点，后面一个group可以是3位digit，也可以是4位，但显然不能是5位。对于需要筛选图片大小范围，限制都可以使用这个思路。

Exercise 14: Matching Conditional Text

tast	text
match	I love cats
match	I love dogs
skip	I love logs
skip	I love cogs

I\slove\s(cats|dogs)

use the | (logical OR, aka. the pipe) to denote different possible sets of characters.

Links

以下链接会作为之后本文扩展或者查阅用的来源，后续是遇到问题和值得写的案例会更新，不然就不更新了。写完简单总结感觉自己膨胀了。