官术网_书友最值得收藏!

Possessive quantifiers

Possessive quantifiers are quantifiers that are greedy when matching text like greedy quantifiers do. Both greedy and possessive quantifiers try to match as many characters as possible. The important difference, however, is that the possessive quantifiers do not backtrack (go back) unlike greedy quantifiers; therefore, it is possible that the regex match fails if the possessive quantifiers go too far.

This table shows all the three types of quantifiers, side by side:

Let's take an example input string, a1b5, and see the behavior of the greedy, lazy, and possessive quantifiers.

If we apply a regex using the greedy quantifier, \w+\d, then it will match a1b (the longest match before backtracking starts) using \w+, and 5 will be matched using \d; thus, the full match will be a1b5.

Now, if we apply a regex using the non-greedy quantifier, \w+?\d, then it will match a (the shortest match before expanding starts) using \w+?, and then the adjacent digit 1 will be matched using \d. Thus, the first full match will be a1. If we let the regex execute again, then it will find another match, b5.

Finally, if we apply a regex using the possessive quantifier, \w++\d, then it will match all the characters a1b5 (the longest possible match without giving back) using \w++ . Due to this, \d remains unmatched, and hence the regex fails to find any match.

Let's take another example. The requirement is to match a string that starts with lowercase English alphabets or hyphen. The string can have any character after the alphabets/hyphens, except a colon. There can be any number of any characters of any length after the colon until the end.

An example of a valid input is as-df999 and that of an invalid input is asdf-:123.

Now, let's try solving this regex problem using a greedy quantifier regex:

    ^[a-z-]+[^:].*$

Unfortunately, this is not the right regex pattern because this regex will match both the aforementioned valid and invalid inputs. This is because of the backtracking behavior of the regex engine in greedy quantifiers. The [a-z-]+ pattern will find the longest possible match in the form of asdf-, but due to the negated character class pattern [^:] , the regex engine will backtrack one position to asdf and will match the next hyphen for [^:]. All the remaining text, that is, :123, will be matched using .*.

Let's try to solve this regex problem using the following possessive quantifier regex:

    ^[a-z-]++[^:].*$

This regex pattern will still match our valid input, but it will fail to match an invalid input because there is no backtracking in possessive quantifiers; hence, the regex engine will not go back any position after matching asdf- in the second example string. Since the next character is a colon and our regex sub-pattern is [^:], the regex engine will stop matching and correctly declare our invalid input a failed match.

Possessive quantifiers are good for the performance of the underlying regex engine because the engine does not have to keep any backtracking information in memory. The performance increase is even more when a regex fails to match because possessive quantifiers fail faster. So, remember that the benefit of possessive quantifiers is to improve the regex performance, especially when using nested quantifiers.

主站蜘蛛池模板: 英山县| 佛山市| 淄博市| 平乡县| 高尔夫| 游戏| 台东县| 伊吾县| 呈贡县| 封开县| 来安县| 潞城市| 玉溪市| 喀什市| 威海市| 府谷县| 长乐市| 宝丰县| 久治县| 灵璧县| 华坪县| 平凉市| 常山县| 昭觉县| 合水县| 新沂市| 莱西市| 上饶县| 莒南县| 龙山县| 华蓥市| 布拖县| 白河县| 庆城县| 遂溪县| 奉化市| 富裕县| 左云县| 那坡县| 渑池县| 赤城县|