I'm stumped that with a regular expression like: "((blah)*(xxx))+" That I can't seem to get at the second occurrence of ((blah)*(xxx)) should it exist, or the second embedded xxx. The tag name: span. 9. It can be used with multiple captured parts. 'between-open'c)+ to the string ooccc. Let’s add parentheses for them: <(([a-z]+)\s*([^>]*))>. Luckily, the non-standard attributes aren't stripped out of the DOM when I insert the HTML, so it will work for my purposes. With the negative lookahead (?:(?!src). 6. 8. In your case, that means it's group #1 that captures the whole src="value" sequence, and group #2 that captures just the value part. Inside a character class, different rules apply. Before the engine can enter this balancing group, it must check whether the subtracted group “open” has captured … (? I'm thinking that regular expressions by themselves can't do exactly what I'm looking for, so here's my modification to work around the problem: Before, I wanted to avoid setting non-standard attributes on the replacement span. In Part IIthe balancing group is explained in depth and it is applied to a couple of concrete examples. As a result, JavaScript can never be found, just because Java is checked first. This becomes important when capturing groups are nested. Update: As @morja pointed out your solution is to move the first .*? This time, \1 matches one as captured by the last iteration … If you are an experienced RegEx developer, please feel free to go forward to the part "The Push-down Automata." The most complete solution that will work in the vast majority of cases is using a capturing group with a lazy dot matching pattern. Two substrings per match are necessarily captured and saved; these are useless to you. When nested references are supported, this regex also matches oneonetwo. This is usually just the order of the capturing groups themselves. For instance, when searching a tag in we may be interested in: The tag content as a whole: span class="my". YES: YES: YES: YES: YES: YES: YES: YES: YES: YES: YES: YES: YES JavaScript VBScript XRegExp Python Ruby std::regex Boost Tcl ARE POSIX BRE POSIX ERE GNU BRE GNU ERE Oracle XML XPath; Character class [When used outside a character class, [begins a character class. Nesting of groups is irrelevant; their numbering is determined strictly by the positions of their opening parentheses within the regex. In this part, I'll study the balancing group and the .NET Regexclass and related objects - again using nested constructions as my main focus. I don’t use PCRE much, as I generally use the real thing ;), but PCRE’s docs show the same as Perl’s: SUBPATTERNS. You can still take a look, but it might be a bit quirky. Solve the above task to continue on to the next problem, or read the. Learn the simplicity of lazy and greedy matching. Let’s apply the regex (?'open'o)+(? Just for completeness: /