
| Easily create and understand regular expressions today. Compose and analyze regex patterns with RegexBuddy's easy-to-grasp regex blocks and intuitive regex tree, instead of or in combination with the traditional regex syntax. Developed by the author of this website, RegexBuddy makes learning and using regular expressions easier than ever. Get your own copy of RegexBuddy now |
All modern regular expression engines support capturing groups, which are numbered from left to right, starting with one. The numbers can then be used in backreferences to match the same text again in the regular expression, or to use part of the regex match for further processing. In a complex regular expression with many capturing groups, the numbering can get a little confusing.
Python's regex module was the first to offer a solution: named capture. By assigning a name to a capturing group, you can easily reference it by name. (?P<name>group) captures the match of group into the backreference "name". You can reference the contents of the group with the numbered backreference \1 or the named backreference (?P=name).
The open source PCRE library has followed Python's example, and offers named capture using the same syntax. The PHP preg functions offer the same functionality, since they are based on PCRE.
Python's sub() function allows you to reference a named group as \1 or \g<name>. This does not work in PHP. In PHP, you can use double-quoted string interpolation with the $regs parameter you passed to pcre_match(): $regs['name'].
The regular expression classes of the .NET framework also support named capture. Unfortunately, the Microsoft developers decided to invent their own syntax, rather than follow the one pioneered by Python. Currently, no other regex flavor supports Microsoft's version of named capture.
Here is an example with two capturing groups in .NET style: (?<first>group)(?'second'group). As you can see, .NET offers two syntaxes to create a capturing group: one using sharp brackets, and the other using single quotes. The first syntax is preferable in strings, where single quotes may need to be escaped. The second syntax is preferable in ASP code, where the sharp brackets are used for HTML tags. You can use the pointy bracket flavor and the quoted flavors interchangeably.
To reference a capturing group inside the regex, use \k<name> or \k'name'. Again, you can use the two syntactic variations interchangeably.
When doing a search-and-replace, you can reference the named group with the familiar dollar sign syntax: ${name}. Simply use a name instead of a number between the curly braces.
The .NET framework allows multiple groups in the regular expression to have the same name. If you do so, both groups will store their matches in the same Group object. You won't be able to distinguish which group captured the text. This can be useful in regular expressions with multiple alternatives to match the same thing. E.g. if you want to match "a" followed by a digit 0..5, or "b" followed by a digit 4..7, and you only care about the digit, you could use the regex a(?'digit'[0-5])|b(?'digit'[4-7]). The group named "digit" will then give you the digit 0..7 that was matched, regardless of the letter.
Python and PCRE do not allow multiple groups to use the same name. Doing so will give a regex compilation error.
Here is where things get a bit ugly. Python and PCRE treat named capturing groups just like unnamed capturing groups, and number both kinds from left to right, starting with one. The regex (a)(?P<x>b)(c)(?P<y>d) matches abcd as expected. If you do a search-and-replace with this regex and the replacement \1\2\3\4, you will get abcd. All four groups were numbered from left to right, from one till four. Easy and logical.
Things are quite a bit more complicated with the .NET framework. The regex (a)(?<x>b)(c)(?<y>d) again matches abcd. However, if you do a search-and-replace with $1$2$3$4 as the replacement, you will get acbd. Probably not what you expected.
The .NET framework does number named capturing groups from left to right, but numbers them after all the unnamed groups have been numbered. So the unnamed groups (a) and (c) get numbered first, from left to right, starting at one. Then the named groups (?<x>b) and (?<y>d) get their numbers, continuing from the unnamed groups, in this case: three.
To make things simple, when using .NET's regex support, just assume that named groups do not get numbered at all, and reference them by name exclusively. To keep things compatible across regex flavors, I strongly recommend that you do not mix named and unnamed capturing groups at all. Either give a group a name, or make it non-capturing as in (?:nocapture). Non-capturing groups are more efficient, since the regex engine does not need to keep track of their matches.
The JGsoft regex engine supports both .NET-style and Python-style named capture. Python-style named groups are numbered along unnamed ones, like Python does. .NET-style named groups are numbered afterwards. You can mix both styles in the same regex. The JGsoft engine allows multiple groups to use the same name, regardless of the syntax used.
Did this website just save you a trip to the bookstore? Please make a donation to support this site, and you'll get a lifetime of advertisement-free access to this site!
Page URL: http://www.Regular-Expressions.info/named.html
Page last updated: 27 January 2008
Site last updated: 06 June 2008
Copyright © 2003-2008 Jan Goyvaerts. All rights reserved.
| More Information |
| Introduction |
| Quick Start |
| Tutorial |
| Tools and Languages |
| Examples |
| Books |
| Reference |
| Print PDF |
| About This Site |
| RSS Feed |
| PowerGREP 3 |
| Use regular expressions to search through large numbers of text and binary files, such as source code, correspondence, server or system logs, reference texts, archives, etc. Quickly find the files you are looking for, or extract the information you need. Look through just a handful of files, or thousands of files and folders. |
| Perform comprehensive text and binary replacement operations for easy maintenance of websites, source code, reports, etc. Preview replacements before modifying files, and stay safe with flexible backup and undo options. |
| Work with plain text files, Unicode files, binary files, files stored in zip archives, and even MS Word documents, Excel spreadsheets and PDF files. Runs on Windows 98, ME, NT4, 2000, XP & Vista. |
| More information |
| Download PowerGREP now |