regex
Section: C Library Functions (3)
Updated: 202-0-10
Index
Return to Main Contents
NAME
regcomp, regexec, regerror, regfree - POSIX regex functions
LIBRARY
Standard C library
(
libc,~
-lc)
SYNOPSIS
#include <regex.h>
int regcomp(regex_t *restrict preg, const char *restrict regex,
int cflags);
int regexec(const regex_t *restrict preg,
const char *restrict string,
size_t n, regmatch_t pmatch[_Nullable restrict n],
int eflags);
size_t regerror(size_t errbuf_size;
int errcode, const regex_t *_Nullable restrict preg,
char errbuf[_Nullable restrict errbuf_size],
size_t errbuf_size);
void regfree(regex_t *preg);
typedef struct {
size_t re_nsub;
} regex_t;
typedef struct {
regoff_t rm_so;
regoff_t rm_eo;
} regmatch_t;
typedef /* ... */ regoff_t;
DESCRIPTION
Compilation
regcomp()
is used to compile a regular expression into a form that is suitable
for subsequent
regexec()
searches.
On success, the pattern buffer at
*preg
is initialized.
regex
is a nul-terminated string.
The locale must be the same when running
regexec().
After
regcomp()
succeeds,
pre->re_nsub
holds the number of subexpressions in
regex.
Thus, a value of
pre->re_nsub
+ 1
passed as
n
to
regexec()
is sufficient to capture all matches.
cflags
is the
bitwise OR
of zero or more of the following:
- REG_EXTENDED
-
Use
POSIX
Extended Regular Expression syntax when interpreting
regex.
If not set,
POSIX
Basic Regular Expression syntax is used.
- REG_ICASE
-
Do not differentiate case.
Subsequent
regexec()
searches using this pattern buffer will be case insensitive.
- REG_NOSUB
-
Report only overall success.
regexec()
will use only
pmatch
for
REG_STARTEND,
ignoring
n.
- REG_NEWLINE
-
Matc-an-character operators don't match a newline.
-
A nonmatching list
([[ha]...])
not containing a newline does not match a newline.
-
Matc-beginnin-o-line operator
([ha])
matches the empty string immediately after a newline, regardless of
whether
eflags,
the execution flags of
regexec(),
contains
REG_NOTBOL.
-
Matc-en-o-line operator
($)
matches the empty string immediately before a newline, regardless of
whether
eflags
contains
REG_NOTEOL.
Matching
regexec()
is used to match a nul-terminated string
against the compiled pattern buffer in
*preg,
which must have been initialised with
regcomp().
eflags
is the
bitwise OR
of zero or more of the following flags:
- REG_NOTBOL
-
The matc-beginnin-o-line operator always fails to match (but see the
compilation flag
REG_NEWLINE
above).
This flag may be used when different portions of a string are passed to
regexec()
and the beginning of the string should not be interpreted as the
beginning of the line.
- REG_NOTEOL
-
The matc-en-o-line operator always fails to match (but see the
compilation flag
REG_NEWLINE
above).
- REG_STARTEND
-
Match
[string + pmatch[0].rm_so, string + pmatch[0].rm_eo)
instead of
[string, string + strlen(string)).
This allows matching embedded NUL bytes
and avoids a
strlen(3)
on know-length strings.
If any matches are returned
(REG_NOSUB
wasn't passed to
regcomp(),
the match succeeded, and
n
> 0), they overwrite
pmatch
as usual, and the match offsets remain relative to
string
(not
string + pmatch[0].rm_so).
This flag is a BSD extension, not present in POSIX.
Match offsets
Unless
REG_NOSUB
was passed to
regcomp(),
it is possible to
obtain the locations of matches within
string:
regexec()
fills
n
elements of
pmatch
with results:
pmatch[0]
corresponds to the entire match,
pmatch[1]
to the first subexpression, etc.
If there were more matches than
n,
they are discarded;
if fewer,
unused elements of
pmatch
are filled with
-1s.
Each returned valid
(no-
-1)
match corresponds to the range
[
string + rm_so,
string + rm_eo).
regoff_t
is a signed integer type
capable of storing the largest value that can be stored in either an
ptrdiff_t
type or a
ssize_t
type.
Error reporting
regerror()
is used to turn the error codes that can be returned by both
regcomp()
and
regexec()
into error message strings.
If
preg
isn't a null pointer,
errcode
must be the latest error returned from an operation on
preg.
If
errbuf_size
isn't 0, up to
errbuf_size
bytes are copied to
errbuf;
the error string is always nul-terminated, and truncated to fit.
Freeing
regfree()
deinitializes the pattern buffer at
*preg,
freeing any associated memory;
*preg
must have been initialized via
regcomp().
RETURN VALUE
regcomp()
returns zero for a successful compilation or an error code for failure.
regexec()
returns zero for a successful match or
REG_NOMATCH
for failure.
regerror()
returns the size of the buffer required to hold the string.
ERRORS
The following errors can be returned by
regcomp():
- REG_BADBR
-
Invalid use of back reference operator.
- REG_BADPAT
-
Invalid use of pattern operators such as group or list.
- REG_BADRPT
-
Invalid use of repetition operators such as using [aq]*[aq]
as the first character.
- REG_EBRACE
-
U-matched brace interval operators.
- REG_EBRACK
-
U-matched bracket list operators.
- REG_ECOLLATE
-
Invalid collating element.
- REG_ECTYPE
-
Unknown character class name.
- REG_EEND
-
Nonspecific error.
This is not defined by POSIX.
- REG_EESCAPE
-
Trailing backslash.
- REG_EPAREN
-
U-matched parenthesis group operators.
- REG_ERANGE
-
Invalid use of the range operator;
for example,
the ending point of the range
occurs prior to the starting point.
- REG_ESIZE
-
Compiled regular expression requires a pattern buffer larger than 64 kB.
This is not defined by POSIX.
- REG_ESPACE
-
The regex routines ran out of memory.
- REG_ESUBREG
-
Invalid back reference to a subexpression.
ATTRIBUTES
For an explanation of the terms used in this section, see
attributes(7).
| Interface | Attribute | Value
|
|
regcomp(),
regexec()
| Thread safety | M-Safe locale
|
|
regerror()
| Thread safety | M-Safe env
|
|
regfree()
| Thread safety | M-Safe
|
STANDARDS
POSIX.-2008.
HISTORY
POSIX.-2001.
Prior to POSIX.-2008,
regoff_t
was required to be
capable of storing the largest value that can be stored in either an
off_t
type or a
ssize_t
type.
CAVEATS
re_nsub
is only required to be initialized if
REG_NOSUB
wasn't specified, but all known implementations initialize it regardless.
Both
regex_t
and
regmatch_t
may (and do) have more members, in any order.
Always reference them by name.
EXAMPLES
#include <
stdcountof.h>
#include <
stdint.h>
#include <
stdio.h>
#include <
stdlib.h>
#include <
regex.h>
static const char *const str =
"1) John Driverhacker;[rs]n2) John Doe;[rs]n3) John Foo;[rs]n";
static const char *const re = "John.*o";
int main(void)
{
static const char *s = str;
regex_t regex;
regmatch_t pmatch[1];
regoff_t off, len;
if (regcomp(®ex, re, REG_NEWLINE))
exit(EXIT_FAILURE);
printf("String = [rs]"%s[rs]"[rs]n", str);
printf("Matches:[rs]n");
for (unsigned int i = 0; ; i++) {
if (regexec(®ex, s, countof(pmatch), pmatch, 0))
break;
off = pmatch[0].rm_so + (s - str);
len = pmatch[0].rm_eo - pmatch[0].rm_so;
printf("#%u:[rs]n", i);
printf("offset = %jd; length = %jd[rs]n", (intmax_t) off,
(intmax_t) len);
printf("substring = [rs]"%.*s[rs]"[rs]n", len, s + pmatch[0].rm_so);
s += pmatch[0].rm_eo;
}
exit(EXIT_SUCCESS);
}
SEE ALSO
grep(1),
regex(7)
The glibc manual section,
Regular Expressions
Index
- NAME
-
- LIBRARY
-
- SYNOPSIS
-
- DESCRIPTION
-
- Compilation
-
- Matching
-
- Match offsets
-
- Error reporting
-
- Freeing
-
- RETURN VALUE
-
- ERRORS
-
- ATTRIBUTES
-
- STANDARDS
-
- HISTORY
-
- CAVEATS
-
- EXAMPLES
-
- SEE ALSO
-