/* Void Main's man pages */

{ phpMan } else { main(); }

Command: man perldoc info search(apropos)  


REGCOMP(3P)                                         POSIX Programmer's Manual                                        REGCOMP(3P)



PROLOG
       This  manual  page is part of the POSIX Programmer's Manual.  The Linux implementation of this interface may differ (con-
       sult the corresponding Linux manual page for details of Linux behavior), or the  interface  may  not  be  implemented  on
       Linux.

NAME
       regcomp, regerror, regexec, regfree - regular expression matching

SYNOPSIS
       #include <regex.h>

       int regcomp(regex_t *restrict preg, const char *restrict pattern,
              int cflags);
       size_t regerror(int errcode, const regex_t *restrict preg,
              char *restrict errbuf, size_t errbuf_size);
       int regexec(const regex_t *restrict preg, const char *restrict string,
              size_t nmatch, regmatch_t pmatch[restrict], int eflags);
       void regfree(regex_t *preg);


DESCRIPTION
       These  functions  interpret  basic  and  extended  regular  expressions  as  described  in the Base Definitions volume of
       IEEE Std 1003.1-2001, Chapter 9, Regular Expressions.

       The regex_t structure is defined in <regex.h> and contains at least the following member:

                                   Member Type  Member Name  Description
                                   size_t       re_nsub      Number of parenthesized subexpressions.

       The regmatch_t structure is defined in <regex.h> and contains at least the following members:

                                   Member Type Member Name Description
                                   regoff_t    rm_so       Byte offset from start of string to
                                                           start of substring.
                                   regoff_t    rm_eo       Byte offset from start of string of the
                                                           first character after the end of sub-
                                                           string.

       The  regcomp()  function  shall compile the regular expression contained in the string pointed to by the pattern argument
       and place the results in the structure pointed to by preg.  The cflags argument is the bitwise-inclusive OR  of  zero  or
       more of the following flags, which are defined in the <regex.h> header:

       REG_EXTENDED
              Use Extended Regular Expressions.

       REG_ICASE
              Ignore case in match. (See the Base Definitions volume of IEEE Std 1003.1-2001, Chapter 9, Regular Expressions.)

       REG_NOSUB
              Report only success/fail in regexec().

       REG_NEWLINE
              Change the handling of <newline>s, as described in the text.


       The default regular expression type for pattern is a Basic Regular Expression. The application can specify Extended Regu-
       lar Expressions using the REG_EXTENDED cflags flag.

       If the REG_NOSUB flag was not set in cflags, then regcomp() shall set re_nsub to the number of  parenthesized  subexpres-
       sions (delimited by "\(\)" in basic regular expressions or "()" in extended regular expressions) found in pattern.

       The  regexec() function compares the null-terminated string specified by string with the compiled regular expression preg
       initialized by a previous call to regcomp().  If it finds a match, regexec() shall return 0; otherwise, it  shall  return
       non-zero  indicating  either no match or an error. The eflags argument is the bitwise-inclusive OR of zero or more of the
       following flags, which are defined in the <regex.h> header:

       REG_NOTBOL
              The first character of the string pointed to by string is not the beginning of the line. Therefore, the circumflex
              character ( '^' ), when taken as a special character, shall not match the beginning of string.

       REG_NOTEOL
              The last character of the string pointed to by string is not the end of the line. Therefore, the dollar sign ( '$'
              ), when taken as a special character, shall not match the end of string.


       If nmatch is 0 or REG_NOSUB was set in the cflags argument to regcomp(), then regexec() shall ignore the pmatch argument.
       Otherwise,  the  application  shall ensure that the pmatch argument points to an array with at least nmatch elements, and
       regexec() shall fill in the elements of that array with offsets of the substrings of string that correspond to the paren-
       thesized  subexpressions  of  pattern:  pmatch[ i]. rm_so shall be the byte offset of the beginning and pmatch[ i]. rm_eo
       shall be one greater than the byte offset of the end of substring i. (Subexpression i begins  at  the  ith  matched  open
       parenthesis, counting from 1.) Offsets in pmatch[0] identify the substring that corresponds to the entire regular expres-
       sion. Unused elements of pmatch up to pmatch[ nmatch-1] shall be filled with -1. If there are more than nmatch subexpres-
       sions  in  pattern ( pattern itself counts as a subexpression), then regexec() shall still do the match, but shall record
       only the first nmatch substrings.

       When matching a basic or extended regular expression, any given parenthesized subexpression of pattern might  participate
       in  the match of several different substrings of string, or it might not match any substring even though the pattern as a
       whole did match. The following rules shall be used to determine which substrings to report in pmatch when matching  regu-
       lar expressions:

        1. If  subexpression i in a regular expression is not contained within another subexpression, and it participated in the
           match several times, then the byte offsets in pmatch[ i] shall delimit the last such match.

        2. If subexpression i is not contained within another subexpression, and it did not participate in an otherwise success-
           ful match, the byte offsets in pmatch[ i] shall be -1. A subexpression does not participate in the match when: '*' or
           "\{\}" appears immediately after the subexpression in a basic regular expression, or '*', '?', or "{}" appears  imme-
           diately  after  the  subexpression  in an extended regular expression, and the subexpression did not match (matched 0
           times)

       or: '|' is used in an extended regular expression to select this subexpression or another, and  the  other  subexpression
       matched.

        3. If subexpression i is contained within another subexpression j, and i is not contained within any other subexpression
           that is contained within j, and a match of subexpression j is reported in pmatch[ j], then the match or non-match  of
           subexpression  i  reported in pmatch[ i] shall be as described in 1. and 2.  above, but within the substring reported
           in pmatch[ j] rather than the whole string. The offsets in pmatch[ i] are still relative to the start of string.

        4. If subexpression i is contained in subexpression j, and the byte offsets in pmatch[ j] are -1, then the  pointers  in
           pmatch[ i] shall also be -1.

        5. If subexpression i matched a zero-length string, then both byte offsets in pmatch[ i] shall be the byte offset of the
           character or null terminator immediately following the zero-length string.

       If, when regexec() is called, the locale is different from when the regular expression was compiled, the result is  unde-
       fined.

       If  REG_NEWLINE is not set in cflags, then a <newline> in pattern or string shall be treated as an ordinary character. If
       REG_NEWLINE is set, then <newline> shall be treated as an ordinary character except as follows:

        1. A <newline> in string shall not be matched by a period outside a bracket expression or by any form of a  non-matching
           list (see the Base Definitions volume of IEEE Std 1003.1-2001, Chapter 9, Regular Expressions).

        2. A  circumflex  (  '^'  )  in  pattern,  when used to specify expression anchoring (see the Base Definitions volume of
           IEEE Std 1003.1-2001, Section 9.3.8, BRE Expression Anchoring), shall match the zero-length string immediately  after
           a <newline> in string, regardless of the setting of REG_NOTBOL.

        3. A dollar sign ( '$' ) in pattern, when used to specify expression anchoring, shall match the zero-length string imme-
           diately before a <newline> in string, regardless of the setting of REG_NOTEOL.

       The regfree() function frees any memory allocated by regcomp() associated with preg.

       The following constants are defined as error return values:

       REG_NOMATCH
              regexec() failed to match.

       REG_BADPAT
              Invalid regular expression.

       REG_ECOLLATE
              Invalid collating element referenced.

       REG_ECTYPE
              Invalid character class type referenced.

       REG_EESCAPE
              Trailing '\' in pattern.

       REG_ESUBREG
              Number in "\digit" invalid or in error.

       REG_EBRACK
              "[]" imbalance.

       REG_EPAREN
              "\(\)" or "()" imbalance.

       REG_EBRACE
              "\{\}" imbalance.

       REG_BADBR
              Content of "\{\}" invalid: not a number, number too large, more than two numbers, first larger than second.

       REG_ERANGE
              Invalid endpoint in range expression.

       REG_ESPACE
              Out of memory.

       REG_BADRPT
              '?', '*', or '+' not preceded by valid regular expression.


       The regerror() function provides a mapping from error codes returned by regcomp() and regexec() to unspecified  printable
       strings.  It generates a string corresponding to the value of the errcode argument, which the application shall ensure is
       the last non-zero value returned by regcomp() or regexec() with the given value of preg. If errcode is not such a  value,
       the content of the generated string is unspecified.

       If  preg  is a null pointer, but errcode is a value returned by a previous call to regexec() or regcomp(), the regerror()
       still generates an error string corresponding to the value of errcode, but it might not be as detailed under some  imple-
       mentations.

       If  the  errbuf_size  argument  is not 0, regerror() shall place the generated string into the buffer of size errbuf_size
       bytes pointed to by errbuf. If the string (including the terminating null) cannot fit in  the  buffer,  regerror()  shall
       truncate the string and null-terminate the result.

       If  errbuf_size  is  0, regerror() shall ignore the errbuf argument, and return the size of the buffer needed to hold the
       generated string.

       If the preg argument to regexec() or regfree() is not a compiled regular expression returned by regcomp(), the result  is
       undefined. A preg is no longer treated as a compiled regular expression after it is given to regfree().

RETURN VALUE
       Upon successful completion, the regcomp() function shall return 0. Otherwise, it shall return an integer value indicating
       an error as described in <regex.h>, and the content of preg is undefined. If a code is returned, the interpretation shall
       be as given in <regex.h>.

       If regcomp() detects an invalid RE, it may return REG_BADPAT, or it may return one of the error codes that more precisely
       describes the error.

       Upon successful completion, the regexec() function shall return 0. Otherwise, it shall return REG_NOMATCH to indicate  no
       match.

       Upon  successful completion, the regerror() function shall return the number of bytes needed to hold the entire generated
       string, including the null termination. If the return value is greater than errbuf_size, the string returned in the  buf-
       fer pointed to by errbuf has been truncated.

       The regfree() function shall not return a value.

ERRORS
       No errors are defined.

       The following sections are informative.

EXAMPLES
              #include <regex.h>


              /*
               * Match string against the extended regular expression in
               * pattern, treating errors as no match.
               *
               * Return 1 for match, 0 for no match.
               */


              int
              match(const char *string, char *pattern)
              {
                  int    status;
                  regex_t    re;


                  if (regcomp(&re, pattern, REG_EXTENDED|REG_NOSUB) != 0) {
                      return(0);      /* Report error. */
                  }
                  status = regexec(&re, string, (size_t) 0, NULL, 0);
                  regfree(&re);
                  if (status != 0) {
                      return(0);      /* Report error. */
                  }
                  return(1);
              }

       The  following  demonstrates  how  the REG_NOTBOL flag could be used with regexec() to find all substrings in a line that
       match a pattern supplied by a user. (For simplicity of the example, very little error checking is done.)


              (void) regcomp (&re, pattern, 0);
              /* This call to regexec() finds the first match on the line. */
              error = regexec (&re, &buffer[0], 1, &pm, 0);
              while (error == 0) {  /* While matches found. */
                  /* Substring found between pm.rm_so and pm.rm_eo. */
                  /* This call to regexec() finds the next match. */
                  error = regexec (&re, buffer + pm.rm_eo, 1, &pm, REG_NOTBOL);
              }

APPLICATION USAGE
       An application could use:


              regerror(code,preg,(char *)NULL,(size_t)0)

       to find out how big a buffer is needed for the generated string, malloc() a buffer to hold  the  string,  and  then  call
       regerror()  again  to  get the string. Alternatively, it could allocate a fixed, static buffer that is big enough to hold
       most strings, and then use malloc() to allocate a larger buffer if it finds that this is too small.

       To match a pattern as described in the Shell and Utilities volume of IEEE Std 1003.1-2001, Section 2.13, Pattern Matching
       Notation, use the fnmatch() function.

RATIONALE
       The  regexec()  function must fill in all nmatch elements of pmatch, where nmatch and pmatch are supplied by the applica-
       tion, even if some elements of pmatch do not correspond to subexpressions in pattern. The application writer should  note
       that there is probably no reason for using a value of nmatch that is larger than preg-> re_nsub+1.

       The  REG_NEWLINE flag supports a use of RE matching that is needed in some applications like text editors. In such appli-
       cations, the user supplies an RE asking the application to find a line that matches the given expression.  An  anchor  in
       such  an  RE  anchors at the beginning or end of any line. Such an application can pass a sequence of <newline>-separated
       lines to regexec() as a single long string and specify REG_NEWLINE to regcomp() to get the desired behavior. The applica-
       tion  must  ensure  that there are no explicit <newline>s in pattern if it wants to ensure that any match occurs entirely
       within a single line.

       The REG_NEWLINE flag affects the behavior of regexec(), but it is in the cflags parameter to regcomp() to allow flexibil-
       ity  of  implementation.  Some  implementations will want to generate the same compiled RE in regcomp() regardless of the
       setting of REG_NEWLINE and have regexec() handle anchors differently based on the setting of the flag. Other  implementa-
       tions will generate different compiled REs based on the REG_NEWLINE.

       The  REG_ICASE  flag supports the operations taken by the grep -i option and the historical implementations of ex and vi.
       Including this flag will make it easier for application code to be written that does the same thing as these utilities.

       The substrings reported in pmatch[] are defined using offsets from the start of the string rather  than  pointers.  Since
       this  is  a new interface, there should be no impact on historical implementations or applications, and offsets should be
       just as easy to use as pointers. The change to offsets was made to facilitate future extensions in which the string to be
       searched is presented to regexec() in blocks, allowing a string to be searched that is not all in memory at once.

       The  type  regoff_t  is used for the elements of pmatch[] to ensure that the application can represent either the largest
       possible  array  in  memory  (important  for  an  application  conforming  to  the  Shell   and   Utilities   volume   of
       IEEE Std 1003.1-2001)  or  the  largest  possible  file (important for an application using the extension where a file is
       searched in chunks).

       The standard developers rejected the inclusion of a regsub() function that would  be  used  to  do  substitutions  for  a
       matched  RE.  While  such a routine would be useful to some applications, its utility would be much more limited than the
       matching function described here. Both RE parsing and substitution are possible to implement without support  other  than
       that  required  by  the  ISO C standard, but matching is much more complex than substituting.  The only difficult part of
       substitution, given the information supplied by regexec(), is finding the next character in a string when  there  can  be
       multi-byte characters. That is a much larger issue, and one that needs a more general solution.

       The errno variable has not been used for error returns to avoid filling the errno name space for this feature.

       The interface is defined so that the matched substrings rm_sp and rm_ep are in a separate regmatch_t structure instead of
       in regex_t. This allows a single compiled RE to be used simultaneously in several contexts; in main() and a  signal  han-
       dler,  perhaps,  or  in  multiple threads of lightweight processes. (The preg argument to regexec() is declared with type
       const, so the implementation is not permitted to use the structure to store intermediate  results.)  It  also  allows  an
       application  to  request an arbitrary number of substrings from an RE. The number of subexpressions in the RE is reported
       in re_nsub in preg.  With this change to regexec(), consideration was given to dropping the REG_NOSUB flag since the user
       can  now  specify  this with a zero nmatch argument to regexec().  However, keeping REG_NOSUB allows an implementation to
       use a different (perhaps more efficient) algorithm if it knows in regcomp() that no subexpressions need be reported.  The
       implementation  is only required to fill in pmatch if nmatch is not zero and if REG_NOSUB is not specified. Note that the
       size_t type, as defined in the ISO C standard, is unsigned, so the description of regexec() does not need to address neg-
       ative values of nmatch.

       REG_NOTBOL  was added to allow an application to do repeated searches for the same pattern in a line. If the pattern con-
       tains a circumflex character that should match the beginning of a line, then the pattern should only match  when  matched
       against  the  beginning of the line. Without the REG_NOTBOL flag, the application could rewrite the expression for subse-
       quent matches, but in the general case this would require parsing the expression. The  need  for  REG_NOTEOL  is  not  as
       clear; it was added for symmetry.

       The  addition of the regerror() function addresses the historical need for conforming application programs to have access
       to error information more than "Function failed to compile/match your RE for unknown reasons".

       This interface provides for two different methods of dealing with error conditions. The specific error codes (REG_EBRACE,
       for  example),  defined  in <regex.h>, allow an application to recover from an error if it is so able. Many applications,
       especially those that use patterns supplied by a user, will not try to deal with specific error cases, but will just  use
       regerror() to obtain a human-readable error message to present to the user.

       The regerror() function uses a scheme similar to confstr() to deal with the problem of allocating memory to hold the gen-
       erated string. The scheme used by strerror() in the ISO C standard was considered unacceptable since it creates difficul-
       ties for multi-threaded applications.

       The  preg argument is provided to regerror() to allow an implementation to generate a more descriptive message than would
       be possible with errcode alone. An implementation might, for example, save the character offset of the offending  charac-
       ter of the pattern in a field of preg, and then include that in the generated message string. The implementation may also
       ignore preg.

       A REG_FILENAME flag was considered, but omitted. This flag caused regexec() to match patterns as described in  the  Shell
       and Utilities volume of IEEE Std 1003.1-2001, Section 2.13, Pattern Matching Notation instead of REs. This service is now
       provided by the fnmatch() function.

       Notice that there is a difference in philosophy between the ISO POSIX-2:1993 standard and IEEE Std 1003.1-2001 in how  to
       handle  a  "bad"  regular  expression.  The  ISO POSIX-2:1993  standard  says that many bad constructs "produce undefined
       results", or that "the interpretation is undefined". IEEE Std 1003.1-2001, however, says that the interpretation of  such
       REs  is  unspecified.  The  term "undefined" means that the action by the application is an error, of similar severity to
       passing a bad pointer to a function.

       The regcomp() and regexec() functions are required to accept any null-terminated string as the pattern argument.  If  the
       meaning of the string is "undefined", the behavior of the function is "unspecified".  IEEE Std 1003.1-2001 does not spec-
       ify how the functions will interpret the pattern; they might return error codes, or they might  do  pattern  matching  in
       some completely unexpected way, but they should not do something like abort the process.

FUTURE DIRECTIONS
       None.

SEE ALSO
       fnmatch(), glob(), Shell and Utilities volume of IEEE Std 1003.1-2001, Section 2.13, Pattern Matching Notation, Base Def-
       initions volume of IEEE Std 1003.1-2001, Chapter 9, Regular Expressions, <regex.h>, <sys/types.h>

COPYRIGHT
       Portions of this text are reprinted and reproduced in electronic form from IEEE Std 1003.1, 2003  Edition,  Standard  for
       Information  Technology -- Portable Operating System Interface (POSIX), The Open Group Base Specifications Issue 6, Copy-
       right (C) 2001-2003 by the Institute of Electrical and Electronics Engineers, Inc and The Open Group. In the event of any
       discrepancy  between this version and the original IEEE and The Open Group Standard, the original IEEE and The Open Group
       Standard  is  the  referee   document.   The   original   Standard   can   be   obtained   online   at   http://www.open-
       group.org/unix/online.html .



IEEE/The Open Group                                           2003                                                   REGCOMP(3P)

Valid XHTML 1.0!Valid CSS!