PCRE — Perl-compatible regular expressions
In normal use of PCRE, if the subject string that is
      passed to pcre_exec() or pcre_dfa_exec() matches as
      far as it goes, but is too short to match the entire pattern,
      PCRE_ERROR_NOMATCH is returned. There are circumstances where
      it might be helpful to distinguish this case from other cases
      in which there is no match.
Consider, for example, an application where a human is
      required to type in data for a field with specific formatting
      requirements. An example might be a date in the form
      ddmmmyy, defined by
      this pattern:
^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$
If the application sees the user's keystrokes one by one, and can check that what has been typed so far is potentially valid, it is able to raise an error as soon as a mistake is made, possibly beeping and not reflecting the character that has been typed. This immediate feedback is likely to be a better user interface than a check that is delayed until the entire string has been entered.
PCRE supports the concept of partial matching by means of
      the PCRE_PARTIAL option, which can be set when calling
      pcre_exec() or
      pcre_dfa_exec().
      When this flag is set for pcre_exec(), the return code
      PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if at
      any time during the matching process the last part of the
      subject string matched part of the pattern. Unfortunately,
      for non-anchored matching, it is not possible to obtain the
      position of the start of the partial match. No captured data
      is set when PCRE_ERROR_PARTIAL is returned.
When PCRE_PARTIAL is set for pcre_dfa_exec(), the return
      code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL
      if the end of the subject is reached, there have been no
      complete matches, but there is still at least one matching
      possibility. The portion of the string that provided the
      partial match is set as the first matching string.
Using PCRE_PARTIAL disables one of PCRE's optimizations. PCRE remembers the last literal byte in a pattern, and abandons matching immediately if such a byte is not present in the subject string. This optimization cannot be used for a subject string that might match only partially.
Because of the way certain internal optimizations are
      implemented in the pcre_exec() function, the
      PCRE_PARTIAL option cannot be used with all patterns. These
      restrictions do not apply when pcre_dfa_exec() is used. For
      pcre_exec(),
      repeated single characters such as
a{2,4}
      and repeated single metasequences such as
\d+
are not permitted if the maximum number of occurrences is greater than one. Optional items such as \d? (where the maximum is one) are permitted. Quantifiers with any values are permitted after parentheses, so the invalid examples above can be coded thus:
 (a){2,4}
 (\d)+
      These constructions run more slowly, but for the kinds of application that are envisaged for this facility, this is not felt to be a major restriction.
If PCRE_PARTIAL is set for a pattern that does not conform
      to the restrictions, pcre_exec() returns the error
      code PCRE_ERROR_BADPARTIAL (-13). You can use the
      PCRE_INFO_OKPARTIAL call to pcre_fullinfo() to find out
      if a compiled pattern can be used for partial matching.
If the escape sequence \P is present in a pcretest data line, the PCRE_PARTIAL flag is used for the match. Here is a run of pcretest that uses the date example quoted above:
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ data> 25jun04\P 0: 25jun04 1: jun data> 25dec3\P Partial match data> 3ju\P Partial match data> 3juj\P No match data> j\P No match
The first data string is matched completely, so
      pcretest shows
      the matched substrings. The remaining four strings do not
      match the complete pattern, but the first two are partial
      matches. The same test, using pcre_dfa_exec() matching (by
      means of the \D escape sequence), produces the following
      output:
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ data> 25jun04\P\D 0: 25jun04 data> 23dec3\P\D Partial match: 23dec3 data> 3ju\P\D Partial match: 3ju data> 3juj\P\D No match data> j\P\D No match
Notice that in this case the portion of the string that was matched is made available.
When a partial match has been found using pcre_dfa_exec(), it is
      possible to continue the match by providing additional
      subject data and calling pcre_dfa_exec() again with
      the same compiled regular expression, this time setting the
      PCRE_DFA_RESTART option. You must also pass the same working
      space as before, because this is where details of the
      previous partial match are stored. Here is an example using
      pcretest, using
      the \R escape sequence to set the PCRE_DFA_RESTART option (\P
      and \D are as above):
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ data> 23ja\P\D Partial match: 23ja data> n05\R\D 0: n05
The first call has "23ja" as the subject, and requests partial matching; the second call has "n05" as the subject for the continued (restarted) match. Notice that when the match is complete, only the last part is shown; PCRE does not retain the previously partially-matched string. It is up to the calling program to do that if it needs to.
You can set PCRE_PARTIAL with PCRE_DFA_RESTART to continue
      partial matching over multiple segments. This facility can be
      used to pass very long subject strings to pcre_dfa_exec(). However,
      some care is needed for certain types of pattern.
1. If the pattern contains tests for the beginning or end of a line, you need to pass the PCRE_NOTBOL or PCRE_NOTEOL options, as appropriate, when the subject string for any call does not contain the beginning or end of a line.
2. If the pattern contains backward assertions (including \b or \B), you need to arrange for some overlap in the subject strings to allow for this. For example, you could pass the subject in chunks that are 500 bytes long, but in a buffer of 700 bytes, with the starting offset set to 200 and the previous 200 bytes at the start of the buffer.
3. Matching a subject string that is split into multiple
      segments does not always produce exactly the same result as
      matching over one single long string. The difference arises
      when there are multiple matching possibilities, because a
      partial match result is given only when there are no
      completed matches in a call to pcre_dfa_exec(). This means
      that as soon as the shortest match has been found,
      continuation to a new subject segment is no longer possible.
      Consider this pcretest example:
re> /dog(sbody)?/ data> do\P\D Partial match: do data> gsb\R\P\D 0: g data> dogsbody\D 0: dogsbody 1: dog
The pattern matches the words "dog" or "dogsbody". When the subject is presented in several parts ("do" and "gsb" being the first two) the match stops when "dog" has been found, and it is not possible to continue. On the other hand, if "dogsbody" is presented as a single string, both matches are found.
Because of this phenomenon, it does not usually make sense to end a pattern that is going to be matched in this way with a variable repeat.
4. Patterns that contain alternatives at the top level which do not all start with the same pattern item may not work as expected. For example, consider this pattern:
1234|3789
If the first part of the subject is "ABC123", a partial match of the first alternative is found at offset 3. There is no partial match for the second alternative, because such a match does not start at the same point in the subject string. Attempting to continue with the string "789" does not yield a match because only those alternatives that match at one point in the subject are remembered. The problem arises because the start of the second alternative matches within the first alternative. There is no problem with anchored patterns or patterns such as:
1234|ABCD
where no string can be a partial match for both alternatives.
Last updated: 04 June 2007 Copyright (c) 1997-2007 University of Cambridge.
| COPYRIGHT | 
|---|
| This manual page is taken from the PCRE library, which is distributed under the BSD license. |