sync code with last improvements from OpenBSD
This commit is contained in:
commit
88965415ff
26235 changed files with 29195616 additions and 0 deletions
121
app/xedit/lisp/re/README
Normal file
121
app/xedit/lisp/re/README
Normal file
|
@ -0,0 +1,121 @@
|
|||
$XFree86: xc/programs/xedit/lisp/re/README,v 1.3 2002/09/23 01:25:41 paulo Exp $
|
||||
|
||||
LAST UPDATED: $Date: 2006/11/25 20:35:00 $
|
||||
|
||||
This is a small regex library for fast matching tokens in text. It was built
|
||||
to be used by xedit and it's syntax highlight code. It is not compliant with
|
||||
IEEE Std 1003.2, but is expected to be used where very fast matching is
|
||||
required, and exotic patterns will not be used.
|
||||
|
||||
To understand what kind of patterns this library is expected to be used with,
|
||||
see the file <XRoot>xc/programs/xedit/lisp/modules/progmodes/c.lsp and some
|
||||
samples in the file tests.txt, with comments for patterns that will not work,
|
||||
or may give incorrect results.
|
||||
|
||||
The library is not built upon the standard regex library by Henry Spencer,
|
||||
but is completely written from scratch, but it's syntax is heavily based on
|
||||
that library, and the only reason for it to exist is that unfortunately
|
||||
the standard version does not fit the requirements needed by xedit.
|
||||
Anyways, I would like to thanks Henry for his regex library, it is a really
|
||||
very useful tool.
|
||||
|
||||
Small description of understood tokens:
|
||||
|
||||
M A T C H I N G
|
||||
------------------------------------------------------------------------
|
||||
. Any character (won't match newline if compiled with RE_NEWLINE)
|
||||
\w Any word letter (shortcut to [a-zA-Z0-9_]
|
||||
\W Not a word letter (shortcut to [^a-zA-Z0-9_]
|
||||
\d Decimal number
|
||||
\D Not a decimal number
|
||||
\s A space
|
||||
\S Not a space
|
||||
\l A lower case letter
|
||||
\u An upper case letter
|
||||
\c A control character, currently the range 1-32 (minus tab)
|
||||
\C Not a control character
|
||||
\o Octal number
|
||||
\O Not an octal number
|
||||
\x Hexadecimal number
|
||||
\X Not an hexadecimal number
|
||||
\< Beginning of a word (matches an empty string)
|
||||
\> End of a word (matches an empty string)
|
||||
^ Beginning of a line (matches an empty string)
|
||||
$ End of a line (matches an empty string)
|
||||
[...] Matches one of the characters inside the brackets
|
||||
ranges are specified separating two characters with "-".
|
||||
If the first character is "^", matches only if the
|
||||
character is not in this range. To add a "]" make it
|
||||
the first character, and to add a "-" make it the last.
|
||||
\1 to \9 Backreference, matches the text that was matched by a group,
|
||||
that is, text that was matched by the pattern inside
|
||||
"(" and ")".
|
||||
|
||||
|
||||
O P E R A T O R S
|
||||
------------------------------------------------------------------------
|
||||
() Any pattern inside works as a backreference, and is also
|
||||
used to group patterns.
|
||||
| Alternation, allows choosing different possibilities, like
|
||||
character ranges, but allows patterns of different lengths.
|
||||
|
||||
|
||||
R E P E T I T I O N
|
||||
------------------------------------------------------------------------
|
||||
<re>* <re> may occur any number of times, including zero
|
||||
<re>+ <re> must occur at least once
|
||||
<re>? <re> is optional
|
||||
<re>{<e>} <re> must occur exactly <e> times
|
||||
<re>{<n>,} <re> must occur at least <n> times
|
||||
<re>{,<m>} <re> must not occur more than <m> times
|
||||
<re>{<n>,<m>} <re> must occur at least <n> times, but no more than <m>
|
||||
|
||||
|
||||
Note that "." is a special character, and when used with a repetition
|
||||
operator it changes completely its meaning. For example, ".*" matches
|
||||
anything up to the end of the input string (unless the pattern was compiled
|
||||
with RE_NEWLINE, in that case it will match anything, but a newline).
|
||||
|
||||
|
||||
Limitations:
|
||||
|
||||
o Only minimal matches supported. The engine has only one level "backtracking",
|
||||
so, it also only does minimal matches to allow backreferences working
|
||||
properly, and to avoid failing to match depending on the input.
|
||||
|
||||
o Only one level "grouping", for example, with the pattern:
|
||||
(a(b)c)
|
||||
If "abc" is anywhere in the input, it will be in "\1", but there will
|
||||
not exist a "\2" for "b".
|
||||
|
||||
o Some "special repetitions" were not implemented, these are:
|
||||
.{<e>}
|
||||
.{<n>,}
|
||||
.{,<m>}
|
||||
.{<n>,<m>}
|
||||
|
||||
o Some patterns will never match, for example:
|
||||
\w*\d
|
||||
Since "\w*" already includes all possible matches of "\d", "\d" will
|
||||
only be tested when "\w*" failed. There are no plans to make such
|
||||
patterns work.
|
||||
|
||||
|
||||
Some of these limitations may be worked on future versions of the library,
|
||||
but this is not what the library is expected to do, and, adding support for
|
||||
correct handling of these would probably make the library slower, what is
|
||||
not the reason of it to exist in the first time.
|
||||
|
||||
If you need "true" regex than this library is not for you, but if all
|
||||
you need is support for very quickly finding simple patterns, than this
|
||||
library can be a very powerful tool, on some patterns it can run more
|
||||
than 200 times faster than "true" regex implementations! And this is
|
||||
the reason it was written.
|
||||
|
||||
|
||||
|
||||
Send comments and code to me (paulo@XFree86.Org) or to the XFree86
|
||||
mailing/patch lists.
|
||||
|
||||
--
|
||||
Paulo
|
2649
app/xedit/lisp/re/re.c
Normal file
2649
app/xedit/lisp/re/re.c
Normal file
File diff suppressed because it is too large
Load diff
123
app/xedit/lisp/re/re.h
Normal file
123
app/xedit/lisp/re/re.h
Normal file
|
@ -0,0 +1,123 @@
|
|||
/*
|
||||
* Copyright (c) 2002 by The XFree86 Project, Inc.
|
||||
*
|
||||
* Permission is hereby granted, free of charge, to any person obtaining a
|
||||
* copy of this software and associated documentation files (the "Software"),
|
||||
* to deal in the Software without restriction, including without limitation
|
||||
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
||||
* and/or sell copies of the Software, and to permit persons to whom the
|
||||
* Software is furnished to do so, subject to the following conditions:
|
||||
*
|
||||
* The above copyright notice and this permission notice shall be included in
|
||||
* all copies or substantial portions of the Software.
|
||||
*
|
||||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
||||
* THE XFREE86 PROJECT BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
|
||||
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF
|
||||
* OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
* SOFTWARE.
|
||||
*
|
||||
* Except as contained in this notice, the name of the XFree86 Project shall
|
||||
* not be used in advertising or otherwise to promote the sale, use or other
|
||||
* dealings in this Software without prior written authorization from the
|
||||
* XFree86 Project.
|
||||
*
|
||||
* Author: Paulo César Pereira de Andrade
|
||||
*/
|
||||
|
||||
/* $XFree86: xc/programs/xedit/lisp/re/re.h,v 1.1 2002/09/08 02:29:50 paulo Exp $ */
|
||||
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <ctype.h>
|
||||
|
||||
#ifndef _re_h
|
||||
#define _re_h
|
||||
|
||||
/*
|
||||
* Defines
|
||||
*/
|
||||
|
||||
/* Compile flags options */
|
||||
#define REG_BASIC 0000 /* Not used */
|
||||
#define REG_EXTENDED 0001 /* Not used, only extended supported */
|
||||
|
||||
#define RE_ICASE 0002
|
||||
#define RE_NOSUB 0004
|
||||
#define RE_NEWLINE 0010
|
||||
#define RE_NOSPEC 0020
|
||||
#define RE_PEND 0040
|
||||
#define RE_DUMP 0200
|
||||
|
||||
|
||||
|
||||
/* Execute flag options */
|
||||
#define RE_NOTBOL 1
|
||||
#define RE_NOTEOL 2
|
||||
#define RE_STARTEND 4
|
||||
#define RE_TRACE 00400 /* Not used/supported */
|
||||
#define RE_LARGE 01000 /* Not used/supported */
|
||||
#define RE_BACKR 02000 /* Not used/supported */
|
||||
|
||||
/* Value returned by reexec when match fails */
|
||||
#define RE_NOMATCH 1
|
||||
/* Compile error values */
|
||||
#define RE_BADPAT 2
|
||||
#define RE_ECOLLATE 3
|
||||
#define RE_ECTYPE 4
|
||||
#define RE_EESCAPE 5
|
||||
#define RE_ESUBREG 6
|
||||
#define RE_EBRACK 7
|
||||
#define RE_EPAREN 8
|
||||
#define RE_EBRACE 9
|
||||
#define RE_EBADBR 10
|
||||
#define RE_ERANGE 11
|
||||
#define RE_ESPACE 12
|
||||
#define RE_BADRPT 13
|
||||
#define RE_EMPTY 14
|
||||
#define RE_ASSERT 15
|
||||
#define RE_INVARG 16
|
||||
#define RE_ATOI 255 /* Not used/supported */
|
||||
#define RE_ITOA 0400 /* Not used/supported */
|
||||
|
||||
|
||||
/*
|
||||
* Types
|
||||
*/
|
||||
|
||||
/* (re)gular expression (mat)ch result */
|
||||
typedef struct _re_mat {
|
||||
long rm_so;
|
||||
long rm_eo;
|
||||
} re_mat;
|
||||
|
||||
/* (re)gular expression (cod)e */
|
||||
typedef struct _re_cod {
|
||||
unsigned char *cod;
|
||||
int re_nsub; /* Public member */
|
||||
const char *re_endp; /* Support for RE_PEND */
|
||||
} re_cod;
|
||||
|
||||
|
||||
/*
|
||||
* Prototypes
|
||||
*/
|
||||
/* compile the given pattern string
|
||||
* returns 0 on success, error code otherwise */
|
||||
int recomp(re_cod *preg, const char *pattern, int flags);
|
||||
|
||||
/* execute the compiled pattern on the string.
|
||||
* returns 0 if matched, RE_NOMATCH if failed, error code otherwise */
|
||||
int reexec(const re_cod *preg, const char *string,
|
||||
int nmat, re_mat pmat[], int flags);
|
||||
|
||||
/* formats an error message for the given code in ebuffer */
|
||||
int reerror(int ecode, const re_cod *preg, char *ebuffer, int ebuffer_size);
|
||||
|
||||
/* frees the given parameter */
|
||||
void refree(re_cod *preg);
|
||||
|
||||
|
||||
#endif /* _re_h */
|
1015
app/xedit/lisp/re/rec.c
Normal file
1015
app/xedit/lisp/re/rec.c
Normal file
File diff suppressed because it is too large
Load diff
685
app/xedit/lisp/re/reo.c
Normal file
685
app/xedit/lisp/re/reo.c
Normal file
|
@ -0,0 +1,685 @@
|
|||
/*
|
||||
* Copyright (c) 2002 by The XFree86 Project, Inc.
|
||||
*
|
||||
* Permission is hereby granted, free of charge, to any person obtaining a
|
||||
* copy of this software and associated documentation files (the "Software"),
|
||||
* to deal in the Software without restriction, including without limitation
|
||||
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
||||
* and/or sell copies of the Software, and to permit persons to whom the
|
||||
* Software is furnished to do so, subject to the following conditions:
|
||||
*
|
||||
* The above copyright notice and this permission notice shall be included in
|
||||
* all copies or substantial portions of the Software.
|
||||
*
|
||||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
||||
* THE XFREE86 PROJECT BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
|
||||
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF
|
||||
* OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
* SOFTWARE.
|
||||
*
|
||||
* Except as contained in this notice, the name of the XFree86 Project shall
|
||||
* not be used in advertising or otherwise to promote the sale, use or other
|
||||
* dealings in this Software without prior written authorization from the
|
||||
* XFree86 Project.
|
||||
*
|
||||
* Author: Paulo César Pereira de Andrade
|
||||
*/
|
||||
|
||||
/* $XFree86: xc/programs/xedit/lisp/re/reo.c,v 1.8 2002/09/29 02:55:01 paulo Exp $ */
|
||||
|
||||
#include "rep.h"
|
||||
|
||||
/*
|
||||
* This file is a placeholder to add code to analyse and optimize the
|
||||
* intermediate data structure generated in rep.c.
|
||||
* Character ranges are optimized while being generated.
|
||||
*/
|
||||
|
||||
/*
|
||||
* Types
|
||||
*/
|
||||
typedef struct _orec_inf {
|
||||
rec_alt *alt; /* Main alternatives list */
|
||||
rec_grp *grp; /* Current group pointer */
|
||||
int flags;
|
||||
int ecode;
|
||||
} orec_inf;
|
||||
|
||||
/*
|
||||
* Prototypes
|
||||
*/
|
||||
static int orec_alt(orec_inf*, rec_alt*);
|
||||
static int orec_pat(orec_inf*, rec_pat*);
|
||||
static int orec_grp(orec_inf*, rec_grp*);
|
||||
static int orec_pat_bad_rpt(orec_inf*, rec_pat*);
|
||||
static int orec_pat_bad_forward_rpt(orec_inf*, rec_pat*);
|
||||
static int orec_pat_rng(orec_inf*, rec_pat*);
|
||||
static int orec_pat_cse(orec_inf*, rec_pat*);
|
||||
static int orec_pat_cse_can(orec_inf*, rec_pat*);
|
||||
static int orec_str_list(orec_inf*, rec_alt*, int, int);
|
||||
|
||||
/*
|
||||
* Initialization
|
||||
*/
|
||||
extern unsigned char re__alnum[256];
|
||||
extern unsigned char re__odigit[256];
|
||||
extern unsigned char re__ddigit[256];
|
||||
extern unsigned char re__xdigit[256];
|
||||
extern unsigned char re__control[256];
|
||||
|
||||
/*
|
||||
* Implementation
|
||||
*/
|
||||
int
|
||||
orec_comp(rec_alt *alt, int flags)
|
||||
{
|
||||
orec_inf inf;
|
||||
|
||||
inf.alt = alt;
|
||||
inf.grp = NULL;
|
||||
inf.flags = flags;
|
||||
inf.ecode = 0;
|
||||
|
||||
orec_alt(&inf, alt);
|
||||
|
||||
return (inf.ecode);
|
||||
}
|
||||
|
||||
void
|
||||
orec_free_stl(rec_stl *stl)
|
||||
{
|
||||
int i;
|
||||
|
||||
for (i = 0; i < stl->nstrs; i++) {
|
||||
if (stl->lens[i] > 2)
|
||||
free(stl->strs[i]);
|
||||
}
|
||||
|
||||
free(stl->lens);
|
||||
free(stl->strs);
|
||||
free(stl);
|
||||
}
|
||||
|
||||
|
||||
static int
|
||||
orec_alt(orec_inf *inf, rec_alt *alt)
|
||||
{
|
||||
if (alt) {
|
||||
rec_alt *ptr = alt;
|
||||
int ret, count = 0, str = 1, cstr = 1, lits = 0, clits = 0;
|
||||
|
||||
/* Check if can build a string list */
|
||||
if (ptr->next) {
|
||||
/* If more than one alternative */
|
||||
while (ptr && (str || cstr)) {
|
||||
if (ptr->pat == NULL || ptr->pat->rep != NULL) {
|
||||
cstr = str = 0;
|
||||
break;
|
||||
}
|
||||
if ((inf->flags & RE_ICASE)) {
|
||||
if (!(ret = orec_pat_cse_can(inf, ptr->pat))) {
|
||||
cstr = str = 0;
|
||||
break;
|
||||
}
|
||||
if (ret == 1)
|
||||
++lits;
|
||||
else if (ret == 2)
|
||||
++clits;
|
||||
}
|
||||
else if (ptr->pat->next == NULL) {
|
||||
if (ptr->pat->type != Rep_String) {
|
||||
if (ptr->pat->type != Rep_Literal) {
|
||||
str = 0;
|
||||
if (ptr->pat->type != Rep_CaseString) {
|
||||
if (ptr->pat->type != Rep_CaseLiteral)
|
||||
cstr = 0;
|
||||
else
|
||||
++clits;
|
||||
}
|
||||
else if (strlen((char*)ptr->pat->data.str) >= 255)
|
||||
str = cstr = 0;
|
||||
}
|
||||
else
|
||||
++lits;
|
||||
}
|
||||
else if (strlen((char*)ptr->pat->data.str) >= 255)
|
||||
str = cstr = 0;
|
||||
}
|
||||
else {
|
||||
str = cstr = 0;
|
||||
break;
|
||||
}
|
||||
if (++count >= 255)
|
||||
str = cstr = 0;
|
||||
ptr = ptr->next;
|
||||
}
|
||||
|
||||
if (str || cstr) {
|
||||
if (inf->flags & RE_ICASE) {
|
||||
for (ptr = alt; ptr; ptr = ptr->next) {
|
||||
if (orec_pat_cse(inf, ptr->pat))
|
||||
return (inf->ecode);
|
||||
}
|
||||
str = 0;
|
||||
}
|
||||
return (orec_str_list(inf, alt, str, count));
|
||||
}
|
||||
}
|
||||
else if (alt == inf->alt && alt->pat && alt->pat->rep == NULL) {
|
||||
/* If the toplevel single alternative */
|
||||
switch (alt->pat->type) {
|
||||
/* One of these will always be true for RE_NOSPEC,
|
||||
* but can also be optimized for simple patterns */
|
||||
case Rep_Literal:
|
||||
alt->pat->type = Rep_SearchLiteral;
|
||||
break;
|
||||
case Rep_CaseLiteral:
|
||||
alt->pat->type = Rep_SearchCaseLiteral;
|
||||
break;
|
||||
case Rep_String:
|
||||
alt->pat->type = Rep_SearchString;
|
||||
break;
|
||||
case Rep_CaseString:
|
||||
alt->pat->type = Rep_SearchCaseString;
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
while (alt) {
|
||||
orec_pat(inf, alt->pat);
|
||||
alt = alt->next;
|
||||
}
|
||||
}
|
||||
|
||||
return (inf->ecode);
|
||||
}
|
||||
|
||||
static int
|
||||
orec_pat(orec_inf *inf, rec_pat *pat)
|
||||
{
|
||||
rec_pat *next;
|
||||
|
||||
while (pat) {
|
||||
switch (pat->type) {
|
||||
case Rep_AnyAnyTimes:
|
||||
if (pat->next == NULL) {
|
||||
rec_grp *grp = inf->grp;
|
||||
|
||||
next = NULL;
|
||||
while (grp) {
|
||||
next = grp->parent->next;
|
||||
/* Cannot check if is .*$ as the input
|
||||
* may be a substring */
|
||||
if (next)
|
||||
break;
|
||||
grp = grp->pgrp;
|
||||
}
|
||||
if (next == NULL) {
|
||||
/* <re>.* */
|
||||
pat->type = Rep_AnyEatAnyTimes;
|
||||
grp = inf->grp;
|
||||
while (grp) {
|
||||
--grp->comp;
|
||||
next = grp->parent->next;
|
||||
if (next)
|
||||
break;
|
||||
grp = grp->pgrp;
|
||||
}
|
||||
}
|
||||
else if (orec_pat_bad_rpt(inf, next))
|
||||
return (inf->ecode);
|
||||
}
|
||||
else if (orec_pat_bad_rpt(inf, pat->next))
|
||||
return (inf->ecode);
|
||||
break;
|
||||
case Rep_AnyMaybe:
|
||||
if (pat->next == NULL) {
|
||||
rec_grp *grp = inf->grp;
|
||||
|
||||
next = NULL;
|
||||
while (grp) {
|
||||
next = grp->parent->next;
|
||||
if (next)
|
||||
break;
|
||||
grp = grp->pgrp;
|
||||
}
|
||||
if (next == NULL) {
|
||||
/* <re>.? */
|
||||
pat->type = Rep_AnyEatMaybe;
|
||||
grp = inf->grp;
|
||||
while (grp) {
|
||||
--grp->comp;
|
||||
next = grp->parent->next;
|
||||
if (next)
|
||||
break;
|
||||
grp = grp->pgrp;
|
||||
}
|
||||
}
|
||||
else if (orec_pat_bad_rpt(inf, next))
|
||||
return (inf->ecode);
|
||||
}
|
||||
else if (orec_pat_bad_rpt(inf, pat->next))
|
||||
return (inf->ecode);
|
||||
break;
|
||||
case Rep_AnyAtLeast:
|
||||
if (pat->next == NULL) {
|
||||
rec_grp *grp = inf->grp;
|
||||
|
||||
next = NULL;
|
||||
while (grp) {
|
||||
next = grp->parent->next;
|
||||
if (next)
|
||||
break;
|
||||
grp = grp->pgrp;
|
||||
}
|
||||
if (next == NULL) {
|
||||
/* <re>.+ */
|
||||
pat->type = Rep_AnyEatAtLeast;
|
||||
grp = inf->grp;
|
||||
while (grp) {
|
||||
--grp->comp;
|
||||
next = grp->parent->next;
|
||||
if (next)
|
||||
break;
|
||||
grp = grp->pgrp;
|
||||
}
|
||||
}
|
||||
else if (orec_pat_bad_rpt(inf, next))
|
||||
return (inf->ecode);
|
||||
}
|
||||
else if (orec_pat_bad_rpt(inf, pat->next))
|
||||
return (inf->ecode);
|
||||
break;
|
||||
case Rep_Range:
|
||||
case Rep_RangeNot:
|
||||
orec_pat_rng(inf, pat);
|
||||
break;
|
||||
case Rep_Group:
|
||||
orec_grp(inf, pat->data.grp);
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
pat = pat->next;
|
||||
}
|
||||
|
||||
return (inf->ecode);
|
||||
}
|
||||
|
||||
static int
|
||||
orec_pat_bad_rpt(orec_inf *inf, rec_pat *pat)
|
||||
{
|
||||
switch (pat->type) {
|
||||
/* Not really an error, but aren't supported by the library.
|
||||
* Includes: .*.*, .+<re>? .*<re>*, (.*)(<re>*), etc.
|
||||
*/
|
||||
|
||||
/* Not a repetition, but mathes anything... */
|
||||
case Rep_Any:
|
||||
|
||||
/* Zero length matches */
|
||||
case Rep_Eol:
|
||||
if (!(inf->flags & RE_NEWLINE))
|
||||
break;
|
||||
case Rep_Bol:
|
||||
case Rep_Bow:
|
||||
case Rep_Eow:
|
||||
|
||||
/* Repetitions */
|
||||
case Rep_AnyAnyTimes:
|
||||
case Rep_AnyMaybe:
|
||||
case Rep_AnyAtLeast:
|
||||
inf->ecode = RE_BADRPT;
|
||||
break;
|
||||
|
||||
/* Check if the first group element is a complex pattern */
|
||||
case Rep_Group:
|
||||
if (pat->rep == NULL) {
|
||||
if (pat->data.grp->alt) {
|
||||
for (pat = pat->data.grp->alt->pat; pat; pat = pat->next) {
|
||||
if (orec_pat_bad_rpt(inf, pat))
|
||||
break;
|
||||
}
|
||||
}
|
||||
break;
|
||||
}
|
||||
/*FALLTHROUGH*/
|
||||
default:
|
||||
if (pat->rep)
|
||||
inf->ecode = RE_BADRPT;
|
||||
break;
|
||||
}
|
||||
|
||||
if (!inf->ecode && pat && pat->next)
|
||||
orec_pat_bad_forward_rpt(inf, pat->next);
|
||||
|
||||
return (inf->ecode);
|
||||
}
|
||||
|
||||
static int
|
||||
orec_pat_bad_forward_rpt(orec_inf *inf, rec_pat *pat)
|
||||
{
|
||||
if (pat->rep) {
|
||||
switch (pat->rep->type) {
|
||||
case Rer_MinMax:
|
||||
if (pat->rep->mine > 0)
|
||||
break;
|
||||
case Rer_AnyTimes:
|
||||
case Rer_Maybe:
|
||||
case Rer_Max:
|
||||
inf->ecode = RE_BADRPT;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
}
|
||||
else if (pat->type == Rep_Group &&
|
||||
pat->data.grp->alt &&
|
||||
pat->data.grp->alt->pat)
|
||||
orec_pat_bad_forward_rpt(inf, pat->data.grp->alt->pat);
|
||||
|
||||
return (inf->ecode);
|
||||
}
|
||||
|
||||
static int
|
||||
orec_grp(orec_inf *inf, rec_grp *grp)
|
||||
{
|
||||
rec_grp *prev = inf->grp;
|
||||
|
||||
inf->grp = grp;
|
||||
orec_alt(inf, grp->alt);
|
||||
/* Could also just say: inf->grp = grp->gparent */
|
||||
inf->grp = prev;
|
||||
|
||||
return (inf->ecode);
|
||||
}
|
||||
|
||||
static int
|
||||
orec_pat_rng(orec_inf *inf, rec_pat *pat)
|
||||
{
|
||||
int i, j[2], count;
|
||||
rec_pat_t type = pat->type;
|
||||
unsigned char *range = pat->data.rng->range;
|
||||
|
||||
for (i = count = j[0] = j[1] = 0; i < 256; i++) {
|
||||
if (range[i]) {
|
||||
if (count == 2) {
|
||||
++count;
|
||||
break;
|
||||
}
|
||||
j[count++] = i;
|
||||
}
|
||||
}
|
||||
|
||||
if (count == 1 ||
|
||||
(count == 2 &&
|
||||
((islower(j[0]) && toupper(j[0]) == j[1]) ||
|
||||
(isupper(j[0]) && tolower(j[0]) == j[1])))) {
|
||||
free(pat->data.rng);
|
||||
if (count == 1) {
|
||||
pat->data.chr = j[0];
|
||||
pat->type = type == Rep_Range ? Rep_Literal : Rep_LiteralNot;
|
||||
}
|
||||
else {
|
||||
pat->data.cse.upper = j[0];
|
||||
pat->data.cse.lower = j[1];
|
||||
pat->type = type == Rep_Range ? Rep_CaseLiteral : Rep_CaseLiteralNot;
|
||||
}
|
||||
}
|
||||
else {
|
||||
if (memcmp(re__alnum, range, 256) == 0)
|
||||
type = type == Rep_Range ? Rep_Alnum : Rep_AlnumNot;
|
||||
else if (memcmp(re__odigit, range, 256) == 0)
|
||||
type = type == Rep_Range ? Rep_Odigit : Rep_OdigitNot;
|
||||
else if (memcmp(re__ddigit, range, 256) == 0)
|
||||
type = type == Rep_Range ? Rep_Digit : Rep_DigitNot;
|
||||
else if (memcmp(re__xdigit, range, 256) == 0)
|
||||
type = type == Rep_Range ? Rep_Xdigit : Rep_XdigitNot;
|
||||
else if (memcmp(re__control, range, 256) == 0)
|
||||
type = type == Rep_Range ? Rep_Control : Rep_ControlNot;
|
||||
|
||||
if (type != pat->type) {
|
||||
free(pat->data.rng);
|
||||
pat->type = type;
|
||||
}
|
||||
}
|
||||
|
||||
return (inf->ecode);
|
||||
}
|
||||
|
||||
/* Join patterns if required, will only fail on memory allocation failure:
|
||||
*/
|
||||
static int
|
||||
orec_pat_cse(orec_inf *inf, rec_pat *pat)
|
||||
{
|
||||
rec_pat_t type;
|
||||
int i, len, length;
|
||||
rec_pat *ptr, *next;
|
||||
unsigned char *str, *tofree;
|
||||
|
||||
if (pat->next == NULL && pat->type == Rep_CaseString)
|
||||
return (inf->ecode);
|
||||
|
||||
type = Rep_CaseString;
|
||||
|
||||
/* First calculate how many bytes will be required */
|
||||
for (ptr = pat, length = 1; ptr; ptr = ptr->next) {
|
||||
switch (ptr->type) {
|
||||
case Rep_Literal:
|
||||
length += 2;
|
||||
break;
|
||||
case Rep_String:
|
||||
length += strlen((char*)ptr->data.str) << 1;
|
||||
break;
|
||||
case Rep_CaseLiteral:
|
||||
length += 2;
|
||||
break;
|
||||
case Rep_CaseString:
|
||||
length += strlen((char*)ptr->data.str);
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if ((str = malloc(length)) == NULL)
|
||||
return (inf->ecode = RE_ESPACE);
|
||||
|
||||
for (ptr = pat, length = 0; ptr; ptr = next) {
|
||||
tofree = NULL;
|
||||
next = ptr->next;
|
||||
switch (ptr->type) {
|
||||
case Rep_Literal:
|
||||
str[length++] = ptr->data.chr;
|
||||
str[length++] = ptr->data.chr;
|
||||
break;
|
||||
case Rep_String:
|
||||
tofree = ptr->data.str;
|
||||
len = strlen((char*)tofree);
|
||||
for (i = 0; i < len; i++) {
|
||||
str[length++] = tofree[i];
|
||||
str[length++] = tofree[i];
|
||||
}
|
||||
break;
|
||||
case Rep_CaseLiteral:
|
||||
str[length++] = ptr->data.cse.lower;
|
||||
str[length++] = ptr->data.cse.upper;
|
||||
break;
|
||||
case Rep_CaseString:
|
||||
tofree = ptr->data.str;
|
||||
len = strlen((char*)tofree);
|
||||
memcpy(str + length, tofree, len);
|
||||
length += len;
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
if (tofree)
|
||||
free(tofree);
|
||||
if (ptr != pat)
|
||||
free(ptr);
|
||||
}
|
||||
str[length] = '\0';
|
||||
|
||||
pat->type = type;
|
||||
pat->data.str = str;
|
||||
pat->next = NULL;
|
||||
|
||||
return (inf->ecode);
|
||||
}
|
||||
|
||||
/* Return 0 if the patterns in the list cannot be merged, 1 if will
|
||||
* be a simple string, 2 if a case string.
|
||||
* This is useful when building an alternative list that is composed
|
||||
* only of strings, but the regex is case insensitive, in wich case
|
||||
* the first pass may have splited some patterns, but if it is a member
|
||||
* of an alternatives list, the cost of using a string list is smaller */
|
||||
static int
|
||||
orec_pat_cse_can(orec_inf *inf, rec_pat *pat)
|
||||
{
|
||||
int ret;
|
||||
|
||||
if (pat == NULL)
|
||||
return (0);
|
||||
|
||||
for (ret = 1; pat; pat = pat->next) {
|
||||
if (pat->rep)
|
||||
return (0);
|
||||
switch (pat->type) {
|
||||
case Rep_Literal:
|
||||
case Rep_String:
|
||||
break;
|
||||
case Rep_CaseLiteral:
|
||||
case Rep_CaseString:
|
||||
ret = 2;
|
||||
break;
|
||||
default:
|
||||
return (0);
|
||||
}
|
||||
}
|
||||
|
||||
return (ret);
|
||||
}
|
||||
|
||||
|
||||
/* XXX If everything is a (case) byte, the pattern should be
|
||||
* [abcde] instead of a|b|c|d|e (or [aAbBcCdDeE] instead of aA|bB|cC|dD|eE)
|
||||
* as a string list works fine, but as a character range
|
||||
* should be faster, and maybe could be converted here. But not
|
||||
* very important, if performance is required, it should have already
|
||||
* been done in the pattern.
|
||||
*/
|
||||
static int
|
||||
orec_str_list(orec_inf *inf, rec_alt *alt, int str, int count)
|
||||
{
|
||||
rec_stl *stl;
|
||||
rec_pat *pat;
|
||||
rec_alt *ptr, *next;
|
||||
int i, j, tlen, len, is;
|
||||
|
||||
if ((stl = calloc(1, sizeof(rec_stl))) == NULL)
|
||||
return (inf->ecode = RE_ESPACE);
|
||||
|
||||
if ((stl->lens = malloc(sizeof(unsigned char) * count)) == NULL) {
|
||||
free(stl);
|
||||
return (inf->ecode = RE_ESPACE);
|
||||
}
|
||||
|
||||
if ((stl->strs = malloc(sizeof(char*) * count)) == NULL) {
|
||||
free(stl->lens);
|
||||
free(stl);
|
||||
return (inf->ecode = RE_ESPACE);
|
||||
}
|
||||
|
||||
if ((pat = calloc(1, sizeof(rec_pat))) == NULL) {
|
||||
free(stl->strs);
|
||||
free(stl->lens);
|
||||
free(stl);
|
||||
return (inf->ecode = RE_ESPACE);
|
||||
}
|
||||
|
||||
pat->data.stl = stl;
|
||||
pat->type = Rep_StringList;
|
||||
stl->type = str ? Resl_StringList : Resl_CaseStringList;
|
||||
for (i = tlen = 0, ptr = alt; i < count; i++) {
|
||||
next = ptr->next;
|
||||
switch (ptr->pat->type) {
|
||||
case Rep_Literal:
|
||||
is = len = 1;
|
||||
break;
|
||||
case Rep_CaseLiteral:
|
||||
is = len = 2;
|
||||
break;
|
||||
default:
|
||||
is = 0;
|
||||
len = strlen((char*)ptr->pat->data.str);
|
||||
break;
|
||||
}
|
||||
tlen += len;
|
||||
stl->lens[i] = len;
|
||||
if (!is) {
|
||||
if (len > 2)
|
||||
stl->strs[i] = ptr->pat->data.str;
|
||||
else {
|
||||
if (len == 1)
|
||||
stl->strs[i] = (void*)(long)(ptr->pat->data.str[0]);
|
||||
else
|
||||
stl->strs[i] = (void*)(long)
|
||||
(ptr->pat->data.str[0] |
|
||||
((int)ptr->pat->data.str[1] << 8));
|
||||
free(ptr->pat->data.str);
|
||||
}
|
||||
}
|
||||
else {
|
||||
if (is == 1)
|
||||
stl->strs[i] = (void*)(long)ptr->pat->data.chr;
|
||||
else
|
||||
stl->strs[i] = (void*)(long)
|
||||
(ptr->pat->data.cse.lower |
|
||||
(ptr->pat->data.cse.upper << 8));
|
||||
}
|
||||
free(ptr->pat);
|
||||
if (i)
|
||||
free(ptr);
|
||||
ptr = next;
|
||||
}
|
||||
stl->tlen = tlen;
|
||||
stl->nstrs = count;
|
||||
|
||||
alt->pat = pat;
|
||||
alt->next = NULL;
|
||||
|
||||
{
|
||||
int li, lj;
|
||||
unsigned char ci, cj, *str;
|
||||
|
||||
/* Don't need a stable sort, there shouldn't be duplicated strings,
|
||||
* but don't check for it either. Only need to make sure that all
|
||||
* strings that start with the same byte are together */
|
||||
for (i = 0; i < count; i++) {
|
||||
li = stl->lens[i];
|
||||
ci = li > 2 ? stl->strs[i][0] : (long)stl->strs[i] & 0xff;
|
||||
for (j = i + 1; j < count; j++) {
|
||||
lj = stl->lens[j];
|
||||
cj = lj > 2 ? stl->strs[j][0] : (long)stl->strs[j] & 0xff;
|
||||
if ((count >= LARGE_STL_COUNT && cj < ci) ||
|
||||
(cj == ci && lj > li)) {
|
||||
/* If both strings start with the same byte,
|
||||
* put the longer first */
|
||||
str = stl->strs[j];
|
||||
stl->strs[j] = stl->strs[i];
|
||||
stl->strs[i] = str;
|
||||
stl->lens[j] = li;
|
||||
stl->lens[i] = lj;
|
||||
li ^= lj; lj ^= li; li ^= lj;
|
||||
ci ^= cj; cj ^= ci; ci ^= cj;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return (inf->ecode);
|
||||
}
|
369
app/xedit/lisp/re/rep.h
Normal file
369
app/xedit/lisp/re/rep.h
Normal file
|
@ -0,0 +1,369 @@
|
|||
/*
|
||||
* Copyright (c) 2002 by The XFree86 Project, Inc.
|
||||
*
|
||||
* Permission is hereby granted, free of charge, to any person obtaining a
|
||||
* copy of this software and associated documentation files (the "Software"),
|
||||
* to deal in the Software without restriction, including without limitation
|
||||
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
||||
* and/or sell copies of the Software, and to permit persons to whom the
|
||||
* Software is furnished to do so, subject to the following conditions:
|
||||
*
|
||||
* The above copyright notice and this permission notice shall be included in
|
||||
* all copies or substantial portions of the Software.
|
||||
*
|
||||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
||||
* THE XFREE86 PROJECT BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
|
||||
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF
|
||||
* OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
* SOFTWARE.
|
||||
*
|
||||
* Except as contained in this notice, the name of the XFree86 Project shall
|
||||
* not be used in advertising or otherwise to promote the sale, use or other
|
||||
* dealings in this Software without prior written authorization from the
|
||||
* XFree86 Project.
|
||||
*
|
||||
* Author: Paulo César Pereira de Andrade
|
||||
*/
|
||||
|
||||
/* $XFree86: xc/programs/xedit/lisp/re/rep.h,v 1.2 2002/11/15 07:01:33 paulo Exp $ */
|
||||
|
||||
#include "re.h"
|
||||
|
||||
#ifndef _rep_h
|
||||
#define _rep_h
|
||||
|
||||
/*
|
||||
* Local defines
|
||||
*/
|
||||
|
||||
#ifdef MIN
|
||||
#undef MIN
|
||||
#endif
|
||||
#define MIN(a, b) ((a) < (b) ? (a) : (b))
|
||||
|
||||
#ifdef MAX
|
||||
#undef MAX
|
||||
#endif
|
||||
#define MAX(a, b) ((a) > (b) ? (a) : (b))
|
||||
|
||||
/* This value can not be larger than 255, a depth value is the nesting of
|
||||
* repetition operations and alternatives. The number of nested parenthesis
|
||||
* does not matter, but a repetition on the pattern inside the parenthesis
|
||||
* does. Note also that you cannot have more than 9 parenthesis pairs in
|
||||
* an expression.
|
||||
* Depth is always at least 1. So for MAX_DEPTH 8, it is only allowed
|
||||
* 7 complex repetitions. A complex repetition is a dot followed by an
|
||||
* repetition operator. It is called a complex repetition because dot
|
||||
* matches anything but the empty string, so the engine needs to test
|
||||
* all possible combinations until the end of the string is found.
|
||||
* Repetitions like .* use one depth until the end of the string is found,
|
||||
* for example a.*b.*c.*d has depth 4, while a*b*c*d has depth 2.
|
||||
*/
|
||||
#define MAX_DEPTH 8
|
||||
|
||||
/* Minimum number of strings to generate a "large" string list, that is,
|
||||
* sort the strings and allocate 512 extra bytes to map the first string
|
||||
* with a given initial byte. */
|
||||
#define LARGE_STL_COUNT 16
|
||||
|
||||
/*
|
||||
* Local types
|
||||
*/
|
||||
/* Intermediate compilation types declaration */
|
||||
/* (r)egular (e)xpression (c)ompile (c)a(se) */
|
||||
typedef struct _rec_cse rec_cse;
|
||||
|
||||
/* (r)egular (e)xpression (c)ompile (r)a(ng)e */
|
||||
typedef struct _rec_rng rec_rng;
|
||||
|
||||
/* (r)egular (e)xpression (c)ompile (pat)tern */
|
||||
typedef struct _rec_pat rec_pat;
|
||||
|
||||
/* (r)egular (e)xpression (c)ompile (rep)etition */
|
||||
typedef struct _rec_rep rec_rep;
|
||||
|
||||
/* (r)egular (e)xpression (c)ompile (gr)ou(p) */
|
||||
typedef struct _rec_grp rec_grp;
|
||||
|
||||
/* (r)egular (e)xpression (c)ompile (alt)ernatives */
|
||||
typedef struct _rec_alt rec_alt;
|
||||
|
||||
|
||||
/* Optimization types */
|
||||
/* (r)egular (e)xpression (c)ompile (st)ring (l)ist */
|
||||
typedef struct _rec_stl rec_stl;
|
||||
|
||||
/* Final compilation and execution types */
|
||||
/* (re)gular expression (inf)ormation */
|
||||
typedef struct _re_inf re_inf;
|
||||
|
||||
/* (re)gular expression (eng)ine */
|
||||
typedef struct _re_eng re_eng;
|
||||
|
||||
|
||||
/* Codes used by the engine */
|
||||
typedef enum {
|
||||
/* Grouping */
|
||||
Re_Open, /* ( */
|
||||
Re_Close, /* ) */
|
||||
Re_Update, /* Like Re_Close, but is inside a loop */
|
||||
|
||||
/* Alternatives */
|
||||
Re_Alt, /* Start alternative list, + next offset */
|
||||
Re_AltNext, /* Next alternative, + next offset */
|
||||
Re_AltDone, /* Finish alternative list */
|
||||
|
||||
/* Repetition */
|
||||
Re_AnyTimes, /* * */
|
||||
Re_Maybe, /* ? */
|
||||
Re_AtLeast, /* +, at least one */
|
||||
|
||||
/* Repetition like */
|
||||
Re_AnyAnyTimes, /* .*<re> */
|
||||
Re_AnyMaybe, /* .?<re> */
|
||||
Re_AnyAtLeast, /* .+<re> */
|
||||
|
||||
Re_AnyEatAnyTimes, /* Expression ends with .* */
|
||||
Re_AnyEatMaybe, /* Expression ends with .? */
|
||||
Re_AnyEatAtLeast, /* Expression ends with .+ */
|
||||
|
||||
/* Repetition with arguments */
|
||||
Re_Exact, /* {e} */
|
||||
Re_Min, /* {n,} */
|
||||
Re_Max, /* {,m} */
|
||||
Re_MinMax, /* {n,m} */
|
||||
|
||||
/* Repetition helper instruction */
|
||||
Re_RepJump, /* Special code, go back to repetition */
|
||||
Re_RepLongJump, /* Jump needs two bytes */
|
||||
/* After the repetition data, all repetitions have an offset
|
||||
* to the code after the repetition */
|
||||
|
||||
/* Matching */
|
||||
Re_Any, /* . */
|
||||
Re_Odigit, /* \o */
|
||||
Re_OdigitNot, /* \O */
|
||||
Re_Digit, /* \d */
|
||||
Re_DigitNot, /* \D */
|
||||
Re_Xdigit, /* \x */
|
||||
Re_XdigitNot, /* \x */
|
||||
Re_Space, /* \s */
|
||||
Re_SpaceNot, /* \S */
|
||||
Re_Tab, /* \t */
|
||||
Re_Newline, /* \n */
|
||||
Re_Lower, /* \l */
|
||||
Re_Upper, /* \u */
|
||||
Re_Alnum, /* \w */
|
||||
Re_AlnumNot, /* \W */
|
||||
Re_Control, /* \c */
|
||||
Re_ControlNot, /* \C */
|
||||
Re_Bol, /* ^ */
|
||||
Re_Eol, /* $ */
|
||||
Re_Bow, /* \< */
|
||||
Re_Eow, /* \> */
|
||||
|
||||
/* Range matching information */
|
||||
Re_Range, /* + 256 bytes */
|
||||
Re_RangeNot, /* + 256 bytes */
|
||||
|
||||
/* Matching with arguments */
|
||||
Re_Literal, /* + character */
|
||||
Re_CaseLiteral, /* + lower + upper */
|
||||
Re_LiteralNot, /* + character */
|
||||
Re_CaseLiteralNot, /* + lower + upper */
|
||||
Re_String, /* + length + string */
|
||||
Re_CaseString, /* + length + string in format lower-upper */
|
||||
|
||||
/* These are useful to start matching, or when RE_NOSPEC is used. */
|
||||
Re_SearchLiteral,
|
||||
Re_SearchCaseLiteral,
|
||||
Re_SearchString,
|
||||
Re_SearchCaseString,
|
||||
|
||||
Re_StringList, /* + total-length + lengths + strings */
|
||||
Re_CaseStringList, /* + total-length + lengths + strings */
|
||||
|
||||
Re_LargeStringList, /* + total-length + lengths + map + strings */
|
||||
Re_LargeCaseStringList, /* + total-length + lengths + map + strings */
|
||||
|
||||
/* Backreference */
|
||||
Re_Backref, /* + reference number */
|
||||
|
||||
/* The last codes */
|
||||
Re_DoneIf, /* Done if at end of input */
|
||||
Re_MaybeDone, /* Done */
|
||||
Re_Done /* If this code found, finished execution */
|
||||
} ReCode;
|
||||
|
||||
|
||||
/* (r)egular (e)xpresssion (pat)rern (t)ype */
|
||||
typedef enum _rec_pat_t {
|
||||
Rep_Literal = Re_Literal,
|
||||
Rep_CaseLiteral = Re_CaseLiteral,
|
||||
Rep_LiteralNot = Re_LiteralNot,
|
||||
Rep_CaseLiteralNot = Re_CaseLiteralNot,
|
||||
Rep_Range = Re_Range,
|
||||
Rep_RangeNot = Re_RangeNot,
|
||||
Rep_String = Re_String,
|
||||
Rep_CaseString = Re_CaseString,
|
||||
Rep_SearchLiteral = Re_SearchLiteral,
|
||||
Rep_SearchCaseLiteral = Re_SearchCaseLiteral,
|
||||
Rep_SearchString = Re_SearchString,
|
||||
Rep_SearchCaseString = Re_SearchCaseString,
|
||||
Rep_Any = Re_Any,
|
||||
Rep_AnyAnyTimes = Re_AnyAnyTimes,
|
||||
Rep_AnyEatAnyTimes = Re_AnyEatAnyTimes,
|
||||
Rep_AnyMaybe = Re_AnyMaybe,
|
||||
Rep_AnyEatMaybe = Re_AnyEatMaybe,
|
||||
Rep_AnyAtLeast = Re_AnyAtLeast,
|
||||
Rep_AnyEatAtLeast = Re_AnyEatAtLeast,
|
||||
Rep_Odigit = Re_Odigit,
|
||||
Rep_OdigitNot = Re_OdigitNot,
|
||||
Rep_Digit = Re_Digit,
|
||||
Rep_DigitNot = Re_DigitNot,
|
||||
Rep_Xdigit = Re_Xdigit,
|
||||
Rep_XdigitNot = Re_XdigitNot,
|
||||
Rep_Space = Re_Space,
|
||||
Rep_SpaceNot = Re_SpaceNot,
|
||||
Rep_Tab = Re_Tab,
|
||||
Rep_Newline = Re_Newline,
|
||||
Rep_Lower = Re_Lower,
|
||||
Rep_Upper = Re_Upper,
|
||||
Rep_Alnum = Re_Alnum,
|
||||
Rep_AlnumNot = Re_AlnumNot,
|
||||
Rep_Control = Re_Control,
|
||||
Rep_ControlNot = Re_ControlNot,
|
||||
Rep_Bol = Re_Bol,
|
||||
Rep_Eol = Re_Eol,
|
||||
Rep_Bow = Re_Bow,
|
||||
Rep_Eow = Re_Eow,
|
||||
Rep_Backref = Re_Backref,
|
||||
Rep_StringList = Re_StringList,
|
||||
Rep_Group = Re_Open
|
||||
} rec_pat_t;
|
||||
|
||||
|
||||
/* (r)egular (e)xpression (rep)etition (t)ype */
|
||||
typedef enum _rec_rep_t {
|
||||
Rer_AnyTimes = Re_AnyTimes,
|
||||
Rer_AtLeast = Re_AtLeast,
|
||||
Rer_Maybe = Re_Maybe,
|
||||
Rer_Exact = Re_Exact,
|
||||
Rer_Min = Re_Min,
|
||||
Rer_Max = Re_Max,
|
||||
Rer_MinMax = Re_MinMax
|
||||
} rec_rep_t;
|
||||
|
||||
|
||||
/* Decide at re compilation time what is lowercase and what is uppercase */
|
||||
struct _rec_cse {
|
||||
unsigned char lower;
|
||||
unsigned char upper;
|
||||
};
|
||||
|
||||
|
||||
/* A rec_rng is used only during compilation, just a character map */
|
||||
struct _rec_rng {
|
||||
unsigned char range[256];
|
||||
};
|
||||
|
||||
|
||||
/* A rec_pat is used only during compilation, and can be viewed as
|
||||
* a regular expression element like a match to any character, a match
|
||||
* to the beginning or end of the line, etc.
|
||||
* It is implemented as a linked list, and does not have nesting.
|
||||
* The data field can contain:
|
||||
* chr: the value of a single character to match.
|
||||
* cse: the upper and lower case value of a character to match.
|
||||
* rng: a character map to match or not match.
|
||||
* str: a simple string or a string where every two bytes
|
||||
* represents the character to match, in lower/upper
|
||||
* case sequence.
|
||||
* The rep field is not used for strings, strings are broken in the
|
||||
* last character in this case. That is, strings are just a concatenation
|
||||
* of several character matches.
|
||||
*/
|
||||
struct _rec_pat {
|
||||
rec_pat_t type;
|
||||
rec_pat *next, *prev; /* Linked list information */
|
||||
union {
|
||||
unsigned char chr;
|
||||
rec_cse cse;
|
||||
rec_rng *rng;
|
||||
rec_grp *grp;
|
||||
unsigned char *str;
|
||||
rec_stl *stl;
|
||||
} data;
|
||||
rec_rep *rep; /* Pattern repetition information */
|
||||
};
|
||||
|
||||
|
||||
/* A rec_rep is used only during compilation, and can be viewed as:
|
||||
*
|
||||
* ? or * or + or {<e>} or {<m>,} or {,<M>} or {<m>,<M>}
|
||||
*
|
||||
* where <e> is "exact", <m> is "minimum" and <M> is "maximum".
|
||||
* In the compiled step it can also be just a NULL pointer, that
|
||||
* is actually equivalent to {1}.
|
||||
*/
|
||||
struct _rec_rep {
|
||||
rec_rep_t type;
|
||||
short mine; /* minimum or exact number of matches */
|
||||
short maxc; /* maximum number of matches */
|
||||
};
|
||||
|
||||
|
||||
/* A rec_alt is used only during compilation, and can be viewed as:
|
||||
*
|
||||
* <re>|<re>
|
||||
*
|
||||
* where <re> is any regular expression. The expressions are nested
|
||||
* using the grp field of the rec_pat structure.
|
||||
*/
|
||||
struct _rec_alt {
|
||||
rec_alt *next, *prev; /* Linked list information */
|
||||
rec_pat *pat;
|
||||
};
|
||||
|
||||
|
||||
/* A rec_grp is a place holder for expressions enclosed in parenthesis
|
||||
* and is linked to the compilation data by an rec_pat structure. */
|
||||
struct _rec_grp {
|
||||
rec_pat *parent; /* Reference to parent pattern */
|
||||
rec_alt *alt; /* The pattern information */
|
||||
rec_alt *palt; /* Parent alternative */
|
||||
rec_grp *pgrp; /* Nested groups */
|
||||
int comp; /* (comp)lex repetition pattern inside group */
|
||||
};
|
||||
|
||||
|
||||
/* Optimization compilation types definition */
|
||||
/* (r)egular (e)xpression (c)ompile (st)ring (l)ist (t)ype */
|
||||
typedef enum {
|
||||
Resl_StringList = Re_StringList,
|
||||
Resl_CaseStringList = Re_CaseStringList
|
||||
} rec_stl_t;
|
||||
|
||||
struct _rec_stl {
|
||||
rec_stl_t type;
|
||||
int nstrs; /* Number of strings in list */
|
||||
int tlen; /* Total length of all strings */
|
||||
unsigned char *lens; /* Vector of string lengths */
|
||||
unsigned char **strs; /* The strings */
|
||||
};
|
||||
|
||||
|
||||
/*
|
||||
* Prototypes
|
||||
*/
|
||||
/* rep.c */
|
||||
rec_alt *irec_comp(const char*, const char*, int, int*);
|
||||
void irec_free_alt(rec_alt*);
|
||||
|
||||
/* reo.c */
|
||||
int orec_comp(rec_alt*, int);
|
||||
void orec_free_stl(rec_stl*);
|
||||
|
||||
#endif /* _rep_h */
|
199
app/xedit/lisp/re/tests.c
Normal file
199
app/xedit/lisp/re/tests.c
Normal file
|
@ -0,0 +1,199 @@
|
|||
/*
|
||||
* Copyright (c) 2002 by The XFree86 Project, Inc.
|
||||
*
|
||||
* Permission is hereby granted, free of charge, to any person obtaining a
|
||||
* copy of this software and associated documentation files (the "Software"),
|
||||
* to deal in the Software without restriction, including without limitation
|
||||
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
||||
* and/or sell copies of the Software, and to permit persons to whom the
|
||||
* Software is furnished to do so, subject to the following conditions:
|
||||
*
|
||||
* The above copyright notice and this permission notice shall be included in
|
||||
* all copies or substantial portions of the Software.
|
||||
*
|
||||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
||||
* THE XFREE86 PROJECT BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
|
||||
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF
|
||||
* OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
* SOFTWARE.
|
||||
*
|
||||
* Except as contained in this notice, the name of the XFree86 Project shall
|
||||
* not be used in advertising or otherwise to promote the sale, use or other
|
||||
* dealings in this Software without prior written authorization from the
|
||||
* XFree86 Project.
|
||||
*
|
||||
* Author: Paulo César Pereira de Andrade
|
||||
*/
|
||||
|
||||
/* $XFree86$ */
|
||||
|
||||
/*
|
||||
* Compile with: cc -o tests tests.c -L. -lre
|
||||
*/
|
||||
|
||||
#include <stdio.h>
|
||||
#include <string.h>
|
||||
#include "re.h"
|
||||
|
||||
int
|
||||
main(int argc, char *argv[])
|
||||
{
|
||||
re_cod cod;
|
||||
re_mat mat[10];
|
||||
int line, ecode, i, len, group, failed;
|
||||
long eo, so;
|
||||
char buf[8192];
|
||||
char str[8192];
|
||||
FILE *fp = fopen("tests.txt", "r");
|
||||
|
||||
if (fp == NULL) {
|
||||
fprintf(stderr, "failed to open tests.txt\n");
|
||||
exit(1);
|
||||
}
|
||||
|
||||
ecode = line = group = failed = 0;
|
||||
cod.cod = NULL;
|
||||
while (fgets(buf, sizeof(buf), fp)) {
|
||||
++line;
|
||||
if (buf[0] == '#' || buf[0] == '\n')
|
||||
continue;
|
||||
else if (buf[0] == '/') {
|
||||
char *ptr = strrchr(buf, '/');
|
||||
|
||||
if (ptr == buf) {
|
||||
fprintf(stderr, "syntax error at line %d\n", line);
|
||||
break;
|
||||
}
|
||||
else {
|
||||
int flags = 0;
|
||||
|
||||
refree(&cod);
|
||||
for (*ptr++ = '\0'; *ptr; ptr++) {
|
||||
if (*ptr == 'i')
|
||||
flags |= RE_ICASE;
|
||||
else if (*ptr == 'n')
|
||||
flags |= RE_NEWLINE;
|
||||
}
|
||||
ecode = recomp(&cod, buf + 1, flags);
|
||||
failed = ecode;
|
||||
}
|
||||
}
|
||||
else if (buf[0] == '>') {
|
||||
if (cod.cod == NULL) {
|
||||
fprintf(stderr, "no previous pattern at line %d\n", line);
|
||||
break;
|
||||
}
|
||||
len = strlen(buf) - 1;
|
||||
buf[len] = '\0';
|
||||
strcpy(str, buf + 1);
|
||||
for (i = 0, --len; i < len - 1; i++) {
|
||||
if (str[i] == '\\') {
|
||||
memmove(str + i, str + i + 1, len);
|
||||
--len;
|
||||
switch (str[i]) {
|
||||
case 'a':
|
||||
str[i] = '\a';
|
||||
break;
|
||||
case 'b':
|
||||
str[i] = '\b';
|
||||
break;
|
||||
case 'f':
|
||||
str[i] = '\f';
|
||||
break;
|
||||
case 'n':
|
||||
str[i] = '\n';
|
||||
break;
|
||||
case 'r':
|
||||
str[i] = '\r';
|
||||
break;
|
||||
case 't':
|
||||
str[i] = '\t';
|
||||
break;
|
||||
case 'v':
|
||||
str[i] = '\v';
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
group = 0;
|
||||
ecode = reexec(&cod, str, 10, &mat[0], 0);
|
||||
if (ecode && ecode != RE_NOMATCH) {
|
||||
reerror(failed, &cod, buf, sizeof(buf));
|
||||
fprintf(stderr, "%s, at line %d\n", buf, line);
|
||||
break;
|
||||
}
|
||||
}
|
||||
else if (buf[0] == ':') {
|
||||
if (failed) {
|
||||
len = strlen(buf) - 1;
|
||||
buf[len] = '\0';
|
||||
if (failed == RE_EESCAPE && strcmp(buf, ":EESCAPE") == 0)
|
||||
continue;
|
||||
if (failed == RE_ESUBREG && strcmp(buf, ":ESUBREG") == 0)
|
||||
continue;
|
||||
if (failed == RE_EBRACK && strcmp(buf, ":EBRACK") == 0)
|
||||
continue;
|
||||
if (failed == RE_EPAREN && strcmp(buf, ":EPAREN") == 0)
|
||||
continue;
|
||||
if (failed == RE_EBRACE && strcmp(buf, ":EBRACE") == 0)
|
||||
continue;
|
||||
if (failed == RE_EBADBR && strcmp(buf, ":EBADBR") == 0)
|
||||
continue;
|
||||
if (failed == RE_ERANGE && strcmp(buf, ":ERANGE") == 0)
|
||||
continue;
|
||||
if (failed == RE_ESPACE && strcmp(buf, ":ESPACE") == 0)
|
||||
continue;
|
||||
if (failed == RE_BADRPT && strcmp(buf, ":BADRPT") == 0)
|
||||
continue;
|
||||
if (failed == RE_EMPTY && strcmp(buf, ":EMPTY") == 0)
|
||||
continue;
|
||||
reerror(failed, &cod, buf, sizeof(buf));
|
||||
fprintf(stderr, "Error value %d doesn't match: %s, at line %d\n",
|
||||
failed, buf, line);
|
||||
break;
|
||||
}
|
||||
else if (!ecode) {
|
||||
fprintf(stderr, "found match when shoudn't, at line %d\n", line);
|
||||
break;
|
||||
}
|
||||
}
|
||||
else {
|
||||
if (failed) {
|
||||
reerror(failed, &cod, buf, sizeof(buf));
|
||||
fprintf(stderr, "%s, at line %d\n", buf, line);
|
||||
break;
|
||||
}
|
||||
if (sscanf(buf, "%ld,%ld:", &so, &eo) != 2) {
|
||||
fprintf(stderr, "expecting match offsets at line %d\n", line);
|
||||
break;
|
||||
}
|
||||
else if (ecode) {
|
||||
fprintf(stderr, "didn't match, at line %d\n", line);
|
||||
break;
|
||||
}
|
||||
else if (group >= 10) {
|
||||
fprintf(stderr, "syntax error at line %d (too many groups)\n",
|
||||
line);
|
||||
break;
|
||||
}
|
||||
else if (so != mat[group].rm_so || eo != mat[group].rm_eo) {
|
||||
fprintf(stderr, "match failed at line %d, got %ld,%ld: ",
|
||||
line, mat[group].rm_so, mat[group].rm_eo);
|
||||
if (mat[group].rm_so < mat[group].rm_eo)
|
||||
fwrite(str + mat[group].rm_so,
|
||||
mat[group].rm_eo - mat[group].rm_so, 1, stderr);
|
||||
fputc('\n', stderr);
|
||||
break;
|
||||
}
|
||||
++group;
|
||||
}
|
||||
}
|
||||
|
||||
fclose(fp);
|
||||
|
||||
return (ecode);
|
||||
}
|
470
app/xedit/lisp/re/tests.txt
Normal file
470
app/xedit/lisp/re/tests.txt
Normal file
|
@ -0,0 +1,470 @@
|
|||
#
|
||||
# Copyright (c) 2002 by The XFree86 Project, Inc.
|
||||
#
|
||||
# Permission is hereby granted, free of charge, to any person obtaining a
|
||||
# copy of this software and associated documentation files (the "Software"),
|
||||
# to deal in the Software without restriction, including without limitation
|
||||
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
||||
# and/or sell copies of the Software, and to permit persons to whom the
|
||||
# Software is furnished to do so, subject to the following conditions:
|
||||
#
|
||||
# The above copyright notice and this permission notice shall be included in
|
||||
# all copies or substantial portions of the Software.
|
||||
#
|
||||
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
||||
# THE XFREE86 PROJECT BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
|
||||
# WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF
|
||||
# OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
# SOFTWARE.
|
||||
#
|
||||
# Except as contained in this notice, the name of the XFree86 Project shall
|
||||
# not be used in advertising or otherwise to promote the sale, use or other
|
||||
# dealings in this Software without prior written authorization from the
|
||||
# XFree86 Project.
|
||||
#
|
||||
# Author: Paulo César Pereira de Andrade
|
||||
#
|
||||
#
|
||||
# $XFree86: xc/programs/xedit/lisp/re/tests.txt,v 1.1 2002/09/08 02:29:50 paulo Exp $
|
||||
|
||||
# Some tests for the library:
|
||||
# lines starting with # are comments
|
||||
# lines starting with / are a regular expression pattern
|
||||
# The pattern must end with / and may be followed by:
|
||||
# i -> ignore case
|
||||
# n -> create newline sensitive regex
|
||||
# lines starting with > are a string input to the last pattern
|
||||
# To test newline sensitive matching, add \n to the string.
|
||||
# lines starting with a number are the expected result
|
||||
# If more than one line, every subsequent line is the
|
||||
# value of an "subresult".
|
||||
# :NOMATCH means that the string input should not match
|
||||
|
||||
# Simple string
|
||||
/abc/
|
||||
>abc
|
||||
0,3: abc
|
||||
>aaaaaaaaaaaaaaabc
|
||||
14,17: abc
|
||||
>xxxxxxxxxxxxxxaaaaaaaaaaaaaaaaabcxx
|
||||
30,33: abc
|
||||
|
||||
# String list
|
||||
/abc|bcd|cde/
|
||||
>abc
|
||||
0,3: abc
|
||||
>aabc
|
||||
1,4: abc
|
||||
>xxxbcdef
|
||||
3,6: bcd
|
||||
>abdzzzcdabcde
|
||||
8,11: abc
|
||||
>xxxxabdecdabdcde
|
||||
13,16: cde
|
||||
|
||||
# Complex string
|
||||
/a?bc|ab?c|abc?/
|
||||
>abc
|
||||
0,3: abc
|
||||
>xxxb
|
||||
:NOMATCH
|
||||
>xxxbc
|
||||
3,5: bc
|
||||
>sssssab
|
||||
5,7: ab
|
||||
|
||||
# Another complex string
|
||||
/a*bc|ab*c|abc*/
|
||||
>aaaaaaabc
|
||||
0,9: aaaaaaabc
|
||||
>xaaaaaaabc
|
||||
1,10: aaaaaaabc
|
||||
>xyzaaaaaaabc
|
||||
3,12: aaaaaaabc
|
||||
>abbc
|
||||
0,4: abbc
|
||||
>xxabbbbbc
|
||||
2,9: abbbbbc
|
||||
>abcccccccccc
|
||||
0,12: abcccccccccc
|
||||
>abccccccccccd
|
||||
0,12: abcccccccccc
|
||||
>xxxxxxxaaaaaaaaaabbbbbbbbbbbccccccccccc
|
||||
16,29: abbbbbbbbbbbc
|
||||
>xxxbbbbbbbbbc
|
||||
11,13: bc
|
||||
|
||||
# Another complex string
|
||||
/a+bc|ab+c|abc+/
|
||||
>xxxbc
|
||||
:NOMATCH
|
||||
>xaaabc
|
||||
1,6: aaabc
|
||||
>zzzzaaaaabbc
|
||||
8,12: abbc
|
||||
>zzzzaaaabbbbbbcccc
|
||||
7,15: abbbbbbc
|
||||
|
||||
# Simple pattern
|
||||
/a.c/
|
||||
>abc
|
||||
0,3: abc
|
||||
>aaac
|
||||
1,4: aac
|
||||
>xac
|
||||
:NOMATCH
|
||||
>xaac
|
||||
1,4: aac
|
||||
>xxabc
|
||||
2,5: abc
|
||||
>xxxaxc
|
||||
3,6: axc
|
||||
|
||||
# Another simple pattern
|
||||
/a*c/
|
||||
>c
|
||||
0,1: c
|
||||
>xxxxxxxxc
|
||||
8,9: c
|
||||
>xxxxxxxcc
|
||||
7,8: c
|
||||
>ac
|
||||
0,2: ac
|
||||
>aaaac
|
||||
0,5: aaaac
|
||||
>xac
|
||||
1,3: ac
|
||||
>xxxaac
|
||||
3,6: aac
|
||||
>xxac
|
||||
2,4: ac
|
||||
>xxxxac
|
||||
4,6: ac
|
||||
|
||||
# Another simple pattern
|
||||
/a+c/
|
||||
>xxaac
|
||||
2,5: aac
|
||||
>xxxaaaac
|
||||
3,8: aaaac
|
||||
>xaaaabac
|
||||
6,8: ac
|
||||
>xxxc
|
||||
:NOMATCH
|
||||
>xxxxaaaaccc
|
||||
4,9: aaaac
|
||||
|
||||
# Another simple pattern
|
||||
/a{4}b/
|
||||
>xabxxaabxxxaaabxxxxaaaab
|
||||
19,24: aaaab
|
||||
>aaabaaaab
|
||||
4,9: aaaab
|
||||
|
||||
# Another simple pattern
|
||||
/a{4,}b/
|
||||
>xxxaaaab
|
||||
3,8: aaaab
|
||||
>zaaabzzzaaaaaaaaaaaaaaaab
|
||||
8,25: aaaaaaaaaaaaaaaab
|
||||
|
||||
# Another simple pattern
|
||||
/a{,4}b/
|
||||
>b
|
||||
0,1: b
|
||||
>xxxxxxxxb
|
||||
8,9: b
|
||||
>xaaaaaaaaab
|
||||
6,11: aaaab
|
||||
>xxxab
|
||||
3,5: ab
|
||||
>aaaaaxaaab
|
||||
6,10: aaab
|
||||
|
||||
# Another simple pattern
|
||||
/a{2,4}b/
|
||||
>xab
|
||||
:NOMATCH
|
||||
>xaab
|
||||
1,4: aab
|
||||
>xaaab
|
||||
1,5: aaab
|
||||
>xxaaaab
|
||||
2,7: aaaab
|
||||
>xxxaaaaab
|
||||
4,9: aaaab
|
||||
|
||||
# Some simple grouping tests
|
||||
/foo(bar|baz)fee/
|
||||
>feebarbazfoobarfee
|
||||
9,18: foobarfee
|
||||
12,15: bar
|
||||
>foofooobazfeefoobazfee
|
||||
13,22: foobazfee
|
||||
/f(oo|ee)ba[rz]/
|
||||
>barfoebaz
|
||||
:NOMATCH
|
||||
>bazfoobar
|
||||
3,9: foobar
|
||||
4,6: oo
|
||||
>barfeebaz
|
||||
3,9: feebaz
|
||||
4,6: ee
|
||||
/\<(int|char)\>/
|
||||
>aint character int foo
|
||||
15,18: int
|
||||
15,18: int
|
||||
|
||||
# Some complex repetitions
|
||||
/foo.*bar/
|
||||
>barfoblaboofoobarfoobarfoobar
|
||||
11,17: foobar
|
||||
/foo.+bar/
|
||||
>foobar
|
||||
:NOMATCH
|
||||
>fobbarfooxbarfooybar
|
||||
6,13: fooxbar
|
||||
/foo.?bar/
|
||||
>xfoobar
|
||||
1,7: foobar
|
||||
>xxfooxxbar
|
||||
:NOMATCH
|
||||
>yyyfootbar
|
||||
3,10: footbar
|
||||
|
||||
# Some nested complex repetitions
|
||||
/a.*b.*c/
|
||||
>abc
|
||||
0,3: abc
|
||||
>xxxxxxxxxabbbbbbbccaaaaabbbc
|
||||
9,18: abbbbbbbc
|
||||
/a.+b.*c/
|
||||
>xxxabc
|
||||
:NOMATCH
|
||||
>xxaxbbc
|
||||
2,7: axbbc
|
||||
/a.+b.?c/
|
||||
>xaabc
|
||||
1,5: aabc
|
||||
>xxaabbc
|
||||
2,7: aabbc
|
||||
|
||||
# Very complex repetitions
|
||||
/(foo.*|bar)fee/
|
||||
# XXX NOTE
|
||||
# This pattern does not return the correct offset for the group.
|
||||
# Support for this may and may not be added.
|
||||
|
||||
>barfoofee
|
||||
3,9: foofee
|
||||
>foobarfee
|
||||
0,9: foobarfee
|
||||
>xxfobarfee
|
||||
4,10: barfee
|
||||
>barfooooooobarfee
|
||||
3,17: fooooooobarfee
|
||||
>xxfobarfeefoobar
|
||||
4,10: barfee
|
||||
/(foo.+|bar)fee/
|
||||
>barfoofee
|
||||
:NOMATCH
|
||||
>barfooxfee
|
||||
3,10: fooxfee
|
||||
/(foo.?|bar)fee/
|
||||
>foobar
|
||||
:NOMATCH
|
||||
>bafoofee
|
||||
2,8:foofee
|
||||
>bafooofeebarfee
|
||||
2,9: fooofee
|
||||
>bafoofeebarfee
|
||||
2,8: foofee
|
||||
|
||||
# Simple backreference
|
||||
/(a|b|c)\1/
|
||||
>aa
|
||||
0,2: aa
|
||||
0,1: a
|
||||
/(a|b|c)(a|b|c)\1\2/
|
||||
>acac
|
||||
0,4: acac
|
||||
0,1: a
|
||||
1,2: c
|
||||
>xxxxacac
|
||||
4,8: acac
|
||||
4,5: a
|
||||
5,6: c
|
||||
>xxacabacbcacbbacbcaaccabcaca
|
||||
24,28: caca
|
||||
24,25: c
|
||||
25,26: a
|
||||
>xyabcccc
|
||||
4,8: cccc
|
||||
4,5: c
|
||||
5,6: c
|
||||
|
||||
# Complex backreference
|
||||
/(a*b)\1/
|
||||
>xxxaaaaabaaaaab
|
||||
3,15: aaaaabaaaaab
|
||||
3,9: aaaaab
|
||||
/(ab+c)\1/
|
||||
>xaaabbbcabbbc
|
||||
3,13: abbbcabbbc
|
||||
3,8: abbbc
|
||||
/(ab?c)\1/
|
||||
>abcac
|
||||
:NOMATCH
|
||||
>abcacabcabc
|
||||
5,11: abcabc
|
||||
5,8: abc
|
||||
>abcacac
|
||||
3,7: acac
|
||||
3,5: acac
|
||||
|
||||
# Very complex backreference
|
||||
/a(.*)b\1/
|
||||
>xxxab
|
||||
3,5: ab
|
||||
4,4:
|
||||
>xxxxazzzbzzz
|
||||
4,12: azzzbzzz
|
||||
5,8: zzz
|
||||
|
||||
# Case testing
|
||||
/abc/i
|
||||
>AbC
|
||||
0,3: AbC
|
||||
/[0-9][a-z]+/i
|
||||
>xxx0aaZxYT9
|
||||
3,10: 0aaZxYT
|
||||
/a.b/i
|
||||
>aaaaaaaaaaaxB
|
||||
10,13: axB
|
||||
/a.*z/i
|
||||
>xxxAaaaaZ
|
||||
3,9: AaaaaZ
|
||||
>xxaaaZaaa
|
||||
2,6: aaaZ
|
||||
/\<(lambda|defun|defmacro)\>/i
|
||||
> (lambda
|
||||
5,11: lambda
|
||||
5,11: lambda
|
||||
/\<(nil|t)\>/i
|
||||
>it Nil
|
||||
3,6: Nil
|
||||
3,6: Nil
|
||||
/\<(begin|end)\>/i
|
||||
>beginning the ending EnD
|
||||
21,24: EnD
|
||||
21,24: EnD
|
||||
|
||||
# Some newline tests
|
||||
/a.*/n
|
||||
>a\naaa
|
||||
0,1:a
|
||||
>xyza\naa
|
||||
3,4: a
|
||||
/a.+/n
|
||||
>a\naaa
|
||||
2,5: aaa
|
||||
>xyza\naa
|
||||
5,7: aa
|
||||
/a.?/n
|
||||
>a\naaa
|
||||
0,1: a
|
||||
>xyza\naa
|
||||
3,4: a
|
||||
|
||||
# Newline tests envolving complex patterns
|
||||
/a.*b.*c/n
|
||||
>xxaa\nzyacb\nabc
|
||||
11,14: abc
|
||||
>xxxab\nabc\nc
|
||||
6,9: abc
|
||||
/a.+b.*c/n
|
||||
>ab\nbc\nabbc
|
||||
6,10: abbc
|
||||
/a.?b.*c/n
|
||||
>ab\ncabbc\ncc
|
||||
4,8: abbc
|
||||
/^foo$/n
|
||||
>bar\nfoobar\nfoo
|
||||
11,14: foo
|
||||
|
||||
# Not so complex test involving a newline...
|
||||
/^\s*#\s*(define|include)\s+.+/n
|
||||
>#define\n#include x
|
||||
8,18: #include x
|
||||
9,16: include
|
||||
|
||||
# Check if large strings are working
|
||||
/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
|
||||
>zzzxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxzzz
|
||||
3,259: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
||||
/ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~/
|
||||
>String here: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~/
|
||||
13,333: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~
|
||||
|
||||
|
||||
# Some complex repetitions not supported
|
||||
# Listed here only to make sure the library is not crashing on these
|
||||
# Repetitions that match an empty match, or an empty string cannot follow
|
||||
# a complex repetition. A complex repetition is:
|
||||
# .* or .+ or .?
|
||||
# .{...} is not supported.
|
||||
/(.*)(\d*)/
|
||||
:BADRPT
|
||||
/(.*).(\d*)/
|
||||
:BADRPT
|
||||
/(.*)\<(\d*)/
|
||||
:BADRPT
|
||||
/(.*)\s(\d*)/
|
||||
:BADRPT
|
||||
/(.*)\D(\d*)/
|
||||
:BADRPT
|
||||
|
||||
# This is a more clear pattern and partially works
|
||||
/(.*)\D(\d+)/
|
||||
>abcW12
|
||||
0,6: abcW12
|
||||
0,3: abc
|
||||
4,6: 12
|
||||
>abcW12abcW12
|
||||
0,6: abcW12
|
||||
0,3: abc
|
||||
4,6: 12
|
||||
# This wasn't working in the previous version, but now with only minimal
|
||||
# matches supported, it works.
|
||||
>abcW12abcW12a
|
||||
0,6: abcW12
|
||||
0,3: abc
|
||||
4,6: 12
|
||||
|
||||
# Note the minimal match
|
||||
/.*\d/
|
||||
>a1a1a1aaaaaaa
|
||||
0,2: a1
|
||||
# Check match offsets
|
||||
/(.*)\d/
|
||||
>a1a1a1aaaaaaa
|
||||
0,2: a1
|
||||
0,1: a
|
||||
/.*(\d)/
|
||||
>a1a1a1aaaaaaa
|
||||
0,2: a1
|
||||
1,2: 1
|
||||
|
||||
/.*(\d+)/
|
||||
:BADRPT
|
||||
|
||||
# Regression fix, was matching empty string
|
||||
/\\\d{3}|\\./
|
||||
>\\
|
||||
:NOMATCH
|
||||
|
||||
/\\.|\\\d{3}/
|
||||
>\\
|
||||
:NOMATCH
|
Loading…
Add table
Add a link
Reference in a new issue