sync code with last improvements from OpenBSD

This commit is contained in:
purplerain 2023-08-28 05:57:34 +00:00
commit 88965415ff
Signed by: purplerain
GPG key ID: F42C07F07E2E35B7
26235 changed files with 29195616 additions and 0 deletions

121
app/xedit/lisp/re/README Normal file
View file

@ -0,0 +1,121 @@
$XFree86: xc/programs/xedit/lisp/re/README,v 1.3 2002/09/23 01:25:41 paulo Exp $
LAST UPDATED: $Date: 2006/11/25 20:35:00 $
This is a small regex library for fast matching tokens in text. It was built
to be used by xedit and it's syntax highlight code. It is not compliant with
IEEE Std 1003.2, but is expected to be used where very fast matching is
required, and exotic patterns will not be used.
To understand what kind of patterns this library is expected to be used with,
see the file <XRoot>xc/programs/xedit/lisp/modules/progmodes/c.lsp and some
samples in the file tests.txt, with comments for patterns that will not work,
or may give incorrect results.
The library is not built upon the standard regex library by Henry Spencer,
but is completely written from scratch, but it's syntax is heavily based on
that library, and the only reason for it to exist is that unfortunately
the standard version does not fit the requirements needed by xedit.
Anyways, I would like to thanks Henry for his regex library, it is a really
very useful tool.
Small description of understood tokens:
M A T C H I N G
------------------------------------------------------------------------
. Any character (won't match newline if compiled with RE_NEWLINE)
\w Any word letter (shortcut to [a-zA-Z0-9_]
\W Not a word letter (shortcut to [^a-zA-Z0-9_]
\d Decimal number
\D Not a decimal number
\s A space
\S Not a space
\l A lower case letter
\u An upper case letter
\c A control character, currently the range 1-32 (minus tab)
\C Not a control character
\o Octal number
\O Not an octal number
\x Hexadecimal number
\X Not an hexadecimal number
\< Beginning of a word (matches an empty string)
\> End of a word (matches an empty string)
^ Beginning of a line (matches an empty string)
$ End of a line (matches an empty string)
[...] Matches one of the characters inside the brackets
ranges are specified separating two characters with "-".
If the first character is "^", matches only if the
character is not in this range. To add a "]" make it
the first character, and to add a "-" make it the last.
\1 to \9 Backreference, matches the text that was matched by a group,
that is, text that was matched by the pattern inside
"(" and ")".
O P E R A T O R S
------------------------------------------------------------------------
() Any pattern inside works as a backreference, and is also
used to group patterns.
| Alternation, allows choosing different possibilities, like
character ranges, but allows patterns of different lengths.
R E P E T I T I O N
------------------------------------------------------------------------
<re>* <re> may occur any number of times, including zero
<re>+ <re> must occur at least once
<re>? <re> is optional
<re>{<e>} <re> must occur exactly <e> times
<re>{<n>,} <re> must occur at least <n> times
<re>{,<m>} <re> must not occur more than <m> times
<re>{<n>,<m>} <re> must occur at least <n> times, but no more than <m>
Note that "." is a special character, and when used with a repetition
operator it changes completely its meaning. For example, ".*" matches
anything up to the end of the input string (unless the pattern was compiled
with RE_NEWLINE, in that case it will match anything, but a newline).
Limitations:
o Only minimal matches supported. The engine has only one level "backtracking",
so, it also only does minimal matches to allow backreferences working
properly, and to avoid failing to match depending on the input.
o Only one level "grouping", for example, with the pattern:
(a(b)c)
If "abc" is anywhere in the input, it will be in "\1", but there will
not exist a "\2" for "b".
o Some "special repetitions" were not implemented, these are:
.{<e>}
.{<n>,}
.{,<m>}
.{<n>,<m>}
o Some patterns will never match, for example:
\w*\d
Since "\w*" already includes all possible matches of "\d", "\d" will
only be tested when "\w*" failed. There are no plans to make such
patterns work.
Some of these limitations may be worked on future versions of the library,
but this is not what the library is expected to do, and, adding support for
correct handling of these would probably make the library slower, what is
not the reason of it to exist in the first time.
If you need "true" regex than this library is not for you, but if all
you need is support for very quickly finding simple patterns, than this
library can be a very powerful tool, on some patterns it can run more
than 200 times faster than "true" regex implementations! And this is
the reason it was written.
Send comments and code to me (paulo@XFree86.Org) or to the XFree86
mailing/patch lists.
--
Paulo

2649
app/xedit/lisp/re/re.c Normal file

File diff suppressed because it is too large Load diff

123
app/xedit/lisp/re/re.h Normal file
View file

@ -0,0 +1,123 @@
/*
* Copyright (c) 2002 by The XFree86 Project, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE XFREE86 PROJECT BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF
* OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
* Except as contained in this notice, the name of the XFree86 Project shall
* not be used in advertising or otherwise to promote the sale, use or other
* dealings in this Software without prior written authorization from the
* XFree86 Project.
*
* Author: Paulo César Pereira de Andrade
*/
/* $XFree86: xc/programs/xedit/lisp/re/re.h,v 1.1 2002/09/08 02:29:50 paulo Exp $ */
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#ifndef _re_h
#define _re_h
/*
* Defines
*/
/* Compile flags options */
#define REG_BASIC 0000 /* Not used */
#define REG_EXTENDED 0001 /* Not used, only extended supported */
#define RE_ICASE 0002
#define RE_NOSUB 0004
#define RE_NEWLINE 0010
#define RE_NOSPEC 0020
#define RE_PEND 0040
#define RE_DUMP 0200
/* Execute flag options */
#define RE_NOTBOL 1
#define RE_NOTEOL 2
#define RE_STARTEND 4
#define RE_TRACE 00400 /* Not used/supported */
#define RE_LARGE 01000 /* Not used/supported */
#define RE_BACKR 02000 /* Not used/supported */
/* Value returned by reexec when match fails */
#define RE_NOMATCH 1
/* Compile error values */
#define RE_BADPAT 2
#define RE_ECOLLATE 3
#define RE_ECTYPE 4
#define RE_EESCAPE 5
#define RE_ESUBREG 6
#define RE_EBRACK 7
#define RE_EPAREN 8
#define RE_EBRACE 9
#define RE_EBADBR 10
#define RE_ERANGE 11
#define RE_ESPACE 12
#define RE_BADRPT 13
#define RE_EMPTY 14
#define RE_ASSERT 15
#define RE_INVARG 16
#define RE_ATOI 255 /* Not used/supported */
#define RE_ITOA 0400 /* Not used/supported */
/*
* Types
*/
/* (re)gular expression (mat)ch result */
typedef struct _re_mat {
long rm_so;
long rm_eo;
} re_mat;
/* (re)gular expression (cod)e */
typedef struct _re_cod {
unsigned char *cod;
int re_nsub; /* Public member */
const char *re_endp; /* Support for RE_PEND */
} re_cod;
/*
* Prototypes
*/
/* compile the given pattern string
* returns 0 on success, error code otherwise */
int recomp(re_cod *preg, const char *pattern, int flags);
/* execute the compiled pattern on the string.
* returns 0 if matched, RE_NOMATCH if failed, error code otherwise */
int reexec(const re_cod *preg, const char *string,
int nmat, re_mat pmat[], int flags);
/* formats an error message for the given code in ebuffer */
int reerror(int ecode, const re_cod *preg, char *ebuffer, int ebuffer_size);
/* frees the given parameter */
void refree(re_cod *preg);
#endif /* _re_h */

1015
app/xedit/lisp/re/rec.c Normal file

File diff suppressed because it is too large Load diff

685
app/xedit/lisp/re/reo.c Normal file
View file

@ -0,0 +1,685 @@
/*
* Copyright (c) 2002 by The XFree86 Project, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE XFREE86 PROJECT BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF
* OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
* Except as contained in this notice, the name of the XFree86 Project shall
* not be used in advertising or otherwise to promote the sale, use or other
* dealings in this Software without prior written authorization from the
* XFree86 Project.
*
* Author: Paulo César Pereira de Andrade
*/
/* $XFree86: xc/programs/xedit/lisp/re/reo.c,v 1.8 2002/09/29 02:55:01 paulo Exp $ */
#include "rep.h"
/*
* This file is a placeholder to add code to analyse and optimize the
* intermediate data structure generated in rep.c.
* Character ranges are optimized while being generated.
*/
/*
* Types
*/
typedef struct _orec_inf {
rec_alt *alt; /* Main alternatives list */
rec_grp *grp; /* Current group pointer */
int flags;
int ecode;
} orec_inf;
/*
* Prototypes
*/
static int orec_alt(orec_inf*, rec_alt*);
static int orec_pat(orec_inf*, rec_pat*);
static int orec_grp(orec_inf*, rec_grp*);
static int orec_pat_bad_rpt(orec_inf*, rec_pat*);
static int orec_pat_bad_forward_rpt(orec_inf*, rec_pat*);
static int orec_pat_rng(orec_inf*, rec_pat*);
static int orec_pat_cse(orec_inf*, rec_pat*);
static int orec_pat_cse_can(orec_inf*, rec_pat*);
static int orec_str_list(orec_inf*, rec_alt*, int, int);
/*
* Initialization
*/
extern unsigned char re__alnum[256];
extern unsigned char re__odigit[256];
extern unsigned char re__ddigit[256];
extern unsigned char re__xdigit[256];
extern unsigned char re__control[256];
/*
* Implementation
*/
int
orec_comp(rec_alt *alt, int flags)
{
orec_inf inf;
inf.alt = alt;
inf.grp = NULL;
inf.flags = flags;
inf.ecode = 0;
orec_alt(&inf, alt);
return (inf.ecode);
}
void
orec_free_stl(rec_stl *stl)
{
int i;
for (i = 0; i < stl->nstrs; i++) {
if (stl->lens[i] > 2)
free(stl->strs[i]);
}
free(stl->lens);
free(stl->strs);
free(stl);
}
static int
orec_alt(orec_inf *inf, rec_alt *alt)
{
if (alt) {
rec_alt *ptr = alt;
int ret, count = 0, str = 1, cstr = 1, lits = 0, clits = 0;
/* Check if can build a string list */
if (ptr->next) {
/* If more than one alternative */
while (ptr && (str || cstr)) {
if (ptr->pat == NULL || ptr->pat->rep != NULL) {
cstr = str = 0;
break;
}
if ((inf->flags & RE_ICASE)) {
if (!(ret = orec_pat_cse_can(inf, ptr->pat))) {
cstr = str = 0;
break;
}
if (ret == 1)
++lits;
else if (ret == 2)
++clits;
}
else if (ptr->pat->next == NULL) {
if (ptr->pat->type != Rep_String) {
if (ptr->pat->type != Rep_Literal) {
str = 0;
if (ptr->pat->type != Rep_CaseString) {
if (ptr->pat->type != Rep_CaseLiteral)
cstr = 0;
else
++clits;
}
else if (strlen((char*)ptr->pat->data.str) >= 255)
str = cstr = 0;
}
else
++lits;
}
else if (strlen((char*)ptr->pat->data.str) >= 255)
str = cstr = 0;
}
else {
str = cstr = 0;
break;
}
if (++count >= 255)
str = cstr = 0;
ptr = ptr->next;
}
if (str || cstr) {
if (inf->flags & RE_ICASE) {
for (ptr = alt; ptr; ptr = ptr->next) {
if (orec_pat_cse(inf, ptr->pat))
return (inf->ecode);
}
str = 0;
}
return (orec_str_list(inf, alt, str, count));
}
}
else if (alt == inf->alt && alt->pat && alt->pat->rep == NULL) {
/* If the toplevel single alternative */
switch (alt->pat->type) {
/* One of these will always be true for RE_NOSPEC,
* but can also be optimized for simple patterns */
case Rep_Literal:
alt->pat->type = Rep_SearchLiteral;
break;
case Rep_CaseLiteral:
alt->pat->type = Rep_SearchCaseLiteral;
break;
case Rep_String:
alt->pat->type = Rep_SearchString;
break;
case Rep_CaseString:
alt->pat->type = Rep_SearchCaseString;
break;
default:
break;
}
}
while (alt) {
orec_pat(inf, alt->pat);
alt = alt->next;
}
}
return (inf->ecode);
}
static int
orec_pat(orec_inf *inf, rec_pat *pat)
{
rec_pat *next;
while (pat) {
switch (pat->type) {
case Rep_AnyAnyTimes:
if (pat->next == NULL) {
rec_grp *grp = inf->grp;
next = NULL;
while (grp) {
next = grp->parent->next;
/* Cannot check if is .*$ as the input
* may be a substring */
if (next)
break;
grp = grp->pgrp;
}
if (next == NULL) {
/* <re>.* */
pat->type = Rep_AnyEatAnyTimes;
grp = inf->grp;
while (grp) {
--grp->comp;
next = grp->parent->next;
if (next)
break;
grp = grp->pgrp;
}
}
else if (orec_pat_bad_rpt(inf, next))
return (inf->ecode);
}
else if (orec_pat_bad_rpt(inf, pat->next))
return (inf->ecode);
break;
case Rep_AnyMaybe:
if (pat->next == NULL) {
rec_grp *grp = inf->grp;
next = NULL;
while (grp) {
next = grp->parent->next;
if (next)
break;
grp = grp->pgrp;
}
if (next == NULL) {
/* <re>.? */
pat->type = Rep_AnyEatMaybe;
grp = inf->grp;
while (grp) {
--grp->comp;
next = grp->parent->next;
if (next)
break;
grp = grp->pgrp;
}
}
else if (orec_pat_bad_rpt(inf, next))
return (inf->ecode);
}
else if (orec_pat_bad_rpt(inf, pat->next))
return (inf->ecode);
break;
case Rep_AnyAtLeast:
if (pat->next == NULL) {
rec_grp *grp = inf->grp;
next = NULL;
while (grp) {
next = grp->parent->next;
if (next)
break;
grp = grp->pgrp;
}
if (next == NULL) {
/* <re>.+ */
pat->type = Rep_AnyEatAtLeast;
grp = inf->grp;
while (grp) {
--grp->comp;
next = grp->parent->next;
if (next)
break;
grp = grp->pgrp;
}
}
else if (orec_pat_bad_rpt(inf, next))
return (inf->ecode);
}
else if (orec_pat_bad_rpt(inf, pat->next))
return (inf->ecode);
break;
case Rep_Range:
case Rep_RangeNot:
orec_pat_rng(inf, pat);
break;
case Rep_Group:
orec_grp(inf, pat->data.grp);
break;
default:
break;
}
pat = pat->next;
}
return (inf->ecode);
}
static int
orec_pat_bad_rpt(orec_inf *inf, rec_pat *pat)
{
switch (pat->type) {
/* Not really an error, but aren't supported by the library.
* Includes: .*.*, .+<re>? .*<re>*, (.*)(<re>*), etc.
*/
/* Not a repetition, but mathes anything... */
case Rep_Any:
/* Zero length matches */
case Rep_Eol:
if (!(inf->flags & RE_NEWLINE))
break;
case Rep_Bol:
case Rep_Bow:
case Rep_Eow:
/* Repetitions */
case Rep_AnyAnyTimes:
case Rep_AnyMaybe:
case Rep_AnyAtLeast:
inf->ecode = RE_BADRPT;
break;
/* Check if the first group element is a complex pattern */
case Rep_Group:
if (pat->rep == NULL) {
if (pat->data.grp->alt) {
for (pat = pat->data.grp->alt->pat; pat; pat = pat->next) {
if (orec_pat_bad_rpt(inf, pat))
break;
}
}
break;
}
/*FALLTHROUGH*/
default:
if (pat->rep)
inf->ecode = RE_BADRPT;
break;
}
if (!inf->ecode && pat && pat->next)
orec_pat_bad_forward_rpt(inf, pat->next);
return (inf->ecode);
}
static int
orec_pat_bad_forward_rpt(orec_inf *inf, rec_pat *pat)
{
if (pat->rep) {
switch (pat->rep->type) {
case Rer_MinMax:
if (pat->rep->mine > 0)
break;
case Rer_AnyTimes:
case Rer_Maybe:
case Rer_Max:
inf->ecode = RE_BADRPT;
default:
break;
}
}
else if (pat->type == Rep_Group &&
pat->data.grp->alt &&
pat->data.grp->alt->pat)
orec_pat_bad_forward_rpt(inf, pat->data.grp->alt->pat);
return (inf->ecode);
}
static int
orec_grp(orec_inf *inf, rec_grp *grp)
{
rec_grp *prev = inf->grp;
inf->grp = grp;
orec_alt(inf, grp->alt);
/* Could also just say: inf->grp = grp->gparent */
inf->grp = prev;
return (inf->ecode);
}
static int
orec_pat_rng(orec_inf *inf, rec_pat *pat)
{
int i, j[2], count;
rec_pat_t type = pat->type;
unsigned char *range = pat->data.rng->range;
for (i = count = j[0] = j[1] = 0; i < 256; i++) {
if (range[i]) {
if (count == 2) {
++count;
break;
}
j[count++] = i;
}
}
if (count == 1 ||
(count == 2 &&
((islower(j[0]) && toupper(j[0]) == j[1]) ||
(isupper(j[0]) && tolower(j[0]) == j[1])))) {
free(pat->data.rng);
if (count == 1) {
pat->data.chr = j[0];
pat->type = type == Rep_Range ? Rep_Literal : Rep_LiteralNot;
}
else {
pat->data.cse.upper = j[0];
pat->data.cse.lower = j[1];
pat->type = type == Rep_Range ? Rep_CaseLiteral : Rep_CaseLiteralNot;
}
}
else {
if (memcmp(re__alnum, range, 256) == 0)
type = type == Rep_Range ? Rep_Alnum : Rep_AlnumNot;
else if (memcmp(re__odigit, range, 256) == 0)
type = type == Rep_Range ? Rep_Odigit : Rep_OdigitNot;
else if (memcmp(re__ddigit, range, 256) == 0)
type = type == Rep_Range ? Rep_Digit : Rep_DigitNot;
else if (memcmp(re__xdigit, range, 256) == 0)
type = type == Rep_Range ? Rep_Xdigit : Rep_XdigitNot;
else if (memcmp(re__control, range, 256) == 0)
type = type == Rep_Range ? Rep_Control : Rep_ControlNot;
if (type != pat->type) {
free(pat->data.rng);
pat->type = type;
}
}
return (inf->ecode);
}
/* Join patterns if required, will only fail on memory allocation failure:
*/
static int
orec_pat_cse(orec_inf *inf, rec_pat *pat)
{
rec_pat_t type;
int i, len, length;
rec_pat *ptr, *next;
unsigned char *str, *tofree;
if (pat->next == NULL && pat->type == Rep_CaseString)
return (inf->ecode);
type = Rep_CaseString;
/* First calculate how many bytes will be required */
for (ptr = pat, length = 1; ptr; ptr = ptr->next) {
switch (ptr->type) {
case Rep_Literal:
length += 2;
break;
case Rep_String:
length += strlen((char*)ptr->data.str) << 1;
break;
case Rep_CaseLiteral:
length += 2;
break;
case Rep_CaseString:
length += strlen((char*)ptr->data.str);
break;
default:
break;
}
}
if ((str = malloc(length)) == NULL)
return (inf->ecode = RE_ESPACE);
for (ptr = pat, length = 0; ptr; ptr = next) {
tofree = NULL;
next = ptr->next;
switch (ptr->type) {
case Rep_Literal:
str[length++] = ptr->data.chr;
str[length++] = ptr->data.chr;
break;
case Rep_String:
tofree = ptr->data.str;
len = strlen((char*)tofree);
for (i = 0; i < len; i++) {
str[length++] = tofree[i];
str[length++] = tofree[i];
}
break;
case Rep_CaseLiteral:
str[length++] = ptr->data.cse.lower;
str[length++] = ptr->data.cse.upper;
break;
case Rep_CaseString:
tofree = ptr->data.str;
len = strlen((char*)tofree);
memcpy(str + length, tofree, len);
length += len;
break;
default:
break;
}
if (tofree)
free(tofree);
if (ptr != pat)
free(ptr);
}
str[length] = '\0';
pat->type = type;
pat->data.str = str;
pat->next = NULL;
return (inf->ecode);
}
/* Return 0 if the patterns in the list cannot be merged, 1 if will
* be a simple string, 2 if a case string.
* This is useful when building an alternative list that is composed
* only of strings, but the regex is case insensitive, in wich case
* the first pass may have splited some patterns, but if it is a member
* of an alternatives list, the cost of using a string list is smaller */
static int
orec_pat_cse_can(orec_inf *inf, rec_pat *pat)
{
int ret;
if (pat == NULL)
return (0);
for (ret = 1; pat; pat = pat->next) {
if (pat->rep)
return (0);
switch (pat->type) {
case Rep_Literal:
case Rep_String:
break;
case Rep_CaseLiteral:
case Rep_CaseString:
ret = 2;
break;
default:
return (0);
}
}
return (ret);
}
/* XXX If everything is a (case) byte, the pattern should be
* [abcde] instead of a|b|c|d|e (or [aAbBcCdDeE] instead of aA|bB|cC|dD|eE)
* as a string list works fine, but as a character range
* should be faster, and maybe could be converted here. But not
* very important, if performance is required, it should have already
* been done in the pattern.
*/
static int
orec_str_list(orec_inf *inf, rec_alt *alt, int str, int count)
{
rec_stl *stl;
rec_pat *pat;
rec_alt *ptr, *next;
int i, j, tlen, len, is;
if ((stl = calloc(1, sizeof(rec_stl))) == NULL)
return (inf->ecode = RE_ESPACE);
if ((stl->lens = malloc(sizeof(unsigned char) * count)) == NULL) {
free(stl);
return (inf->ecode = RE_ESPACE);
}
if ((stl->strs = malloc(sizeof(char*) * count)) == NULL) {
free(stl->lens);
free(stl);
return (inf->ecode = RE_ESPACE);
}
if ((pat = calloc(1, sizeof(rec_pat))) == NULL) {
free(stl->strs);
free(stl->lens);
free(stl);
return (inf->ecode = RE_ESPACE);
}
pat->data.stl = stl;
pat->type = Rep_StringList;
stl->type = str ? Resl_StringList : Resl_CaseStringList;
for (i = tlen = 0, ptr = alt; i < count; i++) {
next = ptr->next;
switch (ptr->pat->type) {
case Rep_Literal:
is = len = 1;
break;
case Rep_CaseLiteral:
is = len = 2;
break;
default:
is = 0;
len = strlen((char*)ptr->pat->data.str);
break;
}
tlen += len;
stl->lens[i] = len;
if (!is) {
if (len > 2)
stl->strs[i] = ptr->pat->data.str;
else {
if (len == 1)
stl->strs[i] = (void*)(long)(ptr->pat->data.str[0]);
else
stl->strs[i] = (void*)(long)
(ptr->pat->data.str[0] |
((int)ptr->pat->data.str[1] << 8));
free(ptr->pat->data.str);
}
}
else {
if (is == 1)
stl->strs[i] = (void*)(long)ptr->pat->data.chr;
else
stl->strs[i] = (void*)(long)
(ptr->pat->data.cse.lower |
(ptr->pat->data.cse.upper << 8));
}
free(ptr->pat);
if (i)
free(ptr);
ptr = next;
}
stl->tlen = tlen;
stl->nstrs = count;
alt->pat = pat;
alt->next = NULL;
{
int li, lj;
unsigned char ci, cj, *str;
/* Don't need a stable sort, there shouldn't be duplicated strings,
* but don't check for it either. Only need to make sure that all
* strings that start with the same byte are together */
for (i = 0; i < count; i++) {
li = stl->lens[i];
ci = li > 2 ? stl->strs[i][0] : (long)stl->strs[i] & 0xff;
for (j = i + 1; j < count; j++) {
lj = stl->lens[j];
cj = lj > 2 ? stl->strs[j][0] : (long)stl->strs[j] & 0xff;
if ((count >= LARGE_STL_COUNT && cj < ci) ||
(cj == ci && lj > li)) {
/* If both strings start with the same byte,
* put the longer first */
str = stl->strs[j];
stl->strs[j] = stl->strs[i];
stl->strs[i] = str;
stl->lens[j] = li;
stl->lens[i] = lj;
li ^= lj; lj ^= li; li ^= lj;
ci ^= cj; cj ^= ci; ci ^= cj;
}
}
}
}
return (inf->ecode);
}

369
app/xedit/lisp/re/rep.h Normal file
View file

@ -0,0 +1,369 @@
/*
* Copyright (c) 2002 by The XFree86 Project, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE XFREE86 PROJECT BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF
* OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
* Except as contained in this notice, the name of the XFree86 Project shall
* not be used in advertising or otherwise to promote the sale, use or other
* dealings in this Software without prior written authorization from the
* XFree86 Project.
*
* Author: Paulo César Pereira de Andrade
*/
/* $XFree86: xc/programs/xedit/lisp/re/rep.h,v 1.2 2002/11/15 07:01:33 paulo Exp $ */
#include "re.h"
#ifndef _rep_h
#define _rep_h
/*
* Local defines
*/
#ifdef MIN
#undef MIN
#endif
#define MIN(a, b) ((a) < (b) ? (a) : (b))
#ifdef MAX
#undef MAX
#endif
#define MAX(a, b) ((a) > (b) ? (a) : (b))
/* This value can not be larger than 255, a depth value is the nesting of
* repetition operations and alternatives. The number of nested parenthesis
* does not matter, but a repetition on the pattern inside the parenthesis
* does. Note also that you cannot have more than 9 parenthesis pairs in
* an expression.
* Depth is always at least 1. So for MAX_DEPTH 8, it is only allowed
* 7 complex repetitions. A complex repetition is a dot followed by an
* repetition operator. It is called a complex repetition because dot
* matches anything but the empty string, so the engine needs to test
* all possible combinations until the end of the string is found.
* Repetitions like .* use one depth until the end of the string is found,
* for example a.*b.*c.*d has depth 4, while a*b*c*d has depth 2.
*/
#define MAX_DEPTH 8
/* Minimum number of strings to generate a "large" string list, that is,
* sort the strings and allocate 512 extra bytes to map the first string
* with a given initial byte. */
#define LARGE_STL_COUNT 16
/*
* Local types
*/
/* Intermediate compilation types declaration */
/* (r)egular (e)xpression (c)ompile (c)a(se) */
typedef struct _rec_cse rec_cse;
/* (r)egular (e)xpression (c)ompile (r)a(ng)e */
typedef struct _rec_rng rec_rng;
/* (r)egular (e)xpression (c)ompile (pat)tern */
typedef struct _rec_pat rec_pat;
/* (r)egular (e)xpression (c)ompile (rep)etition */
typedef struct _rec_rep rec_rep;
/* (r)egular (e)xpression (c)ompile (gr)ou(p) */
typedef struct _rec_grp rec_grp;
/* (r)egular (e)xpression (c)ompile (alt)ernatives */
typedef struct _rec_alt rec_alt;
/* Optimization types */
/* (r)egular (e)xpression (c)ompile (st)ring (l)ist */
typedef struct _rec_stl rec_stl;
/* Final compilation and execution types */
/* (re)gular expression (inf)ormation */
typedef struct _re_inf re_inf;
/* (re)gular expression (eng)ine */
typedef struct _re_eng re_eng;
/* Codes used by the engine */
typedef enum {
/* Grouping */
Re_Open, /* ( */
Re_Close, /* ) */
Re_Update, /* Like Re_Close, but is inside a loop */
/* Alternatives */
Re_Alt, /* Start alternative list, + next offset */
Re_AltNext, /* Next alternative, + next offset */
Re_AltDone, /* Finish alternative list */
/* Repetition */
Re_AnyTimes, /* * */
Re_Maybe, /* ? */
Re_AtLeast, /* +, at least one */
/* Repetition like */
Re_AnyAnyTimes, /* .*<re> */
Re_AnyMaybe, /* .?<re> */
Re_AnyAtLeast, /* .+<re> */
Re_AnyEatAnyTimes, /* Expression ends with .* */
Re_AnyEatMaybe, /* Expression ends with .? */
Re_AnyEatAtLeast, /* Expression ends with .+ */
/* Repetition with arguments */
Re_Exact, /* {e} */
Re_Min, /* {n,} */
Re_Max, /* {,m} */
Re_MinMax, /* {n,m} */
/* Repetition helper instruction */
Re_RepJump, /* Special code, go back to repetition */
Re_RepLongJump, /* Jump needs two bytes */
/* After the repetition data, all repetitions have an offset
* to the code after the repetition */
/* Matching */
Re_Any, /* . */
Re_Odigit, /* \o */
Re_OdigitNot, /* \O */
Re_Digit, /* \d */
Re_DigitNot, /* \D */
Re_Xdigit, /* \x */
Re_XdigitNot, /* \x */
Re_Space, /* \s */
Re_SpaceNot, /* \S */
Re_Tab, /* \t */
Re_Newline, /* \n */
Re_Lower, /* \l */
Re_Upper, /* \u */
Re_Alnum, /* \w */
Re_AlnumNot, /* \W */
Re_Control, /* \c */
Re_ControlNot, /* \C */
Re_Bol, /* ^ */
Re_Eol, /* $ */
Re_Bow, /* \< */
Re_Eow, /* \> */
/* Range matching information */
Re_Range, /* + 256 bytes */
Re_RangeNot, /* + 256 bytes */
/* Matching with arguments */
Re_Literal, /* + character */
Re_CaseLiteral, /* + lower + upper */
Re_LiteralNot, /* + character */
Re_CaseLiteralNot, /* + lower + upper */
Re_String, /* + length + string */
Re_CaseString, /* + length + string in format lower-upper */
/* These are useful to start matching, or when RE_NOSPEC is used. */
Re_SearchLiteral,
Re_SearchCaseLiteral,
Re_SearchString,
Re_SearchCaseString,
Re_StringList, /* + total-length + lengths + strings */
Re_CaseStringList, /* + total-length + lengths + strings */
Re_LargeStringList, /* + total-length + lengths + map + strings */
Re_LargeCaseStringList, /* + total-length + lengths + map + strings */
/* Backreference */
Re_Backref, /* + reference number */
/* The last codes */
Re_DoneIf, /* Done if at end of input */
Re_MaybeDone, /* Done */
Re_Done /* If this code found, finished execution */
} ReCode;
/* (r)egular (e)xpresssion (pat)rern (t)ype */
typedef enum _rec_pat_t {
Rep_Literal = Re_Literal,
Rep_CaseLiteral = Re_CaseLiteral,
Rep_LiteralNot = Re_LiteralNot,
Rep_CaseLiteralNot = Re_CaseLiteralNot,
Rep_Range = Re_Range,
Rep_RangeNot = Re_RangeNot,
Rep_String = Re_String,
Rep_CaseString = Re_CaseString,
Rep_SearchLiteral = Re_SearchLiteral,
Rep_SearchCaseLiteral = Re_SearchCaseLiteral,
Rep_SearchString = Re_SearchString,
Rep_SearchCaseString = Re_SearchCaseString,
Rep_Any = Re_Any,
Rep_AnyAnyTimes = Re_AnyAnyTimes,
Rep_AnyEatAnyTimes = Re_AnyEatAnyTimes,
Rep_AnyMaybe = Re_AnyMaybe,
Rep_AnyEatMaybe = Re_AnyEatMaybe,
Rep_AnyAtLeast = Re_AnyAtLeast,
Rep_AnyEatAtLeast = Re_AnyEatAtLeast,
Rep_Odigit = Re_Odigit,
Rep_OdigitNot = Re_OdigitNot,
Rep_Digit = Re_Digit,
Rep_DigitNot = Re_DigitNot,
Rep_Xdigit = Re_Xdigit,
Rep_XdigitNot = Re_XdigitNot,
Rep_Space = Re_Space,
Rep_SpaceNot = Re_SpaceNot,
Rep_Tab = Re_Tab,
Rep_Newline = Re_Newline,
Rep_Lower = Re_Lower,
Rep_Upper = Re_Upper,
Rep_Alnum = Re_Alnum,
Rep_AlnumNot = Re_AlnumNot,
Rep_Control = Re_Control,
Rep_ControlNot = Re_ControlNot,
Rep_Bol = Re_Bol,
Rep_Eol = Re_Eol,
Rep_Bow = Re_Bow,
Rep_Eow = Re_Eow,
Rep_Backref = Re_Backref,
Rep_StringList = Re_StringList,
Rep_Group = Re_Open
} rec_pat_t;
/* (r)egular (e)xpression (rep)etition (t)ype */
typedef enum _rec_rep_t {
Rer_AnyTimes = Re_AnyTimes,
Rer_AtLeast = Re_AtLeast,
Rer_Maybe = Re_Maybe,
Rer_Exact = Re_Exact,
Rer_Min = Re_Min,
Rer_Max = Re_Max,
Rer_MinMax = Re_MinMax
} rec_rep_t;
/* Decide at re compilation time what is lowercase and what is uppercase */
struct _rec_cse {
unsigned char lower;
unsigned char upper;
};
/* A rec_rng is used only during compilation, just a character map */
struct _rec_rng {
unsigned char range[256];
};
/* A rec_pat is used only during compilation, and can be viewed as
* a regular expression element like a match to any character, a match
* to the beginning or end of the line, etc.
* It is implemented as a linked list, and does not have nesting.
* The data field can contain:
* chr: the value of a single character to match.
* cse: the upper and lower case value of a character to match.
* rng: a character map to match or not match.
* str: a simple string or a string where every two bytes
* represents the character to match, in lower/upper
* case sequence.
* The rep field is not used for strings, strings are broken in the
* last character in this case. That is, strings are just a concatenation
* of several character matches.
*/
struct _rec_pat {
rec_pat_t type;
rec_pat *next, *prev; /* Linked list information */
union {
unsigned char chr;
rec_cse cse;
rec_rng *rng;
rec_grp *grp;
unsigned char *str;
rec_stl *stl;
} data;
rec_rep *rep; /* Pattern repetition information */
};
/* A rec_rep is used only during compilation, and can be viewed as:
*
* ? or * or + or {<e>} or {<m>,} or {,<M>} or {<m>,<M>}
*
* where <e> is "exact", <m> is "minimum" and <M> is "maximum".
* In the compiled step it can also be just a NULL pointer, that
* is actually equivalent to {1}.
*/
struct _rec_rep {
rec_rep_t type;
short mine; /* minimum or exact number of matches */
short maxc; /* maximum number of matches */
};
/* A rec_alt is used only during compilation, and can be viewed as:
*
* <re>|<re>
*
* where <re> is any regular expression. The expressions are nested
* using the grp field of the rec_pat structure.
*/
struct _rec_alt {
rec_alt *next, *prev; /* Linked list information */
rec_pat *pat;
};
/* A rec_grp is a place holder for expressions enclosed in parenthesis
* and is linked to the compilation data by an rec_pat structure. */
struct _rec_grp {
rec_pat *parent; /* Reference to parent pattern */
rec_alt *alt; /* The pattern information */
rec_alt *palt; /* Parent alternative */
rec_grp *pgrp; /* Nested groups */
int comp; /* (comp)lex repetition pattern inside group */
};
/* Optimization compilation types definition */
/* (r)egular (e)xpression (c)ompile (st)ring (l)ist (t)ype */
typedef enum {
Resl_StringList = Re_StringList,
Resl_CaseStringList = Re_CaseStringList
} rec_stl_t;
struct _rec_stl {
rec_stl_t type;
int nstrs; /* Number of strings in list */
int tlen; /* Total length of all strings */
unsigned char *lens; /* Vector of string lengths */
unsigned char **strs; /* The strings */
};
/*
* Prototypes
*/
/* rep.c */
rec_alt *irec_comp(const char*, const char*, int, int*);
void irec_free_alt(rec_alt*);
/* reo.c */
int orec_comp(rec_alt*, int);
void orec_free_stl(rec_stl*);
#endif /* _rep_h */

199
app/xedit/lisp/re/tests.c Normal file
View file

@ -0,0 +1,199 @@
/*
* Copyright (c) 2002 by The XFree86 Project, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE XFREE86 PROJECT BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF
* OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
* Except as contained in this notice, the name of the XFree86 Project shall
* not be used in advertising or otherwise to promote the sale, use or other
* dealings in this Software without prior written authorization from the
* XFree86 Project.
*
* Author: Paulo César Pereira de Andrade
*/
/* $XFree86$ */
/*
* Compile with: cc -o tests tests.c -L. -lre
*/
#include <stdio.h>
#include <string.h>
#include "re.h"
int
main(int argc, char *argv[])
{
re_cod cod;
re_mat mat[10];
int line, ecode, i, len, group, failed;
long eo, so;
char buf[8192];
char str[8192];
FILE *fp = fopen("tests.txt", "r");
if (fp == NULL) {
fprintf(stderr, "failed to open tests.txt\n");
exit(1);
}
ecode = line = group = failed = 0;
cod.cod = NULL;
while (fgets(buf, sizeof(buf), fp)) {
++line;
if (buf[0] == '#' || buf[0] == '\n')
continue;
else if (buf[0] == '/') {
char *ptr = strrchr(buf, '/');
if (ptr == buf) {
fprintf(stderr, "syntax error at line %d\n", line);
break;
}
else {
int flags = 0;
refree(&cod);
for (*ptr++ = '\0'; *ptr; ptr++) {
if (*ptr == 'i')
flags |= RE_ICASE;
else if (*ptr == 'n')
flags |= RE_NEWLINE;
}
ecode = recomp(&cod, buf + 1, flags);
failed = ecode;
}
}
else if (buf[0] == '>') {
if (cod.cod == NULL) {
fprintf(stderr, "no previous pattern at line %d\n", line);
break;
}
len = strlen(buf) - 1;
buf[len] = '\0';
strcpy(str, buf + 1);
for (i = 0, --len; i < len - 1; i++) {
if (str[i] == '\\') {
memmove(str + i, str + i + 1, len);
--len;
switch (str[i]) {
case 'a':
str[i] = '\a';
break;
case 'b':
str[i] = '\b';
break;
case 'f':
str[i] = '\f';
break;
case 'n':
str[i] = '\n';
break;
case 'r':
str[i] = '\r';
break;
case 't':
str[i] = '\t';
break;
case 'v':
str[i] = '\v';
break;
default:
break;
}
}
}
group = 0;
ecode = reexec(&cod, str, 10, &mat[0], 0);
if (ecode && ecode != RE_NOMATCH) {
reerror(failed, &cod, buf, sizeof(buf));
fprintf(stderr, "%s, at line %d\n", buf, line);
break;
}
}
else if (buf[0] == ':') {
if (failed) {
len = strlen(buf) - 1;
buf[len] = '\0';
if (failed == RE_EESCAPE && strcmp(buf, ":EESCAPE") == 0)
continue;
if (failed == RE_ESUBREG && strcmp(buf, ":ESUBREG") == 0)
continue;
if (failed == RE_EBRACK && strcmp(buf, ":EBRACK") == 0)
continue;
if (failed == RE_EPAREN && strcmp(buf, ":EPAREN") == 0)
continue;
if (failed == RE_EBRACE && strcmp(buf, ":EBRACE") == 0)
continue;
if (failed == RE_EBADBR && strcmp(buf, ":EBADBR") == 0)
continue;
if (failed == RE_ERANGE && strcmp(buf, ":ERANGE") == 0)
continue;
if (failed == RE_ESPACE && strcmp(buf, ":ESPACE") == 0)
continue;
if (failed == RE_BADRPT && strcmp(buf, ":BADRPT") == 0)
continue;
if (failed == RE_EMPTY && strcmp(buf, ":EMPTY") == 0)
continue;
reerror(failed, &cod, buf, sizeof(buf));
fprintf(stderr, "Error value %d doesn't match: %s, at line %d\n",
failed, buf, line);
break;
}
else if (!ecode) {
fprintf(stderr, "found match when shoudn't, at line %d\n", line);
break;
}
}
else {
if (failed) {
reerror(failed, &cod, buf, sizeof(buf));
fprintf(stderr, "%s, at line %d\n", buf, line);
break;
}
if (sscanf(buf, "%ld,%ld:", &so, &eo) != 2) {
fprintf(stderr, "expecting match offsets at line %d\n", line);
break;
}
else if (ecode) {
fprintf(stderr, "didn't match, at line %d\n", line);
break;
}
else if (group >= 10) {
fprintf(stderr, "syntax error at line %d (too many groups)\n",
line);
break;
}
else if (so != mat[group].rm_so || eo != mat[group].rm_eo) {
fprintf(stderr, "match failed at line %d, got %ld,%ld: ",
line, mat[group].rm_so, mat[group].rm_eo);
if (mat[group].rm_so < mat[group].rm_eo)
fwrite(str + mat[group].rm_so,
mat[group].rm_eo - mat[group].rm_so, 1, stderr);
fputc('\n', stderr);
break;
}
++group;
}
}
fclose(fp);
return (ecode);
}

470
app/xedit/lisp/re/tests.txt Normal file
View file

@ -0,0 +1,470 @@
#
# Copyright (c) 2002 by The XFree86 Project, Inc.
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE XFREE86 PROJECT BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
# WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF
# OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
#
# Except as contained in this notice, the name of the XFree86 Project shall
# not be used in advertising or otherwise to promote the sale, use or other
# dealings in this Software without prior written authorization from the
# XFree86 Project.
#
# Author: Paulo César Pereira de Andrade
#
#
# $XFree86: xc/programs/xedit/lisp/re/tests.txt,v 1.1 2002/09/08 02:29:50 paulo Exp $
# Some tests for the library:
# lines starting with # are comments
# lines starting with / are a regular expression pattern
# The pattern must end with / and may be followed by:
# i -> ignore case
# n -> create newline sensitive regex
# lines starting with > are a string input to the last pattern
# To test newline sensitive matching, add \n to the string.
# lines starting with a number are the expected result
# If more than one line, every subsequent line is the
# value of an "subresult".
# :NOMATCH means that the string input should not match
# Simple string
/abc/
>abc
0,3: abc
>aaaaaaaaaaaaaaabc
14,17: abc
>xxxxxxxxxxxxxxaaaaaaaaaaaaaaaaabcxx
30,33: abc
# String list
/abc|bcd|cde/
>abc
0,3: abc
>aabc
1,4: abc
>xxxbcdef
3,6: bcd
>abdzzzcdabcde
8,11: abc
>xxxxabdecdabdcde
13,16: cde
# Complex string
/a?bc|ab?c|abc?/
>abc
0,3: abc
>xxxb
:NOMATCH
>xxxbc
3,5: bc
>sssssab
5,7: ab
# Another complex string
/a*bc|ab*c|abc*/
>aaaaaaabc
0,9: aaaaaaabc
>xaaaaaaabc
1,10: aaaaaaabc
>xyzaaaaaaabc
3,12: aaaaaaabc
>abbc
0,4: abbc
>xxabbbbbc
2,9: abbbbbc
>abcccccccccc
0,12: abcccccccccc
>abccccccccccd
0,12: abcccccccccc
>xxxxxxxaaaaaaaaaabbbbbbbbbbbccccccccccc
16,29: abbbbbbbbbbbc
>xxxbbbbbbbbbc
11,13: bc
# Another complex string
/a+bc|ab+c|abc+/
>xxxbc
:NOMATCH
>xaaabc
1,6: aaabc
>zzzzaaaaabbc
8,12: abbc
>zzzzaaaabbbbbbcccc
7,15: abbbbbbc
# Simple pattern
/a.c/
>abc
0,3: abc
>aaac
1,4: aac
>xac
:NOMATCH
>xaac
1,4: aac
>xxabc
2,5: abc
>xxxaxc
3,6: axc
# Another simple pattern
/a*c/
>c
0,1: c
>xxxxxxxxc
8,9: c
>xxxxxxxcc
7,8: c
>ac
0,2: ac
>aaaac
0,5: aaaac
>xac
1,3: ac
>xxxaac
3,6: aac
>xxac
2,4: ac
>xxxxac
4,6: ac
# Another simple pattern
/a+c/
>xxaac
2,5: aac
>xxxaaaac
3,8: aaaac
>xaaaabac
6,8: ac
>xxxc
:NOMATCH
>xxxxaaaaccc
4,9: aaaac
# Another simple pattern
/a{4}b/
>xabxxaabxxxaaabxxxxaaaab
19,24: aaaab
>aaabaaaab
4,9: aaaab
# Another simple pattern
/a{4,}b/
>xxxaaaab
3,8: aaaab
>zaaabzzzaaaaaaaaaaaaaaaab
8,25: aaaaaaaaaaaaaaaab
# Another simple pattern
/a{,4}b/
>b
0,1: b
>xxxxxxxxb
8,9: b
>xaaaaaaaaab
6,11: aaaab
>xxxab
3,5: ab
>aaaaaxaaab
6,10: aaab
# Another simple pattern
/a{2,4}b/
>xab
:NOMATCH
>xaab
1,4: aab
>xaaab
1,5: aaab
>xxaaaab
2,7: aaaab
>xxxaaaaab
4,9: aaaab
# Some simple grouping tests
/foo(bar|baz)fee/
>feebarbazfoobarfee
9,18: foobarfee
12,15: bar
>foofooobazfeefoobazfee
13,22: foobazfee
/f(oo|ee)ba[rz]/
>barfoebaz
:NOMATCH
>bazfoobar
3,9: foobar
4,6: oo
>barfeebaz
3,9: feebaz
4,6: ee
/\<(int|char)\>/
>aint character int foo
15,18: int
15,18: int
# Some complex repetitions
/foo.*bar/
>barfoblaboofoobarfoobarfoobar
11,17: foobar
/foo.+bar/
>foobar
:NOMATCH
>fobbarfooxbarfooybar
6,13: fooxbar
/foo.?bar/
>xfoobar
1,7: foobar
>xxfooxxbar
:NOMATCH
>yyyfootbar
3,10: footbar
# Some nested complex repetitions
/a.*b.*c/
>abc
0,3: abc
>xxxxxxxxxabbbbbbbccaaaaabbbc
9,18: abbbbbbbc
/a.+b.*c/
>xxxabc
:NOMATCH
>xxaxbbc
2,7: axbbc
/a.+b.?c/
>xaabc
1,5: aabc
>xxaabbc
2,7: aabbc
# Very complex repetitions
/(foo.*|bar)fee/
# XXX NOTE
# This pattern does not return the correct offset for the group.
# Support for this may and may not be added.
>barfoofee
3,9: foofee
>foobarfee
0,9: foobarfee
>xxfobarfee
4,10: barfee
>barfooooooobarfee
3,17: fooooooobarfee
>xxfobarfeefoobar
4,10: barfee
/(foo.+|bar)fee/
>barfoofee
:NOMATCH
>barfooxfee
3,10: fooxfee
/(foo.?|bar)fee/
>foobar
:NOMATCH
>bafoofee
2,8:foofee
>bafooofeebarfee
2,9: fooofee
>bafoofeebarfee
2,8: foofee
# Simple backreference
/(a|b|c)\1/
>aa
0,2: aa
0,1: a
/(a|b|c)(a|b|c)\1\2/
>acac
0,4: acac
0,1: a
1,2: c
>xxxxacac
4,8: acac
4,5: a
5,6: c
>xxacabacbcacbbacbcaaccabcaca
24,28: caca
24,25: c
25,26: a
>xyabcccc
4,8: cccc
4,5: c
5,6: c
# Complex backreference
/(a*b)\1/
>xxxaaaaabaaaaab
3,15: aaaaabaaaaab
3,9: aaaaab
/(ab+c)\1/
>xaaabbbcabbbc
3,13: abbbcabbbc
3,8: abbbc
/(ab?c)\1/
>abcac
:NOMATCH
>abcacabcabc
5,11: abcabc
5,8: abc
>abcacac
3,7: acac
3,5: acac
# Very complex backreference
/a(.*)b\1/
>xxxab
3,5: ab
4,4:
>xxxxazzzbzzz
4,12: azzzbzzz
5,8: zzz
# Case testing
/abc/i
>AbC
0,3: AbC
/[0-9][a-z]+/i
>xxx0aaZxYT9
3,10: 0aaZxYT
/a.b/i
>aaaaaaaaaaaxB
10,13: axB
/a.*z/i
>xxxAaaaaZ
3,9: AaaaaZ
>xxaaaZaaa
2,6: aaaZ
/\<(lambda|defun|defmacro)\>/i
> (lambda
5,11: lambda
5,11: lambda
/\<(nil|t)\>/i
>it Nil
3,6: Nil
3,6: Nil
/\<(begin|end)\>/i
>beginning the ending EnD
21,24: EnD
21,24: EnD
# Some newline tests
/a.*/n
>a\naaa
0,1:a
>xyza\naa
3,4: a
/a.+/n
>a\naaa
2,5: aaa
>xyza\naa
5,7: aa
/a.?/n
>a\naaa
0,1: a
>xyza\naa
3,4: a
# Newline tests envolving complex patterns
/a.*b.*c/n
>xxaa\nzyacb\nabc
11,14: abc
>xxxab\nabc\nc
6,9: abc
/a.+b.*c/n
>ab\nbc\nabbc
6,10: abbc
/a.?b.*c/n
>ab\ncabbc\ncc
4,8: abbc
/^foo$/n
>bar\nfoobar\nfoo
11,14: foo
# Not so complex test involving a newline...
/^\s*#\s*(define|include)\s+.+/n
>#define\n#include x
8,18: #include x
9,16: include
# Check if large strings are working
/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
>zzzxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxzzz
3,259: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
/ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~/
>String here: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~/
13,333: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890~
# Some complex repetitions not supported
# Listed here only to make sure the library is not crashing on these
# Repetitions that match an empty match, or an empty string cannot follow
# a complex repetition. A complex repetition is:
# .* or .+ or .?
# .{...} is not supported.
/(.*)(\d*)/
:BADRPT
/(.*).(\d*)/
:BADRPT
/(.*)\<(\d*)/
:BADRPT
/(.*)\s(\d*)/
:BADRPT
/(.*)\D(\d*)/
:BADRPT
# This is a more clear pattern and partially works
/(.*)\D(\d+)/
>abcW12
0,6: abcW12
0,3: abc
4,6: 12
>abcW12abcW12
0,6: abcW12
0,3: abc
4,6: 12
# This wasn't working in the previous version, but now with only minimal
# matches supported, it works.
>abcW12abcW12a
0,6: abcW12
0,3: abc
4,6: 12
# Note the minimal match
/.*\d/
>a1a1a1aaaaaaa
0,2: a1
# Check match offsets
/(.*)\d/
>a1a1a1aaaaaaa
0,2: a1
0,1: a
/.*(\d)/
>a1a1a1aaaaaaa
0,2: a1
1,2: 1
/.*(\d+)/
:BADRPT
# Regression fix, was matching empty string
/\\\d{3}|\\./
>\\
:NOMATCH
/\\.|\\\d{3}/
>\\
:NOMATCH