Mescal
Loading...
Searching...
No Matches
regexp.h File Reference

Implementation of regular expressions. More...

#include "alloc.h"
#include "error.h"
#include "nfa.h"
#include "tools.h"
#include "type_basic.h"
#include "type_dequeue.h"
#include <assert.h>
#include <ctype.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>

Go to the source code of this file.

Classes

struct  syletter
 Type used to represent a symbolic letter. More...
 
struct  syvariable
 Type used to represent a symbolic variable. More...
 
struct  regexp
 Type used to represent a single node in a regular expression. More...
 

Typedefs

typedef struct syletter syletter
 Type used to represent a symbolic letter.
 
typedef struct syvariable syvariable
 Type used to represent a symbolic variable.
 
typedef struct regexp regexp
 Type used to represent a single node in a regular expression.
 

Enumerations

enum  regelem {
  EMPTY , EPSILON , CHAR , SYCHAR ,
  SYVAR , WORD , UNION , INTER ,
  COMPLEMENT , CONCAT , STAR , PLUS ,
  NONE
}
 The operator available in an extended regular expression. More...
 

Functions

short symbolic_index (char *)
 Computes the index of a symbolic variable name in the array symbolic_names.
 
uint display_syletter_utf8 (syletter, FILE *)
 Displays a symbolic letter on a given stream: UTF8 version for the indices.
 
uint display_syvar_utf8 (syvariable v, FILE *out)
 Displays a symbolic variable on a given stream: UTF8 version for the indices.
 
bool reg_has_symbolic (regexp *)
 Tests if a regular expression contains a symbolic node.
 
bool reg_issimple (regexp *)
 Tests if a regular expression is simple.
 
void reg_free (regexp *)
 Release of a regular expression.
 
regexpreg_copy (regexp *)
 Copy of a regular expression.
 
regexpreg_empty (void)
 Computes a regular expression recognizing the empty language.
 
regexpreg_epsilon (void)
 Computes a regular expression recognizing the language {ε}.
 
regexpreg_letter (uchar)
 Computes a regular expression recognizing the language {a} for an input letter a.
 
regexpreg_letter_ext (letter)
 Computes a regular expression recognizing the language {a} for an input letter a. Extended version.
 
regexpreg_letter_numbered (uchar c, uchar index)
 Computes a regular expression recognizing the language {a_n} for an input letter a, subscripted by n.
 
regexpreg_letter_symbolic (uchar c, uchar number)
 Computes a regular expression corresponding to the symbolic letter a_{i+k}.
 
regexpreg_var_symbolic (char *s, uchar number)
 
regexpreg_union (regexp *, regexp *)
 Combines two regular expressions with the union operator.
 
regexpreg_inter (regexp *, regexp *)
 Combines two regular expressions with the intersection operator.
 
regexpreg_concat (regexp *, regexp *)
 Combines two regular expressions with the concatenation operator.
 
regexpreg_star (regexp *)
 Applies the Kleene star operator to a regular expression.
 
regexpreg_plus (regexp *)
 Applies the Kleene plus operator to a regular expression.
 
regexpreg_complement (regexp *)
 Applies the complement operator to a regular expression.
 
void reg_print (regexp *)
 Displays a regular expression on the standard output stream.
 
bool reg_symbolic_loops (regexp *, ushort, uchar, bool *)
 Computes information on the symbolic variables of a regular expression.
 

Variables

short symbolic_count
 Global variable corresponding to the number of symbolic variables which are currently allowed (set to zero if no symbolic variables are allowed).
 
char ** symbolic_names
 Global array assigning its name to each symbolic variable index (its size is symbolic_count).
 

Detailed Description

Implementation of regular expressions.

Enumeration Type Documentation

◆ regelem

enum regelem

The operator available in an extended regular expression.

Enumerator
EMPTY 

Empty language.

EPSILON 

Empty word.

CHAR 

Single letter.

SYCHAR 

Symbolic letter.

SYVAR 

Symbolic variable.

WORD 

Single word.

UNION 

Union of two expressions.

INTER 

Intersection of two expressions.

COMPLEMENT 

Complement of an expression.

CONCAT 

Concatenation of two expressions.

STAR 

Kleene star of an expression.

PLUS 

Kleene plus of an expression.

NONE 

Used for simplifying the display.

Function Documentation

◆ display_syletter_utf8()

uint display_syletter_utf8 ( syletter l,
FILE * out )

Displays a symbolic letter on a given stream: UTF8 version for the indices.

Returns
The length of the displayed letter.
Parameters
lThe symbolic letter.
outThe stream.

◆ display_syvar_utf8()

uint display_syvar_utf8 ( syvariable v,
FILE * out )

Displays a symbolic variable on a given stream: UTF8 version for the indices.

Attention
The variable name is taken from the global array symbolic_names.
Returns
The length of the displayed variable.

◆ reg_complement()

regexp * reg_complement ( regexp * left)

Applies the complement operator to a regular expression.

Attention
The input expression is not copied.
Returns
The resulting regular expression.
Parameters
leftThe expression.

◆ reg_concat()

regexp * reg_concat ( regexp * left,
regexp * right )

Combines two regular expressions with the concatenation operator.

Attention
The two input expressions are not copied.
Returns
The resulting regular expression.
Parameters
leftThe left expression.
rightThe right expression.

◆ reg_copy()

regexp * reg_copy ( regexp * r)

Copy of a regular expression.

Returns
A copy of the input regular expression.
Parameters
rThe regular expression.

◆ reg_empty()

regexp * reg_empty ( void )

Computes a regular expression recognizing the empty language.

Returns
The regular expression

◆ reg_epsilon()

regexp * reg_epsilon ( void )

Computes a regular expression recognizing the language {ε}.

Returns
The regular expression.

◆ reg_free()

void reg_free ( regexp * r)

Release of a regular expression.

Parameters
rThe regular expression.

◆ reg_has_symbolic()

bool reg_has_symbolic ( regexp * exp)

Tests if a regular expression contains a symbolic node.

Returns
A Boolean indicating whether the regular expression contains a symbolic node.
Parameters
expThe regular expression.

◆ reg_inter()

regexp * reg_inter ( regexp * left,
regexp * right )

Combines two regular expressions with the intersection operator.

Attention
The two input expressions are not copied.
Returns
The resulting regular expression.
Parameters
leftThe left expression.
rightThe right expression.

◆ reg_issimple()

bool reg_issimple ( regexp * exp)

Tests if a regular expression is simple.

Remarks
A regular expression is simple if and only if it does not contain the operators for intersection and complement.
Returns
A Boolean indicating whether the regular expression is simple.
Parameters
expThe regular expression.

◆ reg_letter()

regexp * reg_letter ( uchar c)

Computes a regular expression recognizing the language {a} for an input letter a.

Returns
The regular expression.
Parameters
cThe letter.

◆ reg_letter_ext()

regexp * reg_letter_ext ( letter l)

Computes a regular expression recognizing the language {a} for an input letter a. Extended version.

Returns
The regular expression.
Parameters
lThe letter.

◆ reg_letter_numbered()

regexp * reg_letter_numbered ( uchar c,
uchar index )

Computes a regular expression recognizing the language {a_n} for an input letter a, subscripted by n.

Returns
The regular expression.
Parameters
cThe letter.
indexThe number.

◆ reg_letter_symbolic()

regexp * reg_letter_symbolic ( uchar c,
uchar number )

Computes a regular expression corresponding to the symbolic letter a_{i+k}.

Returns
The regular expression.
Parameters
cThe letter.
numberThe number k.

◆ reg_plus()

regexp * reg_plus ( regexp * left)

Applies the Kleene plus operator to a regular expression.

Attention
The input expression is not copied.
Returns
The resulting regular expression.
Parameters
leftThe expression.

◆ reg_print()

void reg_print ( regexp * r)

Displays a regular expression on the standard output stream.

Parameters
rThe expression.

◆ reg_star()

regexp * reg_star ( regexp * left)

Applies the Kleene star operator to a regular expression.

Attention
The input expression is not copied.
Returns
The resulting regular expression.
Parameters
leftThe expression.

◆ reg_symbolic_loops()

bool reg_symbolic_loops ( regexp * exp,
ushort max,
uchar num,
bool * cycle )

Computes information on the symbolic variables of a regular expression.

A list of symbolic variable names is given as input. The function checks whether all names in the expressions are in this set. Furthermore, for each name, it checks if there exists an occurence of this name with the index 0.

Returns
A Boolean indicating whether all names in the expression are in the input set.
Parameters
expThe regular expression.
maxThe maximum decrement.
numThe number of names in the input set.
cycleUsed to return the names which occur with the index 0.

◆ reg_union()

regexp * reg_union ( regexp * left,
regexp * right )

Combines two regular expressions with the union operator.

Attention
The two input expressions are not copied.
Returns
The resulting regular expression.
Parameters
leftThe left expression.
rightThe right expression.

◆ symbolic_index()

short symbolic_index ( char * varname)

Computes the index of a symbolic variable name in the array symbolic_names.

Returns
The index of the symbolic variable name.
Parameters
varnameThe symbolic variable name.