Online Help

Home

Mail Them Pro

Process Them
Online Help

Download

Registration

Company
 

Wildcards

When you extract information using processors you can use wildcards when you define your condition with the 'defined text' fileds.

Supported wildcards are:

1. Simple
---------

? - 1 any character (mandatory)
_ - 0 or 1 any character
* - 0 or more characters
% - 1 or more characters
# - 1 or more digits ('0'...'9')

2. Sets []
----------

matches a limited set of characters.

sintax: [[I|E]<occurrences>:][?|!]<range>[;[?|!]<range>]...
(enclosed on brackets)

occurrences can be:

a number - number of occurrences
an interval n-m - n to m occurrences
empty - 0 or 1 occurrence
? - 1 occurrence
_ - 0 or 1 occurrence
* - 0 or more occurrences
% - 1 or more occurrences

range can be:

1 character - that characater is part of the set
2 characters - every characters between, according to the ASCII table,
inclusive, are part of the set (only when second is
greater than first)
3 or more - each one are part of the set

parameters:

E - (exclusive) will match until next mask succedded (default)
I - (inclusive) will match as much characters as possible
? - the character(s) must occur
! - the character(s) must not occur

examples:

[a;b] - one mandatory occurrence of 'a' or 'b'
[az] - characters 'a','b','c'...'z'
[za] - character 'z' or 'a'
[xyz] - character 'x' or 'y' or 'z'
[az;09] - characters 'a'...'z' or '0'...'9'
[az;!mo] - characters 'a'...'z' except 'm','n','o'
[!ac] - any character except 'a','b','c'
[6:az;09] - 6 occurrences (e.g. '9c3ax7')
[2-5:az;09] - 2 to 5 occurrences
[:az] - 0 or 1 occurrence
[*:az;09] - 0 or more occurrences
[%:az;09] - 1 or more occurrences
[%:az;?x] - any lowercase sequence containing an 'x'
[E*:az]abc - any lowercase sequence ending by 'abc' e.g. 'ashufnmnjuabc'
[I*:az]abc - impossible, because set will consume all 'a'..'z' chars
and will be nothing left to match 'abc'

warning: no masks are supported inside a set, no need to escape reserved
characters

3. OR sequences {}
------------------

matches any of the specified submasks.

sintax: <mask_string>;<mask_string>[;<mask_string>]...
(enclosed on braces)

examples:

{xyz;abcd;123} - word 'xyz' or 'abcd' or '123'
{[3:09];-[2:09]} - 3 digits or '-' and 2 digits
{a*;*0} - something starting with 'a' or ending with '0'
{a{bc;de};fgh} - 'abc' or 'ade' or 'fgh'

4. Negation <>
--------------

match when specified submask is not true.

sintax: <mask_string>
(enclosed on angle brackets)

examples:

<{xyz;abcd;123}> - string is not 'xyz' nor 'abc' nor '123'
<#>* - string doesn't begin with digit(s)

warning: unpredictable behavior when negated mask is complex.

5. Numeric fields ()
--------------------

match any variable-length string respecting a numeric range.

sintax: <minimum>;<maximum>[;<decimal-symbol>[;<other-punctuation-chars>]]
(encolsed on parenthesis)

parameters:

- <minimum>, <maximum> are any Real-compatible value (including negatives),
specifying the range where the numeric string must fit.
- <decimal-symbol> is one character assumed to be the decimal-part
separator (default is dot $2E).
- <other-punctuation-chars> are characters that can be found within the
number and must be ignored (e.g. thousand-separators, currency symbols)

examples:

(0;100) - matches anything between 0 and 100 e.g. '0',
'25.3', '000100'
(-15;15;,) - matches anything between -15 and 15, assuming
coma as the decimal separator e.g. '-14,99'
US$[:\s](0.01;100000;.;,) - matches an american dollar ammount upto 100000,
including comma as the thousands separator
e.g. 'US$ 53,982.33' or 'US$150'

warning: masks are not supported on these parameters.

6. Mask repetition ||
---------------------

matches a mask repetitively, to treat contiguous instances of a pattern
(zero to infinite instances).

sintax: <mask_string>
(encloset between two pipes)

examples:

a.|[2-3:09].|b - matches 'a.123.b' or 'a.05.332.41.191.b' or 'a.b'

warning: nested repetitions are not supported.

7. Escapes \
------------

bypass next character (next character will not be seen as a wildcard or
any control or reserved character).

sintax: \<character|predefined_character|$two_digits_hex_code>

predefined character can be:

- s (space)
- t (tab)
- c (CR)
- l (LF)

examples:

\* - character '*'
\\ - character '\'
\s - character space (#$20)
\t - character tab (#$9)
\c - character CR (#$0D)
\l - character LF (#$0A)
\$41 - character 'A' (#$41) - hex must have 2 digits
[I1-2:\c;\l] - matches any EOL (CR, LF, CR/LF or LF/CR)
[\[;\]] - matches char '[' or ']'

the following characters are reserved and must be escaped when they have to
be interpreted as itselves (some are reserved only inside structures):

? _ * % # [ ] { } < > ( ) \ | ; !