keyword | position | short | action |
Text | first, required | T=ORIGINAL | in-memory string to be analyzed. |
IDentification | middle
optional |
ID=name | include a name (e.g. file name) in InvIdx (default is "noID") |
EXTRA | middle
optional |
EXTRA=".()" | include non-alfanum 1-character words to InvIdx, eg "." or "(" or ")" or Linefeed for use in later queries (eg to find line boundaries if EXTRA included the linefeed character $LF) |
ID1 | middle
optional |
id1=left | start position of an ID-string in the "ORIGINAL" |
ID2 | middle
optional |
id2=right | its end position |
Mark1 | middle
optional |
M1 | start position of a clipping of ORIGINAL |
Mark2 | middle
optional |
M2 | its end position |
OFfSet | middle
optional |
ofs=p1 | added to InvIdx positions. Useful if ORIGINAL is just a separate clipping of the complete document generated by another statement. |
Option | middle
optional |
opt = 1 | 1 to respect case (default is 0 to ignore case)
Use for special problems only: "The" and "the" will be indexed separately, e.g. |
SorTSequence | middle
optional |
sts=32 | (32 means: first column3, then column2) |
InvertedIndex | last, required (except DO) | ii=invidx | receives the inverted index. If the name ID is already indexed, inv_idx updates the existing index. |
DO | last
optional |
DO | DO or DO=count can be used to break up ORIGINAL by resetting Mark1 and Mark2 in the loop. |
to be indexed | option | required | ID generated |
T="A. Default is noID" | II=InvIdx | noID | |
T="B. ID as string" | ID="Filename e.g." | II=InvIdx | Filename e.g. |
T="C. Byte 1 only" | ID1 | II=InvIdx | C |
T="D. Text(id1:id2)" | Id1, Right=6, Id2 | II=InvIdx | D. Text |
T="E: Text ranges:" | RaNge=":", ID="colons" | II=InvIdx | colons; 1 2
colons; 3 15 |
T="F/Slash ranges/" | ID1, RaNge="/", ID="sl" | II=InvIdx) | slF; 1 2
slS; 3 15 |
T="G= Text ID LOOP=" | Mark1, Right="=", Mark2 | II=InvIdx,
Right, DO) |
; 1 2
; 3 16 |
keyword short | full keyword | action |
T | Text | defines string ALL as the ORIGINAL |
R | Right | find right TB ("by William Shakespeare") |
L | Left | find left carriage return ($CR) |
Opt | Option | 128 == Regex (regular expression) |
L | Left | find left \a (alfabetic character, Regex) |
Opt | Option | 0 == all standard again |
L | Left | find left line feed ($LF, CR+LF == new line in Windows) |
R | Right | no argument: Move right 1 character |
ID1 | ID1 | start position for IDentification string |
R | Right | find right carriage return |
L | Left | move left 1 character |
ID2 | ID2 | end position for IDentification string |
M1 | Mark1 | start position for clipping |
Opt | Option | 1 == case (upper case / lower case) |
R | Right | find right TE ("THE END", upper case! ) |
L | Left | move left 1 character |
M2 | Mark2 | end position for clipping |
Opt | Option | 0 == all standard again |
II | InvertedIndex | defines string InvIdx to receive the result |
DO | DO | repeat until error (here: no TB found anymore) |
The first entries to the IDs section of InvIdx are: |
THE SONNETS; 7920 113221 |
ALLS WELL THAT ENDS WELL; 113786 263466 |
THE TRAGEDY OF ANTONY AND CLEOPATRA; 264042 436228 |
AS YOU LIKE IT; 436783 576608 |