This is part of the HicEst documentation

Lexicon: an EDIT Keyword to Build a Lexical Data Base


As all HicEst functions EDIT(...Lexicon...) is a single line statement. It extracts, queries, and marks a lexical string from an arbitrary text.

⇾Home ⇾Contents ⇾more Strings ⇾ Examples



Bookmarks:
⇾num_query ⇾query_lex_and_mark ⇾regular_expression_lexical_queries ⇾set_lexicon ⇾text_query ⇾vec_query

Optional keywords:
(Syntax of optional keywords)
LeXicon Marks
keyword Action [result = ] EDIT(Text=string_to_analyze, [Options,], LeXicon=lex)
LeXicon
  • EDIT(Text=txt, Option=case, LeXicon=lex)
  • ! case=0: ignore case, case=1: respect case
  • EDIT(T="It is, what it is.", LX=lex) ! lex is set to "is,it,what"
  • EDIT(T="It is a banana", LX=lex) ! lex is now "a,banana,is,it,what,"
  • Default SePaRators may be overwritten
  • The 1st character of the separators is used to separate lex entries.
  • EDIT(T=" What is it? Is it a banana? It is.", SPR=".?", LX=lex)
  • ! lex is set to "is it a banana.it is.what is it."
  • txt = EDIT(Text=search_string, [Option=opts], LeXicon=lex) ! opts: 1=case, 2=word
    • With the word option (opts=2) the result is a list of search_string words found in lex. Without the word option all search_string substrings found in lex are contained in the result. In-place results are allowed: txt = EDIT(T=txt, O=opts, LX=lex)
  • ! lex is "a,banana,is,it,what,"
  • txt = EDIT(T="It is not a hat, is IT not?", LX=lex) ! "It,is,a,hat,is,IT,"
  • txt = EDIT(T="It is not a hat, is IT not?", O=2, LX=lex) ! "It,is,a,is,IT,"
  • txt = EDIT(T="It is not a hat, is IT not?", O=1+2, LX=lex) ! "is,a,is,"
  • In contrast to the SortDelDbls keyword, the LeXicon keyword is cumulative if called with different txt-arguments
  • num = EDIT(Text=search_string, Option=word, LeXicon=lex)
    • result num is the bit-sum of search_string-word-numbers in lex
  • ! lex is "a,banana,is,it,what,"
  • n = EDIT(Txt="It is not a hat, is IT not?", Opt=2, LX=lex)
  • 1 2 3 4 5 6 7 8
  • 1 +2 +8 +32+64 = 107
  • n is set to 107 (ignore case, whole words only)
  • vec = EDIT(Text=search_string, Option=1+2, LeXicon=lex)
    • result in vec is an array of byte positions of search_string-words contained in lex
  • ! lex is "a,banana,is,it,what,"
  • vec = EDIT(T="It is not a hat, is IT not?", Opt=2, LX=lex)
  • opt=0: 1 4 11 13 18 21
  • opt=1: 4 11 13 18 case
  • opt=2: 1 4 11 18 21 word
  • opt=3: 4 11 18 case+word
$Marks
  • marked = EDIT(Text=search_string, $Marks="AZ", LeXicon=lex)
    • marks search_string-words contained in lex
    • prefix mark is "A", postfix mark is "Z" (if present)
  • ! lex is "a,banana,is,it,what,"
  • marked = EDIT(T="It is not a hat, is IT not?", Opt=1+2, $Marks="[]", LX=lex)
  • ! "It [is] not [a] hat, [is] IT not?"
  • marked = EDIT(T="It is not a hat, is IT not?", $Marks="*", LX=lex)
  • ! "*It *is not *a *hat, *is *IT not?"
  • ! The LeXicon keyword works with regular expressions (Option=128) as well:
  • marked = EDIT(T="x .. y", O=2+128, $M='[]', LX=lex) ! x [is][it] y



Support HicEst   ⇾ Impressum
©2000-2019 Georg Petrich, HicEst Instant Prototype Computing. All rights reserved.