It is currently Sat, 22 Jul 2017 16:41:01 GMT



 
Author Message
 ANNOUNCE: text retrieval: lq-text 1.14 beta 3 available

If you've been using the lq-text Unix text retrieval package, you may be
interested to know that the third beta release of lq-text 1.14 is available
for anonymous ftp.  I won't include the location, since I want to make
sure that everyone using it is on the mailing list, so contact me if you
are interested.

This is beta software, so although it seems fairly robust now, the
documentation isn't finished.  Sorry.  This means that if you're not
good at porting, compiling and installing Unix C software, you should
stop reading now.

Otherwise, please mail to lq-text-beta-requ...@sq.com to be added to
the (low-volume) mailing list and to get the instructions for picking
up the software.

If you'd like to help me work on lq-text, please let me know.
I'm often too busy to do much on it these days :-(

Note that lq-text is free for non-commercial use, but that there is a
small charge for commercial use and resale.  It is therefore not distributed
under the GNU licence.

This release has been compiled on the following platforms
    SPARC,SunOS 4.1.1, 4.1.3
    SPARC,Solaris 2.2
    HP/UX 9.5
    Linix (unknown version, not by me)

Please do not attempt a port to the DEC Alpha.  I'm waiting for the
DEC Beta :-), but in the meantime, lq-text only works on 32-bit systems.

I have been promised diffs for a Windows NT port, but it has somewhat reduced
functionality.

What follows are the changes since 1.13 (more or less), and a brief overview
of lq-text itself.   See also the Proceedings of the Summer Usenix conference
in 1994 for the paper I gave on lq-text.

Lee

Liam Quin, SoftQuad Inc    | lq-text freely available Unix text retrieval
l...@sq.com +1 416 239 4801 | FAQs: Metafont fonts, OPEN LOOK UI, OpenWindows...
SGML: http://www.**-**.com/ ; |`Consider yourself... one of the family...
The barefoot programmer    | consider yourself... At Home!' [the Artful Dodger]

*

New in this release:
    portability --
        many portability problems have been eased or fixed.
        There is a new "configure" script to help.

    bug fixes
        many bugs have been fixed.  So much so that I will be asking
        ftp sites to remove lq-text 1.13 shortly...

    speed improvements
        lq-text 1.14 should be noticeably faster at indexing, and slightly
        faster at retrieval.

    read-only access and improved block cache
        The index can now be used from a CD-ROM with plausible efficiency.
        Writing an index onto an NFS-mounted partition is no longer
        absurdly slow.

    lqquery --
        like lqphrase, but uses wildcards, so
            lqquery "s?me wor*"
        might find a match for
            some word
            same worries
        and so forth.
        You must run
            lqtext/sortwids
        after creating the database (lqaddfile) and before using lqquery.
        This sorts the vocabulary -- run
            lqwordlist -u -g .
        before and after sortwids (on a small database!) to see the
        difference.

    tracing --
        try lqword -t list
    for a list of trace flags.

    lqkwic --
        has a more powerful expression language:
            lqkwic -x -L
        gives details.

    lqsort --
        a shell script that uses lqkwic and Unix sort

    lq.sh --
        this is a replacement for the `lqtext' front end
        for people who have terminals with scrollback (e.g. xterm).
        It's a shell script!
        It also gives examples of most or all of the lq-text clients.

    higher precision --
        punctuation and stoplist words at the end of a query are
        now significant.
        You can use
            lqphrase -t MatchPhrase "sample phrase"
        or
            lqphrase -t 'MakePhrase|MatchPhrase' "sample phrase"
        to see phrase matching in{*filter*}detail.

    C API --
        Printed documentation for the C API will soon be available.
        There will be a charge for this (sorry), probably between $50 and $100.
        Please do let me know if you might be itnerested in buying a printed
        manual.  I'll also try and make an HTML version.
        The API documentation is partly taken from comments that are
        embedded in the C code -- see any file in src/liblqtext for examples.

    new configuration file entries --
        The configuration file has been renamed from README to config.txt;
        you can still use README or readme if you prefer, however;
        see the beta installation instructions for details.

*

Here is the README for lq-text -- as you can see, I haven't updated it
for the beta rlease of 1.14 yet.

Liam Quin's text retrieval package (lq-text) Sat Nov 27 22:50:31 EST 1993
src/h/Revision.h defines this as Revision 1.13.

lq-text is copyright 1989, 1990-1995, 1996 Liam R. E. Quin;
see src/COPYRIGHT for details.  Parts of the source are copyrighted by
the University of California at Berkley - see src/qsort.c and src/db*/....

Lqtext is a text retrieval package.

That means you can tell it about lots of files, and later you can ask
it questions about them.
The questions have to be
        which files contain this word?
        which files contain this phrase?
but this information turns out to be rather useful.

Lqtext has been designed to be reasonably fast.  It uses an inverted
index, which is simply a kind of database.  This tends to be smaller than
the size of the data, but more than half as large.  You still need to keep
the original data.

Commands include:
        lqaddfile -- add files to the database at any time
        lqfile -- information about files that have been indexed
        lqword -- information about words
        lqphrase -- look up phrases
        lqrank -- combine phrase searches, and sort the results
        lqkwic -- creates keyword-in-context indexes (this is fun!)
        lqshow -- show the matches on the screen (uses curses)
        lqtext -- curses-based front end.
        lq -- shell-script front end

There are about 11,000 lines of C in total, of which 8,000 are the
text database and 3,000 are the curses front end (lqtext).  Well, last time
I counted, anyway.
    [wow -- this is well out of date -- Lee]

Here are some examples, based mostly on the (King James) New Testament,
simply because that is what I have lying around.  The timings ran on a
16 MHz Sun 4/110 -- about 7 MIPS, with a disk drive giving around 1 MByte/sec.

$ time lqphrase 'wept bitterly'
2 35 10 955 KingJames/NT/Matthew/matt26.kjv
2 26 47 995 KingJames/NT/Luke/luke22.kjv
        0.6 real         0.0 user         0.2 sys  [lq-text 1.13]
        0.2 real         0.0 user         0.1 sys  [lq-text 1.14]

$ time lqword -l jesus > XXX
        1.0 real         0.4 user         0.4 sys  
$ wc XXX
     983    4915   68604 XXX
$ sed 12q XXX
1 0 8 930 KingJames/NT/Matthew/matt01.kjv
1 5 21 930 KingJames/NT/Matthew/matt01.kjv
1 6 24 930 KingJames/NT/Matthew/matt01.kjv
1 8 48 930 KingJames/NT/Matthew/matt01.kjv
1 10 49 930 KingJames/NT/Matthew/matt01.kjv
1 0 4 931 KingJames/NT/Matthew/matt02.kjv
1 6 4 932 KingJames/NT/Matthew/matt03.kjv
(and so on for 983 lines)
So there are nine hundred and eighty-three matches.  The line for each match
gives the block in the file, the word within the block, the file number,
and the filename.

More useful things to do include:

// see some of the matching text:

$ lqphrase 'wept bitterly' | lqkwic
==== Document 1: /home/mieza/lee/text/bible/KingJames/NT/Matthew/matt26.kjv ====
  1: thrice. And he went out, and wept bitterly.                              
==== Document 2: /home/mieza/lee/text/bible/KingJames/NT/Luke/luke22.kjv ====
  2:22:62 And Peter went out, and wept bitterly. 22:63 And the men that held Je
$

// which words contain "foot" or "feet"?
$ lqwordlist -g "f[oe][oe]t"
afoot
barefoot
brokenfooted
clovenfooted
feet
foot
footmen
footstep
footstool
fourfooted

// documents containing "shoe" and "barefoot"
$ lqrank "barefoot" "shoe" | lqkwic
==== Document 1: /home/mieza/lee/text/bible/KingJames/OT/Isaiah/isa20.kjv ====
  1:ff thy loins, and put off thy shoe from thy foot. And he did so, walking na
  2: he did so, walking {*filter*} and barefoot. 20:3 And the LORD said, Like as my
  3: Isaiah hath walked {*filter*} and barefoot three years [for] a sign and wonder
  4:ves, young and old, {*filter*} and barefoot, even with [their] buttocks uncovere

// save a query... docs containing any of the following:
$ lqrank -r or serpent witch snake stick rod > skinny-things    

// documents containing abraham said, or god of abraham:
$ lqrank -r or "abraham said" "God of Abraham" > abe    

// documents appearing in both sets of results (intersect), if any:
$ lqrank -r and -f skinny-things -f abe | lqkwic    
==== Document 1: /home/mieza/lee/text/bible/KingJames/OT/Exodus/exod04.kjv ====
  1:in thine hand? And he said, A rod. 4:3 And he said, Cast it on the ground.
  2:n the ground, and it became a serpent; and Moses fled from before it. 4:4 A
  3:nd caught it, and it became a rod in his hand: 4:5 That they may believe th
  4:ORD God of their fathers, the God of Abraham, the God of Isaac, and the God
  5:4:17 And thou shalt take this rod in thine hand, wherewith thou shalt do si
  6: of Egypt: and Moses took the rod of God in his hand. 4:21 And the LORD sai
$  

// Ah, it was Moses I was thinking of...

The "lq" shell script is much more convenient for simple queries.
It's interactive -- give it a try.

--
Liam Quin, SoftQuad Inc    | lq-text freely available Unix text retrieval
l...@sq.com +1 416 239 4801 | FAQs: Metafont fonts, OPEN LOOK UI, OpenWindows...
SGML: http://www.**-**.com/ ; |`Consider yourself... one of the family...
The barefoot programmer    | consider yourself... At Home!' [the Artful Dodger]



 Thu, 12 Nov 1998 03:00:00 GMT   
 
   [ 1 post ] 

Similar Threads

1. ANNOUNCE: GNU text utilities 1.14 released

2. Lq-text for Linux

3. full text search/retrieval software for the web

4. Text retrieval engines

5. Info wanted: Full-text search and retrieval engines

6. Text retrieval / index package for linux out there?


 
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by ST Software