Sgrep - Home page (original) (raw)
What is sgrep ?
sgrep (structured grep) is a tool for searching and indexing text, SGML,XML and HTML files and filtering text streams using structural criteria. The data model of sgrep is based on regions, which are nonempty substrings of text. Regions are typically occurrences of constant strings, SGML-tags, or meaningful text elements, which are recognizable through some delimiting strings or the builtin SGML, XML and HTML parser. Regions can be arbitrarily long, arbitrarily overlapping, and arbitrarily nested.
Sgrep is a convenient tool for making queries to almost any kind of text files with some well kown structure. These include programs, mail folders, news folders, HTML, SGML, etc... With relatively simple queries you can display mail messages by their subject or sender, extract titles or links or any regions from HTML files, function prototypes from C or make complex queries to SGML files based on the DTD of the file.
NEW! Third prerelease of sgrep-2 is out!
Sgrep version 1.92a is out. This version contains the sources, Win32 binary and binaries for HP-UX, Linux, OSF1 and Solaris. See thedownload page. The Win32 binary also includes the m4 macro processor.
Version 1.92 also fixes a fatal bug in sgrep-1.91, which caused version 1.91 to core dump when searching without using the SGML-scanner.
Major new features since 1.90a are:
- Nearness operators for both ordered and unordered nearness.
- Support for 16-bit wide query terms (this really means, that Sgrep now supports Unicode)
- Support for UTF-16 and UTF-8 encodings
- 'parenting' operator is now an order of magnitude faster (in the common case)
- Sgrep now emits and parses #line-directives, which allows for more accurate error reporting
- An option to query terms from index files
- Many bug fixes
- Introduces some new bugs (hopefully not as many as I fixed). Major new features in 1.90a since version 1.70 are:
- Query operators supporting direct containment. In SGML and XML world this means that you can query children and parents of given elements.
- The sources are available under GPL-license for those interested in compiling sgrep themselves.
- Sgrep now uses GNU autoconf, so compiling sgrep under unix like systems should be easy.
- Many bug fixes Major new features since version 0.99 are:
- Indexing of both structure and content.
- SGML/XML/HTML scanner.
- Official Win32 binary.
- sgtool has been dumped. It never really worked and even when it did, it wasn't very useful.
- Should be completely compatible with older versions of sgrep. See the README file for details.
How is sgrep used
Sgrep queries are constructed with it's own language. The details of the language are covered on thesgrep manual page. See also the report using sgrep for querying structured text files. With the query language you can express queries like:
- Give me all lines with text "Hello World"
- Give me all from "From" fields in my mail messages
- Give me senders of all news articles with a word "sgrep" or "linux" in the subject field
- Give me titles and names of all HTML documents that contain links to www.cs.helsinki.fi
The new features in sgrep-1.90a, including indexing, are currently documented only in the README file.
The power of sgrep query language is at its best when making complex queries on SGML like tagged documents. See a set of example queriesincluding the queries above.
The most recent stable sgrep version is 0.99. See the announcement of version 0.99
The most recent alpha version is 1.91a. See theannouncement of version 1.91a
Sgrep requirements
Sgrep-1.91a works in Win32 systems (Win95, Win98 and Windows NT) as a console application or in any decent unix-like system supporting memory mapped files.
Sgrep from the net
- HTML version of sgrep manual page
- Latest version of sgrep is always available fromftp://ftp.cs.helsinki.fi/pub/Software/Local/Sgrep/
Authors
Sgrep was made by
Jani Jaakkola, email:jjaakkol@cs.helsinki.fi Pekka Kilpel�inen, email: Pekka.Kilpelainen@helsinki.fi
Last modified: Dec 22,1998
This document is maintained byJani Jaakkola
at email address jjaakkol@cs.helsinki.fi