GiST support in PostgreSQL (original) (raw)
Search PostgreSQL resources:www.pgsql.ru
Software:
- GiST core - core GiST access methods, in source of postgresql
Updates: now has support of concurrency and recovery (8.1) ! - rtree_gist - R-Tree implementation using gist, available in contrib/rtree_gist (implemented new linear node splitting algorithm by C.H.Ang and T.C.Tan)Updates: now it moved to PostgreSQL core since 8.1!
- btree_gist - B-Tree implementation using gist, available in contrib/btree_gist (README.btree_gist)
- intarray - index support for one-dimensional array of int4's, available in contrib/intarray (README.intarray)
- tsearch - new data type txtidx - a searchable data type (textual) with indexed access. (Deprecated in 7.4 and will be obsoleted in 7.5).
- tsearch V2 - Tsearch New Generation - full text search support in PostgreSQL.
- tree - support for hierachical data types (sort of lexicographical trees), should go to contrib/tree, pending because of lack of proper documentation.
(Read README.tree in Russian and README.tree.english, thanks George Essig).
Download tree.tar.gz and dmoz-full.sql.gz sample catalog DMOZ(6 Mb compressed!) to play with. - >ltree - is a PostgreSQL contrib module which contains implementation of data types, indexed access methods and queries for data organized as a tree-like structures. We have separate page for ltree.
- OpenFTS - Full text search engine, available from Offical site. Please use latest and greatest version from CVS. We consider it stable, release is pending because we need to update documentation. Available "The Crash-course to OpenFTS" - README, README.INSIDE
Standalone web crawler and search script are available for testing by request. - Gevel - Show statistics about GiST indices.
- Download gevel.tar.gz for PostgreSQL 7.4
- Download gevel-8.0.x.tar.gz for PostgreSQL 8.0.X
- Download gevel-8.1.tar.gz for PostgreSQL 8.1
- Download gevel-8.2.tar.gz for PostgreSQL 8.2+
- Download gevel-8.3.tar.gz for PostgreSQL 8.3
- Read <gevel/README.gevel>
- Online version
- hstore - Storage for semistructural data (a'la perl hash) with GiST index access.
- Download hstore.tar.gz
- Read <hstore/README.hstore>
- Online version
- pg_trgm - (former trgm) Fuzzy (trigram) search with GiST index support. ReadREADME.pg_trgm (contributed by Christopher Kings-Lynne) and download pg_trgm.tar.gz
( Development documentation) - pg_sphere - not released yet. pgSphere provides spherical data types, functions and operators for PostgreSQL. This project is hosted @ Gborg. Development team: Oleg Bartunov (project manager), Teodor Sigaev (gist support), Janko Richter (principal developer), Igor Chilingarian (application programer, testing). ADASS poster is available poster.pdf (adass-poster.jpg).
- pgxml - looking for developer ! Basic idea is to combineltree and hstore to store xml data in PostgreSQL. Please, contact Oleg Bartunov if you want to put your hands to this project.
Development history
May 26, 2006
Fixed small bug in GIN code. Patch to the current stable 8.1.X, which introduces Gin (Generalized Inverted Index) support. Also, the patch contains full utf-8 support, query rewriting and improved rank_cd function.
May 3, 2006
Fixed small bug in contrib/hstore module. Get new version hstore.tar.gz. Thanks hubert depesz lubaczewski for report.
Apr 8, 2006
Tsearch2 with full utf-8 support, query rewriting and improved rank_cd function is available. Read Tsearch2WhatsNew for details.
Jul 1, 2005
contrib/rtree_gist is moved to PostgreSQL core !
Jun 27, 2005
GiST is now concurrent !
full concurrency for insert/update/select/vacuum: - select and vacuum never locks more than one page simultaneously - select (gettuple) hasn't any lock across it's calls - insert never locks more than two page simultaneously: - during search of leaf to insert it locks only one page simultaneously - while walk upward to the root it locked only parent (maybe non-direct parent) and child. One of them X-lock, another may be S- or X-lock
- 'vacuum full' locks index
- improve gistgetmulti
- simplify XLOG records
Jun 15, 2005
Online backup and crush recovery are supported now for GiST extensions !
Jun 07, 2005
We began working on GiST concurrency and recovery for 8.1. More information is here. Support our work ! See postingin PostGIS mailing list about contribution.
Feb 21, 2005Fixed memory leak in contrib/btree_gist ( timestamp*). Check REL8_0_STABLE branch in CVS.
Jan 27, 2005Documented function intset(int4) in README.intarray
Jan 25, 2005Several commits to CVS (8.1): tsearch2
- change struct {} WordEntryPos to typedef uint16, for details seediscussion. (Recreating of tsvector fields and reindexing are required !)
- improved support for compound words (A compound is a word containing a stem that is made up of more than one root) - to_tsquery() now make use of roots if dictionary (should support 'compoundwords' flag, check .aff file) returns them for compound word. Example:
regression=# select to_tsquery( 'fotballklubber');
to_tsquery
'fotball' & 'klubb' | 'fot' & 'ball' & 'klubb'
(1 row)
Bad thing is that API to tsearch2 dictionaries was changed ! See commit mail for details and Tsearch_V2_compound_words for introduction about compounds support in tsearch2.
Patch for 8.0 release which adds supports for query expansionexpand_query_8.0.patch.gz. Please, test it !
Dec 28, 2004New documentations about tsearch2 is available !
May 28, 2004New patch to tsearch2 which adds support of words stat per document part is available from tsearch2 page. Example is available from Tsearch_V2_Notes
May 14, 2004
- New version of Tsearch2 introduction by Andrew Kopciuch is available. It contains notes about dump/restore procedure of tsearch2 related databases. Really worth to read !
- New version of pg_trgm(former trgm) module is available. Fixed bug with searching in very big index, no rebuilding of index is required.
March 26, 2004
Experimental version of tsearch2 with ordering function (needed for some operations) is available for 7.3 and 7.4. Download it fromtsearch2 page.
January 10, 2004
Janko Richter added support for timestamp with/without time zone, time with/without time zone, date, interval, oid, money and macaddr types to btree_gist.
Read README.btree_gist and download archive btree_gist-7.4.tar.gz for 7.4.
December 18, 2003
New version of tsearch2 for 7.3.X (and patch for 7.4) is available for downloading !
Changes:
- Fix signed char in comparison and check memory allocation Thanks to LEFEBVRE Herve for bug report.
December 10, 2003
New version of tsearch2 for 7.3.X (and patch for 7.4) is available for downloading !
Changes:
- Fix integer types to use definition from c.h. Per bug report by Patrick Boulay
December 5, 2003
New version of tsearch2 for 7.3.X is available for downloading !
Changes:
- Resolve internal function names conflict between tsearch v1 and v2 !
December 4, 2003
New version of tsearch2 for 7.3.X is available for downloading !
Changes:
- Resolve conflict with glibc function strndup
December 3, 2003
New version of tsearch2 for 7.3.X is available for downloading !
Changes:
- Changes in processing of compound words.
November 28, 2003
New version of tsearch2 for 7.3.X (and patch for 7.4) is available for downloading !
Changes:
- Fixed segmention fault in new ispell code. Thanks Henning Spjelkavik for spotting the problem.
November 27, 2003
New version of tsearch2 for 7.3.X (and patch for 7.4) is available for downloading !
Changes:
- Implement support of compound words in ispell dictionary
- Skip too long words from indexing (throw NOTICE instead of ERROR).
- Added utility to convert myspell dictionaries to ispell. Documentation for new feature will be added later.
October 10, 2003
New version of tsearch2 is available for downloading !
Changes:
- Version for 7.3 is synchronized with 7.4.
September 22, 2003
New version of tsearch2 is available for downloading !
Changes:
- Fixed bug in headline function per Stephane Bidoul
August 28, 2003
New version of tsearch2 is available for downloading !
Changes:
- Change treating of stop words in boolean expression. Earlier they were considered as always 'true', now we ignore'm.
August 13, 2003
New version of tsearch2 is available for downloading !
Changes:
- headline function should works better with MinWords parameter
August 6, 2003
New version of tsearch2 is available for downloading !
Changes:
- added function ts_debug to easy testing tsearch2 configuration. See reference guide for details.
- Some documentation fixes.
July 21, 2003
Stable version of tsearch2 submitted to CVS !
July 18, 2003
New *development* version of tsearch V2 is available for download fromTsearch V2 home page.
Changes:
Documentation fixes (user and reference guide).
July 9, 2003
New *development* version of tsearch V2 is available for download fromTsearch V2 home page.
Changes:
Compatibility with 7.4-dev.
July 8, 2003
New *development* version of tsearch V2 is available for download fromTsearch V2 home page.
Changes:
- Some documentation fixes.
- Gendict tutorial
- Dictionary for integers is available now (see Tsearch V2 home page)
July 7, 2003
New development version of tsearch V2 is available for download fromTsearch V2 home page.
Changes:
- Change name of module to tsearch2 to avoid clashing with old tsearch (v1). Now, tsearch v2 could co-exists with tsearch v1 !
June 27, 2003
Buggy Day :)
tsearch-v2.1.6.tar.gz is available for download.
Changes:
- Fix a bug submitted by Pinchart, Laurent (reveals for very long documents with the number of words > 65535 ).
June 27, 2003
tsearch-v2.1.5.tar.gz is available for download.
Changes:
- Small change in parser's handling of hyphenated words. ('PostgreSQL-7.3.3' is recognized now as 'PostgreSQL-7', 'PostgreSQL' and '7.3.3').
June 27, 2003
tsearch-v2.1.4.tar.gz is available for download.
Changes:
- Fixed hideous bug in error handling routine !
June 26, 2003
tsearch-v2.1.3.tar.gz is available for download.
Changes:
- untsearch.sql has been added to distribution - remove tsearch instance from db.
- We have now README.tsearch !
June 23, 2003
tsearch-v2.1.2.tar.gz is available for download.
Changes:
- Gendict - dictionary template generator is now included into tsearch distribution !
Read README.gendict.
June 21, 2003
Introduction to tsearch V2 by Andrew J. Kopciuch is available as tsearch-V2-intro.txt
June 19, 2003
New contrib module: Gendict - generate dictionary templates for contrib/tsearch v2 module. Download it from gendict.tar.gz, read README.gendict.
June 18, 2003
**Road to release !**All features are frozen now ! Don't even try it.
New version of contrib/tsearch V2 is available for download (tsearch-v2.1.1.tar.gz).
- New naming convention, including last (Jun 18, 2003) changes.
- All id's (integers) are removed for clarity. Use oid instead: For example:
select * from pg_ts_cfg where oid = show_curcfg();
ts_name | prs_name | locale -----------------+----------+--------------
default_russian | default | ru_RU.KOI8-R
May 19, 2003
New version of contrib/tsearch V2 is available for download (tsearch.tar.gz).
Changes:
- We now take into consideration stop words when increment lexeme position, so 'dog food' and 'food the dog' will have different weights.
May 14, 2003
New version of contrib/tsearch V2 is available for download (tsearch.tar.gz).
Changes:
- Changed function C-name (stat) to avoid conflict with stat(2).
May 8, 2003
Andrew J. Kopciuch wrote introductionary guide for beginners - tsearchV2-intro.txtYour comments and additions are welcome !
May 5, 2003
New version of contrib/tsearch V2 is available for download (tsearch.tar.gz).
Changes:
- Fixed rare bug.
Apr 21, 2003
New version of contrib/tsearch V2 is available for download (tsearch.tar.gz).
Changes:
- Due to optimization of index creation new size is about two times less.
- Added new ranking function (rank_cd), which should be faster and better. It doesn't works with weights coefficients yet.
- Added generation of text fragments with hilighted query terms.
Mar 29, 2003
New version of contrib/tsearch V2 is available for download (tsearch.tar.gz).
Changes:
- added synonym dictionary
Mar 28, 2003
New version of contrib/tsearch V2 is available for download (tsearch.tar.gz).
Changes:
- new function
reset_tsearch()- reset all tsearch's caches on dictionary, parser and config. Useful for debugging. - It's possible to get words statistics.
select * from stat('select txtidx_field from test_txtidx') order by ndoc desc;
For each unique word it returns:
word, number of documents with it and total number of word occurencies in indexed collection. (getting statisctics could be very slow !) Patch for ltree ltree.732.patch.gz(patch for 7.4 is sumbitted to CVS).
Changes: - Added finctions index(ltree,ltree,offset), text2ltree(text), ltree2text(text) Read ltree documentation for details.
Mar 20, 2003
New version of contrib/tsearch V2 is available for download (tsearch.tar.gz).
Fixed:
- Allow void value of txtidx
Mar 17, 2003
New version of contrib/tsearch V2 is available for download (tsearch.tar.gz).
Fixed:
- Fixed types of weights mismatch in rank arguments
- Add range check to weights
Mar 11, 2003
btree_gist now supports int2, int8, float4, float8 ! Thanks Janko Richter for contribution.
Download sources btree_gist_new.tar.gzfor 7.3 and above.
Feb 5, 2003
btree_gist now supports int8, float4, float8 ! Thanks Janko Richter for contribution.
Download sources btree_gist.tar.gzfor 7.3 and above.
Tsearch New Generation is coming:
read README-V2.txtfor details, download alpha-versiontsearch.tar.gzfor testing.
Dec 4
Fix for contrib/tsearch (postgresql 7.3) release is available tsearch_patch.gz. It's important for non-C or ru_RU.KOI8-R locales.
Thanks Magnus Naeslund(f). Patch was submitted to CVS.
Aug 9
Fixed very stupid but important bug in ltree. Download new version ltree-7.2.tar.gz for PostgreSQL 7.2. Patch for 7.3 submitted to CVS.
Aug 6
Reworked patch from Andrey Oktyabrski (ano@spider.ru) with functions: icount, sort, sort_asc, uniq, idx, subarray operations: #, +, -, |, &
Download patch.intarray and readREADME.intarray for additional info. Note: patch should work with 7.2 also.
Jul 31
- ltree module now works on 64-bit platforms.
- Added function lca to ltree - find lowest common ancestor
- ltree module now in current CVS (7.3)
Download ltree-7.2.tar.gz for PostgreSQL 7.2.
Jul 13
Preliminary version of contrib/ltree module. Read documentationfor details. Download ltree.tar.gz
Jun 14
All changes are committed to current CVS and 7.2 stable tree (7.2.2 release ?).
Jun 3 Patch for contrib/intarray (7.2.1)
- Apply patch_intarray.gz to Postgresql 7.2.1. Fixed bug with '=' operator for gist__int_ops and define '=' operator for gist__intbig_ops opclass. Now '=' operator is consistent with standard 'array' type.
Tnanks Achilleus Mantzios for bug report and suggestion. May 27 Patch for contrib/rtree_gist (7.2.1) - Apply rtree_patch.gz to Postgresql 7.2.1. Solves problem with creating rtree indices in some cases.
Tnanks Chris Hodgson for bug report and test suite. May 26 Important patch for GiST (7.2.1) - Apply patch_gistupdate.gz to Postgresql 7.2.1. It solves 'strange update problem' (not all rows were updating) if exists gist indices. Tnanks Ivan Panchenko for bug report and Tom Lane for useful direction. Apr 24
- New version of contrib/tree module ( Download tree.tar.gz )
added functions entree_next(entree), bitree_next(bitree), which return next node available.
Example (add node):
select entree_next(tid) from dmoz where tid <* '1.2.3.*.0' order by tid desc limit 1;
Sorry, no documentation in english is available :-( Mar 19
- Upgrade (7.2 -> 7.2.1) notices to users of tsearch:
To upgrade from 7.2 to 7.2.1 one needs to perform following sql (after compiling and installing contrib/tsearch):
update pg_amop set amopreqcheck = true where amopclaid = (select oid from pg_opclass where opcname = 'gist_txtidx_ops'); Feb 12 - New version of our contrib/tree module is ready for testing. Now works on 64-bit platforms ( tested on Sun, Dec Alpha)
Download tree.tar.gz - Known problem with 64-bit platforms existed in GiST core code and gist-based contrib modules is fixed in current CVS. Waiting for 7.2.1 release. Feb 7
- New version of our contrib/tree module is ready for testing. Download tree.tar.gz Documentation is sparse and written in russian.
Features:- Fast
- Unlimited depth of tree
- Handles up to 65535 children per node
- Provides methods for various operation on tree (compare nodes, relations between nodes, node matching, array operations)
Tested with dmoz catalog ( www.dmoz.org)
Feb 6
Urgent !
- Previous patch was buggy :-) Please apply patch_compress2.gz to contrib modules (intarray, tsearch) for 7.2 release !
Thanks Tom Lane for help.
Feb 5
Urgent !
- Please apply patch_compress.gz to contrib modules (intarray, tsearch) for 7.2 release !
Fixed bug in gist indices with long texts(tsearch) and arrays (intarray).
Thanks Poul L. Christiansen for spotting the bug
Dec 5
- Add a lost entry about new picksplit algorithmREADME.rtree_gist for our Rtree implementation using GiST. Archive for 7.2 - rtree_gist.tar.gz
Tue Oct 23
- Fixed bug with toasted arrays in contrib/intarray (PostgreSQL 7.1X)
Download new contrib-intarray.tar.gz
Fri Oct 12
- New contrib module tsearch for PostgreSQL 7.2dev is available for download. It contains implementation of new data type txtidx - a searchable data type (textual) with indexed access. Read README.tsearch for more information.
Wed Aug 15
- Read our proposal for changing of index AM tables. Patch to current CVS (7.2) has submitted for approval.
Tue Aug 7 23:43:18 MSD 2001
- Read our proposal for null-safe interface to GiST. Patch to current CVS (7.2) has submitted for approval. (Applied !)
Tue Aug 4 23:43:18 MSD 2001
- Fixed bug in contrib/intarray for empty arrays - current CVS (7.2)
Thu Jun 28 15:40:52 MSD 2001
- Apply patch patch_mk_core.712 for multi-key GiST (for 7.1.2 version of PostgreSQL) which fixes problem if query contains field (with GiST index) more than 1 time (Multi-key GiST indexing for PostgreSQL v.7.1.2 ) is available as a set of patches available below)
Tue Jun 5 19:06:05 MSD 2001
- We prepare patch which includes previous patches for multi-key GiST and memory leak - patch_multikeygist_woleak.7.1.2.gz
- HACKS's there: Use it only if you know !
- patch_multikeygist_woleak_b3wa.7.1.2.gz - the same as patch_multikeygist_woleak.7.1.2.gz with workaround for index_formtuple function (all keys are of type varlena and pass-by-reference) This patch affects only to GiST indexes !
NOTICE: gist_box_ops from R-tree implemented using GiST will not works with this patch ! - btree_gist_ops.tar.gz - B-Tree for int4 and timestamp/datetime types implementation using GiST. This contrib module requiers patch_multikeygist_woleak_b3wa.7.1.2.gz.
- patch_multikeygist_woleak_b3wa.7.1.2.gz - the same as patch_multikeygist_woleak.7.1.2.gz with workaround for index_formtuple function (all keys are of type varlena and pass-by-reference) This patch affects only to GiST indexes !
Jun 1 20:22:42 MSD 2001
- New version of contrib-intarray for postgresql version *7.1.X* only is available contrib-intarray.tar.gz
Fixed small bug in handling of NULL values. - More info README.intarray
Thu May 31 17:22:42 MSD 2001
- New version of contrib-intarray for postgresql version 7.1 and above is available contrib-intarray.tar.gz
Changes:- Support for new interface of function calling (7.1 and above)
- Optimization for gist__intbig_ops (special treating of degenerated signatures)
- More info README.intarray
Wed May 30 17:04:16 MSD 2001
- Small fix for memory leak in multi-key GiST is availablepatch_memleak_in_multikey.7.1.2.gz
Apply it after original patch_multikeygist.7.1.2.gz
Both patches already applied into current CVS
Tue May 29 17:04:16 MSD 2001
- Small fixes in polygon code. Thanks Dave Blasby dblasby@refractions.net New patch is available rtree_gist.tar.gz
- More info: README.rtree_gist
Mon May 28 20:41:49 MSD 2001
- Full implementation of R-tree using GiST (compatible with multi-key GiST) is available rtree_gist.tar.gz
Notice: This version will works only with postgresql version 7.1 and above because of changes in interface of function calling. - More info: README.rtree_gist
Fri May 25 20:38:55 MSD 2001
- Patch for GiST (7.1.2) which adds multi-key index support is available patch_multikeygist.7.1.2.gz
- More info: README.multi-key
Tue May 15 14:11:16 MSD 2001
- Patch for GiST (7.1.1) which resolve problem with massive insert/update of NULLs (inserting of NULL into indexed field cause ERROR: MemoryContextAlloc: invalid request size) is available here
Workaround is a 'vacuum analyze'.
Already applied in 7.1.2
Mon Mar 19 17:31:22 MSK 2001
- Added support for toastable keys
- Improved split algorithm for intbig (selection speedup is about 30%)
Tue Jan 30 13:00:01 MSK 2001
- Improved regression test for contrib-intarray
- Current implementation provides index support for one-dimensional array of int4's - gist__int_ops, suitable for small and medium size of arrays (used on default), and gist__intbig_ops for indexing large arrays (we use superimposed signature with length of 4096 bits to represent sets, see Sven Helmer,1997).
- Introduction to GiST (not finished yet)
- Short description of GLI algorithm - work in progress
- Download gist-7.1.tar.gz, read README.gist
- Download gist.c for 7.0.3
- Download contrib-intarray.tar.gz for 7.1, (it should works also for 7.0.3, just use Makefile.703), read README.intarray
- Download contrib-rtree_box_gist.tar.gz for 7.1, read README.rtree_box_gist
(R-Tree implementation using GiST) Papers for reading: - "THE RD-TREE: AN INDEX STRUCTURE FOR SETS", Joseph M. Hellerstein, PS (70 Kb)
- "Generalized Search Trees for Database Systems", 1995,Joseph M. Hellerstein,Jeffrey F. Naughton,Avi Pfeffer,PS (190 Kb), full paper (PS) (320 Kb),
- "R-TREES: A dynamic index structure for spatial searching", A. Guttman, PDF (850 Kb)
- "The R*-tree: An Efficient and Robust AccessMethod for Points and Rectangles", Norbert Beckmann,Hans-Peter Kriegel,Ralf Schneider,Bernhard Seeger,PDF(1100 Kb)
- "Index Structures for Databases Containing Data Items with Set-valued Attributes", 1997, Sven Helmer, PS (1350 Kb)
- "On the Analysis of Indexing Schemes", 1997, Joseph M. Hellerstein,Elias Koutsoupiasy,Christos H. Papadimitriouz PS (140 Kb)
- "Implementation of Extended Indexes in POSTGRES", 1991,Paul M. Aoki, PDF (35 Kb)
- "Generalizing Database Access Methods", 1999, Ming Zhou, PS (360 Kb)
- "High-Concurrency Locking in R-Trees", 1995, Marcel Kornacker, PS (115 Kb)
- "High-Performance Extensible Indexing", 1999, Marcel Kornacker, PS (430 Kb)
- "Generalizing ''Search'' in Generalized Search Trees", 1997, Paul M. Aoki, PDF (210 Kb),extended abstract (PDF) (120 Kb)
- "Efficient Concurrency Control in Multidimensional Access Methods", 1999, Kaushik Chakrabarti,Sharad Mehrotra PS (215 Kb)
- "Indexing for String Queries using Generalized Search Trees", 1997, Jeff Foster,Megan Thomas PS (170 Kb)
- "New Linear Node Splitting Algorithm for R-trees", C.H.Ang and T.C.Tan,PS (100 Kb).
- "A Framework for Supporting the Class of Space Partitioning Trees", Walid G.Aref and Ihab F. Ilyas, PDF (150 Kb)
- Selected papers about concurrency and recovery we based on for adding concurrency and recovery to GiST
Links:
- GiST Indices - from PostgreSQL documentation
- The GiST Indexing Project
- Bitmap Index Implementation
- SP-GIST - GiST framework for Space Partitioned Trees
All works was done by Teodor Sigaev (teodor@sigaev.ru) and Oleg Bartunov(oleg@sai.msu.su)
This work is partially supported by Russian Foundation for Basic Research
Last changes:Fri Jun 14 17:50:07 MSD 2002