Datalog and Recursive Query Processing (original) (raw)
Abstract
In recent years, we have witnessed a revival of the use of recursive queries in a variety of emerging application domains such as data integration and exchange, information extraction, networking, and program analysis. A popular language used for expressing these queries is Datalog. This paper surveys for a general audience the Datalog language, recursive query processing, and optimization techniques. This survey differs from prior surveys written in the eighties and nineties in its comprehensiveness of topics, its coverage of recent developments and applications, and its emphasis on features and techniques beyond "classical" Datalog which are vital for practical applications. Specifically, the topics covered include the core Datalog language and various extensions, semantics, query optimizations, magic-sets optimizations, incremental view maintenance, aggregates, negation, and types. We conclude the paper with a survey of recent systems and applications that use Datalog and recursive queries.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (126)
- conclude our exposition of Datalog with some example applications. In particular, we discuss the domains of program analysis, declarative networking, data integration and exchange, and enterprise software sys- tems. For each domain, we highlight language extensions, runtime con- siderations, and use cases. We then briefly survey other applications.
- broad range of analysis: data- flow, control-flow, points-to, source code structure, etc. The results of these analysis are used to optimize programs for performance, to dis- cover bugs, to enforce coding standards, etc. The domain of program analysis is particularly suitable for Datalog, as recursion and non-linear recursion in particular, is pervasive in analysis logic. In this section, we first give readers a taste of program analysis in Datalog with an exam- ple from a Java points-to analysis; we then provide an overview of the major works in this area, and discuss two in particular in more details. References
- BioPerl, http://bioperl.org.
- Datomic website, http://www.datomic.com/.
- H2 Database Engine, http://www.h2database.com.
- LogicBlox website, http://www.logicblox.com/.
- Microsoft SQL server, http://www.microsoft.com/sql.
- PostgreSQL, http://www.postgresql.org/.
- Semmle Web site, http://www.semmle.com.
- S. Abiteboul, E. Simon, and V. Vianu. Non-deterministic languages to express deterministic transformations. In PODS, 1990.
- Serge Abiteboul, Zoe Abrams, Stefan Haar, and Tova Milo. Diagnosis of Asynchronous Discrete Event Systems-Datalog to the Rescue! In PODS, 2005.
- Serge Abiteboul and Oliver Duschka. Complexity of answering queries using materialized views. In PODS, 1998.
- Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995.
- Serge Abiteboul and Victor Vianu. Datalog extensions for database queries and updates. J. Comput. Syst. Sci., 43:62-124, August 1991.
- Foto Afrati, Stavros S. Cosmadakis, and Mihalis Yannakakis. On dat- alog vs. polynomial time. In PODS, 1991. References
- Peter Alvaro, Tyson Condie, Neil Conway, Khaled Elmeleegy, Joseph M. Hellerstein, and Russell Sears. Boom analytics: exploring data-centric, declarative programming for the cloud. In EuroSys, 2010.
- Peter Alvaro, Neil Conway, Joseph M. Hellerstein, and William R. Mar- czak. Consistency analysis in bloom: a calm and collected approach. In CIDR, 2011.
- Peter Alvaro, William Marczak, Neil Conway, Joseph M. Hellerstein, David Maier, and Russell C Sears. Dedalus: Datalog in time and space. Technical Report UCB/EECS-2009-173, EECS Department, University of California, Berkeley, Dec 2009.
- Tom J. Ameloot, Frank Neven, and Jan Van den Bussche. Relational Transducers for Declarative Networking. In PODS, 2011.
- K. R. Apt, H. A. Blair, and A. Walker. Towards a theory of declarative knowledge. pages 89-148, 1988.
- Faiz Arni, KayLiang Ong, Shalom Tsur, Haixun Wang, and Carlo Zan- iolo. The deductive database system LDL++. TPLP, 3(1):61-94, 2003.
- I. Balbin and K. Ramamohanarao. A generalization of the differential approach to recursive query evaluation. Journal of Logic Programming, 4(3), 1987.
- Francois Bancilhon. Naive evaluation of recursively defined relations. On Knowledge Base Management Systems: Integrating AI and DB Tech- nologies, 1986.
- Francois Bancilhon and Raghu Ramakrishnan. An amateur's introduc- tion to recursive query processing strategies. SIGMOD Rec., 15(2):16- 52, 1986.
- BDD-Based Deductive DataBase. http://bddbddb.sourceforge. net/.
- Catriel. Beeri and Raghu. Ramakrishnan. On the power of magic. In PODS, 1987.
- Catriel Beeri and Moshe Y. Vardi. A proof procedure for data depen- dencies. J. ACM, 31(4):718-741, 1984.
- Nicole Bidoit. Bases de Données Déductives: Présentation de Datalog. Armand Colin, 1992.
- Martin Bravenboer and Yannis Smaragdakis. Doop website, http://doop.program-analysis.org/.
- Martin Bravenboer and Yannis Smaragdakis. Strictly declarative spec- ification of sophisticated points-to analyses. In OOPSLA, 2009.
- R.E. Bryant. Graph-based algorithms for boolean function manipula- tion. IEEE Transactions on Computers, 35(8):677-691, 1986.
- Dario Campagna, Beata Sarna-Starosta, and Tom Schrijvers. Optimiz- ing Inequality Joins in Datalog with Approximated Constraint Propa- gation. In Claudio Russo and Neng-Fa Zhou, editors, Practical Aspects of Declarative Languages, 14th International Symposium, Proceedings. Springer, 2012.
- S. Ceri, G. Gottlob, and L. Tanca. What you always wanted to know about datalog (and never dared to ask). IEEE TKDE, 1(1):146-166, 1989.
- Stefano Ceri, Georg Gottlob, and L. Tanca. Logic Programming and Databases. Springer, 1990.
- Keith L. Clark. Negation as failure. In Logic and Data Bases, pages 293-322, 1977.
- Sara Cohen, Joseph Gil, and Evelina Zarivach. Datalog programs over infinite databases, revisited. In DBPL, 2007.
- Robert M. Colomb. Deductive Databases and their Applications. Taylor and Francis, 1998.
- Neil Conway, William R. Marczak, Peter Alvaro, Joseph M. Hellerstein, and David Maier. Logic and lattices for distributed programming. In SoCC, 2012.
- Patrick Cousot and Radhia Cousot. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approx- imation of fixpoints. In POPL, 1977.
- Steven Dawson, C. R. Ramakrishnan, and David S. Warren. Practical program analysis using general purpose logic programming systems-a case study. In PLDI, 1996.
- Alin Deutsch and Val Tannen. Reformulation of xml queries and con- straints. In ICDT, pages 225-241, 2003.
- Guozhu Dong, Leonid Libkin, Jianwen Su, and Limsoon Wong. Main- taining transitive closure of graphs in sql. In Int. J. Information Tech- nology, 5, 1999.
- Guozhu Dong and Jianwen Su. Incremental and decremental evaluation of transitive closure by first-order queries. Inf. Comput., 120(1):101-106, 1995. References
- Ronald Fagin, Phokion G. Kolaitis, Renée J. Miller, and Lucian Popa. Data exchange: semantics and query answering. TCS, 336(1):89-124, 2005.
- John Field, Maria-Cristina Marinescu, and Christian Stefansen. Reac- tors: A data-oriented synchronous/asynchronous programming model for distributed applications. Theor. Comput. Sci., 410(2-3):168-201, 2009.
- Jörg Flum, Max Kubierschky, and Bertram Ludäscher. Total and partial well-founded datalog coincide. In ICDT, 1997.
- Jörg Flum, Max Kubierschky, and Bertram Ludäscher. Games and total datalog ¬ queries. Theoretical Computer Science, 239(2):257-276, 2000.
- Allen Van Gelder. The alternating fixpoint of logic programs with nega- tion. JCSS, 47(1):185 -221, 1993.
- Michael Gelfond and Vladimir Lifschitz. The stable model semantics for logic programming. In ICLP/SLP, pages 1070-1080, 1988.
- Georg Gottlob, Christoph Koch, Robert Baumgartner, Marcus Herzog, and Sergio Flesca. The Lixto data extraction project: back and forth between theory and practice. In PODS, 2004.
- Todd J. Green, Grigoris Karvounarakis, Zachary G. Ives, and Val Tan- nen. Update exchange with mappings and provenance. In VLDB, 2007.
- Todd J. Green, Grigoris Karvounarakis, and Val Tannen. Provenance semirings. In PODS, 2007.
- Ashish Gupta, Inderpal Singh Mumick, and V. S. Subrahmanian. Main- taining views incrementally. In SIGMOD, 1993.
- GUS: The Genomics Unified Schema. http://www.gusdb.org/.
- Elnar Hajiyev, Mathieu Verbaere, and Oege de Moor. Codequest: Scalable source code queries with datalog. In David Thomas, editor, ECOOP, 2006.
- Alon Y. Halevy. Answering queries using views: A survey. VLDB Jour- nal, 10(4):270-294, 2001.
- Y. Halevy, G. Ives, Dan Suciu, and Igor Tatarinov. Schema mediation for large-scale semantic data sharing. VLDB Journal, 14(1):68-83, 2005.
- Joseph M. Hellerstein. Declarative imperative: Experiences and conjec- tures in distributed logic. 2010. SIGMOD Record 39(1).
- Neil Immerman. Relational queries computable in polynomial time. Information and Control, 68(1-3):86-104, 1986.
- IRIS (Integrated Rule Inference System) Reasoner. http://www. iris-reasoner.org/.
- Trevor Jim. SD3: A Trust Management System With Certified Evalu- ation. In IEEE Symposium on Security and Privacy, May 2001.
- David B. Kemp. Efficient recursive aggregation and negation in deduc- tive databases. TKDE, 10(5), 1998.
- Michael Kifer. On the decidability and axiomatization of query finite- ness in deductive databases. JACM, 45(4):588-633, July 1998.
- Michael Kifer, Raghu Ramakrishnan, and Abraham Silberschatz. An axiomatic approach to deciding query safety in deductive databases. In PODS, 1988.
- Anthony C. Klug. Equivalence of relational algebra and relational calcu- lus query languages having aggregate functions. J. ACM, 29(3):699-717, 1982.
- Ravi Krishnamurthy, Raghu Ramakrishnan, and Oded Shmueli. A framework for testing safety and effective computability of extended datalog. In SIGMOD, 1988.
- Monica S. Lam, John Whaley, V. Benjamin Livshits, Michael C. Mar- tin, Dzintars Avots, Michael Carbin, and Christopher Unkel. Context- sensitive program analysis as database queries. In PODS, 2005.
- Laurent Vieille. Recursive Axioms in Deductive Database: The Query- Subquery Approach. In 1st International Conference on Expert Database Systems, 1986.
- Maurizio Lenzerini. Data integration: A theoretical perspective. In PODS, 2002.
- Nicola Leone, Gerald Pfeifer, Wolfgang Faber, Thomas Eiter, Georg Gottlob, Simona Perri, and Francesco Scarcello. The dlv system for knowledge representation and reasoning. ACM Trans. Comput. Logic, 7(3):499-562, July 2006.
- Ondřej Lhoták. Program Analysis using Binary Decision Diagrams. PhD thesis, McGill University, January 2006.
- Senlin Liang and Michael Kifer. Deriving predicate statistics in datalog. In PPDP, 2010.
- Leonid Libkin. Elements Of Finite Model Theory. Springer, 2004.
- Changbin Liu, Lu Ren, Boon Thau Loo, Yun Mao, and Prithwish Basu. Cologne: A declarative distributed constraint optimization platform. In VLDB, 2012. References
- A. Livchak. Languages for polynomial-time queries. Computer-based modeling and optimization of heat-power and electrochemical objects, 1992. In Russian.
- Boon Thau Loo, Tyson Condie, Minos Garofalakis, David E. Gay, Joseph M. Hellerstein, Petros Maniatis, Raghu Ramakrishnan, Timothy Roscoe, and Ion Stoica. Declarative Networking: Language, Execution and Optimization. In SIGMOD, 2006.
- Boon Thau Loo, Tyson Condie, Minos Garofalakis, David E. Gay, Joseph M. Hellerstein, Petros Maniatis, Raghu Ramakrishnan, Timo- thy Roscoe, and Ion Stoica. Declarative networking. Commun. ACM, 52(11):87-95, 2009.
- Boon Thau Loo, Tyson Condie, Joseph M. Hellerstein, Petros Maniatis, Timothy Roscoe, and Ion Stoica. Implementing Declarative Overlays. In SOSP, 2005.
- Boon Thau Loo, Harjot Gill, Changbin Liu, Yun Mao, William R. Mar- czak, Micah Sherr, Anduo Wang, and Wenchao Zhou. Recent advances in declarative networking. In Fourteenth International Symposium on Practical Aspects of Declarative Languages (PADL), 2012.
- Boon Thau Loo, Joseph M. Hellerstein, Ryan Huebsch, Timo thy Roscoe, and Ion Stoica. Analyzing P2P Overlays with Recursive Queries. Technical Report UCB-CS-04-1301, UC Berkeley, 2004.
- Boon Thau Loo, Joseph M. Hellerstein, Ion Stoica, and Raghu Ramakr- ishnan. Declarative routing: extensible routing with declarative queries. In SIGCOMM, 2005.
- Boon Thau Loo, Sailesh Krishnamurthy, and Owen Cooper. Distributed Web Crawling over DHTs. Technical Report UCB-CS-04-1305, UC Berkeley, 2004.
- Bertram Ludäscher. Integration of Active and Deductive Database Rules, volume 45 of DISDBIS. Infix Verlag, St. Augustin, Germany, 1998. PhD thesis.
- Bertram Ludäscher, Ulrich Hamann, and Georg Lausen. A logical framework for active rules. In COMAD, 1995.
- Bertram Ludäscher, Wolfgang May, and Georg Lausen. Nested trans- actions in a logical language for active rules. In LID, 1996.
- William R. Marczak, Shan Shan Huang, Martin Bravenboer, Micah Sherr, Boon Thau Loo, and Molham Aref. Secureblox: customizable secure distributed data processing. In SIGMOD, 2010.
- Michael Meier, Michael Schmidt, and Georg Lausen. On chase termi- nation beyond stratification. PVLDB, 2(1):970-981, 2009.
- Mengmeng Liu and Nicholas Taylor and Wenchao Zhou and Zachary Ives and Boon Thau Loo. Recursive Computation of Regions and Con- nectivity in Networks. In ICDE, 2009.
- Jack Minker. Logic and databases: A 20 year retrospective. In Dino Pedreschi and Carlo Zaniolo, editors, Logic in Databases, volume 1154 of Lecture Notes in Computer Science, pages 1-57. Springer Berlin / Heidelberg, 1996. 10.1007/BFb0031734.
- Inderpal Singh Mumick and Hamid Pirahesh. Implementation of magic- sets in a relational database system. In SIGMOD, 1994.
- Inderpal Singh Mumick, Hamid Pirahesh, and Raghu Ramakrishnan. The magic of duplicates and aggregates. In VLDB, 1990.
- Vivek Nigam, Limin Jia, Boon Thau Loo, and Andre Scedrov. Main- taining distributed logic programs incrementally. In 13th International ACM SIGPLAN Symposium on Principles and Practice of Declarative Programming (PPDP), 2011.
- Orchestra Collaborative Data Sharing System. http://code.google. com/p/penn-orchestra/.
- P2: Declarative Networking System. http://p2.cs.berkeley.edu.
- Christos H. Papadimitriou. A note on the expressive power of prolog. Bulletin of the EATCS, 26:21-22, 1985.
- Lucian Popa, Yannis Velegrakis, Mauricio A. Hernández, Renée J. Miller, and Ronald Fagin. Translating web data. In VLDB, 2002.
- R. Ramakrishnan, F. Bancilhon, and A. Silberschatz. Safety of recursive horn clauses with infinite relations. In PODS, 1987.
- Raghu Ramakrishnan, Kenneth A. Ross, Divesh Srivastava, and S. Su- darshan. Efficient Incremental Evaluation of Queries with Aggregation. In SIGMOD, pages 204-218, 1992.
- Raghu Ramakrishnan, Kenneth A. Ross, Divesh Srivastava, and S. Su- darshan. Efficient incremental evaluation of queries with aggregation. In SIGMOD, 1994.
- Raghu Ramakrishnan, Divesh Srivastava, S. Sudarshan, and Praveen Seshadri. The CORAL deductive system. VLDB Journal, 3(2):161- 210, 1994. References
- Raghu Ramakrishnan and Jeffrey D. Ullman. A Survey of Research on Deductive Database Systems. Journal of Logic Programming, 23(2):125-149, 1993.
- RapidNet Declarative Networking Engine. http://netdb.cis.upenn. edu/rapidnet/.
- Thomas Reps. Demand interprocedural program analysis using logic databases. Applications of Logic Databases, pages 163-196, 1994.
- Kenneth Ross. A syntactic stratification condition using constraints. In ILPS, 1994.
- Kenneth A. Ross. Modular stratification and magic sets for datalog programs with negation. J. ACM, 41:1216-1266, November 1994.
- Kenneth A. Ross. Structural totality and constraint stratification. In PODS, 1995.
- Kenneth A. Ross and Yehoshua Sagiv. Monotonic aggregation in deduc- tive databases. Journal of Computer and System Sciences, 54(1):79-97, 1997.
- Y. Sagiv and M. Y. Vardi. Safety of datalog queries over infinite databases. In PODS, 1989.
- Damien Sereni, Pavel Avgustinov, and Oege de Moor. Adding magic to an optimising datalog compiler. In SIGMOD, 2008.
- Praveen Seshadri, Joseph M. Hellerstein, Hamid Pirahesh, T. Y. Cliff Leung, Raghu Ramakrishnan, Divesh Srivastava, Peter J. Stuckey, and S. Sudarshan. Cost-based optimization for magic: Algebra and imple- mentation. In SIGMOD, 1996.
- W. Shen, A. Doan, J. Naughton, and R. Ramakrishnan. Declarative information extraction using datalog with embedded extraction predi- cates. In VLDB, 2007.
- Divesh Srivastava and Raghu Ramakrishnan. Pushing constraint selec- tions. In PODS, 1992.
- Leon Sterling and Ehud Shapiro. The Art of Prolog. The MIT Press, 2nd edition, 1994.
- Michael Stonebraker and Joseph M. Hellerstein, editors. Readings in Database Systems, Third Edition. Morgan Kaufmann, 1998.
- Peter J. Stuckey and S. Sudarshan. Compiling query constraints (ex- tended abstract). In PODS, 1994.
- S. Sudarshan and Raghu Ramakrishnan. Aggregation and relevance in deductive databases. In VLDB, 1991.
- Frank Tip. A survey of program slicing techniques. Journal of Pro- gramming Languages, 3:121-189, 1995.
- Jeffrey D. Ullman. Implementation of logical query languages for databases. ACM Trans. Database Syst., 10:289-321, September 1985.
- Jeffrey D. Ullman. Principles of Database and Knowledge-Base Systems: Volume II: The New Technologies. W. H. Freeman & Co., New York, NY, USA, 1990.
- M. H. Van Emden and R. A. Kowalski. The semantics of predicate logic as a programming language. J. ACM, 23:733-742, October 1976.
- Allen Van Gelder. The well-founded semantics of aggregation. In PODS, 1992.
- Allen Van Gelder, Kenneth A. Ross, and John S. Schlipf. The well- founded semantics for general logic programs. J. ACM, 38:619-649, July 1991.
- Moshe Y. Vardi. The complexity of relational query languages. In STOC, 1982.
- John Whaley and Monica S. Lam. Cloning-based context-sensitive pointer alias analysis using binary decision diagrams. In PLDI, 2004.
- Jennifer Widom and Stefano Ceri. Active Database Systems: Triggers and Rules for Advanced Database Processing. Morgan-Kaufmann, 1996.
- Wenchao Zhou, Yun Mao, Boon Thau Loo, and Martín Abadi. Unified Declarative Platform for Secure Networked Information Systems. In ICDE, 2009.