Ensuring the Correctness of Regular Expressions: A Review (original) (raw)
References
A. V. Aho, M. S. Lam, R. Sethi, J. D. Ullman. Compilers: Principles, Techniques, & Tools, 2nd ed., Harlow, UK: Pearson Addison Wesley, 2007. MATH Google Scholar
G. Wondracek, P. M. Comparetti, C. Krügel, E. Kirda. Automatic network protocol analysis. In Proceedings of the Network and Distributed System Security Symposium, San Diego, USA, pp. 1–14, 2008.
A. S. Yeole, B. B. Meshram. Analysis of different technique for detection of SQL injection. In Proceedings of International Conference & Workshop on Emerging Trends in Technology, ACM, Mumbai, India, pp. 963–966, 2011. DOI: https://doi.org/10.1145/1980022.1980229. Google Scholar
M. Murata, D. Lee, M. Mani, K. Kawaguchi. Taxonomy of XML schema languages using formal language theory. ACM Transactions on Internet Technology, vol. 5, no. 4, pp. 660–704, 2005. DOI: https://doi.org/10.1145/1111627.1111631. Article Google Scholar
A. N. Arslan. Multiple sequence alignment containing a sequence of regular expressions. In Proceedings of IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, IEEE, La Jolla, USA, 2005. DOI: https://doi.org/10.1109/CIBCB.2005.1594922. Book Google Scholar
C. Chapman, K. T. Stolee. Exploring regular expression usage and context in Python. In Proceedings of the 25th International Symposium on Software Testing and Analysis, ACM, Saarbrücken, Germany, pp. 282–293, 2016. DOI: https://doi.org/10.1145/2931037.2931073. Google Scholar
J. C. Davis, C. A. Coghlan, F. Servant, D. Lee. The impact of regular expression denial of service (ReDoS) in practice: An empirical study at the ecosystem scale. In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Lake Buena Vista, USA, pp. 246–256, 2018. DOI: https://doi.org/10.1145/3236024.3236027.
E. Spishak, W. Dietl, M. D. Ernst. A type system for regular expressions. In Proceedings of the 14th Workshop on Formal Techniques for Java-like Programs, ACM, Beijing, China, pp. 20–26, 2012. DOI: https://doi.org/10.1145/2318202.2318207. Google Scholar
E. Larson, A. Kirk. Generating evil test strings for regular expressions. In Proceedings of IEEE International Conference on Software Testing, Verification and Validation, IEEE, Chicago, USA, pp. 309–319, 2016. DOI: https://doi.org/10.1109/ICST.2016.29. Google Scholar
M. Erwig, R. Gopinath. Explanations for regular expressions. In Proceedings of the 15th International Conference on Fundamental Approaches to Software Engineering, Springer, Tallinn, Estonia, pp. 394–408, 2012. DOI: https://doi.org/10.1007/978-3-642-28872-2_27. Google Scholar
P. Klint, R. Lämmel, C. Verhoef. Toward an engineering discipline for grammarware. ACM Transactions on Software Engineering and Methodology, vol. 14, no. 3, pp. 331–380, 2005. DOI: https://doi.org/10.1145/1072997.1073000. Article Google Scholar
P. P. Wang, K. T. Stolee. How well are regular expressions tested in the wild? In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ACM, Lake Buena Vista, USA, pp.668–678, 2018. DOI: https://doi.org/10.1145/3236024.3236072. Google Scholar
J. E. Hopcroft, R. Motwani, J. D. Ullman. Introduction to Automata Theory, Languages, and Computation, 2nd ed., Boston, USA: Addison-Wesley, 2001. MATH Google Scholar
H. S. Thompson, D. Beech, M. Maloney, N. Mendelsohn. XML schema part 1: Structures Second edition, [Online], Available: https://www.w3.org/TR/xmlschema-1/, October 28, 2004
P. Hazel. PCRE-Perl compatible regular expressions, [Online], Available: http://pcre.org/, 2005.
The Open Group Base Specifications Issue 7, IEEE Std 1003.1-2017, 2018.
P. Ammann, J. Offutt. Introduction to Software Testing Cambridge USA: Cambridge University Press, 2016. Book Google Scholar
P. P. Wang, C. Brown, J. A Jennings, K. T. Stolee. An empirical study on regular expression bugs. In Proceedings of the 17th International Conference on Mining Software Repositories, ACM, Seoul, Korea, pp. 103–113, 2020. DOI: https://doi.org/10.1145/3379597.3387464. Google Scholar
C. Chapman, P. P. Wang, K. T. Stolee. Exploring regular expression comprehension. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, IEEE, Urbana, USA, pp. 405–416, 2017. DOI: https://doi.org/10.1109/ASE.2017.8115653. Google Scholar
P. P. Wang, G. R. Bai, K. T. Stolee. Exploring regular expression evolution. In Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering, IEEE, Hangzhou, China, pp. 502–513, 2019. DOI: https://doi.org/10.1109/SANER.2019.8667972. Google Scholar
G. R. Bai, B. Clee, N. Shrestha, C. Chapman, C. Wright, K. T. Stolee. Exploring tools and strategies used during regular expression composition tasks. In Proceedings of the 27th IEEE/ACM International Conference on Program Comprehension, IEEE, Montreal, Canada, pp. 197–208, 2019. DOI: https://doi.org/10.1109/ICPC.2019.00039. Google Scholar
R. Hodovǎn, Z. Herczeg, Ǎ, Kiss. Regular expressions on the web. In Proceedings of the 12th IEEE International Symposium on Web Systems Evolution, IEEE, Timisoara, Romania, pp. 29–32, 2010. DOI: https://doi.org/10.1109/WSE.2010.5623572. Google Scholar
J. C. Davis, L. G. Michael IV, C. A. Coghlan, F. Servant, D. Lee. Why aren’t regular expressions a lingua franca? An empirical study on the re-use and portability of regular expressions. In Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ACM, Tallinn, Estonia, pp. 443–454, 2019. DOI: https://doi.org/10.1145/3338906.3338909. Google Scholar
J. Kirrage, A. Rathnayake, H. Thielecke. Static analysis for regular expression denial-of-service attacks. In Proceedings of the 7th International Conference on Network and System Security, Springer, Madrid, Spain, pp. 135–148, 2013. DOI: https://doi.org/10.1007/978-3-642-38631-2_11. Google Scholar
L. G. Michael, J. Donohue, J. C. Davis, D. Lee, F. Servant. Regexes are hard: Decision-making, difficulties, and risks in programming regular expressions. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering, IEEE, San Diego, USA, pp. 415–426, 2019. DOI: https://doi.org/10.1109/ASE.2019.00047. Google Scholar
Y. T. Li, X. Y. Chu, X. Y. Mou, C. M. Dong, H. M. Chen. Practical study of deterministic regular expressions from large-scale XML and schema data. In Proceedings of the 22nd International Database Engineering & Applications Symposium, ACM, Villa San Giovanni, Italy, pp. 45–53, 2018. DOI: https://doi.org/10.1145/3216122.3216126. Google Scholar
J. C. Davis, D. Moyer, A. M. Kazerouni, D. Lee. Testing regex generalizability and its implications: A large-scale many-language measurement study. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering, IEEE, San Diego, USA, pp. 427–439, 2019. DOI: https://doi.org/10.1109/ASE.2019.00048. Google Scholar
P. Arcaini, A. Gargantini, E. Riccobene. MutRex: A mutation-based generator of fault detecting strings for regular expressions. In Proceedings of IEEE International Conference on Software Testing, Verification and Validation Workshops, IEEE, Tokyo, Japan, pp. 87–96, 2017. DOI: https://doi.org/10.1109/ICSTW.2017.23. Google Scholar
M. C. F. P. Emer, I. F. Nazar, S. R. Vergilio, M. Jino. Fault-based test of XML schemas. Computing and Informatics, vol. 30, no. 3, pp. 531–557, 2011. Google Scholar
J. B. Li, J. Miller. Testing the semantics of W3C XML schema. In Proceedings of the 29th Annual International Computer Software and Applications Conference, IEEE, Edinburgh, UK, pp. 443–448, 2005. DOI: https://doi.org/10.1109/COMPSAC.2005.151. Google Scholar
S. Kannan, Z. Sweedyk, S. R. Mahaney. Counting and random generation of strings in regular languages. In Proceedings of the 6th Annual ACM-SIAM Symposium on Discrete Algorithms, ACM, San Francisco, USA, pp. 551–557, 1995. MATH Google Scholar
M. Ackerman, J. Shallit. Efficient enumeration of regular languages. In Proceedings of the 12th International Conference on Implementation and Application of Automata, Praque, Czech Republic, Springer, pp. 226–242, 2007. DOI: https://doi.org/10.1007/978-3-540-76336-9_22. MATH Google Scholar
G. Radanne, P. Thiemann. Regenerate: A language generator for extended regular expressions. In Proceedings of the 17th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, ACM, Boston, USA, pp. 202–214, 2018. DOI: https://doi.org/10.1145/3278122.3278133. Google Scholar
P. Arcaini, A. Gargantini, E. Riccobene. Fault-based test generation for regular expressions by mutation. Software: Testing, Verification and Reliability, vol. 29, no. 1–2, Article number e1664, 2019. DOI: https://doi.org/10.1002/stvr.1664. Google Scholar
J. Oncina, P. Garcǐa. Identifying regular languages in polynomial time. Advances in Structural and Syntactic Pattern Recognition, H. Bunke, Ed., World Scientific, pp. 99–108, 1993. DOI: https://doi.org/10.1142/9789812797919_0007.
F. Brauer, R. Rieger, A. Mocan, W. M. Barczynski. Enabling information extraction by inference of regular expressions from sample entities. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, ACM, Glasgow, UK, pp. 1285–1294, 2011. DOI: https://doi.org/10.1145/2063576.2063763. Google Scholar
A. Bartoli, A. De Lorenzo, E. Medvet, F. Tarlao. Can a machine replace humans in building regular expressions? A case study IEEE Intelligent Systems, vol. 31, no. 6, pp. 15–21, 2016. DOI: https://doi.org/10.1109/MIS.2016.46. Article Google Scholar
A. Bartoli, A. De Lorenzo, E. Medvet, F. Tarlao. Active learning of regular expressions for entity extraction. IEEE Transactions on Cybernetics, vol. 48, no. 3, pp. 1067–1080, 2017. DOI: https://doi.org/10.1109/TCYB.2017.2680466. Article Google Scholar
A. Bartoli, A. De Lorenzo, E. Medvet, F. Tarlao. Inference of regular expressions for text extraction from examples. IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 5, pp. 1217–1230, 2016. DOI: https://doi.org/10.1109/TKDE.2016.2515587. Article Google Scholar
M. Lee, S. So, H. Oh. Synthesizing regular expressions from examples for introductory automata assignments. In Proceedings of ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, ACM, Amsterdam, The Netherlands, pp. 70–80, 2016. DOI: https://doi.org/10.1145/2993236.2993244. Google Scholar
G. J. Bex, F. Neven, T. Schwentick, S. Vansummeren. Inference of concise regular expressions and DTDs. ACM Transactions on Database Systems, vol. 35, no. 2, Article number 11, 2010. DOI: https://doi.org/10.1145/1735886.1735890. Google Scholar
N. Kushman, R. Barzilay. Using semantic unification to generate regular expressions from natural language. In Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, USA, pp. 826–836, 2013.
N. Locascio, K. Narasimhan, E. DeLeon, N. Kushman, R. Barzilay. Neural generation of regular expressions from natural language with minimal domain knowledge. In Proceedings of Conference on Empirical Methods in Natural Language Processing, Austin, Texas, USA, pp. 1918–1923, 2016. DOI: https://doi.org/10.18653/v1/D16-1197.
Z. X. Zhong, J. Q. Guo, W. Yang, J. Peng, T. Xie, J. G. Lou, T. Liu, D. M. Zhang. SemRegex: A semantics-based approach for generating regular expressions from natural language specifications. In Proceedings of Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Brussels, Belgium, pp. 1608–1618, 2018. DOI: https://doi.org/10.18653/v1/D18-1189. Google Scholar
Z. X. Zhong, J. Q. Guo, W. Yang, T. Xie, J. G. Lou, T. Liu, D. M. Zhang. Generating regular expressions from natural language specifications: Are we there yet? In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA, pp. 791–794, 2018.
Q. C. Chen, X. Y. Wang, X. Ye, G. Durrett, I. Dillig. Multi-modal synthesis of regular expressions. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, ACM, London, UK, pp. 487–502, 2020. DOI: https://doi.org/10.1145/33854122.3385988. Google Scholar
G. Castagna, D. Colazzo, A. Frisch. Error mining for regular expression patterns. In Proceedings of the 9th Italian Conference on Theoretical Computer Science, Springer, Siena, Italy, pp. 160–172, 2005. DOI: https://doi.org/10.1007/11560586_13. MATH Google Scholar
E. Larson. Automatic checking of regular expressions. In Proceedings of the 18th IEEE International Working Conference on Source Code Analysis and Manipulation, IEEE, Madrid, Spain, pp. 225–234, 2018. DOI: https://doi.org/10.1109/SCAM.2018.00034. Google Scholar
X. Liu, Y. F. Jiang, D. H. Wu. A lightweight framework for regular expression verification. In Proceedings of the 19th IEEE International Symposium on High Assurance Systems Engineering, IEEE, Hangzhou, China, pp. 11–8, 2019. DOI: https://doi.org/10.1109/HASE.2019.00011. Google Scholar
I. Budiselic, S. Srbljic, M. Popovic. RegExpert: A tool for visualization of regular expressions. In Proceedings of International Conference on “Computer as a Tool”, IEEE, Warsaw, Poland, pp. 2387–2389, 2007. DOI: https://doi.org/10.1109/EURCON.2007.4400374. Google Scholar
B. Braune, S. Diehl, A. Kerren, R. Wilhelm. Animation of the generation and computation of finite automata for learning software. In Proceedings of the 4th International Workshop on Implementing Automata, Springer, Potsdam, Germany, pp. 39–47, 2001. DOI: https://doi.org/10.1007/3-540-45526-4_4. MATH Google Scholar
T. Hung, S. H. Rodger. Rodger. Increasing visualization and interaction in the automata theory course. ACM SIGCSE Bulletin, vol. 32, no. 1, pp. 6–10, 2000. DOI: https://doi.org/10.1145/331795.331800. Google Scholar
K. Oflazer, Yılmaz. Vi-xfst: A visual regular expression development environment for Xerox finite state tool. In Proceedings of the Workshop on ACL Special Interest Group in Computational Phonology, ACM, Barcelona, Spain, pp. 86–93, 2004. Google Scholar
F. Beck, S. Gulan, B. Biegel, S. Baltes, D. Weiskopf. Regviz: Visual debugging of regular expressions. In Proceedings of the 36th International Conference on Software Engineering, ACM, Hyderabad, India, pp. 504–507, 2014. DOI: https://doi.org/10.1145/2591062.2591111. Google Scholar
Y. Y. Li, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, H. V. Jagadish. Regular expression learning for information extraction. In Proceedings of Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Honolulu, USA, pp. 21–30, 2008. Google Scholar
T. Rebele, K. Tzompanaki, F. M. Suchanek. Adding missing words to regular expressions. In Proceedings of the 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, Melbourne, Australia, pp. 67–79, 2018. DOI: https://doi.org/10.1007/978-3-319-93037-4_6. Google Scholar
R. Pan, Q. H. P. Hu, G. W. Xu, L. D’Antoni. Automatic repair of regular expressions. In Proceedings of the ACM on Programming Languages, vol. 3, Article number 139, 2019. DOI: https://doi.org/10.1145/3360565.
Y. T. Li, Z. W. Xu, J. L. Cao, H. M. Chen, T. J. Ge, S. C. Cheung, H. R. Zhao. FlashRegex: Deducing Anti-ReDoS regexes from examples. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, IEEE, Melbourne, Australia, pp. 659–671, 2020. DOI: https://doi.org/10.1145/3324884.3416556. Google Scholar
N. Chida, T. Terauchi. Automatic repair of vulnerable regular expressions. [Online], Available: https://arxiv.org/abs/2010.12450, 2020.
P. Arcaini, A. Gargantini, E. Riccobene. Regular expression learning with evolutionary testing and repair. In Proceedings of the 31st IFIP International Conference on Testing Software and Systems, Springer, Paris, France, pp. 22–40, 2019. DOI: https://doi.org/10.1007/978-3-030-31280-0_2. Google Scholar
R. A. Cochran, L. D’Antoni, B. Livshits, D. Molnar, M. Veanes. Program boosting: Program synthesis via crowd-sourcing. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, ACM, Mumbai, India, pp. 677–688, 2015. DOI: https://doi.org/10.1145/2676726.2676973. MATH Google Scholar
P. Arcaini, A. Gargantini, E. Riccobene. Interactive testing and repairing of regular expressions. In Proceedings of the 30th IFIP International Conference on Testing Software and Systems, Springer, Cádiz, Spain, pp. 1–16, 2018. DOI: https://doi.org/10.1007/978-3-319-99927-2_1. Google Scholar
X. H. Qiu, Y. T. Hu, B. Li. Sequential Fault diagnosis using an inertial velocity differential evolution algorithm. International Journal of Automation and Computing, vol. 16, no. 3, pp. 389–397, 2019. DOI: https://doi.org/10.1007/s11633-016-1008-0. Article Google Scholar