Ján Čegiň - Academia.edu (original) (raw)
Papers by Ján Čegiň
arXiv (Cornell University), Jan 11, 2024
The latest generative large language models (LLMs) have found their application in data augmentat... more The latest generative large language models (LLMs) have found their application in data augmentation tasks, where small numbers of text samples are LLM-paraphrased and then used to fine-tune downstream models. However, more research is needed to assess how different prompts, seed data selection strategies, filtering methods, or model settings affect the quality of paraphrased data (and downstream models). In this study, we investigate three text diversity incentive methods well established in crowdsourcing: taboo words, hints by previous outlier solutions, and chaining on previous outlier solutions. Using these incentive methods as part of instructions to LLMs augmenting text datasets, we measure their effects on generated texts' lexical diversity and downstream model performance. We compare the effects over 5 different LLMs, 6 datasets and 2 downstream models. We show that diversity is most increased by taboo words, but downstream model performance is highest with hints.
The emergence of generative large language models (LLMs) raises the question: what will be its im... more The emergence of generative large language models (LLMs) raises the question: what will be its impact on crowdsourcing? Traditionally, crowdsourcing has been used for acquiring solutions to a wide variety of human-intelligence tasks, including ones involving text generation, modification or evaluation. For some of these tasks, models like ChatGPT can potentially substitute human workers. In this study, we investigate whether this is the case for the task of paraphrase generation for intent classification. We apply data collection methodology of an existing crowdsourcing study (similar scale, prompts and seed data) using ChatGPT and Falcon-40B. We show that ChatGPT-created paraphrases are more diverse and lead to at least as robust models.
Unit testing focused on MC/DC criterion is essential in development of safety-critical systems. H... more Unit testing focused on MC/DC criterion is essential in development of safety-critical systems. However design of test data that meet the MC/DC criterion needs detailed manual analysis of branching in units under test by test engineers. To deal with this problem we propose a new test data generation approach based on reinforcement learning, which utilize analogy with a game, in which a gamer, the test engineer, plays in an environment, a unit under test, and tries to achieve the highest possible reward, MC/DC coverage. We evaluated our approach for two different granularity levels, test suite and test case, and for two different action types allowed to the gamer, discrete and continuous action spaces. Preliminary results shows that the proposed approach could solve path explosion problem of symbolic approaches and that the proposed approach achieves at least comparable results to the current state-of-the-art search-based test data generation approaches.
Unit testing focused on the Modified Condition/Decision Coverage (MC/DC) criterion is essential i... more Unit testing focused on the Modified Condition/Decision Coverage (MC/DC) criterion is essential in development of safety-critical systems as recommended by international standards. Designing unit tests for such specific software is time-consuming task which can be partially automated by test data generation methods. Special attention is given to search-based methods which are often used for problems where traditional methods like symbolic execution fall short. However, no publicly available dataset for evaluation of such methods taking into account specifics of the MC/DC criterion, which is esential for safety-critical systems. In this paper we present an analysis of software of safety-critical systems and we postulate to find a fitting open source project which could serve as a synthesized dataset for future evaluations of search-based test data generation methods for the MC/DC criterion.
arXiv (Cornell University), May 22, 2023
The emergence of generative large language models (LLMs) raises the question: what will be its im... more The emergence of generative large language models (LLMs) raises the question: what will be its impact on crowdsourcing? Traditionally, crowdsourcing has been used for acquiring solutions to a wide variety of human-intelligence tasks, including ones involving text generation, modification or evaluation. For some of these tasks, models like ChatGPT can potentially substitute human workers. In this study, we investigate whether this is the case for the task of paraphrase generation for intent classification. We apply data collection methodology of an existing crowdsourcing study (similar scale, prompts and seed data) using ChatGPT and Falcon-40B. We show that ChatGPT-created paraphrases are more diverse and lead to at least as robust models.
Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
arXiv (Cornell University), Jan 11, 2024
The latest generative large language models (LLMs) have found their application in data augmentat... more The latest generative large language models (LLMs) have found their application in data augmentation tasks, where small numbers of text samples are LLM-paraphrased and then used to fine-tune downstream models. However, more research is needed to assess how different prompts, seed data selection strategies, filtering methods, or model settings affect the quality of paraphrased data (and downstream models). In this study, we investigate three text diversity incentive methods well established in crowdsourcing: taboo words, hints by previous outlier solutions, and chaining on previous outlier solutions. Using these incentive methods as part of instructions to LLMs augmenting text datasets, we measure their effects on generated texts' lexical diversity and downstream model performance. We compare the effects over 5 different LLMs, 6 datasets and 2 downstream models. We show that diversity is most increased by taboo words, but downstream model performance is highest with hints.
The emergence of generative large language models (LLMs) raises the question: what will be its im... more The emergence of generative large language models (LLMs) raises the question: what will be its impact on crowdsourcing? Traditionally, crowdsourcing has been used for acquiring solutions to a wide variety of human-intelligence tasks, including ones involving text generation, modification or evaluation. For some of these tasks, models like ChatGPT can potentially substitute human workers. In this study, we investigate whether this is the case for the task of paraphrase generation for intent classification. We apply data collection methodology of an existing crowdsourcing study (similar scale, prompts and seed data) using ChatGPT and Falcon-40B. We show that ChatGPT-created paraphrases are more diverse and lead to at least as robust models.
Unit testing focused on MC/DC criterion is essential in development of safety-critical systems. H... more Unit testing focused on MC/DC criterion is essential in development of safety-critical systems. However design of test data that meet the MC/DC criterion needs detailed manual analysis of branching in units under test by test engineers. To deal with this problem we propose a new test data generation approach based on reinforcement learning, which utilize analogy with a game, in which a gamer, the test engineer, plays in an environment, a unit under test, and tries to achieve the highest possible reward, MC/DC coverage. We evaluated our approach for two different granularity levels, test suite and test case, and for two different action types allowed to the gamer, discrete and continuous action spaces. Preliminary results shows that the proposed approach could solve path explosion problem of symbolic approaches and that the proposed approach achieves at least comparable results to the current state-of-the-art search-based test data generation approaches.
Unit testing focused on the Modified Condition/Decision Coverage (MC/DC) criterion is essential i... more Unit testing focused on the Modified Condition/Decision Coverage (MC/DC) criterion is essential in development of safety-critical systems as recommended by international standards. Designing unit tests for such specific software is time-consuming task which can be partially automated by test data generation methods. Special attention is given to search-based methods which are often used for problems where traditional methods like symbolic execution fall short. However, no publicly available dataset for evaluation of such methods taking into account specifics of the MC/DC criterion, which is esential for safety-critical systems. In this paper we present an analysis of software of safety-critical systems and we postulate to find a fitting open source project which could serve as a synthesized dataset for future evaluations of search-based test data generation methods for the MC/DC criterion.
arXiv (Cornell University), May 22, 2023
The emergence of generative large language models (LLMs) raises the question: what will be its im... more The emergence of generative large language models (LLMs) raises the question: what will be its impact on crowdsourcing? Traditionally, crowdsourcing has been used for acquiring solutions to a wide variety of human-intelligence tasks, including ones involving text generation, modification or evaluation. For some of these tasks, models like ChatGPT can potentially substitute human workers. In this study, we investigate whether this is the case for the task of paraphrase generation for intent classification. We apply data collection methodology of an existing crowdsourcing study (similar scale, prompts and seed data) using ChatGPT and Falcon-40B. We show that ChatGPT-created paraphrases are more diverse and lead to at least as robust models.
Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering