Program Similarity Detection Tool (original) (raw)
Related papers
A Literature Review on Plagiarism Detection in Computer Programming Assignments
IRJET, 2022
Our research aims to detect plagiarism in computer programming assignments. Plagiarism has been a problem for a long time and the problem has evolved with time. With the rise of the internet the theft of intellectual property has risen significantly and recognizing these thefts has become difficult. With this survey we have identified techniques used in detecting plagiarism. With changing times, the tools needed to detect plagiarism have to be evolved. However, to develop a tool with an ability to achieve high accuracy and greater accessibility of data has always been a demand. A comparative study on plagiarism checking tools with the technology used is presented in this paper. This study would help us determine the algorithm and methodology to proceed with the development of code to detect plagiarism.
Plagiarism detection using software tools: a study in a Computer Science degree
2009
In this paper we describe a study on plagiarism detection in programming projects of 8 courses of a BSc in Computer Science. 865 projects of different size (from 20 to 2000 source code lines) written in C and Modula-2 programming languages were screened using two plagiarism detection software tools that produce originality reports for each project including a global similarity index (SI). The reports were individually analysed in detail by the instructor of each course showing that even projects with very high SI values are not actually plagiarized. Quantitatively, 26 projects among the 100 ones that were evaluated by the tools as having SI >75% exhibited plagiarism evidences to some extent (3% of total). Usual reasons for high SI in non-plagiarized projects were legitimate reuse of code, the repetitive syntax of programming languages, or use of common modules for basic tasks usually solved in the same way. Due to this, it became clear that a manual in-depth individualized post-analysis of the reports needs to be done in order to avoid false positives. Having high quality and usability review facilities (such as highlighting similar fragments among documents, quick navigation between fragments, and easy access to external sources of potential plagiarism) are very valuable additions to these tools, which help to reduce time devoted to the necessary manual inspection of documents. Such features are very welcome by users. 1.3. Conclusions It became clear after the study that inclusion of knowledge to plagiarism detection tools is a need when applied to programming projects. This knowledge is related to (i) a description of the resources in the courses and the minimum SI threshold that is acceptable (stating the reusable code, ...), (ii) the implicit information that instructors provide when a given document is labelled as plagiarized or not and (iii) including automated learning mechanisms for refinement of the plagiarized fragments detection. Addition of these features to a plagiarism detection software tool together with a good integration in the assessment workflow are key issues for constructing a valuable support system to e-learning based continuous assessment in programming courses.
2013 Learning and Teaching in Computing and Engineering, 2013
Technology empowers students but can also entice them to plagiarise. To tackle this problem, plagiarism detection tools are especially useful, not only in popular thinking as a deterrent for students, but also as an educational tool to raise students' awareness of the offence and to improve their academic skills. Commercial text matching tools (e.g. Turnitin) are at a high level of maturity. These tools offer the ability to interact with students, making them suitable for an educational objective. Additionally, they can be readily integrated into learning environments enabling uniform application at an institutional level. On the other hand, computer source code matching tools, despite their successful detection performance, are mostly used as standalone tools that are difficult to adopt at an institutional level. The research presented in this paper describes the trial and evaluation of a tool that is seamlessly integrated into the Moodle virtual learning environment. The tool provides code similarity scanning capability within Moodle so that institutions using this learning environment could apply this tool easily at an enterprise level. Additionally, the educational aspects available in text matching tools have been added into the tool capability. The tool relies on two popular code matching services, MOSS and JPlag, as underlying engines to provide good code similarity scanning performance. The evaluation of the tool from both academics' and students' perspectives indicates a considerable level of interest in using the tool, and supports the suitability of this tool for wider institutional adoption in the computing education community.
Plagiarism Detection in Source Code
Source code plagiarism is a very serious problem in academia. Lot of assignment work in programming courses is submitted electronically by the students. This makes it difficult for the faculty to check each and every code separately. Using a plagiarism detection tool makes it easy to check and analyze student’s assignment. In programming courses, students submit their work in form of java source code files. There is a possibility that students may copy the Java code files from another source without properly crediting the original writer or programmer, intentionally or unintentionally. This is also a form of plagiarism. The main purpose of this paper is to show a method to detect plagiarism in java source code. To do this first it makes all the submitted java codes into a similar pattern by removing comments. Then tokenization is done. Finally the tokens are compared to get the similar portions and are displayed accordingly.
Automatic Source Code Plagiarism Detection
2009 10th ACIS International Conference on Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing, 2009
Plagiarism is one form of academic dishonesty, which is often done by students in programming classes. In a large class, detecting plagiarism manually is both difficult and time-consuming、especially due to the numerous modifications of the source code to conceal the cheating. We designed and developed Deimos, a prototype of a source code plagiarism detector, which can be extended to handle other programming languages, simply by implementing new scanners and parsers. Deimos works in two steps: (1) parsing source code and transforming it into tokens, and then (2) comparing each pair of token strings obtained in the first step using Running Karp-Rabin Greedy String Tiling algorithm. Instructor can access Deimos via a web application interface that receives input parameters, triggers a background process, and displays the result. The web interface offers user friendliness while the background process prevents timeout and reduces bandwidth consumption. This approach was chosen since Deimos is intended to be used for processing more than 100 source code. The web application was implemented using PHP, while Java was used to implement the backend application, which is responsible for the background process. Unit test, functional test, and nonfunctional test has been conducted. Detection time is 1 hour for processing 100 samples of beginner's source code taken from real assignment of our programming class where the average length of source code is 150 lines. This code similarity detector could also be used for other pedagogical tools, such as autograder, which checks consistency of source code based on a template or solution. plagiarism detection, source code plagiarism I.
Parker and Hamblen defi ne software plagiarism as "a program that has been produced from another program with trivial text edit operations and without detailed understanding of the program". In practice, if one program can be transformed into another simply through use of editor operations (such as global substitutions) or by exploiting synonymous expressions provided by the programming language, then a probable case of plagiarism has been found and should be examined further. We propose a metric to solve this problem based on the amount of shared information between two computer programs. We designed and implemented a system, TokenMatch, to enable plagiarism detection and we demonstrate experimental results to prove its eff ectiveness.
A Unified Approach to Automate the Usage of Plagiarism Detection Tools in Programming Courses
International Conference on Computer Science and Education (ICCSE), 2017
Plagiarism in programming assignments is an extremely common problem in universities. While there are many tools that automate the detection of plagiarism in source code, users still need to inspect the results and decide whether there is plagiarism or not. Moreover, users often rely on a single tool (using it as " gold standard " for all cases), which can be ineffective and risky. Hence, it is desirable to make use of several tools to complement their results. However, various limitations exist in these tools that make their usage a very time-consuming task, such as the need of manually analyzing and correlating their multiple outputs. In this paper, we propose an automated system that addresses the common usage limitations of plagiarism detection tools. The system automatically manages the execution of different plagiarism tools and generates a consolidated comparative visualization of their results. Consequently, the user can make better-informed decisions about potential plagiarisms. Our experimental results show that the effort and expertise required to use plagiarism detection tools is significantly reduced, while the probability of detecting plagiarism is increased. Results also show that our system is lightweight (in terms of computational resources), proving it is practical for real-world usage.
Detecting computer code plagiarism in higher education
Proceedings of the 31st International Conference on Information Technology Interfaces
Rapid development of industry and economy requires quick and efficient education of large amount of theoretical knowledge and practical skills. In computer science, especially in programming, this trend is very noticeable and real experts are needed and hard to create. But new problem has emerged in higher education and its name is plagiarism. In order to prevent this one would have to check all students program codes to find similarities. To do this efficiently a procedure has been designed upon which a certain prototype was developed and tested. We discuss efficiency of this solution and we also mention some other methods and algorithms. We also discuss some other possible usages of this solution and we mention further actions and steps in our research.
Detection of Plagiarism in Programming Assignments
IEEE Transactions on Education, 2008
Laboratory work assignments are very important for computer science learning. Over the last 12 years many students have been involved in solving such assignments in the authors' department, having reached a figure of more than 400 students doing the same assignment in the same year. This number of students has required teachers to pay special attention to conceivable plagiarism cases. A plagiarism detection tool has been developed as part of a full toolset for helping in the management of the laboratory work assignments. This tool defines and uses four similarity criteria to measure how similar two assignment implementations are. The paper describes the plagiarism detection tool and the experience of using it over the last 12 years in four different programming assignments, from microprogramming a CPU to system programming in C.
Detection of source code similitude in academic environments
2013
This article presents a proposal for the detection of programming source code similitude in academic environments. The objective of this proposal is to provide support to professors in detecting plagiarism in student homework assignments in introductory computer programming courses. The developed tool, CODESIGHT, is based on a modification of the Greedy String Tiling algorithm. The tool was tested in one theoretical and three real scenarios, obtaining similitude detections for assignments ranging from those that contained code without modifications to assignments containing insertions of procedural instructions inside the main code. The results verified the efficiency of the tool at the first five levels of the plagiarism spectrum for programming code, in addition to supporting suspicions of plagiarism in real scenarios.