Thursday, October 3, 2019
Clone Detection in Object Oriented Systems
Clone Detection in Object Oriented Systems Program Slicing based Clone Detection in Object Oriented Systems Ishu Singla Rajesh Bhatia AbstractÃ¢â¬â Program slicing is an efficient technique for understanding programs by simplifying them. It is a program analysis technique that extracts a particular set of statements relevant to any computation. For the last 25 years, the technique has found its application in a number of research areas like testing, debugging, maintenance etc. In this paper, we have proposed a method to use this technique for clone detection in object oriented programs. As program slicing concentrates only on the relevant portion of the programs based upon some criteria, this property can be utilized in clone detection process. For this we have used Program Dependency Graphs as an intermediate representation. These PDGÃ¢â¬â¢s are later used to extract isomorphic partial slices and finally these slices are matched to find out potential clones. KeywordsÃ¢â¬â Partial Slices;PDG; Isomorphism. I. Introduction A code clone represents a sequence of statements that are duplicated in multiple locations of a program. Clones often arise in source code as a result of multiple cut paste operations on the source. Thus, Code cloning can be considered as the act of copying code fragments and making minor, non-functional alterations in the implemented code. Code cloning increases the maintenance cost because if there is an error in the code fragment to be copied, then that error will be propagated at different places. Thus, the normal functioning of the system is not affected but further development may become prohibitively expensive . Pre-processing of the whole program is often not a good choice while searching for clones. The program contains a number of irrelevant statements, thus, pre-processing will be a time consuming process . Also the approach for finding clones in procedural oriented and object oriented programs is completely different. Clone detection in object oriented programs has a number of problems  and sometimes follows different approach. Selecting a particular set of statements from a large program that contains statements relevant to a particular computation is called program slicing. Thus, Program Slicing improves program understandability and find its importance in a number of applications such as software maintenance, software debugging, testing etc . A number of code clone detection techniques have been proposed based on text, token, graphs, trees and metrics . Some other techniques based on models and some hybrid techniques have also been proposed . The main advantage of using program slicing is that we can find the non-contiguous, intertwined code clones, where the coder changes some of the statements and the rest of the code remains unchanged in between. II. DEFINITIONS Program slicing was originally introduced by Weiser that defines program slicing as an analysis technique which extracts the elements of a program related to a particular computation. That set of statements collectively called as program slice. Program slices contains that parts of a program that affects the values computed at some point of interest. Program slicing automatically decomposes program by determining the data and control dependencies . A. SLICING CRITERION Slicing in program is always computed on the basis of some slicing criterion. We can represent slicing criterion as , where S is the statement from which the slice is to be computed and V is the variable for which the slice is to be computed and that variable must exist in the statement S . B. DATA DEPENDENCY Statement P is data dependent on statement Q of a program if there exists a variable m at P which is accessed also in statement Q . Consider the following example, 1.x=10; 2.y=x+c; In statement 1, we are assigning a value 10 to x and in statement 2, we are using the value of x. So, there is a data dependency between the two statements 1 and 2. C. CONTROL DEPENDENCY Statement P is control dependent on statement Q if and only if statement P controls the execution of statement Q . Consider the following example, 1.if(statement 1) 2. statement 2; In the above example, statement 2 will be executed if statement 1 results in true value. Thus, statement 2 is control dependent on statement 1. Figure. 1 flow chart for program slicing based clone detection. III. Clone Detection Using the Program Slicing in object oriented programs Figure 1 shows the flow chart for the clone detection approach. The technique starts by taking two sample java programs. Then, the pre-processing of these programs is to be done, in which we remove the comments and blank spaces. Thereafter, the .class files for the normalized sample programs are generated. After this, the Program Dependency Graphs (PDGs), on the basis of control and data dependencies, are determined for the two programs. The program dependency graph is represented in the form of adjacency matrix as shown in figure 2. It is an n*n matrix where n is the no of statements in the normalized program. Every entry Ã¢â¬Ë1Ã¢â¬â¢ represents the data dependency between the two statements determined from the row and column of the matrix. Similarly, every entry 2 represents the control dependency between two statements. Now, by having a close look at the adjacency matrix, it is quite clear that the matrix is sparse because the occurrence of zero is higher than the non-zero entries. So comparing the adjacency matrices of the two programs canÃ¢â¬â¢t be an efficient approach. Thus, an algorithm has been developed that determines the partial slices from the adjacency matrix in the form of lists. In earlier techniques for program slicing, the slicing criterion has to be defined manually to determine the slices. But, in our approach, the program slices are determined automatically on the basis of the mentioned algorithm. Because, the slices are extracted starting from the first statement, using control and data dependencies in the adjacency matrix. Figure 2. Example of Adjacency matrix obtained from programs. A. Algorithm for Program Slicing Input:- A control data dependency adjacency matrix mat[n][n] of size n*n where n is the no of statements. Every entry Ã¢â¬Ë1Ã¢â¬â¢ at index mat[i][j] shows that there is a data dependency between statement i and j and every entry Ã¢â¬Ë2Ã¢â¬â¢ represents the control dependency between statement i and j. Output:- Partial slices in form of lists The partial slices are extracted from the adjacency matrix, which are in the form of lists. Once, the partial slices for the two java programs are determined, we have to match them using an efficient matching algorithm. If there is cloning among the two source codes, then there must be a match between these partial slices. The matching algorithm will find out the extent of cloning between the two programs by comparing the partial slices and finally return percentage of cloning as result. IV. Related Work In last two decades, various algorithms have been proposed for program slicing. All have its own advantages and shortcomings. In next section, an overview of recent research in the area of program slicing is given. Z. Guangquan et. al proposed a method to slice the concurrent object oriented programs. In this approach the java concurrency model is used and dependencies between the statements are defined. The paper presents the method of extracting slicing criterion from linear temporal logic property and proposes the steps of computing slicing. Multithreaded dependency graph is used for intermediate representation. A Two-pass algorithm based on Variable Cache Table is adapted to compute slices by extracting out the irrelevant portions of the programs. Results show the satisfaction is guaranteed for source and sliced program and the method can be easily extended to handle other concurrency models. R. Komondoor et. al. proposed a tool to detect clones in C fragments. In their approach, they used program dependence graphs and program slicing to find isomorphic PDG subgraphs. These subgraphs can be represented as clones. This tool is capable of finding non-continuous clones, intertwined clones and clones in which different variable names are used and statements have been reordered. The approach has been applied for the procedural oriented programs and finds many variants of ideal clones. A number of test cases demonstrating the application of approach on large programs have been shown . A. Surendran et. al. proposed a partial slicing approach as an effective method of program testing. Partial slices are formed from the combination of static slices and program points. In some cases static slices contains large number of program statements which are of little use in many practical applications. Partial slicing removes the disadvantage of large size of static slices. In their approach they use only static slices for the algorithm as static slices give all possible execution paths. As compared to original program there is a significant reduction in the number of statements in static slices using partial slicing. Using the constraints of partial slicing program testing is also simplified. This approach can also be used in debugging, maintenance and finding clones . D. Liang et. al. presented system dependence graph for object-oriented softwareÃ¢â¬â¢s. They have shown that their approach is more precise than previous approaches and is more efficient to construct. It distinguishes data members that fit for different objects. It provides a way to represent data members that act as parameters and the effects of polymorphism on parameters and parameter bindings. It presents a concept of object slicing which helps in examine the statements in slice object by object. Object slicing is good technique for debugging and analysis of large scale programs. In their work an efficient mechanism is also provided to represent incomplete programs and to represent classes in class libraries . T. Ishio et. al. proposed a program debugging tool. In their approach they proposed dynamic slicing to efficiently localize faults in procedural oriented and object oriented programs. Aspect-oriented programming is used for collecting dynamic information in program slicing calculation. The dynamic data dependence analysis aspect can be woven into various object-oriented programs without changes as the point cuts of the aspect in the approach is made in a generic form. With the help of dynamic program analysis module, a DC slice calculation system is developed. It improves maintainability and reusability of the module. The approach has also a restriction that it does not allow to analyze the local variables and local control structures. The benefits, usability and cost effectiveness of module show that it is a good tool for debugging . B. Korel et. al. presents the concept of program slicing on the module level which helps in better understanding of program slices of large programs. In this paper on call graph level, execution level and module trace level several static and dynamic program slicing features are proposed. These features can also be used during software maintenance. The concept of static and dynamic program slicing is combined with different methods of visualization which helps in understanding the program. Experiment results show that it helps the process of understanding program . V. CONCLUSION AND FUTURE WORK This paper provides a technique for detecting code clones in object oriented programs. For this purpose, program slicing is used as the base methodology. The algorithm uses PDGs as the intermediate representations for the source program. The PDG is represented in the form of adjacency matrix. Partial slices are extracted from the adjacency matrix and those slices are matched for possible clones. Result shows that program slicing is an efficient way for understanding programs and finding non-contiguous clones and intertwined code clones. The approach uses the control and data dependencies to find out adjacency matrix representation for the PDG. The whole process is automated where the user has to interact only once to input the programs for finding clones. Future work involves taking into consideration all the object oriented paradigm. It includes the object oriented programming features such as abstraction, encapsulation, inheritance, and polymorphism. An efficient algorithm for matching partial slices is also to be developed. REFERENCES  Dhavleesh Rattan, Rajesh Bhatia, Maninder Singh, Ã¢â¬Å"Software clone detection: a systematic review,Ã¢â¬ Information and software technology, Vol. 55, No. 7, pp. 1165-1199, 2013.  C. K. Roy, J.R. Cordy and R. Koschke, Ã¢â¬Å"Comparison and evaluation of code clone detection techniques and tools: A qualitative approach,Ã¢â¬ Science of computer programming, Vol. 74, No. 7, pp. 470-495, 2009.  F. Tip, Ã¢â¬Å"A Survey of Program Slicing TechniquesÃ¢â¬ , Journal of Programming Languages, 1995, vol. 3, no. 3,pp. 121-189.  R. Komondoor,S. Horwitz, Using Slicing to Identify Duplication in Source Code, Proceedings of the 8th International Symposium on Static Analysis, 2001.  Yingzhou Zhang, Baowen Xu, Jose Emilio, Labra Gayo, A Formal Method for Program Slicing, Proceedings of the 2005 Australian Software Engineering Conference (ASWECÃ¢â¬â¢05) 1530-0803/05.  Jens Krinke, Advanced Slicing of Sequential and Concurrent Programs, Proceedings of the 20th IEEE International Conference on Software Mai1ntenance (ICSMÃ¢â¬â¢04) 1063-6773/04,2004.  Z. Guangquan, R. Mei, An Approach of Concurrent Object-oriented Program Slicing Base on LTL Property, 2008 IEEE International Conference on Computer Science and Software Engineering,DOI 10.1109/CSSE.2008.1283.  M. Weiser, Program slicing, IEEE Transactions on Software Engineering, 10(4):352Ã¢â¬â357, 1984.  Dhavleesh Rattan, Rajesh Bhatia, Maninder Singh, Ã¢â¬Å"Model clone detection based on tree comparison,Ã¢â¬ India conference (INDICON), IEEE, pp. 1041 Ã¢â¬â 1046, 2012  A. Surendran, P. Samuel, Partial Slices in Program Testing,2012 IEEE 35th Software Engineering Workshop.  Yogita Sharma, Rajesh Bhatia, Raj Kumar Tekchandani, Ã¢â¬Å"Hybrid technique for object oriented software clone detection,Ã¢â¬ ME thesis submitted at Thapar University, Patiala, 2011  D. Liang, M. Harrold, Slicing Objects Using System Dependence Graph, IEEE International Conference on Software Maintenance,Washington, D.C., November 1998.  T. Ishio, S. Kusumoto,K. Inoue, Program Slicing Tool for Effective Software Evolution Using Aspect-Oriented Technique, Proceedings of the Sixth International Workshop on Principles of Software Evolution, 2002 IEEE.  B. Korel, J. Rilling, Program Slicing in Understanding of Large Programs, Program Comprehension, 1998. IWPC 98. Proceedings., 6th International Workshop.  S. Khalsa, R. Bhatia,J. Chhabra, M. Singh, A Review of Coupling and Cohesion Measurement in OO Systems Using Program Slicing, ICISTM 2012, CCIS 285, pp.199-210,Springer, 2012.