How to view a duplicate code pair

In this screen, Sider Scan displays side-by-side duplicate code pairs. Sider Scan detects not only exact duplicate codes where the text is identical word for word, but also similar codes where variable names, function names, and variable types are partially changed, and codes that are inserted or deleted line by line, as long as the logic is almost identical.

Screen shot of a duplicate code pair

In this screenshot:

  • The pink background (red font) indicates an exact match.
  • The gray background (black font) indicates lines where the duplicate pairs differ.
  • A line with a gray background and a word highlighted in pink and green is the word that differs between the left and right hand side duplicate code pairs.

In the above example, the entire line shown is a duplicate code block, and the part of the code that differs between the left and right code is lines 65 and 68 of the left code, where OrcStruct and Group are different words, respectively. In addition, only in the left code, there are multiple lines in lines 76 to 79 that do not exist in the left code.

In addition, Sider Scan evaluates each duplicate code pair and calculates an “importance” score. It is heuristically derived and converted into an algorithm based on the analysis of our own open-source projects and user interviews and is not absolute. In addition, it is an index that is still under development and so its definition and associated algorithm may change.

The current version factors in the following in order the calculate the importance index:

  • Number of lines in the duplicate: ​​The number of lines in the code block that was considered a duplicate.
  • Similarity score: This shows how many parts of the logic are the same but the strings are different, such as different names for variables and functions.
  • Same file factor: If the code exists in multiple files, it is deemed more important.
  • Complexity of logic: The greater the complexity in the duplicate portion, the more important it is considered. We do this by analyzing its control structure.

In addition, the full path of the source code is displayed in the title section of the left and right codes. You can also view the code before and after this duplicate code block by clicking on the “View details on CI server” button attached to the analysis result e-mail.