FAQ for Sider Scan

What is duplicate code? #

Duplicate code is a pair of matching or similar code fragments in the source code. Duplicates are created when you “copy and paste” existing code for various reasons. Not all code duplicates are a problem, but software with a large number of duplicate code is generally vulnerable to changes and extensions. In order to fix a certain bug, in many cases, it is not enough to fix the code in which the bug appears. Is it necessary to search all duplicate code that have the code as the copy source and also have the same correction in the copy destination? You need to judge. As the size of the software grows, the difficulty of this task increases.

How is the input source code handled? #

The source code entered runs entirely in the user’s local environment, and there are no connections to any external servers. Data will not be exposed.

Is there truly no communication with external servers? #

Sider Scan is a local application where the analysis is completed locally. The input source code group is stored in the local database and the duplicate code detection engine runs inside it. As described above, customer information (your source code, etc) will not be sent to any external server. However, in order to improve the usability of this service itself, we may collect usage data such as which functions are frequently used on the search result screen.

What programming languages can Sider Scan analyze? #

Sider Scan currently supports Java/JavaScript/TypeScript/PHP/C/C++/Swift/Ruby/CUDA. Files with extensions other than c, h, cc, cpp, cxx, hpp, cu, cuh, php, swift, js, jsx, ts, tsx, vue, php, java, rb in the entered directory will not be analyzed. We plan to support Python, C# in the future. If you have other languages ​​you would like supported, please let us know.

It seems that not all files are displayed in the analysis results. Why is this? #

The files displayed in the analysis result screen of Sider Scan are not all the files of the project but all the files in which duplicate code are detected. Also, please note that duplicate code do not necessarily span multiple files, and can exist within a single file.

How do you calculate the ‘importance’ of duplicate code? #

It is heuristically derived and converted into an algorithm based on the analysis of our own open-source projects and user interviews and is not absolute. In addition, it is an index that is still under development and so its definition and associated algorithm may change.

The current version factors in the following in order the calculate the importance index:

  • Number of lines in the duplicate: ​​The number of lines in the code block that was considered a duplicate.
  • Similarity score: This shows how many parts of the logic are the same but the strings are different, such as different names for variables and functions.
  • Same file factor: If the code exists in multiple files, it is deemed more important.
  • Complexity of logic: The greater the complexity in the duplicate portion, the more important it is considered. We do this by analyzing its control structure.

Is there a way to share the analysis results? #

Sider Scan allows users to import/export analysis results through our original file format with file extension RADUMP. Please click on “Save analysis results” and download the RADUMP file. This file can be sent to other users and can be dragged and dropped into the Sider Scan application to display the saved analysis results.