Adjust the number of notifications by “priority score”.

What is “Priority Score”?

Sider Scan detects inconsistencies in the code between the detected clone pairs and notifies you of this as problematic code. The problematic code is code that may be a bug due to missing changes, or code that is not a bug but should be fixed for readability reasons.

In addition, Sider Scan evaluates the detected problematic codes one by one from various perspectives, and calculates a “priority score”. The priority score is ranked as High, Mid, and Low. The code that needs to be checked has a high priority score because it is likely to have serious problems such as bugs. We keep improving the algorithm for evaluating the priority score. For more details on the algorithm, please refer to this section.

Example of a notification with a priority score of “MID

How to adjust the number of notifications

A problematic code with an extremely low priority score is likely to be false positives, and therefore will not be notified to the user. However, the priority score can vary depending on the programming language used, the amount of duplicate code inside the repository, or even the type of software (application, library, framework, etc.). As mentioned earlier, the score evaluation algorithm is improving day by day, but it is not so easy to reduce the number of false positives and adjust the amount of notifications to be appropriate for all repositories, and some repositories may have too many notifications.

In that case, it is possible to reduce the number of notifications, using the priority score as a threshold. The setting is enabled by .siderscan.json file. In the .siderscan.json file, set a notificationThreshold object as a child object of the report object. The key and value are follows:

  • Key: notificationThreshold
  • Value: “high”, “mid”, “low”
  • Default: “mid”

The meaning of the values are:

high: Notify the code with a High priority score only
mid: Notify the code with Mid and High priority score
low: Notify all

The following JSON code is an example of setting up notification only for a high priority score.

{
  "report": {
    "notificationThreshold": "high",
    "mail": {
      "to": ["alice@siderlabs.com", "bob@siderlabs.com", "carol@siderlabs.com"],
      "useBuiltInProvider": true
    }
  }
}

Priority Score Evaluation Algorithm

The algorithm for evaluating the priority score is a Sider Scan original one, combining more than 15 heuristic rules derived from analysis of numerous open source software projects and actual repositories of our customers, and a noise filter developed using machine learning techniques to reduce false positives.

The algorithm is updated with each release, but here is an example of the metrics used in the current evaluation algorithm:

  • Whether the pointed out word is a literal token or not
  • Frequency of occurrence of the pointed out word (variable, function name, constant, argument, etc.) in the duplicate code
  • Number of pointed out words in the same duplicate code
  • Ratio of the number of words used in the pattern dictionary to the number of words that violate the pattern
  • Scope of pointed words
  • Whether the code is automatically generated or not

The priorities may be changed in the future due to the addition or modification of calculation algorithms, or due to changes in the weights of already existing factors.