Companies choosing to participate in RAMP commit to keep their Compensation Ratio (CR) stable over time. Fairness demands that all companies in the program live by the same standards in the same way and do not seek some competitive advantage by gaming the rules.

The key metric around which RAMP is built is the CR. Hence, it is important to ensure that if a company attempts to game the CR, such an attempt can be immediately detected, and significant penalties applied.

In theory, a company could attempt to game the CR in two basic ways: by trying to hide cuts in compensation to the bottom 90% of workers (B90 workers) while holding the CR stable, or by trying to hide increases in profits or in pay to the top 10% of workers (T10 workers) while holding the CR stable. In any attempt at gaming, whichever way is chosen, a change to the numerator of the CR must be offset by a change to the denominator, or vice versa. For example, for the CR to remain stable, a cut in B90 compensation would need to be offset by showing a decrease in either profits or T10 compensation, or a combination of the two. Hiding some of the profits, for instance, would be an obvious means to then lower compensation to the B90 and keep the CR stable.

For RAMP to be effective, therefore, it must be possible to detect such gaming attempts with a high degree of certainty, sensitivity, and accuracy. In some ways the task is analogous to the challenge of attempting to detect so called ‘exoplanets’ orbiting a distant star. The star emits a constant stream of information (in the form of light). When an orbiting planet passes in front of the star, it briefly disrupts this stream of information. One can use these sudden discontinuities in the stream to detect a phenomenon, in this case an exo-planet, with great accuracy.

Companies also emit streams of information in annual financial statements, tax returns, W-2’s, etc. They employ auditing firms to confirm the accuracy of key numbers such as revenue, profits, cash compensation, and number of employees. By looking at the sequence of variables such as these (quarter-to-quarter, or year-to-year), it is possible to identify important changes in them from one period to the next. We believe that patterns in these four key variables, as well as ratios between them, can be used as ‘sensors’ to detect the occurrence of abnormal changes or discontinuities in them while, despite those changes, the overarching CR remains stable. We recognize that there may be instances where abnormal changes in the sensors even as the CR remains stable could result not from gaming but from normal business events: recessions, loss of a major client, increases in supply costs, etc. Such events can be accounted for by companies’ self-reports of extenuating circumstances relevant to RAMP, just as they would on their tax returns. This helps eliminate any “false positives” from the proposed sensors and allows the focus to be placed on detecting actual gaming.

To explore whether gaming detection would be feasible, we built a gaming detection app using the following nine sensors based on the four variables outlined above: (1) observed versus expected number of employees, based on employment trends; (2) B90 compensation; (3) number of employees in the lowest and highest thirds of the B90, measured by national pay levels; (4) number of employees in the middle and highest thirds of the B90, measured by national pay levels; (5) the ratio of B90 to T10 compensation; (6) the ratio of B90 to T10 employees; (7) total compensation and profits of the company; (8) total compensation relative to profits in the company; and, (9) profits of the company relative to revenue.

The app runs in two parts. In the first part, the application generates every conceivable pair of changes, small and large, in the CR numerator and denominator, and assigns gaming attempts to take place in a particular period of a simulated sequence of 10 years. In the second part (which was blind to the first), the software attempts to detect a game being played and to identify in which year, if any, a gaming-event took place. The detection of a game is based upon observing movement in the individual sensors.  The combined game-generator and game-detector can create and process tens of thousands of sequences, involving any combination and size of shifts in the CR numerator and denominator in any year, including no gaming at all. The core questions are, first, how accurately can games be detected and, second, to what degree can false positives be avoided?

Once the software detects game-events, answering these questions is the purpose of the next step, which measures the accuracy of the detection. In effect, this is like asking ‘what is the batting average’ of the software? In statistics used to determine effectiveness, there are two metrics: sensitivity (the rate of detecting game-events, out of all game-events) and specificity (how many non-game-events are truly non-game-events).

These two metrics can be taken one step further by varying the detection threshold used to detect a game. A game is detected when a year-to-year discontinuity in the sensors exceeds a certain threshold amount. This threshold itself can be varied to determine what threshold range maximizes finding true positives and minimizing false positives. If the threshold is set with a very small range, it becomes like an alarm that goes off with the slightest disturbance, making false alarms more frequent. When thresholds are set to be very large, the alarm seldom goes off, making it possible to miss a lot. What we care about is simultaneously maximizing the true positive rate (detecting a game when it actually is present) while minimizing false positives (minimizing detecting a game when it is not present). We want the probability of detection (sensitivity) to be as high as possible, and the probability of a false alarm (1-specificity) to be as low as possible.

We can apply this method to assess the adequacy of our detection system. It involved running thousands of randomly generated games, assigned to be played in randomly generated years, while varying the detection threshold and tracking and measuring the true positives and false negatives. The results from ten thousand runs show that detection of gaming is indeed not only possible, but achievable with:

1.      Sensitivity and specificity, since at the optimal sensor thresholds the software identifies nearly all true positives while having hardly any false positives, less than 1 in 1000;

2.      Great precision, since the software is capable of identifying very small aberrations in B90 compensation (3% or less of the expected annual B90 raise);

3.      High flexibility, since the application can work equally well across all types of companies regardless of scale or sector type.

The application used here is in a software environment called R that can handle very large data sets. R, and programs like it, are widely used by the federal government to crunch vast amounts of data. R can easily process data from every US employer, if employers were required to submit audited data annually on the four indices: revenue, profits, cash compensation, and number of employees (the last two of which are available from W2 forms).

The implications are that it is possible to detect and prevent gaming strategies effectively, using software already employed by the government to analyze tax returns. Running detection should provide highly accurate results without being costly. Periodic renewal of a company’s RAMP registration could provide an opportunity for companies to pay significant penalties for gaming while remaining in the program or to be dropped from the program, resulting in even more onerous penalties.