Keeping Performance Reviews Honest: Lessons from the Marines

Performance evaluations are not known for objectivity and consistency. An analysis of U.S. Marine Corps FITREPs shows how people analytics can help.

“The completed fitness report is the most important information component in manpower management. It is the primary means of evaluating a Marine’s performance and is the Commandant’s primary tool for the selection of personnel…”

Those are the very first sentences at the top of the first page of the U.S. Marine Corps’ Fitness Report (FITREP)—what the Marines call performance review sheets. I’m sure we have all heard similar statements regarding the importance of similar means of performance evaluation in our respective organizations.

However, subjectivity is an ever-present concern in these processes. Relying on submissions from individual leaders, these reviews are not always calibrated and verified, despite the weight they hold in personnel decisions.

A 2017 Naval Postgraduate School thesis by Philipp E. D. Rigaut, analyzing more than 70,000 Marine Corps FITREPs from 2007-2016, shows what in my view could be two opportunities for people analytics tools to fill this subjectivity gap.

 

1. Maintain grade distributions

Many of us are familiar with various forms of forced-curve rankings. Coming from consulting, I had a high degree of confidence with the integrity of these rankings, because a board called Career Development Council convened to re-evaluate submissions from individual Project Leaders to ensure grades follow proper distributions.

However, this is a process that takes a healthy amount of time for debate, and was possible given the relatively small number of people being evaluated. For a roughly 200,000-strong organization like the Marine Corps, such processes may not be practical.

As a 2012 analysis by Clemens et al. (discussed in Maj. Rigaut’s thesis) showed, this resulted in deviations from intended distribution.

The Corps uses an 8-point grade system intended to follow a ‘Christmas Tree’ distribution pattern, but the reality was different. Second Lieutenants’ grades were closer to be normally distributed, while distributions for Lieutenant Colonels were left-skewed.

This is where people analytics can step in. Imagine for a moment that the right-hand side picture above is not a one-time, static analysis, but continuously-tracked. As leaders fill their subordinates’ FITREPs, they can see their grade distributions real-time and receive nudges when they deviate.

Will this completely solve grade inflation and distribution issues? I’d say it’s unlikely, but perhaps it would reduce the number of problem cases to a manageable level—maybe to a point where the board deliberation model discussed above would work for even organizations as large as the Marine Corps.

 

2. Increase consistency of text commentary

Performance review grades are often accompanied by text to justify them, but it’s rarely regarded as the most objective section in the document.

It’s no secret that certain well-sounding phrases could be codes for red flags, adding opaqueness to the process. Army Col. (Ret.) Steve Leonard, known as the Doctrine Man in social media circles, offers a few excellent, if colorful, examples:

Maj. Rigaut’s findings again show what looks like an opportunity for people analytics to me. Using NLP and an ensemble of predictive modeling techniques, he was able to correctly classify officers’ performance tiers—at a rate of 67%!—using the text comment sections of the FITREPs analyzed.

Some ‘power words’ were consistently associated with distinct tiers, suggesting a shared understanding perhaps through the strong culture enjoyed by the Corps, such as the example below for Captains:

However, it is far from perfect. The analysis also showed, among other examples, that comments like “#1 Officer” or “Best Capt” were shared by 61% of officers in the same grade.

Again, this is where I believe people analytics can help. A tool embedded in FITREP platforms can give nudges and prompts when leaders use certain words inconsistent with their gradings and ratings (something like “The word ‘meritorious’ is only used to describe the performance of officers receiving grades 7 or above in 80% of cases.”).

I believe live, analytics-enabled feedback like these would go a long way in promoting consistency and usefulness in evaluations, even more than traditional training programs or doctrine materials could.

______

Reference:

Rigaut, P., 2017. A Text Analysis of the Marine Corps Fitness Report. [ebook] Monterey, California: Naval Postgraduate School. Available at: <https://apps.dtic.mil/dtic/tr/fulltext/u2/1046515.pdf> [Accessed 12 April 2021].

Previous:

Predicting Crime

Next:

Unionization at Amazon and the future of employee surveillance

6 thoughts on “Keeping Performance Reviews Honest: Lessons from the Marines

  1. Incredibly interesting that NLP can effectively identify the coded language that corresponds to different levels of ‘real’ performance. I guess the only risk here is whether sharing that data and nudging people in specific directions contaminates the raw data, making the analysis itself less helpful (and eventually even restrictive of which words should belong in which category of review).

    I don’t have an answer, but I think how you use the data and stop it being gamed is a consistently interesting challenge with this kind of analytics!

  2. The article made me realize that the military setting could actually be very suitable for developing new people analytics practices, as well as for establishing rules/frameworks around fair data usage and privacy.

    On the innovation side, there is potential to create and improve models that could eventually be adapted to the private setting. Performance inputs and outputs are relatively standardized, while operational processes are well structured. Furthermore, the inherent hierarchy and organizational structure is an ideal setting to study and understand interpersonal networks and group dynamics.

    The advantages from a governance perspective come from the inherent rigor and discipline of military organizations. Any kind of data is handled with care, outside vendors are strictly vetted, and changes in processes are codified and documented in great detail. Furthermore, as most military personnel are expected to (and accept to) be serving the organization beyond typical private sector working hours, there is less controversy in personal data collection, with the assumption that are used to improve overall outcomes – rather than being profit oriented.

  3. While I agree the Marine Corps and broader military are fertile ground for people analytics, I think one other consideration is how people “game” the evaluation process. For example, I observed commanders rank their soldiers lower immediately following a promotion because the promotion board does not weigh evaluations heavily that are well before the next promotion cycle. If you look at FITREP scores, you will notice that officers’ rankings gradually improve the closer they get to the promotion window. Since promotions occur at predefined intervals, senior officers know how to manipulate the 8-point system to ensure their soldiers get promoted. The gaming causes mediocre officers and soldiers to continue getting promoted and high performers to exit the military because going above and beyond is not recognized consistently in performance rankings.

    People analytics could help identify some of the gaming and push commanders to give honest performance assessments. I look forward to seeing whether the military decides to change its evaluation and promotion process to limit “gaming” the system.

    1. I wonder if people analytics solves this problem? My hypothesis is that we need to incorporate human judgement to identify gaming (any new rule is subject to gaming too!)

  4. I think it is promising as to how people analytics can help, at a rate of 67%, though I wonder how that number can be increased by machine learning as to power words commonly associated with particular behaviours. Of particular interest to me, is the embedded words, which I think would rely on more technical computer skills. I wonder why the Christmas tree distribution was chosen, as it is very interesting.

  5. Incredibly interesting topics as well as commentary, thanks for sharing!
    “ten cringe-worthy evaluation phrases” is super interesting to go through, and strongly reminded my of how difficult it is to objectively understand these comments without having biases…

    It is also interesting to see the “power words” list, especially that “eager”, “good”, “wingman”, “top” are all in the bottom category indicators (kind of make sense… but I’m pretty sure that analytics can do much better job than human beings on analyzing these objectively in way more efficient manner).

    Also wondering if there could be any way to incorporate this type of machine learning analysis to the consulting evaluation process. Tons of calibration work are needed and yet many complain about the outcomes. Since more and more consulting firms have stated focusing on DX as their client services, I hope these firms also apply people analytics in their HR process.

Leave a comment