A Retrospective Analysis Comparing the New Standardized Letter of Recommendation in Dermatology with the Classic Narrative Letter of Recommendation

| September 1, 2016

aJessica A. Kaffenberger, MD; bJoy Mosser, MD; aGrace Lee, MD; aLlana Pootrakul, MD, PhD; bKatya Harfmann, MD; aStephanie Fabbro, MD; bEsteban Fernandez Faith, MD; aDavid Carr, MD; aAlisha Plotner, MD; aMatthew Zirwas, MD; aBenjamin H. Kaffenberger, MD

aDepartment of Internal Medicine, Division of Dermatology, The Ohio State University Wexner Medical Center, Columbus, Ohio; bDepartment of Pediatrics, Division of Pediatric Dermatology, The Ohio State University and Nationwide Children’s Hospital, Columbus, Ohio

Disclosure: This article has no funding source. The authors report no relevant conflicts of interest. IRB from The Ohio State University Wexner Medical Center was approved for this study.


Background: In an effort to avoid numerous problems associated with narrative letters of recommendation, a dermatology standardized letter of recommendation was utilized in the 2014–2015 resident application cycle. Objective: A comparison of the standardized letter of recommendation and narrative letters of recommendation from a single institution and application cycle to determine if the standardized letter of recommendation met its original goals of efficiency, applicant stratification, and validity. Methods: Eight dermatologists assessed all standardized letters of recommendation/narrative letters of recommendation pairs received during the 2014–2015 application cycle. Five readers repeated the analysis two months later. Each letter of recommendation was evaluated based on a seven question survey. Letter analysis and survey completion for each letter was timed. Results: Compared to the narrative letters of recommendation, the standardized letter of recommendation is easier to interpret (p<0.0001), has less exaggeration of applicants’ positive traits (p<0.001), and has higher inter-rater and intrarater reliability for determining applicant traits including personality, reliability, work-ethic, and global score. Standardized letters of recommendation are also faster to interpret (p<0.0001) and provide more information about the writer’s background or writer-applicant relationship than narrative letters of recommendation (p<0.001). Limitations: This study was completed at a single institution. Conclusions: The standardized letter of recommendation appears to be meeting its initial goals of 1) efficiency, 2) applicant stratification, and 3) validity. (J Clin Aesthet Dermatol. 2016;9(9):36–42.)

Letters of recommendation (LOR) are an important component of the dermatology residency application and can provide valuable information regarding personality, reliability, and work ethic, which are not explicitly addressed elsewhere in the application. Unfortunately, despite their importance in resident selection, LOR may be difficult to interpret1,2 and several studies question their utility.[1],[3–7] Narrative letters of recommendation (NLOR) are often excessively flattering[3–5] with a complicated hierarchy of laudatory phrases[4] or “code words”/expressions.[8],[9] Additionally, they can be redundant, simply reciting other parts of the application.[2] Finally, NLOR have been found to lack clarity[1],[5] and have demonstrated low reliability between interpreting faculty members.6 In an effort to improve the LOR process, the authors created a standardized letter of recommendation (SLOR), which was accepted by the Association of Professors of Dermatology (APD) for use along with the traditional narrative letter of recommendation in the 2014–2015 application cycle. The dermatology SLOR is based on currently used models of SLOR from emergency medicine, otolaryngology, plastic surgery, and orthopedic surgery. The dermatology SLOR also incorporates qualities that are deemed important by members of the APD,[2] such as the letter writer’s background and specific applicant traits (e.g., personality, work ethic and reliability). The SLOR strives to improve the NLOR in the following three primary areas: 1) efficiency— decreased time and effort needed for interpretation, 2) stratification— improved ability to discern the quality of an applicant, and 3) validity—high consistency in inter-rater interpretation and intrarater reliability. Since the SLOR is a new addition to the dermatology application process, it is essential to evaluate if it has met its initial goals. The primary aim of this study is to compare the SLOR and NLOR to determine if the SLOR meets these three criteria. A secondary goal is to assess the amount of information provided by each type of LOR.


Study design. This was a retrospective examination of letters of recommendation from the 2014–2015 cycle. All letters of recommendation received by the Ohio State Division of Dermatology for the 2014–2015 cycle were included provided there was at least one narrative and one standardized letter for the same applicant. All letters were de-identified by a physician not involved in the interpretation and the pair of SLOR and NLOR was assigned a number. There were 46 pairs of SLORs/NLORs, however two pairs written by the same author were excluded due to excessively generic narrative letters (serving only as a supplement). Six dermatology faculty members and two dermatology chief residents completed the LOR interpretations. Each interpreter was assigned a number to ensure anonymity. The interpreter evaluated consecutively all letters of one type before evaluating the corresponding letters in the other format. The order of evaluation was randomly assigned.

Each LOR was evaluated based on a seven question survey, which was developed based on the goals of the SLOR and previous literature10,11 (see Supplement 1, Supplement 1 Part 2 and Supplement 2 , Supplement 2 Part 2). Each reader evaluated the LOR by determining how strongly he/she would rank the applicant based on the LOR, the difficulty or ease in determining the strength of the LOR, the amount of time and effort required to evaluate the letter, and the perceived accuracy versus exaggeration of the letter. All survey questions were based on a seven rung Likert scale as utilized in previous literature.[10],[11] Letter analysis and survey completion was timed for each letter. To determine intrarater reliability, five readers repeated the complete analysis of all LORs two months after completing the original evaluation.

Statistical analysis. Time and ease of interpretation. This was performed by consecutively subtracting timestamps at the completion of each letter. The interpreters were given strict instructions to complete letters consecutively and if a stoppage was needed, to make a false entry to restart the timestamps. The initial letter was thus not evaluable. Distributions were then analyzed for each letter/interpreter combination. Outliers based on time of completion were removed from the data to prevent unavoidable distractions from affecting the overall results. The times to interpret and answer letters were then analyzed based on using a paired two-tailed t-test. A Spearman ? was performed to evaluate the subjective effort to interpret letter with the actual time involved. A ?2 analysis was performed to evaluate differences in ordinal values in determining strength of recommendation, overall ease of letter interpretation, and perception of letter inflation. Reliability analysis. Inter-rater reliability was analyzed using the Kendall coefficient of concordance calculated with ties (W). These were calculated separately for each question. Intrarater reliability was analyzed using the Spearman rank correlation coefficent (?) between interpretations and the mean of the 5 was reported.


Time and ease of interpretation. There was a statistically reliable difference in time to interpret the corresponding letters (Figure 1). On average, NLOR required more than two minutes to interpret compared to one minute for the corresponding SLOR (p<0.0001). Subjective effort to complete the interpretations had a linear correlation with the actual time spent in evaluation and was significant both by analysis of variance (ANOVA) and a Spearman ? =0.688, p<0.0001. A ?2 analysis indicated the SLOR was more easily evaluated in comparison to the NLOR (p<0.0001). Similarly, the SLOR was more easily interpreted from a global standpoint (p<0.0001), and the applicants’ positive traits were felt to be less exaggerated (p<0.0001).

Reliability analysis. The inter-rater reliability testing was generally higher within standardized letters of recommendation compared to the narrative letters by the Kendall W and similar findings were present when analyzing the intrarater reliability utilizing the averaged Spearman ? (Table 1 ).

Information provided. There was a statistically reliable difference in amount of information regarding the writer’s background or the relationship of the writer to the applicant. On average, SLOR had 4.4 pieces of information while NLOR had 2.3 pieces of information regarding the writer’s background or writer-applicant relationship (p<0.0001).


Letters of recommendation are deemed an important part of the application process[1],[12–14]; however, research suggests numerous problems are associated with the NLORs. In light of these NLOR weaknesses, nearly 80 percent of a group of academic dermatologists were in support or possibly in support of developing a SLOR.[2]

The authors’ study indicates that the new dermatology SLOR is significantly more efficient than the NLOR. Each fall, dermatology programs are inundated with hundreds of applications. Saving over a minute of time per each letter has the potential to save evaluators hours of time. This mirrors the efficacy demonstrated in evaluating the emergency medicine (EM) SLOR10 and otolaryngology SLOR.[11] Additionally, although not addressed in the authors’ study, research has shown that SLOR also have a shorter composition time compared to NLOR,[11] which would further improve the efficiency of the application process.

The authors’ study suggests that reviewers deem applicants positive traits to be less exaggerated with the NLOR than the SLOR. Academic dermatologists are often unwilling to address an applicants’ negative qualities,2 and previous research has found that NLOR contain only positive feedback about an applicant.[8] Therefore, the authors postulate that NLOR may seem more laudatory because the writer dwells only on the applicants’ positive traits, while weaknesses are ignored. In contrast, the SLOR forces the writer to evaluate the entire applicant. However, the authors understand the potential to exaggerate an applicant’s positive traits on a SLOR still exists and research on the EM SLOR demonstrates that grade inflation is a problem.[15],[16] Interestingly, the EM task force suggests that grade inflation could be minimized if the word limit on the small narrative section of the EM SLOR was followed.[15] The goal of the written section on the EM SLOR is to concisely address any superlative or negative ratings in the application. Unfortunately, with nearly half of EM SLOR having written comments over the 200-word-limit, the current status of this section falls prey to the same problems of NLOR.[15] Given this research, the dermatology SLOR was structured to include only a small written section; however, the impact of this section on letter interpretation is yet to be determined.

The authors also found that SLOR have higher inter-rater and intrarater reliability than NLOR, which is consistent with previous research.[10],[11] NLOR have worse reliability likely because of varying interpretations of select phrases and “code words” (e.g., “excellent” versus “outstanding” versus “superb”).[8],[9] Additionally, research has demonstrated significant differences between female and male letter writers. Female letter writers are more likely to focus on the applicant’s compassion and ability to work as part of a team.[8] The authors also postulate that readers get fatigued after reading large numbers of lengthy NLOR and this could affect letter interpretation.

Lastly, previous research indicates that the background of the letter writer as well as the length and type of relationship between the writer and applicant is extremely important in evaluating LOR.[2] Surprisingly, NLOR contain only an average of 2.3 pieces of information regarding the background of the letter writer or about the writer-applicant relationship. It is plausible that writers who do not know the applicant well forgo including information about the writer-applicant connection so attention is not drawn to the weak relationship. In contrast, the SLOR’s structured questions provide the reader more information about the writer-applicant relationship and writer’s background, which provides a better framework to understand the recommendation.

Limitations. The authors’ study has some limitations. First, this study was conducted at a single institution with a small number of faculty and therefore may not represent all dermatology programs. Secondly, this study incorporates data from highly experienced letter readers (more than 10 years of experience) to less experienced readers (first year of experience). Although this diversity of experience may mimic the reality in many programs, it may affect how much time was spent on the letters and how they were analyzed. Finally, a limitation of the SLOR itself is that many programs may be reluctant to implement this format. Because the SLOR requires evaluation of numerous qualities, it may hinder programs’ weaker applicants.


Based on this study, the SLOR appears to be meeting its initial goals of 1) efficiency, 2) ability to stratify applicants, and 3) validity. However, given that this is a small study, larger studies will be needed to further analyze the SLOR as well as the impact it has on the dermatology application process.


