How Exams Are Scored

The American Board of Internal Medicine (ABIM) provides clear, detailed score reports to help physicians interpret their assessment results, identify strengths and weaknesses, and support continued learning and improvement.

A Message Regarding Changes in Medicine and Test Questions

The American Board of Internal Medicine (ABIM) is aware that, on occasion, for a small number of questions, changes in medicine (e.g., new practice guidelines) occur late in the examination publishing process and may alter what was previously the correct answer.

Do your best to answer all questions according to your understanding of current clinical principles and practice.

If ABIM determines that what was designed to be the correct answer has been changed by new information and there is no longer a single best response, this question will not be counted in the overall score.

ABIM distributes assessment results in an electronic format. This score report was redesigned in collaboration with ABIM Board Certified physicians across various specialties. The result is a user-friendly report that provides a detailed description of assessment performance in a timely manner.

ABIM uses an Automated Test Assembly (ATA) program to build its assessments. This program ensures a fair balance of content on each examination form, so that each form reflects the distribution of the items according to the blueprint and other specific content criteria. ATA also uses statistical criteria to ensure that examination forms are comparably constructed with difficulty and other statistical constraints. The examination forms are built with items that best meet the content and statistical criteria.

Your performance on the entire examination determines your examination pass/fail decision. Overall performance is reported on a standardized score scale ranging from 200 to 800, with a mean of 500, for traditional assessments. Candidates with equal ability will achieve the same standardized score.

To pass the examination, your standardized score must equal or exceed the standardized passing score. The passing score, or standard, for each ABIM assessment is established using standard-setting techniques that follow best practices in assessment. The standard for an assessment is based on a specified level of mastery of content in the specialty area. Therefore, no predetermined percentage of examinees will pass or fail the assessment.

ABIM uses the Angoff method to gather insight from practicing physicians about what an assessment's standard should be. This evidence-based method asks the physician content experts to conceptualize and estimate what a specialist who is just barely qualified to merit or maintain certification would be able to do. For each test question, the content expert is asked, “What is the probability that this type of physician will correctly answer this question?” These judgments are systematically combined to derive a content-based recommended passing standard. Learn more about the standard-setting process.

Informed by the results of the Angoff method, the standard for each assessment is set by the designated ABIM Specialty Board or Advisory Committee. Members of the Specialty Boards and Advisory Committees are nationally recognized specialists whose combined expertise encompasses the breadth of clinical knowledge in the specialty area. Physician members include both clinical educators and practitioners, incorporating the perspectives of both the training and practice environments. While the majority of members are practicing physicians, Specialty Boards and Advisory Committees also include interprofessional healthcare team members as well as patient representatives. In setting the passing standard, the Specialty Board or Approval Committee members consider several factors, including relevant changes to the knowledge base of the field as well as changes in the characteristics of minimally qualified candidates for certification.

Following best practices in assessment, standards are periodically reviewed and updated using the process described above. This allows ABIM to ensure that standards reflect appropriate and current expectations for examinee performance in a given discipline.

The ABIM score report includes reference group information to help physicians interpret their results. The reference group is defined as the group of first-time examinees who completed a similar assessment during the current or a previous administration. Typically, the reference group on the ABIM score reports includes first-time takers of the assessment from the current administration as well as recent prior administrations.

The rationale for including a reference group is to provide stability when making comparisons with the performance of other examinees. Since the number of first-time takers completing the assessment during a given administration may be small, the reference group comprising first-time takers from multiple administrations is used in the score report in order to compare your performance with a more representative cohort. View pass rates.

Content area scores provide feedback on your relative strengths and weaknesses in the content domains of your specialty. They are reported in standard deviation units and are on a different scale than your overall score. Therefore, these scores cannot be compared directly to your overall score.

Content area scores are calculated from fewer questions than the overall score so they are less exact or reproducible. The lower reproducibility limits the degree to which you can generalize from your performance on a content area to your specific strengths and weaknesses. Therefore, the standard error of measurement for the medical content areas is much larger than the overall test score. For these reasons, you should be cautious in interpreting the content area scores that appear in your report.

Due to the fact that each content area has fewer questions, the classical percent correct method is not considered for reporting performance of content areas. Instead, these content area scores are calculated using the Empirical Bayes method. This method is consistent with the procedures currently in place for estimating overall scores. This method yields more reliable scores than the classical percent correct method. The method incorporates ancillary information to enhance score precision—something that the percent correct method does not do. This has the effect of making each candidate's content area score profile more homogenous and less susceptible to irregularities associated with small numbers of items. In addition, the scaled scores resulting from the Empirical Bayes procedure are not as test- and sample-dependent as percent correct scoring and deciles. Percent correct scores are dependent on the specific items that were administered, and deciles are dependent on the group that was administered the assessment. Although the EB procedures do rely on subscore reliability estimates and inter-subdomain correlation estimates—which are dependent, in part, on the exam and the administration group—the EB scores are not as test- or sample-dependent as the classical methods.

An assessment blueprint is a table of specifications that defines the content of each assessment. It is developed by the Approval Committee and reviewed annually. The assessment blueprint is based on analyses of current practices and an understanding of the relative importance of the clinical problems in the specialty area. The assessment blueprints at the primary level for internal medicine and each subspecialty are published on the ABIM website. Select an assessment blueprint for your specialty.