Sharing Results
Career and college development and counseling have a rich history of utilizing various career assessment tools. These include interest inventories, values inventories, skills inventories, self-efficacy assessments, aptitude tests, and job skill evaluations, such as the Food Handler’s License and Commercial Driver’s License.
High school counseling teams have long integrated career assessments into their comprehensive school counseling programs. They have administered tools like the General Aptitude Test Battery, Differential Aptitude Test, PSAT, and SAT or ACT to students who registered for them. Additionally, the Armed Services Vocational Aptitude Test Battery is available for students who opt to take it. Many counselors and ASPIRE volunteers also utilize the Oregon Career Information System in various schools. Recently, new assessment tools like YouScience, along with career platforms that include assessments such as Xello, Naviance, MajorClarity, MaiaLearning, and SchoolLinks, have become available.
We have already addressed guidelines for the equitable and ethical use of assessment instruments and a guide to understanding interest inventories and aptitude assessment. This section will focus on sharing career assessment information with learners, parents, caregivers, and staff. Please note that this is a condensed version; fully grasping the rationale behind our instructions requires a more in-depth exploration of how assessment instruments are normed and standardized, along with a thorough understanding of the concepts of reliability and validity.
We need to start this discussion by discussing the Norming of Assessment Instruments. A raw score on most psychological tests is meaningless. To say that an individual has correctly solved 15 problems on a mathematical reasoning test, or identified 34 words in a vocabulary test, or successfully assembled a mechanical object in 57 seconds conveys little or no information about the learner's standing in any of these functions. Nor do the familiar percentage scores provide a satisfactory solution to the problem of interpreting test scores. For example, a score of 65% correct on one vocabulary test might be equivalent to 30% correct on another and 80% correct on a third. The difficulty level of the items making up each test will, of course, determine the different meaning of the score. Like all raw scores, percentage scores can be interpreted only in terms of a clearly defined and uniform frame of reference.
Scores on career assessments are typically interpreted by comparing them to norms derived from the performance of a representative norming sample. These norms are established through empirical data that reflect how individuals in the sample perform on the test. An individual's raw score is then compared to this distribution to determine their position within it. Does their score align with the average performance of the norming group? Is it slightly below average, or does it fall closer to the upper end of the distribution?
In order to ascertain more precisely the individual's exact position with reference to the norming sample, the raw score is converted into some relative measure. These derived scores are designed to serve a dual purpose:
They indicate the individual's relative standing in the normative sample and thus permit an evaluation of their performance in reference to other persons.
They provide comparable measures that permit a direct comparison of the individual's performance on different tests.
If your learners' performance is going to be compared to that of other learners in the norming group, then you are going to want to know if the norming group is representative of the learners at your school. This is the age-old adage that you are to compare apples to apples and oranges to oranges. Here is an example:
The College Board – publishers of the SAT – was criticized for how long it took to re-norm the SAT. The SAT was first normed on a sample of 10,000 test takers in 1941-42. The sample consisted of “privileged, white males applying to prestigious New England colleges. The SAT was not re-normed until 1990 using the 1,052,000 college-bound seniors in that cohort. Consider for a moment how many thousands of students over the span of almost 50 years inappropriately had their performance on the SAT compared to that of the select group of the 10,000 in the original norming group.
The SAT was most recently re-normed in 2016. Given that the National Center for Education Statistics reported in August 2014 that the U.S. school enrollment became majority-minority, one could contend that the SAT and many other assessment instruments used with K-12 populations should be re-normed during the next few years.
Examine the norming sample to assess its representativeness of your learner population. If it isn’t representative, consider following the example of many schools by developing local norms. This allows for comparisons to a norming group that is more similar to your students.
The concept of reliability is straightforward, so let’s start with a brief overview. Reliability refers to the extent to which assessment results are affected by measurement error. Consider the thousands of students who take the SAT and ACT each year at high schools and universities across the country. Does a single test administration truly reflect a student’s academic aptitude for success in their freshman year of college? Many factors can influence their performance, such as test anxiety, fatigue, health issues, relationship troubles, or even distractions like a homecoming game and dance the night before. Other influences might include the lighting or temperature in the testing room, or noise from a nearby marching band practicing while the SAT is being administered. Even something like forgetting tissues can cause distractions for students nearby.
All of these internal and external factors that may affect performance can make a single assessment less reflective of a student’s true level of knowledge. These extraneous influences contribute to measurement error, which skews the obtained score and makes it less representative of the learner's actual knowledge. In an ideal scenario—free from measurement error and perfectly capturing their understanding—the score would likely differ significantly from what was recorded.
The American Educational Research Association, National Council on Measurement in Education, and American Counseling Association (1999) clearly state the impact that measurement error has on our ability to utilize assessment results which include measurement error:
Measurement error reduces the usefulness of measures. It limits the extent to which test results can be generalized beyond the particulars of a specific application of the measurement process. Therefore, it reduces the confidence that can be placed in any single measurement.
Validity
Basic concepts of validity are when assessments are “behavior samples” captured at a specific moment in a student’s life. The scores from these assessments are generally used to diagnose or predict behaviors of interest. Here are some examples:
Depression: To determine if a learner is depressed, they might complete a Depression Inventory, which is a 20-item paper-and-pencil assessment.
Success as a Freshman in College: To assess a learner’s likelihood of succeeding in their freshman year, they could take the SAT, designed to predict college readiness.
Career Choice: To help clients explore potential career paths, they might complete a career interest inventory, such as the Oregon CIS Interest Profiler.
The validity issue we’re exploring is how accurately these assessment instruments diagnose or predict what they are intended to measure. For example:
How accurately does the Depression Inventory determine if an individual is clinically depressed?
How effectively does the SAT predict success during a learner's freshman year in college?
How accurately does the Oregon CIS Interest Profiler suggest suitable career options for a client?
Validity is the psychometric term that refers to the accuracy of these assessments in predicting or diagnosing the behavior of interest.
The Standards for Educational and Psychological Testing (AERA, APA, & NCME, 1999) states,
Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests. Validity is, therefore, the most fundamental consideration in developing and evaluating tests. The process of validation involves accumulating evidence to provide a sound scientific basis for the proposed score interpretations. It is the interpretations of test scores required by proposed uses that are evaluated, not the test itself.
Validation is the joint responsibility of the test developer and the test user. The test developer is responsible for furnishing relevant evidence and a rationale in support of the intended test use. The test user is ultimately responsible for evaluating the evidence in the particular setting in which the test is to be used. When the use of a test differs from that supported by the test developer, the test user bears special responsibility for validation.
Gathering evidence of the validity of interest inventories can be challenging, especially when used with learners. Learners complete these inventories to gain insights into their interests and explore potential career paths. The underlying idea is that if learners pursue careers suggested by the interest inventory, they are likely to enjoy those work environments more and achieve greater success in those fields. Nota Bene: Those who developed the interest inventory did not have a norming group of middle or high school learners take an interest inventory and then track them into adulthood to see if the learners’ interest inventories accurately predicted the career they would choose nor whether the learners were successful in that career. Rather the interest inventory is administered to adults in a variety of careers, and when a learner’s interest inventory has high scores on engineering, for example, that means they responded to the inventory in the same way that adults in engineering did.
So, given that career assessments do not have perfect reliability - no measurement error - nor perfect validity - the ability to, with 100% accuracy, predict the career choice in which the learner will enjoy most and experience the most success and how might we share the results?
We suggest that you start by sharing, for example, with a learner who took an interest inventory: You took this interest inventory on one day in your life during your ninth grade year. Had you taken the interest inventory at another time, the results might be somewhat different. Was there anything going on in your life that day that you can remember that might have affected your taking of the interest inventory? Perhaps give some examples of internal or external factors that might have influenced their results. Process their response.
Also remember that this interest inventory is just one indicator of careers that might be good fits for you. Consider also what you know about yourself such as classes you have taken that you have really enjoyed and done well in OR What family members or neighbors or teachers have shared about the work they have done that interested you as a possibility for you.
So, let’s look at the results of your interest inventory and then talk about your thoughts about those results and also your thoughts about what else you know about yourself.