THE CRUCIBLE STORY
The Crucible solution began in 2007 as a consultation-based service focusing on the convergence of business process and technology in the high-stakes, human-scored, clinical assessment space within the dental licensure industry. Several years of consultation fostered an idea for the creation and production of a turnkey, hardware-software solution to digitally deliver legally-defensible, clinical exams.
As the word spread of Crucible’s delivery and capabilities, other industries turned to BrightLink for help with their human-scored performance assessments to include practical exams, structured oral interviews, essay marking, clinicals, etc.. Many have discovered the administrative and cost-cutting benefits from Crucible’s five primary purposes.
1. Reduce cognitive fatigue and its subsequent human errors.
2. Enforce exam policy for a more objective, bias-free, legally-defensible exam.
3. Calibrate examiners to the exam through real-time psychometric analysis and adjustment.
4. Decrease operational overhead and administrative burden of manual processes
5. Get scoring results faster.
WHAT WE SOLVE
Credentialing organizations hire graders, examiners, evaluators, and raters for their expertise. Yet paper-based testing methods and processes tax them with an unnecessary administrative burden. Fatigued minds and tired eyes naturally produce human errors which complicate exam delivery, marking, and results tabulation. Crucible frees examiners to do be the experts they were hired to be during exam sessions.
The data quality of paper-based exams is problematic due to one inescapable glitch: human error.
- How often do you find that you are missing ratings and scores?
- Could you estimate how much it costs per exam to research errors and resolve with accurate corrections?
- On a scale of 1 to 5 (1 = ambiguous & 5 = perfect), how would you rate those correction efforts and findings?
- Do you have errors during your checklist or mark sheet scans due to unclear or ambiguous information?
Crucible removes this glitch, improving exam delivery through reduction and elimination of:
- Common Mark Sheet Errors
- Illegible handwriting
- Missing information: examiner marks, scores and signatures; student or candidate names and ID numbers
- Lost assessment sheets
- Incomplete scans and recordings
- Rater effects (e.g. leniency – harshness)
- Common Scoring Calculation Errors
- Transcription errors
- Common Database Entry Errors
- Transposed numbers and form lines
- Misspelled names
- Transcription errors
Crucible reduces exam operations process threats that lead to increased legal risk, overhead expenditures, and operational burden.
Standardization and training of examiners is crucial to ensure validity, reliability, and legal defensibility of an assessment. Crucible equips exam staff to perform examiner calibration, ensuring examiners are trained and aligned with the assessment’s scoring rubric.
Real-time psychometric analysis & adjustment in paper-based systems is a never a reality. Any opportunities to discover errors come only after the results are tabulated. Crucible offers real-time, psychometric analysis by:
- Capturing all tablet marks live, in real time, and in the order as they occurred, with precise timing of each mark.
- Enabling graders and examiners to see grading decisions and markings live, in real time.
- Identifying operational errors which are always-present in paper based exam delivery.
- Identifying and addressing existing biases, and ones that would otherwise come into existence.
- Grading and recalibrating the examiner, exam, and exam delivery process simultaneously.
With Crucible’s technology, the psychometric reliability, objectivity, and legal defensibility of each exam item, score, session, training, etc. all improve at exponential rates over the course of subsequent exam deliveries.
The long duration of exam planning, administration, facilitation, delivery and follow up is another inescapable challenge. For example:
- How long does it take to scan checklists or mark sheets?
- How many personnel are utilized for an exam session?
- How many pieces of equipment are required?
- Approximately how long does it take to process exam results?
In addition, there is often little flexibility for tests that are distributed across multiple locations.
- How many checklists or mark sheets must be mailed?
- How many pieces of paper are mailed to each location?
- How much does this shipping cost?
- What security concerns are present if the checklists or mark sheets arrive too early? How do you prevent this?
- What do you do if you need to make last minute changes to your checklists or mark sheets?
- What do you do if you need to make short-term, temporary changes?
Consider a single OSCE with 16 stations and 20 candidates.
- Such an event can produce 320 sheets of paper alone.
- All 320 sheets of paper must be handled multiple times to print, collate, package, ship, unpackage, distribute, collect, scan, record, research, etc.
- Manual scoring on the paper mark sheet further wastes time, which wastes labor funds.
- Labor funds are further increased exponentially when it comes time to discover and resolve errors.
In short, paper requires massive administrative overhead. What is more, most paper-based exams are created for grading by machines, not humans. Filling in bubbles takes a degree of precision which requires additional time. Scanning mark sheets presents the occasion for misreads, misfeeds and/or destroyed exam forms. Crucible solves the operational and logistical hindrances presented by paper-based, manual scoring.
TURNKEY SETUP & DELIVERY
Crucible equips humans to grade what computers cannot in an all-in-one, turnkey, integrated system with a simplified, out-of-the-box experience.
HERE’S HOW IT WORKS
Turnkey SETUP: 6 Simple Steps
Turnkey DELIVERY: 6 Simple Steps
ASSESSMENT APPLICATIONS & TYPES
What is a human-scored item? Any assessment or exam item that cannot typically be graded by a computer but must be observed and scored by a human.
Most knowledge-based exams or assessments are delivered via computer-based testing (CBT), often in a multiple choice question (MCQ) format. However, many credentialing organizations require a practical exam or performance assessment. Such exams are also called observational assessments, since a grader or examiner observes the candidate performing a specific activity and assesses the performance.
Human-scored items often include a candidate’s:
- Knowledge of an activity.
- Application of their knowledge.
- Task, process, or procedural performance.
- Ability to demonstrate their skill.
Crucible delivers assessments in one-to-one (1:1) and one-to-many (1:N) environments.
Whether one examiner to one candidate or many candidates, delivery includes the following observational assessment scenarios.
- OSCEs & Clinical Skills / Regulatory Licensure and Medical Education.
- Practical Exams, Performance Items
- Essay Marking, Written Case Studies, Written Case Presentations
- Structured Oral Interviews, Oral Presentations, Oral Practicals
- Equipment Operation
- Safety Procedures & Protocols
- Hands-On Tests / Job Skills
- Activities Observed & Scored During the Exam Application Process
- HR / Interviews
- Performance Change Management
- Testing Events: exams that are offered at scheduled times across one or more specific testing locations. This framework often includes logistics planning for candidates, examiners, proctors, standardized patients, hotel accommodations, meals, etc. Crucible Scheduler may be coupled with Crucible to manage the entire event from start to finish.
- Continuous Testing: exams that are offered on an ongoing basis, throughout the year, at either specific first party testing sites or third party exam centers.
Assessment schemas refer to the scoring formats within an exam rubric. Crucible is equipped with the following schemas:
- Single choice
- Multiple choice
- Dichotomous IRT
- Polytomous IRT
- Partial credit
For a decade, Crucible has proven international experience in exam delivery in regulatory industries including:
Our clients use Crucible to deliver a variety of human-scored exams including:
- OSCEs & Clinical Skills
- Practical & Performance Items
- Essay Marking, Written Case Studies & Presentations
- Structured Oral Interviews, Presentations, and Practicals
- Equipment Operation, Safety Procedures and Protocols
- HR Interviews & Performance Change Management
- Hands-On Assessments for Exam Application Process
Crucible is useful in a variety of for-profit and non-profit environments and contexts, including:
- Regulatory Licensure
- Medical Education
- K-12 / Teachers & Students
- Financial / Banking
- Environmental / Safety
- Construction / Design
- Hands-On Tests / Job Skills
- Human Resources / Application Eligibility
DELIVERY & DEPLOYMENT
Crucible-driven assessments are delivered digitally via tablets and desktop computer in one of three ways:
3 Ways Crucible is Deployed
Resolving security requirements and/or lack of internet connectivity, Crucible may be deployed globally, wherver there is power, with a decentralized experience via the Crucible Local Kit.
- The Crucible Local Kit includes secure exam delivery without existing network or internet infrastructure.
- This option may also be utilized with existing network infrastructure.
- Each Crucible Local Kit is a standalone, all-in-one, turnkey kit, including examiner tablets preconfigured with the Crucible app, wireless router/access point, repeaters/range extenders, and server.
- Exam results export is required.
Resolving network infrastructure challenges when concurrently delivering across multiple locations, Crucible may be deployed with a centralized experience via your own Crucible online server.
- The Cloud+Network kit includes secure exam delivery via the Crucible app and preconfigured tablets, cellular mode-access point (4G/LTE), cellular antenna, and wireless router antenna.
- No export of exam data is required as results are securely stored in your own cloud instance.
For organizations desiring to use their own existing network and internet infrastructure, Crucible may be deployed…
- For centralized, secure exam delivery via the Crucible app and preconfigured tablets.
- Connecting to your own Crucible online server instance.
- With no export of exam data required as results are securely stored in your own cloud instance.
2 Questions to Discover Which Deployment is Right For You
There are two questions to ask to discover which method of deployment for Crucible is right for your organization and context.
- Local: host your exams on the preconfigured server included in the Crucible kit.
- Cloud-based: host your exams on our online infrastructure.
- Our network: Crucible will be accessible via the preconfigured network included in the Crucible kit.
- Your network: Crucible will be accessible via the Crucible app, which will preconfigure tablets to work with your network.
Human-scored assessments include practical exams, clinical exams, performance assessments and typically anything other type of exams that computers’ can’t grade. As such they can be every bit as complex as the humans who are grading and being graded.
Crucible has arisen out of the exam complexities of psychometrics, subjectivity, licensure, legal defensibility, and analytical reliability. The convergence of multiple streams of scientific research, technology expertise, and industry best practices, Crucible sits at the intersection of human-scored assessments, software development, DevOps, and business operations. With Crucible, BrightLink helps credentialing organizations direct the traffic flow of candidates and exams hundreds of thousands of times over each year in a variety of industries. Here’s how it works.
Crucible manages all aspects of performance examination (OSCE, Clinical Skills, Observational, Oral) delivery in a single integrated system.
- Exam administrators use it to dispatch candidates and exam resources.
- Chief examiners use Crucible to calibrate examiner grading so that performance exams are more fair and consistent, by aligning and cross-checking examiners’ grading styles.
- Psychometricians use Crucible to analyze the performance of exams and examiners so that calibration can happen simultaneous to the exam event.
- Analysts use Crucible during and after exam sessions to improve processes for greater accuracy, efficiency and risk reduction.
- Operations use Crucible Scheduling to organize, administrate, and communicate to examiners, proctors, graders, and volunteers for the most important details of the exam session.
- Before delivering a live practical assessment, individual examiners (or a group of examiners) review training material to gain an understanding of the grading rubric nuance(s).
- Within 1 week of the assessment, the Crucible calibration module prompts the examiners to view the practical element(s) involved in the assessment using the defined rubric.
- When completed on an individual basis, examiners are given a report of their performance according to predetermined standardized responses. If participating in a group setting, the exam administrator discusses and rates their performance according to the rubric.
- Ratings are analyzed for competency and trends among examiners prior to commencing the live examination.
The use of this module may be toggled on or off. Crucible Examiner Calibration may be toggled on to require examiners be calibrated before being permitted to rate live assessments. Conversely it may also be used for information purposes only.
Beyond training and aligning examiners to the assessment, value is added by determining that examiners comprehend scoring criteria enough to be held accountable. Crucible will simulate the scoring task in a real-world scenario without actually grading the candidate. From this activity, empirical evidence can validate the examiner has been trained and calibrated to an acceptable level before delivering the actual assessment.
Guidance for this method can be found in the Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association, and National Council on Measurement in Education, 2014). Programs use these types of activities when facing the need to integrate and deliver human scored assessments along with the decisions candidates receive.
- Statistical calculations on large candidate pools. With enough data over time, Crucible identifies trends by identifying mean, standard deviations, and other statistical measures. This information assists in predicting the curve in an examiner’s performance, and highlights which examiners require additional training or re-calibration.
- Interface usage. Crucible serves by identifying an examiner’s marking decisions, speed, changes, sequence, and other metrics based on empirical correlation with existing reliability measures.
- Test-to-test changes in candidate performance. Crucible assists in defining a baseline for retest performance improvements of candidate, and then measure differences among examiners.
- Calibration test outcome for the examiner. See section on Examiner Calibration.
- Prediction Bias
- Construct Bias
- Methods Bias
- Selection Bias
- Halo Bias
- Framing Effect
- Recency Bias
- Primacy Bias
- Availability Bias
- Bandwagon Effect
- Omission Bias
- Congruence Bias
- Confirmation Bias
- Contrast Effect
- Von Restorff Effect
Crucible can deliver assessments with tablets enforcing a minimum set of processes for the examiner. The creation of constructs that are irrelevant to the actual assessment are thereby prevented. Consequently, the impact of these biases is significantly reduced, thereby contributing to the highest standards of objectivity, accuracy, and defensibility.
- Miller’s Law of Short Term Memory Load. The human brain can only recall a distinct “chunk” of information per category (plus or minus one): seven digits, six letters, five words. Crucible works with the human brain by displaying assessment details in an optimized chunk for human recall and interaction.
- Fitts’ Law. The human effort associated with pointing and tapping speed, target distance based on screen size, as well as target (button) size and tapping accuracy are crucial when developing a user interface for repetitive use. Crucible implements these principles into assessment delivery offering higher indices of performance for a faster, better grading experience.
- Hick-Hyman Law. The more choices a user is presented with, the longer it will take them to make a decision. Crucible presents only the most common functionalities a grader uses when delivering an assessment, and guides them to those functionalities in consistent ways.
- Pareto and Zipf Laws. Zipf’s law describes user behavior with phenomena where particular types of events are quite frequent, whereas other types of events are rare. Pareto’s law indicates that 20% of the content is doing 80% of the work. Crucible features were designed to include only what is necessary and most familiar to deliver an assessment.
- Nielsen’s Response Time Limits Law. Human response times to digital information and a computer’s subsequent responsiveness guide the design of an interface. Crucible aims to maintain responsiveness between the standard 0.1 – 10-second timeframe in order to maintain the examiner’s attention and reduce conscious annoyance with system delays.
- Gestalt Law and Praegnanz Law. Operating as a fundamental law of human perception, the human eye sees objects in their entirety before perceiving individual parts. The Crucible interface is designed in keeping the principles of closure, figure and ground, proximity, continuation, symmetry, and similarity.
- Weber’s Law and Fechner’s Law. Weber’s law measures the increased response required against the intensity of a specific background (e.g. speaking loud in a noisy room; whispering in a quiet room). Fechner’s law is an application of Weber, used by Crucible in the design of user interface features. Attention to brightness, line length and thickness, the visual weight of fonts in typography, color matching etc. is given in order to maintain consistency and reduce examiner distraction.
Examiners who serve for multiple hours need a solution uniquely designed to fit the demands of the human brain, utilizing the science of visual ergonomics. Crucible introduces a balance of mental comfort and acuity.
- Mental comfort: Crucible delivers a mentally comfortable examiner interaction in order to reduce fatigue and resulting errors.
- Mental acuity: Equally important, Crucible is designed to draw visual focus to the most important details of an evaluation process to deepen and reinforce exam policy.
- Integration: bringing exam scheduling, dispatching and delivery workflows together within a single integrated system.
- Defragmentation: unifying accessibility, information, analysis and reporting from every test center, regardless of global location, into one data source.
- Standardization: enabling repeatability in exam experiences, subsequently increasing fairness.
- Risk Mitigation: creating an “assessment blueprint” to reduce potential legal risks and appeals involved in granting certification or licensure.
- Standalone: each Crucible kit operates on its own self-contained, isolated, encrypted network.
- Preconfigured: the Crucible app is configured to communicate ONLY with the Crucible server.
- Live Tracking: Crucible logs every human interaction with the device, application, mark sheet, and notations including precise timing/action log for analysis and reference.
- Authorization: Test security is integrated within Crucible. Only authorized examiners have access to Crucible assessments. Unauthorized entry attempts are monitored. Only authorized candidates may be assessed.
From the first interaction forward, forensic data is captured and reported to the secure cloud server, allowing assessment staff to monitor, track, and analyze the data, as well as offering a variety of customizable reports. When combined with proctoring surveillance cameras, unethical examiner behavior may also be identified and managed with the forensic data available.
- Missing data: failing to record marks or scores.
- Transference errors: marking the right information in the wrong place, or vice versa.
- Transcription errors: rewriting examiner notes and commentary incorrectly.
- Lack of exam policy/process enforcement: failure to complete certain sections or details of an assessment.
- Lack of real-time analytics: unable to make advancements or changes to the assessment until after completion and assessment stacks are analyzed.
- Lack of instant results: waiting for two to four weeks or more before calculating and releasing assessment scores to candidates.
- Lack of real-time intervention and adjustment: inability to discover and seize opportunities to intervene, recalibrate examiners, adjust items, or remove high-risk candidates.
- Lack of bias insight: inability to identify and address examiner biases.
- Examiner fatigue: introduced through bubble-based assessment forms filled in with pencils.
- Scanning misreads and machine misfeeds: created by manual handling and/or machine malfunction, often resulting in damaged or destroyed mark sheets.
- Security breach risk threats: presented through lack of timely collection and proper storage of paper-based assessments.
With hundreds, if not thousands, of laborers being assessed per week, month or year, paper-based assessments introduce the likelihood of a significant number of human errors that challenge the validity, reliability and legal defensibility of the exam itself. Crucible will eliminate paper-based assessments, increasing confidence and scalability of assessments as an organization continues to expand its operations.
- Crucible tablets can mark assessments offline. Once the tablet is within connectivity range again, it will sync with the cloud-based server.
- Crucible can operate with local mobile carriers. This allows tablets to stay connected to the cloud-based server wherever a cellular mobile-wifi access point is available.
There is no limit to the number of Crucible examinations that may be implemented across a region or country.
Crucible was built from consultation around industry best practices on performance assessment delivery and continues to be guided by this consultation today. A constant involvement with industry-leading psychometric, operational, and technological resources and experts continues to ensure that Crucible evolves with the processes and policies required by our high-stakes clients.
Some organizations desire to utilize Crucible’s expertise strictly to consult on human-scored assessment processes, and do so with zero obligation to utilize Crucible technology. Crucible Certitude is our consultation-based service offering a thorough three-step discovery, documentation, and deliverable process to assist organizations seeking to standardize, optimize, or outsource exam needs.
- “A pilot study of marking accuracy and mental workload as measures of OSCE examiner performance,” by Aidan Byrne. BMC Medical Education (2016) 16:191.
- “Money Makes the (Medical Assessment) World Go Round: The Cost of Components of a Summative Final Year Objective Structured Clinical Examination (OSCE),” by Craig Brown, Sarah Ross, Jennifer Cleland, and Kieran Walsh. MedTeach (2015) April, 29:1-7.
- “A Comparative Analysis of the Costs of Administration of an OSCE” by Michael D. Cusimano, MD, MHPE; Robert Cohen, PhD.; William Tucker, MD; John Murnaghan, MD; Ron Kodama, MD; and Richard Reznick, MD. Journal of the Association of American Medical Colleges (1994) 69(7): 571-576.
- “OSCE Feedback: A Randomized Trial of Effectiveness, Cost-Effectiveness and Student Satisfaction,” by Celia A. Taylor and Kathryn E. Green. Creative Education (2013) Vol. 4, No. 6A, 9-14.
- “A method for identifying extreme OSCE examiners,” by Ilona Bartman, Sydney Smee, and Marguerite Roy. The Clinical Teacher (2013) 10:27-31.
- “Contrasting Automated and Human Scoring of Essays,” by Mo Zhang. Educational Testing Service (2013) No. 21.
- “An Objective Structured Clinical Examination for Evaluating Psychiatric Clinical Clerks,” by Brian Hodges, MD, MEd; Glenn Regehr, PhD; Mark Hanson, MD; and Nancy McNaughton. Academic Medicine (1997) 72:715-721.
- “The objective structured clinical examination: can physician-examiners participate from a distance?” by James Chan, Susan Humphrey-Murto, Debra M. Pugh, Charles Su, and Timothy Wood. Medical Education (2014) 48: 441-450.
- “Evaluating the Impact of Releasing an Item Pool on a Test’s Empirical Characteristics,” by Chad W. Buckendahl, PhD and Jack D. Gerrow, DDS. Journal of Dental Education (2016) 80(10): 1253-1260.
- “A Review of Strategies for Validating Computer-Automated Scoring,” by Yongwei Yang, Chad W. Buckendahl, Piotr J. Juszkiewicz and Dennison S. Bhola. Applied Measurement in Education (2002) 15(4): 391-412.
- “How Good Are Our Raters? Rater Errors in Clinical Skills Assessments,” by Cherdsak Iramaneerat and Rachel Yudkowsky. University of Illinois Chicago (2006).
- “Capturing Useful Assessment Data: Eliminating Unintended Cognitive Bias From Your Evaluation Instruments,” by Nancy Piro & Ann Dohn. 2010 ACGME Annual Education Conference, Stanford School of Medicine.
- “Leniency and halo bias in industry-based assessments of student competencies: a critical, sector-based analysis,” by Katharina Wolf. Higher Education Research and Development (2015) 34:1045-1059.