THE CRUCIBLE STORY
The Crucible solution began in 2007 as a consultation-based service focusing on the convergence of business process and technology in the high-stakes, human-scored, clinical assessment space within the dental licensure industry. Several years of consultation fostered an idea for the creation and production of a turnkey, hardware-software solution to digitally deliver legally-defensible, clinical exams.
As the word spread of Crucible’s delivery and capabilities, other industries turned to BrightLink for help with their human-scored performance assessments to include practical exams, structured oral interviews, essay marking, clinicals, etc.. Many have discovered the administrative and cost-cutting benefits from Crucible’s five primary purposes.
1. Reduce cognitive fatigue and its subsequent human errors.
2. Enforce exam policy for a more objective, bias-free, legally-defensible exam.
3. Calibrate examiners to the exam through real-time psychometric analysis and adjustment.
4. Decrease operational overhead and administrative burden of manual processes
5. Get scoring results faster.
WHAT WE SOLVE
Credentialing organizations hire graders, examiners, evaluators, and raters for their expertise. Yet paper-based testing methods and processes tax them with an unnecessary administrative burden. Fatigued minds and tired eyes naturally produce human errors which complicate exam delivery, marking, and results tabulation. Crucible frees examiners to do be the experts they were hired to be during exam sessions.
Paper-based exams produce an inescapable glitch in the exam delivery process: human error. Crucible removes this glitch, improving exam delivery through reduction and elimination of:
- Common Mark Sheet Errors
- Illegible handwriting
- Missing information: examiner marks, scores and signatures; student or candidate names and ID numbers
- Lost assessment sheets
- Incomplete scans and recordings
- Rater effects (e.g. leniency – harshness)
- Common Scoring Calculation Errors
- Common Database Entry Errors
- Transposed numbers and form lines
- Misspelled names
- Transcription errors
Crucible reduces exam operations process threats that lead to increased legal risk, overhead expenditures, and operational burden.
Standardization and training of examiners is crucial to ensure validity, reliability, and legal defensibility of an assessment. Crucible equips exam staff to perform examiner calibration, ensuring examiners are trained and aligned with the assessment’s scoring rubric.
Real-time psychometric analysis & adjustment in paper-based systems is a never a reality. Any opportunities to discover errors come only after the results are tabulated. Crucible offers real-time, psychometric analysis by:
- Capturing all tablet marks live, in real time, and in the order as they occurred, with precise timing of each mark.
- Enabling graders and examiners to see grading decisions and markings live, in real time.
- Identifying operational errors which are always-present in paper based exam delivery.
- Identifying and addressing existing biases, and ones that would otherwise come into existence.
- Grading and recalibrating the examiner, exam, and exam delivery process simultaneously.
With Crucible’s technology, the psychometric reliability, objectivity, and legal defensibility of each exam item, score, session, training, etc. all improve at exponential rates over the course of subsequent exam deliveries.
Paper wastes time. A single OSCE with 16 stations and 20 candidates can produce 320 sheets of paper alone. Manual entry wastes time which wastes labor funds, a factor exponentially increased when there are errors to be discovered and resolved. Paper requires massive administrative overhead involved in printing, organizing, distributing, collecting, storing, scanning, and recording the exams.
What is more, most paper-based exams are created for grading by machines, not humans. Filling in bubbles takes a degree of precision which requires additional time. Scanning mark sheets presents the occasion for misreads, misfeeds and/or destroyed exam forms.
Crucible offers performance examination management and delivery in an all-in-one, turnkey, integrated system with a simplified, out-of-the-box experience.
1. Open the Crucible Local or Remote Kit.
2. Power up the gear.
3. Deliver the exams.
4. Export the results (required only with the Crucible Local Kit)
5. Return the gear to the container.
6. Ship the kit back to BrightLink.
Crucible is an occupationally neutral solution delivering objective, legally defensible exams and assessments for high-stakes, competency education and training, and accreditation standards.
Crucible also includes video training for asynchronous education of examiners to a testing event, as well as analysis and reporting capabilities.
ASSESSMENT APPLICATIONS & TYPES
Crucible grades assessments or exams that computers typically cannot. Most knowledge-based exams or assessments are delivered via computer-based testing (CBT), often in a multiple choice question (MCQ) format. However, many credentialing organizations require a practical exam or performance assessment. Such exams are also called observational assessments, since a grader or examiner observes the candidate performing a specific activity and assesses the performance.
Applications often include a candidate’s:
- Knowledge of an activity.
- Application of their knowledge.
- Task, process, or procedural performance.
- Ability to demonstrate their skill.
Crucible delivers assessments in one-to-one (1:1) and one-to-many (1:N) environments. Whether one examiner to one candidate or many candidates, delivery includes the following observational assessment scenarios.
- OSCEs & Clinical Skills / Regulatory Licensure and Medical Education.
- Practical Exams, Performance Items
- Essay Marking, Written Case Studies, Written Case Presentations
- Structured Oral Interviews, Oral Presentations, Oral Practicals
- Equipment Operation
- Safety Procedures & Protocols
- Performance Change Management
Assessment schemas refer to the scoring formats within an exam rubric. Crucible is equipped with the following schemas:
- Single choice
- Multiple choice
- Dichotomous IRT
- Polytomous IRT
- Partial credit
For a decade, Crucible has proven international experience in exam delivery in regulatory industries including:
Crucible also serves the education space with competency training and assessments, including verification of competency-based accreditation standards.
DELIVERY & DEPLOYMENT
Crucible-driven assessments are delivered digitally via tablets and desktop computer in one of three ways:
3 Ways Crucible is Deployed
Resolving security requirements and/or lack of internet connectivity, Crucible may be deployed globally, wherver there is power, with a decentralized experience via the Crucible Local Kit.
- The Crucible Local Kit includes secure exam delivery without existing network or internet infrastructure.
- This option may also be utilized with existing network infrastructure.
- Each Crucible Local Kit is a standalone, all-in-one, turnkey kit, including examiner tablets preconfigured with the Crucible app, wireless router/access point, repeaters/range extenders, and server.
- Exam results export is required.
Resolving network infrastructure challenges when concurrently delivering across multiple locations, Crucible may be deployed with a centralized experience via your own Crucible online server.
- The Cloud+Network kit includes secure exam delivery via the Crucible app and preconfigured tablets, cellular mode-access point (4G/LTE), cellular antenna, and wireless router antenna.
- No export of exam data is required as results are securely stored in your own cloud instance.
For organizations desiring to use their own existing network and internet infrastructure, Crucible may be deployed…
- For centralized, secure exam delivery via the Crucible app and preconfigured tablets.
- Connecting to your own Crucible online server instance.
- With no export of exam data required as results are securely stored in your own cloud instance.
2 Questions to Discover Which Deployment is Right For You
There are two questions to ask to discover which method of deployment for Crucible is right for your organization and context.
There are two hosting solutions for Crucible:
- Local: host your exams on the preconfigured server included in the Crucible kit.
- Cloud-based: host your exams on our online infrastructure.
There are two ways in which Crucible can be accessed.
- Our network: Crucible will be accessible via the preconfigured network included in the Crucible kit.
- Your network: Crucible will be accessible via the Crucible app, which will preconfigure tablets to work with your network.
Human-scored assessments include practical exams, clinical exams, performance assessments and typically anything other type of exams that computers’ can’t grade. As such they can be every bit as complex as the humans who are grading and being graded.
Crucible has arisen out of the exam complexities of psychometrics, subjectivity, licensure, legal defensibility, and analytical reliability. The convergence of multiple streams of scientific research, technology expertise, and industry best practices, Crucible sits at the intersection of human-scored assessments, software development, DevOps, and business operations. With Crucible, BrightLink helps credentialing organizations direct the traffic flow of candidates and exams hundreds of thousands of times over each year in a variety of industries. Here’s how it works.
Through the examiner calibration process, examiners will review training material, evaluate standardized candidate evidence, and report results, which are in turn used to ensure the examiner is aligned to the purpose of the assessment. This process ensures the highest levels of accuracy, objectivity and consistency in grading. Calibration and training happen in four steps:
- Before delivering a live practical assessment, individual examiners (or a group of examiners) review training material to gain an understanding of the grading rubric nuance(s).
- Within 1 week of the assessment, the Crucible calibration module prompts the examiners to view the practical element(s) involved in the assessment using the defined rubric.
- When completed on an individual basis, examiners are given a report of their performance according to predetermined standardized responses. If participating in a group setting, the exam administrator discusses and rates their performance according to the rubric.
- Ratings are analyzed for competency and trends among examiners prior to commencing the live examination.
The use of this module may be toggled on or off. Crucible Examiner Calibration may be toggled on to require examiners be calibrated before being permitted to rate live assessments. Conversely it may also be used for information purposes only.
Beyond training and aligning examiners to the assessment, value is added by determining that examiners comprehend scoring criteria enough to be held accountable. Crucible will simulate the scoring task in a real-world scenario without actually grading the candidate. From this activity, empirical evidence can validate the examiner has been trained and calibrated to an acceptable level before delivering the actual assessment.
Guidance for this method can be found in the Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association, and National Council on Measurement in Education, 2014). Programs use these types of activities when facing the need to integrate and deliver human scored assessments along with the decisions candidates receive.
As each examiner grades a candidate, Crucible is grading the e xaminer. As more candidates are graded, examiners develop a performance record of their grading. This data allows exam administrators to measure examiner performance with inputs that include, but are not limited to, the following features:
- Statistical calculations on large candidate pools. With enough data over time, Crucible identifies trends by identifying mean, standard deviations, and other statistical measures. This information assists in predicting the curve in an examiner’s performance, and highlights which examiners require additional training or re-calibration.
- Interface usage. Crucible serves by identifying an examiner’s marking decisions, speed, changes, sequence, and other metrics based on empirical correlation with existing reliability measures.
- Test-to-test changes in candidate performance. Crucible assists in defining a baseline for retest performance improvements of candidate, and then measure differences among examiners.
- Calibration test outcome for the examiner. See section on Examiner Calibration.
Human examiners conduct assessments. As such, examiners carry inherent biases that affects the objectivity and resulting accuracy and outcome of the assessment. Crucible empirically identifies and/or adjusts for biases, including but not limited to:
- Prediction Bias
- Construct Bias
- Method Bias
- Selection Bias
- Halo Bias
- Framing effect
- Recency Bias
- Primacy Bias
- Availability Bias
- Bandwagon effect
- Omission bias
- Congruence bias
- Contrast effect
- Unacceptability bias
- Von Restorff effect
Crucible can deliver assessments with tablets enforcing a minimum set of processes for the examiner. The creation of constructs that are irrelevant to the actual assessment are thereby prevented. Consequently, the impact of these biases is significantly reduced, thereby contributing to the highest standards of objectivity, accuracy, and defensibility.
Visual ergonomics is a multidisciplinary science focused on understanding human visual cognition processes and the interactions with elements of a particular system. Applying various laws, principles, theories, knowledge and methods, software user interfaces may be designed, assessed and optimized for ease-of-use and overall system performance. (See International Ergonomics Association ). Following is a list of visual ergonomic laws which have guided the development of Crucible and the assessments it delivers:
- Miller’s Law of Short Term Memory Load. The human brain can only recall a distinct “chunk” of information per category (plus or minus one): seven digits, six letters, five words. Crucible works with the human brain by displaying assessment details in an optimized chunk for human recall and interaction.
- Fitts’ Law. The human effort associated with pointing and tapping speed, target distance based on screen size, as well as target (button) size and tapping accuracy are crucial when developing a user interface for repetitive use. Crucible implements these principles into assessment delivery offering higher indices of performance for a faster, better grading experience.
- Hick-Hyman Law. The more choices a user is presented with, the longer it will take them to make a decision. Crucible presents only the most common functionalities a grader uses when delivering an assessment, and guides them to those functionalities in consistent ways.
- Pareto and Zipf Laws. Zipf’s law describes user behavior with phenomena where particular types of events are quite frequent, whereas other types of events are rare. Pareto’s law indicates that 20% of the content is doing 80% of the work. Crucible features were designed to include only what is necessary and most familiar to deliver an assessment.
- Nielsen’s Response Time Limits Law. Human response times to digital information and a computer’s subsequent responsiveness guide the design of an interface. Crucible aims to maintain responsiveness between the standard 0.1 – 10-second timeframe in order to maintain the examiner’s attention and reduce conscious annoyance with system delays.
- Gestalt Law and Praegnanz Law. Operating as a fundamental law of human perception, the human eye sees objects in their entirety before perceiving individual parts. The Crucible interface is designed in keeping the principles of closure, figure and ground, proximity, continuation, symmetry, and similarity.
- Weber’s Law and Fechner’s Law. Weber’s law measures the increased response required against the intensity of a specific background (e.g. speaking loud in a noisy room; whispering in a quiet room). Fechner’s law is an application of Weber, used by Crucible in the design of user interface features. Attention to brightness, line length and thickness, the visual weight of fonts in typography, color matching etc. is given in order to maintain consistency and reduce examiner distraction.
Examiners who serve for multiple hours need a solution uniquely designed to fit the demands of the human brain, utilizing the science of visual ergonomics. Crucible introduces a balance of mental comfort and acuity.
- Mental comfort: Crucible delivers a mentally comfortable examiner interaction in order to reduce fatigue and resulting errors.
- Mental acuity: Equally important, Crucible is designed to draw visual focus to the most important details of an evaluation process to deepen and reinforce exam policy.
Use of the Crucible solution results in the optimization and/or reduction of operational costs and overhead through:
- Integration: bringing exam scheduling, dispatching and delivery workflows together within a single integrated system.
- Defragmentation: unifying accessibility, information, analysis and reporting from every test center, regardless of global location, into one data source.
- Standardization: enabling repeatability in exam experiences, subsequently increasing fairness.
- Risk Mitigation: creating an “assessment blueprint” to reduce potential legal risks and appeals involved in granting certification or licensure.
Crucible was built with integrated security, due to the high-stakes nature of our clients’ assessments and exams.
- Standalone: each Crucible kit operates on its own self-contained, isolated, encrypted network.
- Preconfigured: the Crucible app is configured to communicate ONLY with the Crucible server.
- Live Tracking: Crucible logs every human interaction with the device, application, mark sheet, and notations including precise timing/action log for analysis and reference.
- Authorization: Test security is integrated within Crucible. Only authorized examiners have access to Crucible assessments. Unauthorized entry attempts are monitored. Only authorized candidates may be assessed.
From the first interaction forward, forensic data is captured and reported to the secure cloud server, allowing assessment staff to monitor, track, and analyze the data, as well as offering a variety of customizable reports. When combined with proctoring surveillance cameras, unethical examiner behavior may also be identified and managed with the forensic data available.
Crucible can replace existing paper-based systems, which are prone to the following standard challenges:
- Missing data: failing to record marks or scores.
- Transference errors: marking the right information in the wrong place, or vice versa.
- Transcription errors: rewriting examiner notes and commentary incorrectly.
- Lack of exam policy/process enforcement: failure to complete certain sections or details of an assessment.
- Lack of real-time analytics: unable to make advancements or changes to the assessment until after completion and assessment stacks are analyzed.
- Lack of instant results: waiting for two to four weeks or more before calculating and releasing assessment scores to candidates.
- Lack of real-time intervention and adjustment: inability to discover and seize opportunities to intervene, recalibrate examiners, adjust items, or remove high-risk candidates.
- Lack of bias insight: inability to identify and address examiner biases.
- Examiner fatigue: introduced through bubble-based assessment forms filled in with pencils.
- Scanning misreads and machine misfeeds: created by manual handling and/or machine malfunction, often resulting in damaged or destroyed mark sheets.
- Security breach risk threats: presented through lack of timely collection and proper storage of paper-based assessments.
With hundreds, if not thousands, of laborers being assessed per week, month or year, paper-based assessments introduce the likelihood of a significant number of human errors that challenge the validity, reliability and legal defensibility of the exam itself. Crucible will eliminate paper-based assessments, increasing confidence and scalability of assessments as an organization continues to expand its operations.
As credentialing organization grow, they encounter the opportunity to expand their operations throughout different districts, states, regions, or even countries. The Crucible solution can enable such an organization to scale its delivery to applicants without regard to long-term planning or location. It may be implemented with minimal lead time, and in any location, including those without wireless access. In remote locations, Crucible may be utilized in two flexible ways:
- Crucible tablets can mark assessments offline. Once the tablet is within connectivity range again, it will sync with the cloud-based server.
- Crucible can operate with local mobile carriers. This allows tablets to stay connected to the cloud-based server wherever a cellular mobile-wifi access point is available.
There is no limit to the number of Crucible examinations that may be implemented across a region or country.
The development of Crucible has occurred in large part in collaboration with assessment industry and psychometric experts whose experience and research have helped shape the Crucible solution. Following are a list of helpful and influential articles centered almost exclusively on the OSCE sector, all of which contain significant implications for other observational assessment sectors as well.
- “A pilot study of marking accuracy and mental workload as measures of OSCE examiner performance,” by Aidan Byrne. BMC Medical Education (2016) 16:191.
- “Money Makes the (Medical Assessment) World Go Round: The Cost of Components of a Summative Final Year Objective Structured Clinical Examination (OSCE),” by Craig Brown, Sarah Ross, Jennifer Cleland, and Kieran Walsh. MedTeach (2015) April, 29:1-7.
- “A Comparative Analysis of the Costs of Administration of an OSCE” by Michael D. Cusimano, MD, MHPE; Robert Cohen, PhD.; William Tucker, MD; John Murnaghan, MD; Ron Kodama, MD; and Richard Reznick, MD. Journal of the Association of American Medical Colleges (1994) 69(7): 571-576.
- “OSCE Feedback: A Randomized Trial of Effectiveness, Cost-Effectiveness and Student Satisfaction,” by Celia A. Taylor and Kathryn E. Green. Creative Education (2013) Vol. 4, No. 6A, 9-14.
- “A method for identifying extreme OSCE examiners,” by Ilona Bartman, Sydney Smee, and Marguerite Roy. The Clinical Teacher (2013) 10:27-31.
- “Contrasting Automated and Human Scoring of Essays,” by Mo Zhang. Educational Testing Service (2013) No. 21.
- “An Objective Structured Clinical Examination for Evaluating Psychiatric Clinical Clerks,” by Brian Hodges, MD, MEd; Glenn Regehr, PhD; Mark Hanson, MD; and Nancy McNaughton. Academic Medicine (1997) 72:715-721.
- “The objective structured clinical examination: can physician-examiners participate from a distance?” by James Chan, Susan Humphrey-Murto, Debra M. Pugh, Charles Su, and Timothy Wood. Medical Education (2014) 48: 441-450.
- “Evaluating the Impact of Releasing an Item Pool on a Test’s Empirical Characteristics,” by Chad W. Buckendahl, PhD and Jack D. Gerrow, DDS. Journal of Dental Education (2016) 80(10): 1253-1260.
- “A Review of Strategies for Validating Computer-Automated Scoring,” by Yongwei Yang, Chad W. Buckendahl, Piotr J. Juszkiewicz and Dennison S. Bhola. Applied Measurement in Education (2002) 15(4): 391-412.
Crucible was built from consultation around industry best practices on performance assessment delivery and continues to be guided by this consultation today. A constant involvement with industry-leading psychometric, operational, and technological resources and experts continues to ensure that Crucible evolves with the processes and policies required by our high-stakes clients.
Some organizations desire to utilize Crucible’s expertise strictly to consult on human-scored assessment processes, and do so with zero obligation to utilize Crucible technology. Crucible Certitude is our consultation-based service offering a thorough three-step discovery, documentation, and deliverable process to assist organizations seeking to standardize, optimize, or outsource exam needs.