The key statistical objective in IVD device evaluation is establishing performance criteria while minimizing bias and maximizing precision. While the statistical considerations and methodologies required for such evaluation contribute to this objective, they vary throughout the IVD product development process.
Although clearly separate, the assay development and validation phases are often erroneously folded into one. An assay’s development phase requires continuous evaluation that should be clearly distinct from the validation phase. The regulatory and statistical requirements for assay development and validation are different. Understanding the differences between them is crucial for discussing the statistical considerations in each phase.
Utilizing good statistical practices early in an assay’s development will ensure not only that the regulatory requirements are met but also that a solid body of knowledge exists regarding the device’s performance. Furthermore, confidence in the performance profile of an assay can potentially affect its clinical evaluation.
Assay Development versus Assay Validation
What Is Assay Development. During the assay development, or assay optimization, phase, an analytical process or idea is defined and optimized into a robust and reproducible device that delivers results as intended. The majority of assays can be categorized into one of the following three types: qualitative, semiquantitative (qualitative assays based on a quantitative determination in which a clinically meaningful gradation of results exists), and fully quantitative. One optimization strategy for fully quantitative and semiquantitative assays is using a calibration curve. Another common optimization technique for semiquantitative assays involves using receiver operating characteristic (ROC) curves.
An assay’s evolution begins with a clearly desired objective. Whether for basic research or clinical purposes, an assay’s intended use becomes the anchor to which all optimization and validation activities are set. Optimizing an assay involves choosing its optimal format. With the intended use in mind, a new assay’s appropriate performance characteristics are then defined. Although a number of performance characteristics, such as stability, accuracy, and precision, should be reviewed in all assays, other individual characteristics such as robustness and reproducibility may not be as important, depending on the device’s intended use.1,2
The maturation, or optimization, phase is a continuous cycle that begins with defining these initial performance characteristics, and continues until the performance metrics are established and there is confidence in the results that are obtained from the assay. As a device prototype is being finalized, the final stages of assay development focus on the initial feasibility of manufacturing and marketing the device. Once a final optimized and feasible prototype design is completed, it proceeds to assay validation. The culmination of any assay development involves drafting a development report.
What Is Assay Validation. After successfully completing the development phase and prior to implementation, an assay must undergo a validation period. Validation can begin only after the assay design is set and the test parameters have been established. Based on the data obtained during the development phase and with the help of sound judgment, a validation protocol is prepared. Such a protocol should include experiments that confirm those assay parameters deemed important during the design period and test whether the device meets the performance criteria for its intended use. Some test parameters that could be included in the validation protocol are accuracy, precision, linearity, and specificity. The protocol must also include predefined acceptance criteria for each of the assay parameters.
If an assay successfully passes all criteria in the validation protocol, a validation report should be prepared at the conclusion of this phase. Such a report should outline the experiments performed, any deviations (with justifications) from the protocol, and the results of the evaluations.
Delineating Assay Development and Validation. An assay cannot fail in the development or optimization phase. If an assay does not meet the criteria during development, it either gets reoptimized until it can achieve acceptable performance standards or is rejected for its intended use. However, an assay can fail in the validation phase. If an assay does not meet the predefined acceptance criteria during validation, further development is required. After determining and resolving the cause of the failure, an assay should be reoptimized. Once satisfactory performance is achieved, an assay will then be tested under a new validation protocol. The validation results certify that an assay is fit for use.
Statistical Considerations in Assay Development
Statistics play a crucial role in understanding assay results and developing experimental strategies for optimizing the device. The statistical methods employed during assay optimization are generally simple and understandable. However, assay developers should pay careful attention to ensure that these methods are implemented correctly and the results are appropriately interpreted.
Analytical Method Calibration. Many quantitative or semiquantitative assays may require comparing the results to a standard curve, or in other words, they require calibration. Although the same statistical principles apply to all assays, those devices in which quantitation depends on a standard curve present unique challenges.
In order to quantify the amount and activity of an analyte in a sample, calibration to a standard may be required. A series of samples with known amounts of an analyte are run on an assay. The assay results are then plotted against the reference standards, and a curve is statistically fitted to the data. In some cases, a calibration curve with a simple linear fit may be generated (see Figure 1).
Although the example in Figure 1 shows a linear fit, most biological assays do not exhibit linearity across their complete range. The strategy in such cases should be to assess an assay’s linear range and establish a standard curve within this range. The linear range can be determined graphically using a plot similar to the example above. A number of linear fit techniques are available, although the least-squares approach is often sufficient. A simple transformation of the data (such as a log, log-log, square root transformation) may also be required to obtain a linear fit. However, if an assay does not perform in a linear fashion throughout its analytical range (i.e., typical transformation methods are not adequate) and graphical plots show a sigmoidal relationship, the standard curve may be able to be modeled on the four-parameter logistic regression equation.3
In order to verify the goodness of fit of a linear equation, common regression diagnostics should be applied. Although no standard exists, a regression line’s fit is commonly measured using a threshold value of the coefficient of determination (r2) associated with the regression fit. An assay that does not meet this criterion will be deemed invalid in the validation phase and should undergo further development. While relying on r2 as a goodness-of-fit measure, assay developers should take caution to ensure that the standard curve does exhibit a linear response, that no regression outliers exist, and that the data have no systematic bias or heterogeneous variability.4 Such features can be elicited using common graphical diagnostics techniques such as residual plots (see Figure 2). While quantitative statistical tests of curvature, outliers, and patterned residuals are also available, they should be understood prior to use.5 Once the validity of the regression line has been established, predicted concentrations can be determined using inverse regression techniques.Receiver Operating Characteristic (ROC) Curves. When developing semiquantitative assays that can be compared with a gold-standard in which true disease states are known, ROC curves should be used.6 An ROC curve is a plot of the true-positive rates against the false-positive rates for the different possible cut points of a test (see Figure 3). An ROC curve shows the trade-off between the sensitivity and specificity of an assay.6 The closer a curve follows the y-axis and the top border of the graph sheet, the more accurate the test. Another way to express it is that the area under an ROC curve is a measure of assay accuracy. The Clinical and Laboratory Standards Institute (CLSI; Wayne, PA) has established guidelines (in document GP10) that provide further guidance on the use and utility of ROC curves.7
Assay Optimization. The goal of assay optimization is establishing a plan for evaluating the factors that may affect a device’s performance. The factors that may affect an assay are temperature, humidity, sample contaminants, the matrix tested, and reagent composition. Assay developers should study these factors alone and in combination to assess their effects on the device’s accuracy, precision, repeatability, and cross-reactivity.
The most commonly chosen experimental design for assay optimization is a factorial design (see Table I). Depending on the number of factors to be tested, such an experiment will employ either a full factorial or fractional factorial design. A full description of factorial experiments can be found in many statistical data analysis texts.8 Depending on the experimental scenario, other methods such as a full-randomized design or a randomized block design may be employed.8A factorial design allows for simultaneous evaluation of multiple factors that might influence an assay’s performance. (Fewer than five factors use a full factorial design, more than five factors use a fractional factorial design.) When using a factorial design in an experiment, the runs should be performed in a random order. Fractional factorial designs are used when the number of runs in a full factorial design becomes too tedious and when resources are not available to complete a full factorial design. Since deciding which runs to choose and which to leave out can be complex, researchers should consult the relevant references to ensure that the runs are appropriately selected.7
Once an appropriate factorial design is chosen, the experiment can be conducted. The results of the experiment can be graphically summarized using a Pareto plot (see Figure 4). A Pareto plot shows the effects of any given factor compared with all other factors, including possible factor interactions, in an ordered fashion.