The Validity of Reliability Measure in Threshold Perimetry

Xiaolei Shao; Cecilia Fenerty; David B Henson

doi:10.4172/2155-9570.1000117

Research Article - (2011) Volume 2, Issue 1

View PDF Download PDF

The Validity of Reliability Measure in Threshold Perimetry

Xiaolei Shao^1,²^*, Cecilia Fenerty³ and David B Henson^2,³: ¹Department of Ophthalmology, the 2nd Xiangya hospital of Central South University, Changsha, China; ²School of Biomedicine, University of Manchester, United Kingdom; ³Manchester Royal Eye Hospital, Manchester, United Kingdom

^*Corresponding Author: Xiaolei Shao, Department of Ophthalmology, The 2nd Xiang Ya hospital of Central South University, No.139 Middle People Road, Changsha 410011, P.R. China Email:

Abstract

Purpose: To evaluate the color Doppler imaging (CDI) and pattern visual evoked potential (P-VEP) examinations in primary open angle glaucoma (POAG) patients and investigate the relation between flow velocities measured by CDI and P-VEP examination in POAG patients.

Methods: Sixty five POAG patients and 45 control subjects underwent CDI evaluation of the ophthalmic artery (OA), short posterior ciliary artery (SPCA) and central retinal arteries (CRA). The peak systolic velocities (PSV) and end-diastolic velocities (EDV) and resistive index (RI) of all retrobulbar vessels were measured. The latency and amplitude of P100 in P-VEP were recorded. The differences of CDI and P-VEP parameters among POAG and control groups were compared. The correlations between CDI parameters, visual field indices and P-VEP in POAG patients were evaluated by Pearson's correlation analysis.

Results: POAG patients had the lower EDV and higher RI in the OA, CRA and SPCA comparing with that of control subjects. Also, POAG patients had lower PSV in OA and CRA comparing with that of control subjects. The latency of P100 in VEP delayed and the amplitude of P100 decreased in the POAG patients comparing with that of the control group. The RI of OA and SPCA were negatively correlated with the mean deviation (MD) values in the POAG patients. The RI of OA was positively correlated with the PSD value in POAG patients. The MD values in POAG patients were negatively correlated with the latency time of P100. The RI of OA was positively correlated with the latency time of P100 and negatively correlated with the amplitude of P100 in the POAG patients.

Conclusions: The combination of the CDI and pattern VEP techniques provides further interpretation of ocular circulatory changes in POAG patients. Further studies are needed for assessment the relationship between circulatory and neural changes.

Keywords: Color doppler imaging, Pattern visual evoked potential, Primary open angle glaucoma

Introduction

At present, visual field assessments commonly used in clinical practice are subjective in nature. The test results depend on subjective responses of the patient who is expected to press the response button when they see the stimulus. However, patients occasionally press the button even when they didn’t see the stimulus or give no response to a supra-threshold stimulus. This can lead to the threshold being either over or underestimated, Newkirk et al. 2006. Small changes in the visual field of a reliable patient are obviously more significant than similar changes in the visual field of an unreliable patient. It is, therefore, important that an estimate of reliability be provided with each visual field test. The validity¹ of a reliability index is important as it tells us how much confidence we can place upon it being a good measure of reliability.

Modern perimetric programs apply a series of measures to indicate the reliability of patients. The first one is fixation loss rate. The index is measured by using catch trials. During the visual field test process, the perimeter will randomly project a stimulus to the previously mapped blind spot area. If the patient responds to the presentation, a fixation loss (FL) will be recorded by the system. A more than 20% fixation loss rate is considered unreliable by the system. The second measure of reliability is false-positive rate. Perimeters record a falsepositive when the patient responds to a non-existent stimulus. The Full Threshold program estimates the false positive rate (FP) with the aid of catch trials. In each catch trial the perimeter goes through the motions of presenting a stimulus but does not present one. If the ratio of false-positive responses to trials exceeds 33% then the patient is again classified as unreliable. The SITA (Swedish Interactive Threshold Algorithm) test, however, derives an estimate of the false positive rate without catch trials. The responses outside the patient normal response window (calculated by the perimeter at the end of each test) are used as a surrogate false positive rate which is regarded as a more repeatable measure [1,2]. The third reliability index is the percentage of false negatives. Both Full Threshold and SITA tests use catch trials to estimate the false negative rate. During the visual field test, the perimeter presents some extra bright stimuli which should be visible to the patient. If the patient doesn’t respond, the system will record these as false-negative (FN) answers. The cut-off reliability criterion for false negative is also set at 33%. The false-negative rate has been shown to be related to the extent of visual field loss with damaged locations presenting a higher frequency of false negatives than normal locations [3,4]. The SITA test increases the supra-threshold increment for falsenegative trials in locations with a visual defect [1]. The technique of maximum likelihood estimation is also used in SITA to extract a false-negative frequency from the up-down staircases of the threshold determinations, which is combined with the frequency of false-negative answers during post test analysis [5]. These improvements dramatically reduce the relation between false negative rate and sensitivity [1]. A comparison of the reliability measures used in the Full Threshold and SITA tests has been published by Wall et al 2008.

Reproducibility is a measure of reliability in clinical practice. A reliable patient is likely to give similar results (high reproducibility) in repeat tests while an unreliable patient is likely to give very different results in repeat tests. Reproducibility is generally measured by test-retest variability. High test-retest variability represents a low reproducibility. So test-retest variability is a good criterion for the evaluation of reliability indices. However, many researchers have verified a strong relationship between threshold sensitivity and testretest variance [6-9], so it is necessary to compensate for this covariant when evaluating reliability measures with test-retest variance.

The following paper evaluates the reliability measures used in the Full Threshold and SITA Standard tests of the Humphrey Visual Field Analyzer (HFA; Zeiss Humphrey, Dublin, USA) with a measure of test-retest variance that compensates for the increased variance seen at damaged test locations.

Materials and Methods

The visual field results from 2 samples of 74 glaucoma patients were analyzed. The first sample had undergone two visual field tests with the Full Threshold (24-2) test of the HFA while the second sample had undergone two tests with the SITA Standard (24-2) test of the HFA. Intervals between test and retest were 1 week. Both samples were selected from a database of glaucoma patients attending the outpatient clinics of Manchester Royal Eye Hospital. Selection of both patient and eye (only one eye was used from each patient) was based upon: 1) sampling a wide range (early to advanced loss) of defects; 2) matching the distribution of ages between the 2 samples and 3) matching the distribution of Mean Deviation between the 2 samples. Additional inclusion criteria were; visual field loss (Glaucoma Hemifield Test outside normal limits); age greater than 40 years; spherical refractive error within ± 5.00DS (cylinder component less than 3.00D) and corrected visual acuity of 0.5 or better (Log MAR). Patients with a narrow angle, secondary glaucoma and other pathologies (excluding early cataract) were excluded. All patients had prior experience of threshold perimetry.

The blind spot was re-mapped if fixation errors were found to be high during the early part of the visual field test [10] and patients exhibiting fixation losses were verbally encouraged to fixate more carefully.

Test-retest variance of a visual field location is inversely proportional to the threshold of that location [9] which means that eyes with more extensive loss will appear more variable when using raw threshold data. To derive a measure of test-retest variance for each patient that is independent of the extent of visual field defect we undertook the following:

1. Calculated the test-retest threshold difference and average threshold (S) of each location for every patient (excluding blind spot locations).

2. Classified each test location, on the basis of the average threshold, into one of 7 groups (0<S≤5; 5<S≤10; ….. 30<S≤35).

3. Calculated the percentile ranking of each test point within each group on the basis of test-retest threshold difference.

4. Calculated the mean percentile ranking of each eye as a measure of test-retest variability.

This technique compares each test-retest threshold difference with those from a population of locations with a similar threshold value. This analysis was carried out independently for the Full Threshold and SITA tests.

The pre-programmed measures of reliability (FL, FP and FN) were taken from the first visual field test to enhance the clinical applicability of the findings. In the case of the SITA test these were taken from the printed charts to ensure that all post test analysis had been completed.

Logic regression and forward stepwise multiple regression analysis were used to analyze the relationship between test-retest variability and the three reliability indices.

All patients gave written informed consent and the study conformed to the Tenets of the Declaration of Helsinki

Results

The two groups (Full threshold and SITA) had matched age and Mean Deviation (first visit) distributions (Figure 1). The distribution of the 3 reliability indices for visit 1 are given in Figure 2 for both Full Threshold and SITA tests. Also included in Figure 2 are the distributions of test-retest variability. While test-retest variability follows a fairly normal distribution for both field tests, the distributions of the three reliability measures are skewed to the left, especially for FP, with 86% and 47% of patients having no false-positives in the Full and SITA tests respectively.

clinical-experimental-ophthalmology-mean-defect

Figure 1: Age and mean defect distribution of two groups.

clinical-experimental-ophthalmology-test-retest

Figure 2: Distribution of reliability parameters (fixation losses, false positive, -negatives) of the first test and test-retest variability of the two tests for both Full threshold and SITA programs.

Test-retest comparisons of the three reliability indices for both the HFA and SITA are given in Figure 3. The coefficient of repeatability (1.96 x standard deviation of the differences) is given in Table 1 along with the number of tests where the reliability criteria fell beyond the accepted limits of 20% for fixation losses and 33% for false positives and -negatives. Fixation losses have the highest coefficient of repeatability approaching 30% for both tests. The coefficient of repeatability for false-negatives is marginally better (smaller) for the Full Threshold test and much better for SITA. This may indicate that SITA false-negative responses are a better predictor of test-retest variability. The coefficients for false-positives are better still but this, in part, can be explained on the basis of the large number of zero values.

clinical-experimental-ophthalmology-fixation-losses

Figure 3: Relationship between the indices (fixation losses, false positive, -negatives) on the first test and the retest for both Full threshold and SITA programs.

	Full threshold			SITA standard
	FL	FP	FN	FL	FP	FN
SD of the test-retest differences	14.6	5.9	12.6	14.2	3.2	8.5
Coefficient of repeatability	28.6	13.5	24.7	27.8	6.3	16.7
number beyond accepted criteria (1st visit)	18	0	5	16	0	0
number beyond accepted criteria (2nd visit)	20	0	7	19	0	0

Table 1: Test-retest comparisons of the percentage of fixation losses, false positive and false negatives for the Full Threshold and SITA test strategies. Accepted criteria are 20% fixation losses, 33% false positive and 33% false negatives.

The relationship between each of the 3 reliability indices (fixation losses, false-positives, false-negatives) and test-retest variability is given in Figure 4, for both Full Threshold and SITA, while the results from a forward stepping multiple regression analysis are given in Tables 2.

clinical-experimental-ophthalmology-SITA-programs

Figure 4: Relationship between the reliability indices (fixation losses, false positive, -negatives) obtained at the first test and test-retest variability (percentile ranking) for both Full threshold and SITA programs.

Full Threshold	R-square	R-square change	p-level	Variables included
False-negatives	0.0315	0.0315	0.1327	1
Fixation losses	0.0431	0.0115	0.3612	2
False-positives	0.0516	0.0085	0.4332	3

SITA	R-square	R-square change	p-level	Variables included
False-negatives	0.1324	0.1324	0.0014	1
False-positives	0.1569	0.0206	0.1555	2
Fixation losses	0.1693	0.0123	0.3102	3

Table 2: Forward stepping multiple regression analysis of the 3 reliability indices (percentage of fixation loses, false negatives, false positives) and test-retest variability for the Full Threshold and SITA test strategies. Reliability measures taken from the first visual field test.

For the Full Threshold test none of the indices had a significant relationship with test-retest variability at the p<0.05 level and only the false-negative responses had a significant relationship in SITA. Overall the results show a very poor relationship between the reliability indices and test-retest variability. For the Full Threshold tests the 3 measures accounted for only 5.1% of the variance in test-retest variability with the percentage of false-negatives accounting for the majority (3.1%). With SITA the 3 measures accounted for a larger, but still small, proportion of the variability (16.9%) with the major component again coming from the percentage of false-negatives (13.2%).

To investigate whether increasing the sample size of the 3 indices would improve their relationship with test-retest variability we combined the test and retest data. This doubled the mean sample size of false-positives, false-negatives and fixation losses. In the SITA test we took the mean percentage of false-positive and false-negative responses as the false-positive estimates are not based upon a series of trials and the false-negative values undergo some post test processing that did not allow us to pool the responses. Table 3 gives the results of this analysis and shows that the R-square values increase marginally, the total amount of variability explained by the 3 indices increasing to 6.3% for the Full Threshold test and 19.6% for SITA. It is interesting to note that pooling improved the ranking of the fixation losses. Again,only the false-negative responses from SITA had a significant (p<0.05) relationship with test-retest variability.

Full Threshold	R-square	R-square change	p-level	Variables included
Fixation losses	0.0455	0.0455	0.0680	1
False-positives	0.0554	0.0099	0.3899	2
False-negatives	0.0637	0.0082	0.4343	3

SITA	R-square	R-square change	p-level	Variables included
False-negatives	0.1754	0.1754	0.0003	1
Fixation losses	0.1935	0.0180	0.2117	2
False-positives	0.1965	0.0030	0.6092	3

Table 3: Forward stepping multiple regression analysis of the 3 reliability indices (percentage of fixation loses, false negatives, false positives) and test-retest variability for the Full Threshold and SITA test strategies. Reliability measures pooled from 2 visual field tests.

Discussion

The percentage of visual field results classed as unreliable (>20% fixation losses and >33% false-positives and -negatives) in this study is in broad agreement with values already published within the literature [11,4,1]. The number of fixation losses was the main reason for patients to be classified as unreliable, with very few being classed as unreliable on the basis of false-positives. The distributions of the 3 reliability indices for the SITA test were also similar to those published by Bengtsson (2000) [1] as was the relatively normal distribution of testretest variability. Our sample was designed to be as representative as possible of patients attending an outpatient clinic, however, recruiting patients for this type of study does introduce a degree of selection bias, as it is only those who are more concerned and potentially more careful in their visual field examinations that volunteer. The visual field results used in this study were obtained by highly motivated researchers in a research facility, where there was little distraction from other patients/ staff. This could have led to a reduction in the number of errors from that routinely found within clinic populations. The lack of a wide range of reliability indices within our population could have contributed to the poor relationship between the indices and test-retest variability. However, our measure of test-retest variability did show a good range of values. In a non-selected population of patients attending for routine perimetry there may well be a larger range of reliability measures in which the extreme values give useful information on reliability.

The distribution of fixation losses for the Full Threshold and SITA tests were similar, and the tests showed a similar range of test-retest variability. Inaccuracies in locating the blind spot at the beginning of the visual field test is certainly one reason for abnormally high fixation losses and Henson et al 1996 has previously shown that the percentage of fixation losses derived from blind spot catch trials bears little relationship to fixation errors measured by an eye movement recorder. The number of blind spot fixation catch trials is also small and subject to large confidence limits [12]. Pooling the results from both sessions improved the ability of fixation losses to predict test-retest variability, a finding consistent with an under sampled measure. However, the increase in performance was small and would not justify increasing the number of catch trials within a visual field test with its incumbent negative effect upon test times. Currently, the perimetrist’s estimate of fixation accuracy, derived from direct observation of the patient during the visual field test, gives a more valid estimate of fixation accuracy than that derived from blind spot catch trials (Henson et al 1996). With this in mind it might be better to encourage perimetrists to grade fixation accuracy (e.g. excellent, good, average, poor, very poor) and to write this on the visual field chart.

The false-positive measures from the SITA test were better distributed than those from the Full Threshold test. The method of using response times in SITA gives a much larger sample size than that obtained from false-positive catch trials (all trials to which a response was made rather than approximately 20 catch trials with the Full Threshold test). The increased sample size is one of the reasons for the improvement in repeatability of this index in the SITA test, Olsson et al (1997). It was, therefore, disappointing to find that the improvement in this measure did not lead to it making a significant (P<0.05) contribution to test-retest variability (in the multiple regression analysis this index increased R2 by just over 2% in the SITA test).

The best measure of test-retest variability came from the falsenegative index of SITA. This index, on its own, accounted for 13% of the test-retest variability. The false-negative index in the Full Threshold test was much poorer, accounting for only 3% of the variability. The SITA estimate of false-negatives differs from that of the Full Threshold test in two ways. SITA increases the supra-threshold increment at damaged locations and uses a maximum likelihood technique, based upon the up-and-down staircases of the threshold determinations, to derive an estimate of the false-negative rate. The increased supra-threshold increment used at damaged locations was designed to compensate for the relationship between variability and threshold. The increased variability that occurs with reduced sensitivity means that, without such an increase, the probability of a false-negative increase as the threshold decreases. This modification to the false-negative routine could account for the reduction in the percentage of false-negative responses seen in SITA compared to Full Threshold tests (mean percentage of falsenegative responses being 5.2 versus 7.7 for the first visual field test of the Full and SITA tests respectively) also noted by Wall et al 2008. Combining data from catch trials with that from the up-and-down staircases could account for the improved repeatability of false-negative responses found in SITA (SD of the differences between the first and second estimates being 12.6 and 8.5 for the Full Threshold and SITA tests respectively). Overall these two modifications have improved the value of the false-negative estimate in SITA although it still only accounts for a small proportion of the test-retest variability.

Two potential indices of reliability were not investigated in this study, short-term fluctuation (STF) and gaze tracking. STF, which is confined to the Full threshold test, gives an estimate of the within-test variability derived from repeat measures at a series of test locations [14,15]. It does not take into account the relationship between threshold and variability and is, therefore, highly dependent upon the sensitivity of the selected re-test locations. Gaze tracking detects movement of the patient’s eye during the perimetric examination. It does not differentiate between lateral displacements and rotations of the eye. Lateral displacements regularly occur during a perimetric examination when patients make small adjustments to their head position and it is not unusual for the gaze tracking signal to ‘max out’ due to such movements in patients with excellent fixation.

Conclusions

When a series of visual field charts are available the clinician can get a good impression of how variable a patient is by simply looking at the consistency of the results. Unfortunately, this cannot be done when the series of results does not extend beyond a couple of sets of data. In this situation the clinician needs valid measure of test-retest variability. Unfortunately, the reliability indices incorporated in the Full Threshold and SITA tests (fixation losses, false-positive and false-negatives) are poor predictors of test-retest variability. New, more valid, indices of reliability are needed to assist clinicians in making clinical decisions concerning change in the visual field.

References

Citation: Shao X, Fenerty C, Henson DB (2011) The Validity of Reliability Measure in Threshold Perimetry. J Clinic Experiment Ophthalmol 2:117.

Copyright: © 2011 Shao X, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Journal of Clinical and Experimental OphthalmologyOpen Access

The Validity of Reliability Measure in Threshold Perimetry

Abstract

Introduction

Materials and Methods

Results

Discussion

Conclusions

References

Journal of Clinical and Experimental Ophthalmology
Open Access