Compute benchmark scale membership probabilities
An elaborate approach to interpret a kappa value is based on the notion of cumulative interval membership probability (CIMP). The interval probability represents the Normality-based probability that the “true” agreement coefficient kappa belongs to the interval in question. The general rule consists of retaining the highest interval whose CIMP equals or exceeds the threshold of 0.95. For more details see Inter-rater reliability among multiple raters when subjects are rated by different pairs of subjects blog post.
We can use one of the available scales or define a custom one. The available ones are:
Altman
Cicchetti-Sparrow
Fleiss
Landis-Koch
Regier
Initialize Benchmark
First, we have to initialize a Benchmark object.
[1]:
import pandas as pd
from irrCAC.benchmark import Benchmark
benchmark = Benchmark(coeff=0.67, se=0.15)
benchmark
[1]:
<Benchmark scales Coefficient value: 0.67, Standard Error: 0.15>
Altman
To interpret coefficient using the Altman scale, use the altman() method.
[2]:
altman = benchmark.altman()
altman_interp = pd.DataFrame(altman)
altman_interp
[2]:
| scale | Altman | CumProb | |
|---|---|---|---|
| 0 | (0.8, 1.0) | Very Good | 0.18168 |
| 1 | (0.6, 0.8) | Good | 0.67511 |
| 2 | (0.4, 0.6) | Moderate | 0.96356 |
| 3 | (0.2, 0.4) | Fair | 0.99912 |
| 4 | (-1.0, 0.2) | Poor | 1.00000 |
Ciccheti-Sparrow
To interpret coefficient using the Ciccheti-Sparrow scale, use the ciccheti_sparrow() method.
[3]:
cs = benchmark.cicchetti_sparrow()
cs_interp = pd.DataFrame(cs)
cs_interp
[3]:
| scale | Cicchetti | CumProb | |
|---|---|---|---|
| 0 | (0.75, 1.0) | Excellent | 0.28699 |
| 1 | (0.6, 0.75) | Good | 0.67511 |
| 2 | (0.4, 0.6) | Fair | 0.96356 |
| 3 | (0.0, 0.4) | Poor | 1.00000 |
Fleiss
To interpret coefficient using the Fleiss scale, use the fleiss() method.
[4]:
fleiss = benchmark.fleiss()
fleiss_intep = pd.DataFrame(fleiss)
fleiss_intep
[4]:
| scale | Fleiss | CumProb | |
|---|---|---|---|
| 0 | (0.75, 1.0) | Excellent | 0.28699 |
| 1 | (0.4, 0.75) | Intermediate to Good | 0.96356 |
| 2 | (-1.0, 0.4) | Poor | 1.00000 |
Landis-Koch
To interpret coefficient using the Landis-Koch scale, use the landis_koch() method.
[5]:
lk = benchmark.landis_koch()
lk_interp = pd.DataFrame(lk)
lk_interp
[5]:
| scale | Landis-Koch | CumProb | |
|---|---|---|---|
| 0 | (0.8, 1.0) | Almost Perfect | 0.18168 |
| 1 | (0.6, 0.8) | Substantial | 0.67511 |
| 2 | (0.4, 0.6) | Moderate | 0.96356 |
| 3 | (0.2, 0.4) | Fair | 0.99912 |
| 4 | (0.0, 0.2) | Slight | 1.00000 |
| 5 | (-1.0, 0.0) | Poor | 1.00000 |
Regier et al.
To interpret coefficient using the Landis-Koch scale, use the regier() method.
[6]:
regier = benchmark.regier()
regier_interp = pd.DataFrame(regier)
regier_interp
[6]:
| scale | Regier | CumProb | |
|---|---|---|---|
| 0 | (0.8, 1.0) | Excellent | 0.18168 |
| 1 | (0.6, 0.8) | Very Good | 0.67511 |
| 2 | (0.4, 0.6) | Good | 0.96356 |
| 3 | (0.2, 0.4) | Questionable | 0.99912 |
| 4 | (0.0, 0.2) | Unacceptable | 1.00000 |
Custom Scale
If you must use a custom scale, then you have to provide a dictionary as the one bellow. The lb key is for the lower bounds of the scale, the ub for the upper bounds, the interp key are the interpretations of each scale, and scale_name the name of your scale.
To use the scale, call the interpet() method passing the custom scale as an argument.
[7]:
my_scale = dict(
lb=[0.6, 0.3, 0.0],
ub=[1.0, 0.6, 0.3],
interp=['Excellent', 'Acceptable', 'Poor'],
scale_name='My Scale')
my_bench = benchmark.interpret(my_scale)
my_bench_interp = pd.DataFrame(my_bench)
my_bench_interp
[7]:
| scale | My Scale | CumProb | |
|---|---|---|---|
| 0 | (0.6, 1.0) | Excellent | 0.67511 |
| 1 | (0.3, 0.6) | Acceptable | 0.99308 |
| 2 | (0.0, 0.3) | Poor | 1.00000 |