Compute benchmark scale membership probabilities

An elaborate approach to interpret a kappa value is based on the notion of cumulative interval membership probability (CIMP). The interval probability represents the Normality-based probability that the “true” agreement coefficient kappa belongs to the interval in question. The general rule consists of retaining the highest interval whose CIMP equals or exceeds the threshold of 0.95. For more details see Inter-rater reliability among multiple raters when subjects are rated by different pairs of subjects blog post.

We can use one of the available scales or define a custom one. The available ones are:

  • Altman

  • Cicchetti-Sparrow

  • Fleiss

  • Landis-Koch

  • Regier

Initialize Benchmark

First, we have to initialize a Benchmark object.

[1]:
import pandas as pd
from irrCAC.benchmark import Benchmark

benchmark = Benchmark(coeff=0.67, se=0.15)
benchmark
[1]:
<Benchmark scales Coefficient value: 0.67, Standard Error: 0.15>

Altman

To interpret coefficient using the Altman scale, use the altman() method.

[2]:
altman = benchmark.altman()
altman_interp = pd.DataFrame(altman)
altman_interp
[2]:
scale Altman CumProb
0 (0.8, 1.0) Very Good 0.18168
1 (0.6, 0.8) Good 0.67511
2 (0.4, 0.6) Moderate 0.96356
3 (0.2, 0.4) Fair 0.99912
4 (-1.0, 0.2) Poor 1.00000

Ciccheti-Sparrow

To interpret coefficient using the Ciccheti-Sparrow scale, use the ciccheti_sparrow() method.

[3]:
cs = benchmark.cicchetti_sparrow()
cs_interp = pd.DataFrame(cs)
cs_interp
[3]:
scale Cicchetti CumProb
0 (0.75, 1.0) Excellent 0.28699
1 (0.6, 0.75) Good 0.67511
2 (0.4, 0.6) Fair 0.96356
3 (0.0, 0.4) Poor 1.00000

Fleiss

To interpret coefficient using the Fleiss scale, use the fleiss() method.

[4]:
fleiss = benchmark.fleiss()
fleiss_intep = pd.DataFrame(fleiss)
fleiss_intep
[4]:
scale Fleiss CumProb
0 (0.75, 1.0) Excellent 0.28699
1 (0.4, 0.75) Intermediate to Good 0.96356
2 (-1.0, 0.4) Poor 1.00000

Landis-Koch

To interpret coefficient using the Landis-Koch scale, use the landis_koch() method.

[5]:
lk = benchmark.landis_koch()
lk_interp = pd.DataFrame(lk)
lk_interp
[5]:
scale Landis-Koch CumProb
0 (0.8, 1.0) Almost Perfect 0.18168
1 (0.6, 0.8) Substantial 0.67511
2 (0.4, 0.6) Moderate 0.96356
3 (0.2, 0.4) Fair 0.99912
4 (0.0, 0.2) Slight 1.00000
5 (-1.0, 0.0) Poor 1.00000

Regier et al.

To interpret coefficient using the Landis-Koch scale, use the regier() method.

[6]:
regier = benchmark.regier()
regier_interp = pd.DataFrame(regier)
regier_interp
[6]:
scale Regier CumProb
0 (0.8, 1.0) Excellent 0.18168
1 (0.6, 0.8) Very Good 0.67511
2 (0.4, 0.6) Good 0.96356
3 (0.2, 0.4) Questionable 0.99912
4 (0.0, 0.2) Unacceptable 1.00000

Custom Scale

If you must use a custom scale, then you have to provide a dictionary as the one bellow. The lb key is for the lower bounds of the scale, the ub for the upper bounds, the interp key are the interpretations of each scale, and scale_name the name of your scale.

To use the scale, call the interpet() method passing the custom scale as an argument.

[7]:
my_scale = dict(
    lb=[0.6, 0.3, 0.0],
    ub=[1.0, 0.6, 0.3],
    interp=['Excellent', 'Acceptable', 'Poor'],
    scale_name='My Scale')
my_bench = benchmark.interpret(my_scale)
my_bench_interp = pd.DataFrame(my_bench)
my_bench_interp
[7]:
scale My Scale CumProb
0 (0.6, 1.0) Excellent 0.67511
1 (0.3, 0.6) Acceptable 0.99308
2 (0.0, 0.3) Poor 1.00000