Compute benchmark scale membership probabilities

An elaborate approach to interpret a kappa value is based on the notion of cumulative interval membership probability (CIMP). The interval probability represents the Normality-based probability that the “true” agreement coefficient kappa belongs to the interval in question. The general rule consists of retaining the highest interval whose CIMP equals or exceeds the threshold of 0.95. For more details see Inter-rater reliability among multiple raters when subjects are rated by different pairs of subjects blog post.

We can use one of the available scales or define a custom one. The available ones are:

Altman
Cicchetti-Sparrow
Fleiss
Landis-Koch
Regier

Initialize Benchmark

First, we have to initialize a Benchmark object.

[1]:

import pandas as pd
from irrCAC.benchmark import Benchmark

benchmark = Benchmark(coeff=0.67, se=0.15)
benchmark

[1]:

<Benchmark scales Coefficient value: 0.67, Standard Error: 0.15>

Altman

To interpret coefficient using the Altman scale, use the altman() method.

[2]:

altman = benchmark.altman()
altman_interp = pd.DataFrame(altman)
altman_interp

[2]:

	scale	Altman	CumProb
0	(0.8, 1.0)	Very Good	0.18168
1	(0.6, 0.8)	Good	0.67511
2	(0.4, 0.6)	Moderate	0.96356
3	(0.2, 0.4)	Fair	0.99912
4	(-1.0, 0.2)	Poor	1.00000

Ciccheti-Sparrow

To interpret coefficient using the Ciccheti-Sparrow scale, use the ciccheti_sparrow() method.

[3]:

cs = benchmark.cicchetti_sparrow()
cs_interp = pd.DataFrame(cs)
cs_interp

[3]:

	scale	Cicchetti	CumProb
0	(0.75, 1.0)	Excellent	0.28699
1	(0.6, 0.75)	Good	0.67511
2	(0.4, 0.6)	Fair	0.96356
3	(0.0, 0.4)	Poor	1.00000

Fleiss

To interpret coefficient using the Fleiss scale, use the fleiss() method.

[4]:

fleiss = benchmark.fleiss()
fleiss_intep = pd.DataFrame(fleiss)
fleiss_intep

[4]:

	scale	Fleiss	CumProb
0	(0.75, 1.0)	Excellent	0.28699
1	(0.4, 0.75)	Intermediate to Good	0.96356
2	(-1.0, 0.4)	Poor	1.00000

Landis-Koch

To interpret coefficient using the Landis-Koch scale, use the landis_koch() method.

[5]:

lk = benchmark.landis_koch()
lk_interp = pd.DataFrame(lk)
lk_interp

[5]:

	scale	Landis-Koch	CumProb
0	(0.8, 1.0)	Almost Perfect	0.18168
1	(0.6, 0.8)	Substantial	0.67511
2	(0.4, 0.6)	Moderate	0.96356
3	(0.2, 0.4)	Fair	0.99912
4	(0.0, 0.2)	Slight	1.00000
5	(-1.0, 0.0)	Poor	1.00000

Regier et al.

To interpret coefficient using the Landis-Koch scale, use the regier() method.

[6]:

regier = benchmark.regier()
regier_interp = pd.DataFrame(regier)
regier_interp

[6]:

	scale	Regier	CumProb
0	(0.8, 1.0)	Excellent	0.18168
1	(0.6, 0.8)	Very Good	0.67511
2	(0.4, 0.6)	Good	0.96356
3	(0.2, 0.4)	Questionable	0.99912
4	(0.0, 0.2)	Unacceptable	1.00000

Custom Scale

If you must use a custom scale, then you have to provide a dictionary as the one bellow. The lb key is for the lower bounds of the scale, the ub for the upper bounds, the interp key are the interpretations of each scale, and scale_name the name of your scale.

To use the scale, call the interpet() method passing the custom scale as an argument.

[7]:

my_scale = dict(
    lb=[0.6, 0.3, 0.0],
    ub=[1.0, 0.6, 0.3],
    interp=['Excellent', 'Acceptable', 'Poor'],
    scale_name='My Scale')
my_bench = benchmark.interpret(my_scale)
my_bench_interp = pd.DataFrame(my_bench)
my_bench_interp

[7]:

	scale	My Scale	CumProb
0	(0.6, 1.0)	Excellent	0.67511
1	(0.3, 0.6)	Acceptable	0.99308
2	(0.0, 0.3)	Poor	1.00000