{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "# CAC from Raw Ratings\n", "\n", "The raw ratings, or wide data format, is a way to organize your data in a table\n", "format where each row represents a subject, each column a rater and each data\n", "point at the junction of the row and column represents the rating the rater\n", "assigned to the subject. Its main advantage is the completeness of the\n", "information it presents. With this format, there is no loss of information as\n", "it shows what rater rated what subject and the specific rating assigned to\n", "every subject. A secondary advantage of this format is its ability to use\n", "categorical ratings as well as quantitative measurements.\n", "\n", "Such datasets are the ones with the `raw_` prefix in the\n", "[datasets](../irrCAC.rst#module-irrCAC.datasets) module.\n", "One example dataset is [raw_4raters](../irrCAC.rst#irrCAC.datasets.raw_4raters)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Rater1Rater2Rater3Rater4
Units
11.01.0NaN1.0
22.02.03.02.0
33.03.03.03.0
43.03.03.03.0
52.02.02.02.0
61.02.03.04.0
74.04.04.04.0
81.01.02.01.0
92.02.02.02.0
10NaN5.05.05.0
11NaNNaN1.01.0
12NaNNaN3.0NaN
\n
", "text/plain": " Rater1 Rater2 Rater3 Rater4\nUnits \n1 1.0 1.0 NaN 1.0\n2 2.0 2.0 3.0 2.0\n3 3.0 3.0 3.0 3.0\n4 3.0 3.0 3.0 3.0\n5 2.0 2.0 2.0 2.0\n6 1.0 2.0 3.0 4.0\n7 4.0 4.0 4.0 4.0\n8 1.0 1.0 2.0 1.0\n9 2.0 2.0 2.0 2.0\n10 NaN 5.0 5.0 5.0\n11 NaN NaN 1.0 1.0\n12 NaN NaN 3.0 NaN" }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from irrCAC.datasets import raw_4raters\n", "\n", "data = raw_4raters()\n", "data" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "As you can see, a dataset of raw ratings is merely a listing of ratings that\n", "the raters assigned to the subjects. Each row is associated with a single\n", "subject. Typically, the same subject would be rated by all or some of the\n", "raters. The dataset `raw_4raters` contains some missing ratings represented by\n", "the symbol `NaN`, suggesting that some raters did not rate all subjects. As a\n", "matter of fact, in this particular case, no rater rated all subjects.\n", "\n", ".. note:: The categories appears as floating numbers because of the `NaN` values.\n" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "## Initialize CAC\n", "\n", "To compute the various agreement coefficients using the raw ratings, first\n", "initialize a [CAC](../irrCAC.rst#module-irrCAC.raw) object.\n", "By initializing the object, it has information about the subjects, raters,\n", "categories, and weights." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "from irrCAC.raw import CAC\n", "\n", "cac_4raters = CAC(data)\n", "print(cac_4raters)\n" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "To calculate the agreement coefficients, you call the appropriate methods." ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "## Fleiss' Coefficient\n", "\n", "To calculate the Fleiss' coefficient, call the `fleiss()` method." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": "{'est': {'coefficient_value': 0.76117,\n 'coefficient_name': \"Fleiss' kappa\",\n 'confidence_interval': (0.42438, 1),\n 'p_value': 0.00041917303853056254,\n 'z': 4.97434,\n 'se': 0.15302,\n 'pa': 0.81818,\n 'pe': 0.23872},\n 'weights': array([[1., 0., 0., 0., 0.],\n [0., 1., 0., 0., 0.],\n [0., 0., 1., 0., 0.],\n [0., 0., 0., 1., 0.],\n [0., 0., 0., 0., 1.]]),\n 'categories': [1.0, 2.0, 3.0, 4.0, 5.0]}" }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fleiss = cac_4raters.fleiss()\n", "fleiss" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "## Gwet's Coefficient\n", "\n", "To calculate Gwet's coefficient, call the `gwet()` method." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": "{'est': {'coefficient_value': 0.77544,\n 'coefficient_name': 'AC1',\n 'confidence_interval': (0.46081, 1),\n 'p_value': 0.0002087209840633264,\n 'z': 5.42458,\n 'se': 0.14295,\n 'pa': 0.81818,\n 'pe': 0.19032},\n 'weights': array([[1., 0., 0., 0., 0.],\n [0., 1., 0., 0., 0.],\n [0., 0., 1., 0., 0.],\n [0., 0., 0., 1., 0.],\n [0., 0., 0., 0., 1.]]),\n 'categories': [1.0, 2.0, 3.0, 4.0, 5.0]}" }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gwet = cac_4raters.gwet()\n", "gwet" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "## Krippendorff's Alpha\n", "\n", "To calculate the Krippendorff's Alpha, call the `krippendorff()` method." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": "{'est': {'coefficient_value': 0.74342,\n 'coefficient_name': \"Krippendorff's Alpha\",\n 'confidence_interval': (0.41906, 1),\n 'p_value': 0.000459425698154714,\n 'z': 5.10682,\n 'se': 0.14557,\n 'pa': 0.805,\n 'pe': 0.24},\n 'weights': array([[1., 0., 0., 0., 0.],\n [0., 1., 0., 0., 0.],\n [0., 0., 1., 0., 0.],\n [0., 0., 0., 1., 0.],\n [0., 0., 0., 0., 1.]]),\n 'categories': [1.0, 2.0, 3.0, 4.0, 5.0]}" }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alpha = cac_4raters.krippendorff()\n", "alpha" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "## Conger\n", "\n", "To calculate the Conger's coefficient, call the `conger()` method." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": "{'est': {'coefficient_value': 0.76282,\n 'coefficient_name': \"Conger's kappa\",\n 'confidence_interval': (0.4345, 1),\n 'p_value': 0.00033670657720907826,\n 'z': 5.11381,\n 'se': 0.14917,\n 'pa': 0.81818,\n 'pe': 0.23343},\n 'weights': array([[1., 0., 0., 0., 0.],\n [0., 1., 0., 0., 0.],\n [0., 0., 1., 0., 0.],\n [0., 0., 0., 1., 0.],\n [0., 0., 0., 0., 1.]]),\n 'categories': [1.0, 2.0, 3.0, 4.0, 5.0]}" }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "conger = cac_4raters.conger()\n", "conger" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "## Brennar-Prediger\n", "\n", "To calculate the Brennar-Prediger coefficient, call the `bp()` method.\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": "{'est': {'coefficient_value': 0.77273,\n 'coefficient_name': 'Brennan-Prediger',\n 'confidence_interval': (0.45421, 1),\n 'p_value': 0.00011878043481217126,\n 'z': 5.33959,\n 'se': 0.14472,\n 'pa': 0.81818,\n 'pe': 0.2},\n 'weights': array([[1., 0., 0., 0., 0.],\n [0., 1., 0., 0., 0.],\n [0., 0., 1., 0., 0.],\n [0., 0., 0., 1., 0.],\n [0., 0., 0., 0., 1.]]),\n 'categories': [1.0, 2.0, 3.0, 4.0, 5.0]}" }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bp = cac_4raters.bp()\n", "bp" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "## Weighted Analysis\n", "\n", "We can use custom weights or predefined weight types initializing the\n", "[CAC](../irrCAC.rst#module-irrCAC.raw) objects. For the available weight\n", "types see the [Weights](../irrCAC.rst#module-irrCAC.weights) module.\n", "\n", "In the following example, we initialize a new object on the same data using\n", "[quadratic](../irrCAC.rst#irrCAC.weights.Weights.quadratic) weights." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": "" }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cac_4raters_quadratic = CAC(data, weights='quadratic')\n", "cac_4raters_quadratic" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "To see the weights' matrix we can print the `weights_mat` attribute of the\n", "object." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": "array([[1. , 0.9375, 0.75 , 0.4375, 0. ],\n [0.9375, 1. , 0.9375, 0.75 , 0.4375],\n [0.75 , 0.9375, 1. , 0.9375, 0.75 ],\n [0.4375, 0.75 , 0.9375, 1. , 0.9375],\n [0. , 0.4375, 0.75 , 0.9375, 1. ]])" }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cac_4raters_quadratic.weights_mat" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "Next, we simply call the method of the coefficient we want the calculation.\n", "Here for example we show the weighted Gwet coefficient." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": "{'est': {'coefficient_value': 0.914,\n 'coefficient_name': 'AC2',\n 'confidence_interval': (0.68518, 1),\n 'p_value': 2.6344384658205655e-06,\n 'z': 8.79166,\n 'se': 0.10396,\n 'pa': 0.97538,\n 'pe': 0.7137},\n 'weights': array([[1. , 0.9375, 0.75 , 0.4375, 0. ],\n [0.9375, 1. , 0.9375, 0.75 , 0.4375],\n [0.75 , 0.9375, 1. , 0.9375, 0.75 ],\n [0.4375, 0.75 , 0.9375, 1. , 0.9375],\n [0. , 0.4375, 0.75 , 0.9375, 1. ]]),\n 'categories': [1.0, 2.0, 3.0, 4.0, 5.0]}" }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gwet_quadratic = cac_4raters_quadratic.gwet()\n", "gwet_quadratic" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "To use custom weights pass a list of values or an array." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": "" }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "\n", "\n", "weights = np.array(\n", " [\n", " [1.00, 0.75, 0.50, 0.25, 0.00],\n", " [0.00, 1.00, 0.75, 0.50, 0.25],\n", " [0.25, 0.00, 1.00, 0.75, 0.50],\n", " [0.50, 0.25, 0.00, 1.00, 0.75],\n", " [0.75, 0.50, 0.25, 0.00, 1.00],\n", " ]\n", ")\n", "cac_4raters_custom_weights = CAC(data, weights=weights)\n", "cac_4raters_custom_weights" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "Verify the weights." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": "array([[1. , 0.75, 0.5 , 0.25, 0. ],\n [0. , 1. , 0.75, 0.5 , 0.25],\n [0.25, 0. , 1. , 0.75, 0.5 ],\n [0.5 , 0.25, 0. , 1. , 0.75],\n [0.75, 0.5 , 0.25, 0. , 1. ]])" }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cac_4raters_custom_weights.weights_mat" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "Calculate a coefficient using the custom weights." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/plain": "{'est': {'coefficient_value': 0.78322,\n 'coefficient_name': 'AC2',\n 'confidence_interval': (0.47891, 1),\n 'p_value': 0.00014556862670245252,\n 'z': 5.66487,\n 'se': 0.13826,\n 'pa': 0.88636,\n 'pe': 0.4758},\n 'weights': array([[1. , 0.75, 0.5 , 0.25, 0. ],\n [0. , 1. , 0.75, 0.5 , 0.25],\n [0.25, 0. , 1. , 0.75, 0.5 ],\n [0.5 , 0.25, 0. , 1. , 0.75],\n [0.75, 0.5 , 0.25, 0. , 1. ]]),\n 'categories': [1.0, 2.0, 3.0, 4.0, 5.0]}" }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gwet_custom_weights = cac_4raters_custom_weights.gwet()\n", "gwet_custom_weights" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } }, "source": [ "To compare the results of the [identity](../irrCAC.rst#irrCAC.weights.Weights.identity) weights,\n", "the calculation with the [quadratic](../irrCAC.rst#irrCAC.weights.Weights.quadratic)\n", "weights, and the custom weights, we display the results side by side." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Identity WeightsQuadratic WeightsCustom Weights
0(coefficient_value, 0.77544)(coefficient_value, 0.914)(coefficient_value, 0.78322)
1(coefficient_name, AC1)(coefficient_name, AC2)(coefficient_name, AC2)
2(confidence_interval, (0.46081, 1))(confidence_interval, (0.68518, 1))(confidence_interval, (0.47891, 1))
3(p_value, 0.0002087209840633264)(p_value, 2.6344384658205655e-06)(p_value, 0.00014556862670245252)
4(z, 5.42458)(z, 8.79166)(z, 5.66487)
5(se, 0.14295)(se, 0.10396)(se, 0.13826)
6(pa, 0.81818)(pa, 0.97538)(pa, 0.88636)
7(pe, 0.19032)(pe, 0.7137)(pe, 0.4758)
\n
", "text/plain": " Identity Weights Quadratic Weights \\\n0 (coefficient_value, 0.77544) (coefficient_value, 0.914) \n1 (coefficient_name, AC1) (coefficient_name, AC2) \n2 (confidence_interval, (0.46081, 1)) (confidence_interval, (0.68518, 1)) \n3 (p_value, 0.0002087209840633264) (p_value, 2.6344384658205655e-06) \n4 (z, 5.42458) (z, 8.79166) \n5 (se, 0.14295) (se, 0.10396) \n6 (pa, 0.81818) (pa, 0.97538) \n7 (pe, 0.19032) (pe, 0.7137) \n\n Custom Weights \n0 (coefficient_value, 0.78322) \n1 (coefficient_name, AC2) \n2 (confidence_interval, (0.47891, 1)) \n3 (p_value, 0.00014556862670245252) \n4 (z, 5.66487) \n5 (se, 0.13826) \n6 (pa, 0.88636) \n7 (pe, 0.4758) " }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "df = pd.DataFrame(\n", " zip(gwet['est'].items(),\n", " gwet_quadratic['est'].items(),\n", " gwet_custom_weights['est'].items()),\n", " columns=['Identity Weights', 'Quadratic Weights', 'Custom Weights'])\n", "df" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }