When is inter rater reliability used




















Looks like you do not have access to this content. Entries Per Page:. Methods Map Research Methods. Explore the Methods Map. Related Content. Back to Top. However, higher inter-rater reliabilities may be needed in specific fields. What is Test-Retest Reliability? What is Parallel Forms Reliability?

What is a Standard Error of Measurement? Your email address will not be published. Skip to content Menu. Posted on February 26, February 27, by Zach. There are two common ways to measure inter-rater reliability: 1.

Instead, we have to estimate reliability, and this is always an imperfect endeavor. Here, I want to introduce the major reliability estimators and talk about their strengths and weaknesses. There are four general classes of reliability estimates , each of which estimates reliability in a different way. They are:.

Whenever you use humans as a part of your measurement procedure, you have to worry about whether the results you get are reliable or consistent. People are notorious for their inconsistency. We are easily distractible. We get tired of doing repetitive tasks. We daydream. We misinterpret. So how do we determine whether two observers are being consistent in their observations?

You probably should establish inter-rater reliability outside of the context of the measurement in your study. There are two major ways to actually estimate inter-rater reliability. If your measurement consists of categories — the raters are checking off which category each observation falls in — you can calculate the percent of agreement between the raters. For each observation, the rater could check one of three categories.

Imagine that on 86 of the observations the raters checked the same category. The other major way to estimate inter-rater reliability is appropriate when the measure is a continuous one. There, all you need to do is calculate the correlation between the ratings of the two observers. For instance, they might be rating the overall level of activity in a classroom on a 1-to-7 scale.

You could have them give their rating at regular time intervals e. The correlation between these ratings would give you an estimate of the reliability or consistency between the raters. For instance, I used to work in a psychiatric unit where every morning a nurse had to do a ten-item rating of each patient on the unit.

Although this was not an estimate of reliability, it probably went a long way toward improving the reliability between raters. We estimate test-retest reliability when we administer the same test to the same sample on two different occasions. This approach assumes that there is no substantial change in the construct being measured between the two occasions.

The amount of time allowed between measures is critical. We know that if we measure the same thing twice that the correlation between the two observations will depend in part by how much time elapses between the two measurement occasions. The shorter the time gap, the higher the correlation; the longer the time gap, the lower the correlation. This is because the two observations are related over time — the closer in time we get the more similar the factors that contribute to error.

Since this correlation is the test-retest estimate of reliability, you can obtain considerably different estimates depending on the interval. In parallel forms reliability you first have to create two parallel forms. One way to accomplish this is to create a large set of questions that address the same construct and then randomly divide the questions into two sets.



0コメント

  • 1000 / 1000