Comparing multiple groups is easy when your data follows a normal distribution. The problem is, most real-world data doesn't. If ANOVA is your default test, you'll get to the wrong conclusions, as it assumes your data follows a normal distribution. When it isn't - think skewed data or small samples - you need a different approach. The Kruskal-Wallis test is that different approach.

What You'll Learn:

What the Kruskal-Wallis test is and when to use it
How it compares to ANOVA and Mann-Whitney U test
The formula and math behind the test statistic H
How to run it in Python and R with examples
Interpret results and perform post hoc Dunn's test
Assumptions and when NOT to use Kruskal-Wallis

What Is the Kruskal-Wallis Test?

The Kruskal-Wallis test is a nonparametric method for comparing three or more independent groups. It converts all observations into ranks and compares those ranks across groups instead of working with raw values.

You can think of it as an extension of the Mann-Whitney U test, which compares two groups. The Mann-Whitney U does the same rank-based comparison, but only for two groups. The Kruskal-Wallis test scales it to three or more.

Because it works on ranks rather than raw values, it doesn't assume your data follows any particular distribution. That's what makes it useful with real-world data, as it never tends to follow one distribution type perfectly.

When to Use the Kruskal-Wallis Test

The Kruskal-Wallis test is a great fit when you're dealing with:

Three or More Independent Groups

You want to compare three or more groups that are independent of each other.

Ordinal or Continuous Data

Such as Likert scale ratings or measurement data that can be ranked meaningfully.

Non-Normal Distributions

Through skewed data, outliers, small samples, or anything ANOVA can't handle.

Small Sample Sizes

Where normality is hard to verify and parametric tests would be unreliable.

Imagine you want to compare exam scores across three different classes. The scores are skewed and the samples are small, so ANOVA isn't a good choice. The Kruskal-Wallis test doesn't need normality, so it works here. It'll tell you whether at least one class scored differently from the others without making assumptions your data can't support.

Kruskal-Wallis Test vs. ANOVA

Both tests compare groups, but they do it differently. ANOVA compares group means and assumes your data is normally distributed with roughly equal variances. When those assumptions are true, it's the better choice - it's more statistically powerful and the results are easier to interpret.

The Kruskal-Wallis test compares group distributions using ranks. It doesn't care about normality or equal variances. That makes it more flexible, but you lose some statistical power in the process.

Feature	ANOVA	Kruskal-Wallis
Number of Groups	3+ (any number)	3+ (any number)
Data Type	Continuous, normal distribution	Ordinal or continuous, any distribution
What It Compares	Group means	Group distributions (via ranks)
Assumptions	Normality, equal variances	Independent samples, rankable data
Statistical Power	Higher when assumptions met	Lower, but works with non-normal data

If your data is normally distributed, use ANOVA. If it isn't - or you can't verify that it is - use Kruskal-Wallis.

Kruskal-Wallis Test Formula

The Kruskal-Wallis test boils down to a single test statistic, H. Here's the formula:

Kruskal-Wallis Formula

H = (12 / (N * (N + 1))) * Σ(R_i² / n_i) - 3 * (N + 1)

Where:
N = total number of observations across all groups
k = number of groups
n_i = number of observations in group i
R_i = sum of ranks assigned to group i

The formula measures how much the rank sums of each group deviate from what you'd expect if all groups were identical. A large H means the groups are different, and a small H means they are not that different.

Once you have H, you compare it against a chi-square distribution with k - 1 degrees of freedom to get a p-value.

How the Kruskal-Wallis Test Works

There are four steps needed to perform the Kruskal-Wallis test:

Combine All Groups

Take all observations from every group and combine them into a single dataset.

Rank All Observations

Sort the combined data from smallest to largest and assign ranks. The smallest value gets rank 1, the next gets rank 2, and so on. If two values are equal, they share the average of the ranks they would have occupied.

Compute Rank Sums

Split the ranks into their original groups. Add up the ranks for each group. These are your rank sums - R_i in the formula.

Calculate the Test Statistic

Add the rank sums into the H formula. If the groups are similar, their rank sums will be close to each other and H will be small. If one group consistently gets higher or lower ranks, H grows larger.

You can see that the test doesn't care about the actual values, but instead, only where they are relative to everything else.

Kruskal-Wallis Test in Python

Python's scipy library has a built-in function for the Kruskal-Wallis test, meaning you don't have to implement the formula by hand. Let's go through an example.

Say you're comparing exam scores across three classes. Here's how you'd run the test:

Python Example: Kruskal-Wallis Test

from scipy import stats

# Exam scores
class_a = [78, 85, 90, 72, 88]
class_b = [65, 70, 68, 74, 60]
class_c = [88, 92, 95, 85, 91]

# Run the test
statistic, p_value = stats.kruskal(class_a, class_b, class_c)

print(f"H statistic: {statistic:.4f}")
print(f"P-value: {p_value:.4f}")

The p-value is below 0.05, which means at least one class scored differently from the others. Just keep in mind the test won't tell you which one - you'll need a post hoc test for that.

Kruskal-Wallis Test in R

Just like Python, R has a built-in function for this test. Let's use the same exam score scenario.

R Example: Kruskal-Wallis Test

# Exam scores
class_a <- c(78, 85, 90, 72, 88)
class_b <- c(65, 70, 68, 74, 60)
class_c <- c(88, 92, 95, 85, 91)

# Combine
scores <- c(class_a, class_b, class_c)
groups <- factor(rep(c("A", "B", "C"), each = 5))

# Run the test
kruskal.test(scores ~ groups)

The output is the same as what you got in Python - same H statistic, same p-value. With p < 0.05, you'd reject the null hypothesis and conclude that at least one group differs.

How to Interpret Kruskal-Wallis Results

The null hypothesis of the Kruskal-Wallis test is that all groups have the same distribution. The p-value tells you whether to reject it. Here's how to interpret it:

•

p < 0.05

At least one group differs from the others, so reject the null hypothesis.

•

p >= 0.05

There is no strong evidence that the groups differ, so don't reject the null hypothesis.

The 0.05 threshold is a convention. Depending on your field or the stakes of your analysis, you might use a stricter threshold like 0.01 or a looser one like 0.10.

Keep in mind this test won't tell you which group is different. A significant result just means the groups aren't all the same. You know something is going on, but not where. To find out which pairs are driving the difference, you need a post hoc test.

Post Hoc Tests After Kruskal-Wallis

The test tells you that at least one group differs, but not which group is actually different. If you have three groups and p < 0.05, it could be A versus B, A versus C, B versus C, or some combination. You need to perform a post hoc test to get these pairwise comparisons.

Dunn's test is the most common choice. It runs pairwise comparisons between all groups and adjusts the p-values to account for multiple comparisons - without that adjustment, you'd inflate the chance of a false positive. The more comparisons you run, the higher the risk of finding a "significant" result by chance alone.

Dunn's Test in Python

You'll need the scikit_posthocs library for this. If you don't have it, install it with pip install scikit-posthocs.

Python: Dunn's Test After Kruskal-Wallis

import scikit_posthocs as sp
import pandas as pd

# Same exam scores as before
class_a = [78, 85, 90, 72, 88]
class_b = [65, 70, 68, 74, 60]
class_c = [88, 92, 95, 85, 91]

# Combine
scores = class_a + class_b + class_c
groups = ["A"] * 5 + ["B"] * 5 + ["C"] * 5

df = pd.DataFrame({"score": scores, "group": groups})

# Run the test
result = sp.posthoc_dunn(df, val_col="score", group_col="group", p_adjust="bonferroni")
print(result)

Each cell shows the adjusted p-value for that pair. Here, only B versus C (p = 0.004) crosses the 0.05 threshold, so those two groups differ. A versus B (p = 0.167) and A versus C (p = 0.607) don't, which means class A isn't statistically different from either of the other two classes.

Dunn's Test in R

R: Dunn's Test After Kruskal-Wallis

# Install if needed: install.packages("dunn.test")
library(dunn.test)

# Same exam scores as before
class_a <- c(78, 85, 90, 72, 88)
class_b <- c(65, 70, 68, 74, 60)
class_c <- c(88, 92, 95, 85, 91)

scores <- c(class_a, class_b, class_c)
groups <- factor(rep(c("A", "B", "C"), each = 5))

# Run the test
dunn.test(scores, groups, method = "bonferroni")

The results match Python, as you would expect. Only B versus C is significant, while A versus B and A versus C aren't. Class B and class C are the ones behind the difference detected by the Kruskal-Wallis test.

Assumptions of the Kruskal-Wallis Test

The Kruskal-Wallis test is more flexible than ANOVA, but it still has three assumptions you need to check before running it:

Independent Samples

Observations in one group don't influence observations in another. If your data is paired or repeated measures, this test isn't the right fit.

Ordinal or Continuous Data

The test needs data you can rank. Nominal categories (like colors or labels) can't be ranked, so they won't work here.

Similar Distribution Shapes

If you want to interpret the results as a comparison of medians rather than just distributions, the groups need to have roughly the same shape. If the shapes differ a lot, you can still compare distributions, but the median interpretation won't hold.

Important Note

If you violate the first two assumptions, the test results won't be valid. The third assumption is somewhat softer, as it affects how you interpret the results, not whether you can run the test at all.

When You Should Not Use the Kruskal-Wallis Test

There are three cases where a different test would be a better fit:

Your Data is Paired or Repeated Measures

If the same subjects appear across groups, use the Friedman test instead. It's the nonparametric equivalent designed for dependent samples. Using Kruskal-Wallis on paired data ignores the relationship between observations and can lead to wrong conclusions.

Your Data Meets ANOVA's Assumptions

If your data is normally distributed with roughly equal variances, ANOVA is the better choice. It's more statistically powerful, which means it's better at detecting real differences when they exist.

Your Sample Sizes Are Large

With large samples, parametric methods tend to work well even when the data isn't perfectly normal. The central limit theorem does its thing, and ANOVA will give you more reliable results than the rank-based approach.

If you're working with hundreds or thousands of observations per group, Kruskal-Wallis isn't the test for you.

Frequently Asked Questions

What is the Kruskal-Wallis test used for?

The Kruskal-Wallis test is used to compare three or more independent groups when you can't assume your data follows a normal distribution. It's a nonparametric alternative to ANOVA that works on ranked data instead of raw values.

What does a significant Kruskal-Wallis result mean?

A significant result - typically p < 0.05 - means at least one group differs from the others. It doesn't tell you which groups are different, just that they're not all the same. To find out which pairs are behind the difference, you need to follow up with a post hoc test like Dunn's test.

What are the assumptions of the Kruskal-Wallis test?

The test requires independent samples, meaning observations in one group don't influence observations in another. Your data needs to be ordinal or continuous - something you can rank. If you want to interpret results as a comparison of medians, the groups should also have similar distribution shapes.

What is the difference between Kruskal-Wallis and Mann-Whitney U test?

The Mann-Whitney U test compares two independent groups, while the Kruskal-Wallis test extends that approach to three or more groups. Both work on ranked data and don't assume normality. If you only have two groups, Mann-Whitney U is the right choice - Kruskal-Wallis is its multi-group equivalent.

When should you use Dunn's test after Kruskal-Wallis?

Run Dunn's test when your Kruskal-Wallis result is significant and you need to know which specific pairs of groups differ. It performs pairwise comparisons between all groups and adjusts the p-values to reduce the chance of false positives.

Need Help with Statistics & Data Science?

Our experts can help you understand statistical tests, implement nonparametric methods, and analyze your data with Python and R.

What You'll Learn:

What the Kruskal-Wallis test is and when to use it
How it compares to ANOVA and Mann-Whitney U test
The formula and math behind the test statistic H
How to run it in Python and R with examples
Interpret results and perform post hoc Dunn's test
Assumptions and when NOT to use Kruskal-Wallis

What Is the Kruskal-Wallis Test?

When to Use the Kruskal-Wallis Test

The Kruskal-Wallis test is a great fit when you're dealing with:

Three or More Independent Groups

You want to compare three or more groups that are independent of each other.

Ordinal or Continuous Data

Such as Likert scale ratings or measurement data that can be ranked meaningfully.

Non-Normal Distributions

Through skewed data, outliers, small samples, or anything ANOVA can't handle.

Small Sample Sizes

Where normality is hard to verify and parametric tests would be unreliable.

Kruskal-Wallis Test vs. ANOVA

The Kruskal-Wallis test compares group distributions using ranks. It doesn't care about normality or equal variances. That makes it more flexible, but you lose some statistical power in the process.

Feature	ANOVA	Kruskal-Wallis
Number of Groups	3+ (any number)	3+ (any number)
Data Type	Continuous, normal distribution	Ordinal or continuous, any distribution
What It Compares	Group means	Group distributions (via ranks)
Assumptions	Normality, equal variances	Independent samples, rankable data
Statistical Power	Higher when assumptions met	Lower, but works with non-normal data

If your data is normally distributed, use ANOVA. If it isn't - or you can't verify that it is - use Kruskal-Wallis.

Kruskal-Wallis Test Formula

The Kruskal-Wallis test boils down to a single test statistic, H. Here's the formula:

Kruskal-Wallis Formula

H = (12 / (N * (N + 1))) * Σ(R_i² / n_i) - 3 * (N + 1)

Where:
N = total number of observations across all groups
k = number of groups
n_i = number of observations in group i
R_i = sum of ranks assigned to group i

Once you have H, you compare it against a chi-square distribution with k - 1 degrees of freedom to get a p-value.

How the Kruskal-Wallis Test Works

There are four steps needed to perform the Kruskal-Wallis test:

Combine All Groups

Take all observations from every group and combine them into a single dataset.

Rank All Observations

Compute Rank Sums

Split the ranks into their original groups. Add up the ranks for each group. These are your rank sums - R_i in the formula.

Calculate the Test Statistic

You can see that the test doesn't care about the actual values, but instead, only where they are relative to everything else.

Kruskal-Wallis Test in Python

Python's scipy library has a built-in function for the Kruskal-Wallis test, meaning you don't have to implement the formula by hand. Let's go through an example.

Say you're comparing exam scores across three classes. Here's how you'd run the test:

Python Example: Kruskal-Wallis Test

from scipy import stats

# Exam scores
class_a = [78, 85, 90, 72, 88]
class_b = [65, 70, 68, 74, 60]
class_c = [88, 92, 95, 85, 91]

# Run the test
statistic, p_value = stats.kruskal(class_a, class_b, class_c)

print(f"H statistic: {statistic:.4f}")
print(f"P-value: {p_value:.4f}")

The p-value is below 0.05, which means at least one class scored differently from the others. Just keep in mind the test won't tell you which one - you'll need a post hoc test for that.

Kruskal-Wallis Test in R

Just like Python, R has a built-in function for this test. Let's use the same exam score scenario.

R Example: Kruskal-Wallis Test

# Exam scores
class_a <- c(78, 85, 90, 72, 88)
class_b <- c(65, 70, 68, 74, 60)
class_c <- c(88, 92, 95, 85, 91)

# Combine
scores <- c(class_a, class_b, class_c)
groups <- factor(rep(c("A", "B", "C"), each = 5))

# Run the test
kruskal.test(scores ~ groups)

The output is the same as what you got in Python - same H statistic, same p-value. With p < 0.05, you'd reject the null hypothesis and conclude that at least one group differs.

How to Interpret Kruskal-Wallis Results

The null hypothesis of the Kruskal-Wallis test is that all groups have the same distribution. The p-value tells you whether to reject it. Here's how to interpret it:

•

p < 0.05

At least one group differs from the others, so reject the null hypothesis.

•

p >= 0.05

There is no strong evidence that the groups differ, so don't reject the null hypothesis.

The 0.05 threshold is a convention. Depending on your field or the stakes of your analysis, you might use a stricter threshold like 0.01 or a looser one like 0.10.

Post Hoc Tests After Kruskal-Wallis

Dunn's Test in Python

You'll need the scikit_posthocs library for this. If you don't have it, install it with pip install scikit-posthocs.

Python: Dunn's Test After Kruskal-Wallis

import scikit_posthocs as sp
import pandas as pd

# Same exam scores as before
class_a = [78, 85, 90, 72, 88]
class_b = [65, 70, 68, 74, 60]
class_c = [88, 92, 95, 85, 91]

# Combine
scores = class_a + class_b + class_c
groups = ["A"] * 5 + ["B"] * 5 + ["C"] * 5

df = pd.DataFrame({"score": scores, "group": groups})

# Run the test
result = sp.posthoc_dunn(df, val_col="score", group_col="group", p_adjust="bonferroni")
print(result)

Dunn's Test in R

R: Dunn's Test After Kruskal-Wallis

# Install if needed: install.packages("dunn.test")
library(dunn.test)

# Same exam scores as before
class_a <- c(78, 85, 90, 72, 88)
class_b <- c(65, 70, 68, 74, 60)
class_c <- c(88, 92, 95, 85, 91)

scores <- c(class_a, class_b, class_c)
groups <- factor(rep(c("A", "B", "C"), each = 5))

# Run the test
dunn.test(scores, groups, method = "bonferroni")

Assumptions of the Kruskal-Wallis Test

The Kruskal-Wallis test is more flexible than ANOVA, but it still has three assumptions you need to check before running it:

Independent Samples

Observations in one group don't influence observations in another. If your data is paired or repeated measures, this test isn't the right fit.

Ordinal or Continuous Data

The test needs data you can rank. Nominal categories (like colors or labels) can't be ranked, so they won't work here.

Similar Distribution Shapes

Important Note

When You Should Not Use the Kruskal-Wallis Test

There are three cases where a different test would be a better fit:

Your Data is Paired or Repeated Measures

Your Data Meets ANOVA's Assumptions

If your data is normally distributed with roughly equal variances, ANOVA is the better choice. It's more statistically powerful, which means it's better at detecting real differences when they exist.

Your Sample Sizes Are Large

If you're working with hundreds or thousands of observations per group, Kruskal-Wallis isn't the test for you.

Frequently Asked Questions

What is the Kruskal-Wallis test used for?

What does a significant Kruskal-Wallis result mean?

What are the assumptions of the Kruskal-Wallis test?

What is the difference between Kruskal-Wallis and Mann-Whitney U test?

When should you use Dunn's test after Kruskal-Wallis?

Need Help with Statistics & Data Science?

Our experts can help you understand statistical tests, implement nonparametric methods, and analyze your data with Python and R.

How to Use Kruskal-Wallis Test: Complete Step by Step Guide

What Is the Kruskal-Wallis Test?

When to Use the Kruskal-Wallis Test

Three or More Independent Groups

Ordinal or Continuous Data

Non-Normal Distributions

Small Sample Sizes

Kruskal-Wallis Test vs. ANOVA

Kruskal-Wallis Test Formula

How the Kruskal-Wallis Test Works

Combine All Groups

Rank All Observations

Compute Rank Sums

Calculate the Test Statistic

Kruskal-Wallis Test in Python

Kruskal-Wallis Test in R

How to Interpret Kruskal-Wallis Results

p < 0.05

p >= 0.05

Post Hoc Tests After Kruskal-Wallis

Dunn's Test in Python

Dunn's Test in R

Assumptions of the Kruskal-Wallis Test

Independent Samples

Ordinal or Continuous Data

Similar Distribution Shapes

When You Should Not Use the Kruskal-Wallis Test

Your Data is Paired or Repeated Measures

Your Data Meets ANOVA's Assumptions

Your Sample Sizes Are Large

Frequently Asked Questions

What is the Kruskal-Wallis test used for?

What does a significant Kruskal-Wallis result mean?

What are the assumptions of the Kruskal-Wallis test?

What is the difference between Kruskal-Wallis and Mann-Whitney U test?

When should you use Dunn's test after Kruskal-Wallis?

Need Help with Statistics & Data Science?

Need this implemented in your project?

Take the guide with you

Book a 30-min architecture call

Get a free 48-hour written brief

How to Use Kruskal-Wallis Test: Complete Step by Step Guide

What Is the Kruskal-Wallis Test?

When to Use the Kruskal-Wallis Test

Three or More Independent Groups

Ordinal or Continuous Data

Non-Normal Distributions

Small Sample Sizes

Kruskal-Wallis Test vs. ANOVA

Kruskal-Wallis Test Formula

How the Kruskal-Wallis Test Works

Combine All Groups

Rank All Observations

Compute Rank Sums

Calculate the Test Statistic

Kruskal-Wallis Test in Python

Kruskal-Wallis Test in R

How to Interpret Kruskal-Wallis Results

p < 0.05

p >= 0.05

Post Hoc Tests After Kruskal-Wallis

Dunn's Test in Python

Dunn's Test in R

Assumptions of the Kruskal-Wallis Test

Independent Samples

Ordinal or Continuous Data

Similar Distribution Shapes

When You Should Not Use the Kruskal-Wallis Test

Your Data is Paired or Repeated Measures

Your Data Meets ANOVA's Assumptions

Your Sample Sizes Are Large

Frequently Asked Questions

What is the Kruskal-Wallis test used for?

What does a significant Kruskal-Wallis result mean?

What are the assumptions of the Kruskal-Wallis test?

What is the difference between Kruskal-Wallis and Mann-Whitney U test?

When should you use Dunn's test after Kruskal-Wallis?

Need Help with Statistics & Data Science?

Need this implemented in your project?

Take the guide with you