How to Use Generalized Linear Models in Python: Complete Guide
Generalized linear models (GLMs) extend ordinary linear regression so that your response variable can follow many different probability distributions — not just the normal distribution. This complete tutorial is both a beginner guide and a step by step guide showing you exactly how to use generalized linear models in Python with statsmodels (and R with the glm() function). You will learn the three components every GLM is built from, how link functions reshape predictions into a valid range, and how to fit and interpret logistic regression and Poisson regression on real-world business data. By the end of this complete tutorial you will know how to choose the right distribution, select the correct link function, fit the model, read its summary, and interpret coefficients correctly.
What You'll Learn:
- What a generalized linear model is and why plain linear regression breaks down on binary and count data
- The three components of every GLM: the random component, the systematic component, and the link function
- How identity, logit, and log link functions map the linear predictor into the right range
- How to fit logistic regression and Poisson regression in Python with statsmodels
- How to fit the same GLMs in R using the built-in
glm()function - How maximum likelihood estimation and IRLS train a GLM
- How to interpret GLM coefficients correctly using odds ratios and multiplicative effects
- Common GLM mistakes — wrong distribution, link confusion, and overdispersion
What Is a Generalized Linear Model?
A generalized linear model is an extension of linear regression that allows the response variable to follow different probability distributions, not just the normal distribution. Rather than being a single model, a GLM is a unifying framework that encompasses linear regression, logistic regression, and Poisson regression under one consistent structure. Once you understand the framework, you stop memorizing separate models and instead reason about three interchangeable building blocks.
The power of the GLM framework is that switching from predicting a continuous outcome to predicting a yes/no outcome to predicting a count is mostly a matter of swapping the distribution and the link function. The underlying linear predictor — the weighted sum of your features — stays exactly the same. This is why statsmodels and R can fit all of these models with nearly identical code.
The Three GLM Components
Every GLM is built from a random component (the distribution of the response), a systematic component (the linear predictor Xβ), and a link function that connects them. Change the distribution and link, and the same machinery becomes linear, logistic, or Poisson regression. This modular structure is what makes the framework so flexible.
Link Functions
A link function g() transforms the linear predictor's unbounded output (−∞ to +∞) into the valid range of your chosen distribution. The identity link leaves it unchanged, the logit link maps it into a 0-to-1 probability, and the log link keeps it positive. The single equation g(μ) = Xβ unifies all GLMs.
Logistic & Poisson Regression
Logistic regression (Binomial + logit) models binary outcomes like customer churn and outputs probabilities between 0 and 1. Poisson regression (Poisson + log) models count data like support tickets and guarantees non-negative predictions. Both are just GLMs with a different distribution and link.
Coefficient Interpretation
GLM coefficients live in the transformed space of the link function. In logistic regression they are log-odds — exponentiate them for odds ratios. In Poisson regression they are on the log scale — exponentiate them for multiplicative effects on counts. Only identity-link coefficients are read directly.
Why GLMs Are Necessary
Standard linear regression assumes that outcomes are normally distributed with constant variance and that the relationship between predictors and the response is linear and unbounded. Those assumptions are reasonable for continuous quantities like revenue or temperature, but they break down completely for the kinds of outcomes data scientists encounter every day.
Binary Outcomes
Consider predicting whether a customer will churn, where the outcome is 1 (left) or 0 (stayed). Linear regression places no bounds on its predictions, so it will happily output values like −0.3 or 1.7. These are nonsensical: a probability cannot be negative or greater than one. The model is fundamentally mismatched to the data, and any predictions near the extremes are invalid.
Count Data
Now consider predicting the number of support tickets a customer files in a month. Counts are non-negative integers — 0, 1, 2, and so on. Linear regression can predict negative counts, which are impossible, and it ignores the fact that the variance of count data typically grows with the mean. The result is poor fit and misleading inference.
Non-Normal Distributions in General
Whenever the response distribution is not normal — skewed durations, proportions, rates, counts, binary flags — the assumptions behind ordinary least squares fail and the results become meaningless. Standard errors are wrong, p-values cannot be trusted, and predictions fall outside valid ranges. GLMs solve this by letting you declare the correct distribution for your response and then connecting it to the linear predictor through an appropriate link function.
The Three Essential Components of a GLM
Every generalized linear model is assembled from exactly three components. Understanding each one — and how they snap together — is the key to using GLMs confidently in Python or R.
1. Random Component (the Distribution)
The random component defines the probability distribution of the response variable. This is the part you change based on what kind of outcome you are modeling. A Normal distribution fits continuous, roughly symmetric outcomes and gives you linear regression. A Binomial distribution fits binary yes/no outcomes and gives you logistic regression. A Poisson distribution fits count data and gives you Poisson regression.
2. Systematic Component (the Linear Predictor)
The systematic component is the weighted sum of your features, written as Xβ, where X holds the features and β holds the coefficients. This is the linear part of the model, and crucially it is constant across every type of GLM. Whether you are doing linear, logistic, or Poisson regression, the linear predictor is computed the same way. Only how it is connected to the response changes.
3. Link Function
The link function transforms the linear predictor's output — which ranges from −∞ to +∞ — into the appropriate range for the chosen distribution. The complete GLM equation is g(μ) = Xβ, where g() is the link function and μ is the expected value of the response. By choosing g() correctly, you guarantee that predictions land in a valid range: between 0 and 1 for probabilities, or above zero for counts.
| Component | Role | Example |
|---|---|---|
| Random Component (Distribution) | Defines the probability distribution of the response variable | Normal for revenue, Binomial for churn, Poisson for ticket counts |
| Systematic Component (Linear Predictor) | The weighted sum Xβ of features and coefficients; same for every GLM | β0 + β1·salary + β2·experience + β3·overtime |
| Link Function | Maps the linear predictor (−∞ to +∞) into the valid range of the distribution | Identity for Normal, logit for Binomial, log for Poisson |
Key Insight: One Equation Unifies Every GLM
The single equation g(μ) = Xβ is the heart of the entire framework. The right-hand side, the linear predictor Xβ, never changes — it is the same weighted sum of features in linear, logistic, and Poisson regression alike. What changes is the distribution of the response and the link function g() on the left. Choose Normal with the identity link and g(μ) = Xβ collapses to ordinary linear regression. Choose Binomial with the logit link and you get logistic regression. Choose Poisson with the log link and you get Poisson regression. Internalizing this one equation means you no longer learn three separate models — you learn one framework with three settings.
Common Link Functions Explained
The link function is where most of the conceptual difficulty in GLMs lives, so it is worth slowing down. Each link reshapes the unbounded linear predictor into the range that the response distribution requires.
Identity Link
The identity link applies no transformation at all: μ = Xβ. The expected value of the response is simply the linear predictor. This is the link used in standard linear regression for continuous outcomes, where any real number is a valid prediction. It is the simplest possible link and the natural default for Normal-distributed data.
Logit Link
The logit link is defined as logit(μ) = log(μ / (1 − μ)) = Xβ. It maps probabilities, which live between 0 and 1, onto the entire real line so they can be modeled by an unbounded linear predictor. The quantity μ / (1 − μ) is the odds, and its logarithm is the log-odds. The logit link is what makes logistic regression produce valid probabilities no matter what the linear predictor outputs.
Log Link
The log link is defined as log(μ) = Xβ, which is equivalent to μ = e^(Xβ). Because the exponential function is always positive, this link guarantees that predictions stay positive — exactly what you need for counts. It is the link used in Poisson regression, and a key consequence is that the effect of each predictor is multiplicative rather than additive: coefficients describe percentage changes in the expected count.
| Link | Formula | Distribution | Use Case |
|---|---|---|---|
| Identity | μ = Xβ | Normal | Continuous outcomes; standard linear regression with no transformation |
| Logit | log(μ / (1 − μ)) = Xβ | Binomial | Binary outcomes; maps probabilities (0 to 1) to the full real line as log-odds |
| Log | log(μ) = Xβ | Poisson | Count data; keeps predictions positive; coefficients are multiplicative |
Common GLM Examples
Putting the distribution and link together produces the three GLMs you will use most often. Notice how the only differences are the distribution and the link — the linear predictor Xβ is shared by all three.
Linear Regression
Linear regression pairs the Normal distribution with the identity link, giving μ = Xβ. The prediction is direct: the linear predictor is the expected value of the outcome. Use it for continuous, roughly symmetric responses.
Logistic Regression
Logistic regression pairs the Binomial distribution with the logit link, giving log(μ / (1 − μ)) = Xβ. The model outputs probabilities between 0 and 1, making it ideal for binary classification problems like churn, conversion, or pass/fail prediction.
Poisson Regression
Poisson regression pairs the Poisson distribution with the log link, giving log(μ) = Xβ, or equivalently μ = e^(Xβ). The exponential guarantees non-negative counts, making it the right choice for modeling the number of events in a fixed window.
| Model | Distribution | Link | Output |
|---|---|---|---|
| Linear Regression | Normal | Identity | μ = Xβ; direct prediction of a continuous value |
| Logistic Regression | Binomial | Logit | log(μ / (1 − μ)) = Xβ; probability between 0 and 1 |
| Poisson Regression | Poisson | Log | log(μ) = Xβ or μ = e^(Xβ); non-negative count |
How a GLM Is Trained: Maximum Likelihood Estimation
Unlike ordinary linear regression, which can be solved in closed form with ordinary least squares (OLS), GLMs are fit using maximum likelihood estimation (MLE). MLE searches for the set of coefficients that make the observed data most probable under the chosen probability distribution. Because the distribution is part of the model, the same estimation principle works for Normal, Binomial, and Poisson responses alike.
Most GLMs have no closed-form solution for the MLE. Instead, the coefficients are found numerically, typically with iteratively reweighted least squares (IRLS) or gradient-based optimization. The algorithm starts with an initial guess and repeatedly updates the coefficients until they converge — meaning successive updates barely change the estimates. Libraries like statsmodels and R's glm() handle all of this internally, so in practice you simply call fit() and read the results.
How to Use Generalized Linear Models: 6 Steps
Here is the practical workflow for applying a GLM to any dataset. Following these six steps in order keeps you from the most common mistakes — choosing the wrong distribution or misreading transformed coefficients.
Identify Your Outcome Type
Start by looking at the response variable, not the predictors. Is it continuous (revenue, temperature, time)? Binary (churned vs stayed, pass vs fail)? Or a count (tickets per month, defects per batch)? This single decision drives every other choice in the workflow. Getting it wrong — for example, treating a count as if it were continuous — leads to impossible predictions and invalid inference, so spend real time here before touching any code.
Choose the Distribution
Map the outcome type to the random component. Continuous and roughly symmetric outcomes use the Normal distribution. Binary outcomes use the Binomial distribution. Counts use the Poisson distribution. In statsmodels this is the family argument; in R it is the family parameter of glm(). The distribution you pick tells the model how the response varies around its mean and is the foundation of maximum likelihood estimation.
Select the Link Function
Pick the link that maps the linear predictor into the valid range of your distribution. Use the identity link with Normal, the logit link with Binomial, and the log link with Poisson. These are the canonical (default) links, so in both statsmodels and R you usually get them automatically just by specifying the family. You only need to set the link explicitly when you want a non-default option such as a probit link for binary data.
Fit the GLM in Python
Build the design matrix with sm.add_constant() so the model includes an intercept, then create the model with sm.GLM(y, X, family=...) and call .fit(). Behind the scenes statsmodels runs iteratively reweighted least squares until the coefficients converge. The same two lines fit logistic or Poisson regression — you only swap sm.families.Binomial() for sm.families.Poisson(). In R the equivalent is a single glm() call with a formula.
Read the Model Summary
Call print(results.summary()) in Python or summary(model) in R. The summary reports each coefficient, its standard error, z-value, and p-value, plus model-level fit statistics such as the log-likelihood and deviance. Confirm the model converged, check which predictors are statistically significant, and note the sign and magnitude of each coefficient. Remember that these coefficients are on the link scale, not the outcome scale — interpretation comes next.
Interpret the Coefficients
Translate the coefficients back to a human-readable scale. For an identity link, read them directly as the change in the outcome per unit change in the predictor. For a logit link, exponentiate (e^coef) to get an odds ratio. For a log link, exponentiate to get a multiplicative effect on the expected count. For example, a logistic coefficient of 0.12 on overtime hours gives e^0.12 ≈ 1.127, so each extra overtime hour multiplies the odds of leaving by about 1.13.
Python Implementation with statsmodels
The statsmodels library provides a clean GLM interface that mirrors the framework directly: you pass the response, the design matrix, and a family object that bundles both the distribution and its default link. The running example below uses an HR dataset where we model employee attrition (left) from salary, years of experience, and overtime hours.
First, logistic regression for the binary left outcome. The sm.add_constant() call adds the intercept column, and sm.families.Binomial() specifies both the Binomial distribution and the default logit link in one object.
# Install first: pip install statsmodels pandas
import statsmodels.api as sm
# Build the design matrix (adds the intercept column)
X = sm.add_constant(df[["salary", "experience_years", "overtime_hours"]])
# Binomial() sets BOTH the distribution AND the default link (logit)
logit_model = sm.GLM(df["left"], X, family=sm.families.Binomial())
logit_results = logit_model.fit()
# The summary reports coefficients on the log-odds (logit) scale
print(logit_results.summary())
Switching to Poisson regression is a one-word change. Keep the same design matrix and simply swap the family. sm.families.Poisson() selects the Poisson distribution and the log link automatically, so predictions stay non-negative. Here we model the number of sick days an employee takes.
import statsmodels.api as sm
# Same design matrix as before
X = sm.add_constant(df[["salary", "experience_years", "overtime_hours"]])
# Swapping Binomial() for Poisson() switches the distribution
# and uses the log link automatically (predictions stay non-negative)
poisson_model = sm.GLM(df["sick_days"], X, family=sm.families.Poisson())
poisson_results = poisson_model.fit()
# Coefficients are on the log scale -> exponentiate for multiplicative effects
print(poisson_results.summary())
That is the entire pattern. The design matrix, the fit() call, and the summary() output are identical across model types; only the family object differs. This is the practical payoff of the GLM framework — one mental model, one API, three (and more) models.
R Implementation with glm()
R ships with GLMs built in through the glm() function, no extra package required. You write the model as a formula (outcome ~ predictor1 + predictor2) and pass a family with an explicit link. The logistic version uses binomial(link = "logit").
# glm() is part of base R - no package to install
logit_model <- glm(left ~ salary + experience_years + overtime_hours,
data = df,
family = binomial(link = "logit"))
# Coefficients are on the log-odds scale
summary(logit_model)
# Exponentiate to read coefficients as odds ratios
exp(coef(logit_model))
Poisson regression in R follows the identical structure — just change the formula's response and the family to poisson(link = "log").
poisson_model <- glm(sick_days ~ salary + experience_years + overtime_hours,
data = df,
family = poisson(link = "log"))
# Coefficients are on the log scale
summary(poisson_model)
# Exponentiate for the multiplicative effect on expected counts
exp(coef(poisson_model))
Interpreting GLM Coefficients Correctly
Coefficient interpretation is where GLMs trip up newcomers, because the coefficients are expressed on the link scale rather than the outcome scale. The correct interpretation depends entirely on which link function you used.
Linear regression (identity link): coefficients are read directly. A coefficient of 500 on salary means the outcome rises by 500 units for each one-unit increase in salary, holding everything else constant.
Logistic regression (logit link): coefficients are in log-odds space. To make them interpretable, exponentiate them (e^coef) to obtain odds ratios. An odds ratio above 1 means the predictor increases the odds of the event; below 1 means it decreases them.
Poisson regression (log link): coefficients are in log space. Exponentiate them to get the multiplicative effect on the expected count. A coefficient of 0.20 becomes e^0.20 ≈ 1.22, meaning each unit increase multiplies the expected count by about 1.22 (a 22% increase).
overtime_hours coefficient = 0.12 (log-odds)
odds ratio = e^0.12 ≈ 1.127
Interpretation: each extra overtime hour multiplies the
odds of an employee leaving by about 1.13 (a ~13% increase in odds).
When to Use Each GLM
Choosing the right GLM comes down to the nature of your outcome variable. Use this quick decision guide whenever you start a new modeling task.
Binary outcomes (yes/no, pass/fail, churn/retain): use logistic regression with the Binomial distribution and the logit link. The output is a probability, which you can threshold for classification.
Count data (events per time window, defects per unit, visits per user): use Poisson regression with the Poisson distribution and the log link. Predictions are guaranteed non-negative.
Continuous, normally distributed outcomes (revenue, measurements, durations that are roughly symmetric): use standard linear regression with the Normal distribution and the identity link.
Common GLM Mistakes and How to Avoid Them
Most GLM errors are conceptual rather than coding errors. Watch out for these four.
Mistake 1: Wrong Distribution Selection
Fitting linear regression to count data is the classic error. Because the identity link is unbounded, the model can predict negative counts — which are impossible. Always match the distribution to the outcome type: Poisson for counts, Binomial for binary, Normal for continuous.
Mistake 2: Misunderstanding Link Functions
Confusing the raw coefficients with their transformed interpretation leads to badly wrong conclusions. A logistic coefficient of 0.12 does not mean a 0.12 increase in probability — it is a log-odds value that must be exponentiated to an odds ratio. Always know which scale your coefficients are on before describing an effect.
Mistake 3: Comparing Coefficients Across Models
Coefficients from different GLM types are not directly comparable because they live in different transformed spaces. A coefficient of 0.3 in a logistic model (log-odds) and 0.3 in a Poisson model (log count) describe entirely different things. Never compare them side by side as if they were the same quantity.
Mistake 4: Ignoring Distributional Assumptions
Poisson regression assumes the mean equals the variance. Real count data often violates this — a condition called overdispersion — which inflates significance and produces misleadingly small p-values and standard errors. Check for overdispersion and switch to a negative binomial model if it is present.
Poisson Regression Assumes Mean = Variance — Check for Overdispersion
The Poisson distribution forces the variance to equal the mean. When real count data is more spread out than that (overdispersion), your standard errors will be too small and your p-values too optimistic, making predictors look more significant than they truly are. Always check the ratio of deviance to degrees of freedom; if it is well above 1, switch to a negative binomial model or use a quasi-Poisson family before trusting any of the inference.
Frequently Asked Questions
What is the difference between linear regression and a generalized linear model?
Linear regression is actually a special case of a generalized linear model — the one using the Normal distribution with the identity link. A GLM generalizes this by letting the response follow other distributions (Binomial, Poisson, and more) and by inserting a link function g() so that g(μ) = Xβ. This lets GLMs model binary outcomes, counts, and other non-normal responses that ordinary linear regression handles poorly.
Which family should I use in statsmodels for logistic and Poisson regression?
For logistic regression use sm.families.Binomial(), which applies the logit link by default and outputs probabilities. For Poisson regression use sm.families.Poisson(), which applies the log link and keeps predictions non-negative. The rest of your code — building the design matrix with sm.add_constant() and calling .fit() — stays exactly the same; only the family changes.
How do I interpret coefficients in logistic regression?
Logistic regression coefficients are on the log-odds scale, so you cannot read them directly as probabilities. Exponentiate each coefficient (e^coef) to get an odds ratio. A value above 1 means the predictor increases the odds of the event; below 1 means it decreases them. For example, a coefficient of 0.12 gives e^0.12 ≈ 1.13, meaning each unit increase multiplies the odds by about 1.13.
Why do GLMs use maximum likelihood instead of ordinary least squares?
Ordinary least squares is built around the assumption of normally distributed errors, which does not hold for binary or count outcomes. Maximum likelihood estimation (MLE) instead finds the coefficients that make the observed data most probable under whatever distribution you chose, so it works for any GLM family. Because most GLMs lack a closed-form solution, the MLE is computed numerically using iteratively reweighted least squares (IRLS) or gradient-based optimization.
What is overdispersion in Poisson regression and how do I fix it?
Overdispersion occurs when count data has more variance than the Poisson distribution allows, since Poisson assumes the mean equals the variance. It makes standard errors too small and p-values misleadingly significant. Detect it by checking whether the ratio of deviance to degrees of freedom is well above 1. The fix is to switch to a negative binomial model or use a quasi-Poisson family, both of which allow the variance to exceed the mean.
Need Expert Help with AI and Machine Learning?
Our AI and ML consultants can help you choose the right generalized linear model, build logistic and Poisson regression pipelines in Python or R, interpret coefficients and odds ratios correctly, and integrate statistical modeling into your data science and production workflows.
About the author
Founder & CEO, Braincuber Technologies
Founder and CEO of Braincuber. Has scoped and shipped 500+ Odoo, AI, and cloud projects for US mid-market and global brands. Takes every founder call personally — no SDR layer between buyers and the people building the system.
