Log Link vs Log Transformation in R — The Difference that Misleads Your Entire Data Analysis

Although normal distributions are the most commonly used, a lot of real-world data unfortunately is not normal. When faced with extremely skewed data, it’s tempting for us to utilize log transformations to normalize the distribution and stabilize the variance. I recently worked on a project analyzing the energy consumption of training AI models, using data […] The post Log Link vs Log Transformation in R — The Difference that Misleads Your Entire Data Analysis appeared first on Towards Data Science.

May 10, 2025 - 02:15

Log Link vs Log Transformation in R — The Difference that Misleads Your Entire Data Analysis

Although normal distributions are the most commonly used, a lot of real-world data unfortunately is not normal. When faced with extremely skewed data, it’s tempting for us to utilize log transformations to normalize the distribution and stabilize the variance. I recently worked on a project analyzing the energy consumption of training AI models, using data from Epoch AI [1]. There is no official data on energy usage of each model, so I calculated it by multiplying each model’s power draw with its training time. The new variable, Energy (in kWh), was highly right-skewed, along with some extreme and overdispersed outliers (Fig. 1).

Figure 1. Histogram of Energy Consumption (kWh)

To address this skewness and heteroskedasticity, my first instinct was to apply a log transformation to the Energy variable. The distribution of log(Energy) looked much more normal (Fig. 2), and a Shapiro-Wilk test confirmed the borderline normality (p ≈ 0.5).

Figure 2. Histogram of log of Energy Consumption (kWh)

Modeling Dilemma: Log Transformation vs Log Link

The visualization looked good, but when I moved on to modeling, I faced a dilemma: Should I model the log-transformed response variable (log(Y) ~ X), or should I model the original response variable using a log link function (Y ~ X, link = “log")? I also considered two distributions — Gaussian (normal) and Gamma distributions — and combined each distribution with both log approaches. This gave me four different models as below, all fitted using R’s Generalized Linear Models (GLM):

all_gaussian_log_link <- glm(Energy_kWh ~ Parameters +
      Training_compute_FLOP +
      Training_dataset_size +
      Training_time_hour +
      Hardware_quantity +
      Training_hardware, 
    family = gaussian(link = "log"), data = df)
all_gaussian_log_transform <- glm(log(Energy_kWh) ~ Parameters +
                          Training_compute_FLOP +
                          Training_dataset_size +
                          Training_time_hour +
                          Hardware_quantity +
                          Training_hardware, 
                         data = df)
all_gamma_log_link  <- glm(Energy_kWh ~ Parameters +
                    Training_compute_FLOP +
                    Training_dataset_size +
                    Training_time_hour +
                    Hardware_quantity +
                    Training_hardware + 0, 
                  family = Gamma(link = "log"), data = df)
all_gamma_log_transform  <- glm(log(Energy_kWh) ~ Parameters +
                    Training_compute_FLOP +
                    Training_dataset_size +
                    Training_time_hour +
                    Hardware_quantity +
                    Training_hardware + 0, 
                  family = Gamma(), data = df)

Model Comparison: AIC and Diagnostic Plots

I compared the four models using Akaike Information Criterion (AIC), which is an estimator of prediction error. Typically, the lower the AIC, the better the model fits.

AIC(all_gaussian_log_link, all_gaussian_log_transform, all_gamma_log_link, all_gamma_log_transform)

                           df       AIC
all_gaussian_log_link      25 2005.8263
all_gaussian_log_transform 25  311.5963
all_gamma_log_link         25 1780.8524
all_gamma_log_transform    25  352.5450

Among the four models, models using log-transformed outcomes have much lower AIC values than the ones using log links. Since the difference in AIC between log-transformed and log-link models was substantial (311 and 352 vs 1780 and 2005), I also examined the diagnostics plots to further validate that log-transformed models fit better:

Figure 4. Diagnostic plots for the log-linked Gaussian model. The Residuals vs Fitted plot suggests linearity despite a few outliers. However, the Q-Q plot shows noticeable deviations from the theoretical line, suggesting non-normality.

Figure 5. Diagnostics plots for the log-transformed Gaussian model. The Q-Q plot shows a much better fit, supporting normality. However, the Residuals vs Fitted plot has a dip to -2, which may suggest non-linearity.

Figure 6. Diagnostic plots for the log-linked Gamma model. The Q-Q plot looks okay, yet the Residuals vs Fitted plot shows clear signs of non-linearity

Figure 7. Diagnostic plots for the log-transformed Gamma model. The Residuals vs Fitted plot looks good, with a small dip of -0.25 at the beginning. However, the Q-Q plot shows some deviation at both tails.

Based on the AIC values and diagnostic plots, I decided to move forward with the log-transformed Gamma model, as it had the second-lowest AIC value and its Residuals vs Fitted plot looks better than that of the log-transformed Gaussian model.
I proceeded to explore which explanatory variables were useful and which interactions may have been significant. The final model I selected was:

glm(formula = log(Energy_kWh) ~ Training_time_hour * Hardware_quantity + 
    Training_hardware + 0, family = Gamma(), data = df)

Interpreting Coefficients

However, when I started interpreting the model’s coefficients, something felt off. Since only the response variable was log-transformed, the effects of the predictors are multiplicative, and we need to exponentiate the coefficients to convert them back to the original scale. A one-unit increase in