PARTIAL CORRELATION

Gavril D'Souza
6 min readNov 10, 2021

--

What exactly is partial correlation?

In order to understand what is partial correlation we have to know what is correlation. In simple terms correlation coefficients are basically the strengths between the linear relations of 2 variables like X and Y. there are 3 types of correlations.

  • Positive correlation: when the correlation coefficients is greater than zero
  • No relation: when there is absolutely no relation between X and Y
  • Negative relation: when the correlation coefficients is less than zero

Now Partial correlation is basically a method that is usually used to describe the relationship between two variables while taking away the possible effects of another variable, or several other variables, on a particular relationship. Partial correlation is mostly thought of in terms of multiple regression

If we desire to find to what amount or extent there is a numerical relation between two variables that we want to study, using their correlation coefficient will possibly give misleading results if there is another confounding variable that is numerically related to both variables of similar interest. This misleading information can be usually avoided by mostly controlling for the confounding variable, which is enabled by computing the partial correlation coefficient

Somewhat Like the correlation coefficient, the PCC or partial correlation coefficient takes on a value in the range from –1 to 1. The value –1 shows a perfect negative correlation controlling for some variables . That is, an exact linear relationship in which higher values of one variable are associated with lower values of the other. The value +1 tells us a perfect positive linear relationship, and the value 0 tells us that there is no linear relationship.

Definition:

the partial correlation between X and Y given a set of n controlling variables Z = {Z1, Z2, …, Zn}, written ρXY·Z, is the correlation between the residuals eX and eY resulting from the linear regression of X with Z and of Y with Z, respectively.

Computation

We can can compute the PCC of 2 variables in three ways

  • Using linear regression

An easy and an understanding way to find the sample partial correlation for some given data is to compute the two associated linear regression problems, find the residuals, and finally compute the correlation between the residuals. Let us assume X and Y are random variables taking real values, and Z be the n-dimensional vector-valued random variable. We can write xi, yi and zi denoting the ith of N some independent and identically distributed observations from some joint probability distribution over the real random variables (X, Y & Z) with zi having being augmented with a 1 to allow for a constant value or term in the regression.

Calculating the linear regression problem mounts up to finding (n+1)-dimensional regression coefficient vectors 𝐖*x and 𝐖*y

Where

N — No of observations

{w, z} — Scalar products between w and z. The residuals can be given as

and

and then sample partial correlation is then calculated by the usual formula for sample correlation, but between the new derived values:

  • Using recursive formula

Solving linear regression problems can tend to become computationally a bit expensive.as a matter of fact, the nth-order partial correlation (with |Z| = n) can be easily solved from three (n — 1)th-order partial correlations. The zeroth-order partial correlation ρXY·Ø is usually defined to be the regular correlation coefficient ρXY. It holds, for any Z₀ ∊ Z that,

  • Using matrix inversion

In O(n³) time, another approach is which allows all partial correlations to be calculated between any two variables Xi and Xj of a set V of cardinality n, given all other variables V\{Xi, Xj} if the covariance matrix Ω = (ρXiXj), is positive definite and therefore invertible. defining the precision matrix P = (pij ) = Ω−1, we get;

Representation of plotting of partial correlation coefficients in R with spearman estimates

Limitations of partial correlation

But using partial correlation in statistics has its own limitations. Some of them are;

  • The reliability of the result tends to decrease as the order of the partial correlation coefficients tend to go up
  • If calculations for partial correlation are done manually or mathematically it can become very irritating, hence only the use of softwares makes it easier.
  • The normal calculations for partial correlation coefficients is usually based on the simple correlation coefficients. Since simple correlation coefficients assume linear relations it is not generally valid as linear relations do not exist in social sciences.

Use cases

There are several use cases of partial correlation. This article has listed 2 of them

  • Time series analysis

The use of partial correlation can be used in time series analysis. Here the partial autocorrelation function of a time series is defined, for lag h, as

This function is mainly used to determine the appropriate lag length for an autoregression.

  • Semipartial correlation

Like partial correlation ,The semipartial (or part) correlation statistic is quite similar.. Both compare variations of two variables after a certain amount of factors are controlled for, but in order to calculate the semi partial correlation one must hold the third variable constant for either X or Y but never both, but for the partial correlation one holds the third variable constant for both. The semipartial correlation compares the unique variation of one variable, with the variation(unfiltered) of the other, while the partial correlation compares the variation(unique) of one variable to the unique variation of the other.

Conclusion

Partial correlation mainly means the ability to describe the relationship between two variables while taking away the possible effects of another variable, or several other variables, on a particular relationship.Partial correlation is mostly thought of in terms of multiple regression. Partial correlation coefficient takes on a value in the range from –1 to 1. There are three main ways to calculate partial correlation namely linear regression, using recursive function and matrix inversion. But partial correlation also has its own limitations such as it is cumbersome in manual calculations(mathematically), result reliability is also an issue and the main assumption that correlation coefficients takes a linear relationship of the data points which is not a valid phenomenon in social experiments/science. There are 2 main cases of partial correlation that are listed here i.e time series analysis and semipartial correlation.

--

--

Gavril D'Souza
Gavril D'Souza

Written by Gavril D'Souza

A Designer talking about how he finds wisdom through trial and error | Owner of SimpleLoom | UI/UX