Segmented regression

Segmented regression, also known as piecewise regression or broken-stick regression, is a method in regression analysis in which the independent variable is partitioned into intervals and a separate line segment is fit to each interval. Segmented regression analysis can also be performed on multivariate data by partitioning the various independent variables. Segmented regression is useful when the independent variables, clustered into different groups, exhibit different relationships between the variables in these regions. The boundaries between the segments are breakpoints.
Segmented linear regression is segmented regression whereby the relations in the intervals are obtained by linear regression.

Segmented linear regression, two segments

Segmented linear regression with two segments separated by a breakpoint can be useful to quantify an abrupt change of the response function of a varying influential factor. The breakpoint can be interpreted as a critical, safe, or threshold value beyond or below which desired effects occur. The breakpoint can be important in decision making
The figures illustrate some of the results and regression types obtainable.
A segmented regression analysis is based on the presence of a set of data, in which y is the dependent variable and x the independent variable.
The least squares method applied separately to each segment, by which the two regression lines are made to fit the data set as closely as possible while minimizing the sum of squares of the differences between observed and calculated values of the dependent variable, results in the following two equations:

Yr = A₁.x + K₁ for x < BP
Yr = A₂.x + K₂ for x > BP

where:
The data may show many types or trends, see the figures.
The method also yields two correlation coefficients :

for x < BP

and

for x > BP

where:
and
In the determination of the most suitable trend, statistical tests must be performed to ensure that this trend is reliable.
When no significant breakpoint can be detected, one must fall back on a regression without breakpoint.

Example

For the blue figure at the right that gives the relation between yield of mustard and soil salinity it is found that:
BP = 4.93, A₁ = 0, K₁ = 1.74, A₂ = −0.129, K₂ = 2.38, R₁² = 0.0035, R₂² = 0.395 and:

Ym = 1.74 t/ha for Ss < 4.93
Ym = −0.129 Ss + 2.38 t/ha for Ss > 4.93

indicating that soil salinities < 4.93 dS/m are safe and soil salinities > 4.93 dS/m reduce the yield @ 0.129 t/ha per unit increase of soil salinity.
The figure also shows confidence intervals and uncertainty as elaborated hereunder.

Test procedures

The following statistical tests are used to determine the type of trend:

significance of the breakpoint by expressing BP as a function of regression coefficients A₁ and A₂ and the means Y₁ and Y₂ of the y-data and the means X₁ and X₂ of the x data, using the laws of propagation of errors in additions and multiplications to compute the standard error of BP, and applying Student's t-test
significance of A₁ and A₂ applying Student's t-distribution and the standard error SE of A₁ and A₂
significance of the difference of A₁ and A₂ applying Student's t-distribution using the SE of their difference.
significance of the difference of Y₁ and Y₂ applying Student's t-distribution using the SE of their difference.
A more formal statistical approach to test for the existence of a breakpoint, is via the pseudo score test which does not require estimation of the segmented line.

In addition, use is made of the correlation coefficient of all data, the coefficient of determination or coefficient of explanation, confidence intervals of the regression functions, and ANOVA analysis.
The coefficient of determination for all data, that is to be maximized under the conditions set by the significance tests, is found from:
where Yr is the expected value of y according to the former regression equations and Ya is the average of all y values.
The Cd coefficient ranges between 0 to 1.
In a pure, unsegmented, linear regression, the values of Cd and Ra² are equal. In a segmented regression, Cd needs to be significantly larger than Ra² to justify the segmentation.
The optimal value of the breakpoint may be found such that the Cd coefficient is maximum.

No-effect range

Segmented regression is often used to detect over which range an explanatory variable has no effect on the dependent variable, while beyond the reach there is a clear response, be it positive or negative.
The reach of no effect may be found at the initial part of X domain or conversely at its last part. For the "no effect" analysis, application of the least squares method for the segmented regression analysis may not be the most appropriate technique because the aim is rather to find the longest stretch over which the Y-X relation can be considered to possess zero slope while beyond the reach the slope is significantly different from zero but knowledge about the best value of this slope is not material. The method to find the no-effect range is progressive partial regression over the range, extending the range with small steps until the regression coefficient gets significantly different from zero.
In the next figure the break point is found at X=7.9 while for the same data, the least squares method yields a break point only at X=4.9. The latter value is lower, but the fit of the data beyond the break point is better. Hence, it will depend on the purpose of the analysis which method needs to be employed.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...