Duplicate code

Duplicate code is a computer programming term for a sequence of source code that occurs more than once, either within a program or across different programs owned or maintained by the same entity. Duplicate code is generally considered undesirable for a number of reasons. A minimum requirement is usually applied to the quantity of code that must appear in a sequence for it to be considered duplicate rather than coincidentally similar. Sequences of duplicate code are sometimes known as code clones or just clones, the automated process of finding duplications in source code is called clone detection.
Two code sequences may be duplicates of each other without being character-for-character identical, for example by being character-for-character identical only when white space characters and comments are ignored, or by being token-for-token identical, or token-for-token identical with occasional variation. Even code sequences that are only functionally identical may be considered duplicate code.

Emergence

Some of the ways in which duplicate code may be created are:

copy and paste programming, which in academic settings may be done as part of plagiarism
scrounging, in which a section of code is copied "because it works". In most cases this operation involves slight modifications in the cloned code, such as renaming variables or inserting/deleting code. The language nearly always allows one to call one copy of the code from different places, so that it can serve multiple purposes, but instead the programmer creates another copy, perhaps because they
* do not understand the language properly
* do not have the time to do it properly, or
* do not care about the increased active software rot.

It may also happen that functionality is required that is very similar to that in another part of a program, and a developer independently writes code that is very similar to what exists elsewhere. Studies suggest that such independently rewritten code is typically not syntactically similar.
Automatically generated code, where having duplicate code may be desired to increase speed or ease of development, is another reason for duplication. Note that the actual generator will not contain duplicates in its source code, only the output it produces.

Fixing

Duplicate code is most commonly fixed by moving the code into its own unit and calling that unit from all of the places where it was originally used. Using a more open-source style of development, in which components are in centralized locations, may also help with duplication.

Costs and benefits

Code which includes duplicate functionality is more difficult to support,

simply because it is longer, and
because if it needs updating, there is a danger that one copy of the code will be updated without further checking for the presence of other instances of the same code.

On the other hand, if one copy of the code is being used for different purposes, and it is not properly documented, there is a danger that it will be updated for one purpose, but this update will not be required or appropriate to its other purposes.
These considerations are not relevant for automatically generated code, if there is just one copy of the functionality in the source code.
In the past, when memory space was more limited, duplicate code had the additional disadvantage of taking up more space, but nowadays this is unlikely to be an issue.
When code with a software vulnerability is copied, the vulnerability may continue to exist in the copied code if the developer is not aware of such copies.
Refactoring duplicate code can improve many software metrics, such as lines of code, cyclomatic complexity, and coupling. This may lead to shorter compilation times, lower cognitive load, less human error, and fewer forgotten or overlooked pieces of code. However, not all code duplication can be refactored.
Clones may be the most effective solution if the programming language provides inadequate or overly complex abstractions, particularly if supported with user interface techniques such as simultaneous editing. Furthermore, the risks of breaking code when refactoring may outweigh any maintenance benefits.
A study by Wagner, Abdulkhaleq, and Kaya concluded that while additional work must be done to keep duplicates in sync, if the programmers involved are aware of the duplicate code there weren't significantly more faults caused than in unduplicated code.

Detecting duplicate code

A number of different algorithms have been proposed to detect duplicate code. For example:

Baker's algorithm.
Rabin–Karp string search algorithm.
Using Abstract Syntax Trees.
Visual clone detection.
Count Matrix Clone Detection.
Locality-sensitive hashing
Anti-unification
Example of functionally duplicate code

Consider the following code snippet for calculating the average of an array of integers

extern int array_a;
extern int array_b;
int sum_a = 0;
for
sum_a += array_a;
int average_a = sum_a / 4;
int sum_b = 0;
for
sum_b += array_b;
int average_b = sum_b / 4;

The two loops can be rewritten as the single function:

int calc_average_of_four

or, usually preferably, by parameterising the number of elements in the array.
Using the above function will give source code that has no loop duplication:

extern int array1;
extern int array2;
int average1 = calc_average_of_four;
int average2 = calc_average_of_four;

Note that in this trivial case, the compiler may choose to inline both calls to the function, such that the resulting machine code is identical for both the duplicated and non-duplicated examples above. If the function is not inlined, then the additional overhead of the function calls will probably take longer to run. Theoretically, this additional time to run could matter.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...

Duplicate code

Emergence

Fixing

Costs and benefits

Detecting duplicate code

Example of functionally duplicate code