Longest palindromic substring


In computer science, the longest palindromic substring or longest symmetric factor problem is the problem of finding a maximum-length contiguous substring of a given string that is also a palindrome. For example, the longest palindromic substring of "bananas" is "anana". The longest palindromic substring is not guaranteed to be unique; for example, in the string "abracadabra", there is no palindromic substring with length greater than three, but there are two palindromic substrings with length three, namely, "aca" and "ada". In some applications it may be necessary to return all maximal palindromic substrings rather than returning only one substring or returning the maximum length of a palindromic substring.
invented a linear time algorithm for listing all the palindromes that appear at the start of a given string. However, as observed e.g., by, the same algorithm can also be used to find all maximal palindromic substrings anywhere within the input string, again in linear time. Therefore, it provides a linear time solution to the longest palindromic substring problem. Alternative linear time solutions were provided by, and by, who described a solution based on suffix trees. Efficient parallel algorithms are also known for the problem.
The longest palindromic substring problem should not be confused with the different problem of finding the longest palindromic subsequence.

Manacher's algorithm

To find in linear time a longest palindrome in a string, an algorithm may take advantage of the following characteristics or observations about a palindrome and a sub-palindrome:
  1. The left side of a palindrome is a mirror image of its right side.
  2. A third palindrome whose center is within the right side of a first palindrome will have exactly the same length as a second palindrome anchored at the mirror center on the left side, if the second palindrome is within the bounds of the first palindrome by at least one character. Such as "dacabacad", the whole string is the first palindrome, "aca" in the left side as second palindrome, "aca" in the right side as third palindrome. In this case, the second and third palindrome have exactly the same length.
  3. If the second palindrome meets or extends beyond the left bound of the first palindrome, then the distance from the center of the second palindrome to the left bound of the first palindrome is exactly equal to the distance from the center of the third palindrome to the right bound of the first palindrome.
  4. To find the length of the third palindrome under Case 2, the next character after the right outermost character of the first palindrome would then be compared with its mirror character about the center of the third palindrome, until there is no match or no more characters to compare.
  5. Neither the first nor second palindrome provides information to help determine the palindromic length of a fourth palindrome whose center is outside the right side of the first palindrome.
  6. It is therefore desirable to have a palindrome as a reference that possesses characters farthest to the right in a string when determining from left to right the palindromic length of a substring in the string.
  7. Regarding the time complexity of palindromic length determination for each character in a string: there is no character comparison for Case 1, while for Cases 2 and 3 only the characters in the string beyond the right outermost character of the reference palindrome are candidates for comparison.
  8. For even-length palindromes, the center is at the boundary of the two characters in the middle.

    Implementation


import java.util.Arrays;
public class ManachersAlgorithm

Pseudocode with explanation

To help understand how this algorithm works, the following is commented pseudo-code that explains what's going on:
given string S
generate S' by inserting a bogus character between each character in S
Create array P to store the lengths of the palindrome for each center point in S

track the following pointers :
R -> the next element to be examined
C -> the largest/left-most palindrome whose right boundary is R-1
i -> the next palindrome to be calculated
L -> character candidate for comparing with R. Computed implicitly as:
L = i −
i' -> the palindrome mirroring i from C. Computed implicitly as:
i' = C −

while R < P.length:
If i is within the palindrome at C :
Set P = P

Expand the palindrome at i :
while S' S':
increment P
increment R

If the palindrome at i extends past the palindrome at C:
Update C = i

increment i

return max
This diverges a little from Manacher's original algorithm primarily by deliberately declaring and operating on in such a way to help show that the runtime is in fact linear. You can see in the pseudo-code that, and i are all monotonically increasing, each stepping through the elements in S' and P. .
The use of S' provides a couple of simplifications for the code: it provides a string aligned to allowing direct use of the pointers in both arrays and it implicitly enables the inner while-loop to double-increment P and R.