Growing self-organizing map


A growing self-organizing map is a growing variant of a self-organizing map. The GSOM was developed to address the issue of identifying a suitable map size in the SOM. It starts with a minimal number of nodes and grows new nodes on the boundary based on a heuristic. By using the value called Spread Factor, the data analyst has the ability to control the growth of the GSOM.
All the starting nodes of the GSOM are boundary nodes, i.e. each node has the freedom to grow in its own direction at the beginning. New Nodes are grown from the boundary nodes. Once a node is selected for growing all its free neighboring positions will be grown new nodes. The figure shows the three possible node growth options for a rectangular GSOM.

The algorithm

The GSOM process is as follows:
  1. Initialization phase:
  2. #Initialize the weight vectors of the starting nodes with random numbers between 0 and 1.
  3. #Calculate the growth threshold for the given data set of dimension according to the spread factor using the formula
  4. Growing Phase:
  5. #Present input to the network.
  6. #Determine the weight vector that is closest to the input vector mapped to the current feature map, using Euclidean distance. This step can be summarized as: find such that where, are the input and weight vectors respectively, is the position vector for nodes and is the set of natural numbers.
  7. #The weight vector adaptation is applied only to the neighborhood of the winner and the winner itself. The neighborhood is a set of neurons around the winner, but in the GSOM the starting neighborhood selected for weight adaptation is smaller compared to the SOM. The amount of adaptation is also reduced exponentially over the iterations. Even within the neighborhood, weights that are closer to the winner are adapted more than those further away. The weight adaptation can be described by where the Learning Rate, is a sequence of positive parameters converging to zero as., are the weight vectors of the node before and after the adaptation and is the neighbourhood of the winning neuron at the th iteration. The decreasing value of in the GSOM depends on the number of nodes existing in the map at time.
  8. #Increase the error value of the winner.
  9. #When. Grow nodes if i is a boundary node. Distribute weights to neighbors if is a non-boundary node.
  10. #Initialize the new node weight vectors to match the neighboring node weights.
  11. #Initialize the learning rate to its starting value.
  12. #Repeat steps 2 – 7 until all inputs have been presented and node growth is reduced to a minimum level.
  13. Smoothing phase.
  14. #Reduce learning rate and fix a small starting neighborhood.
  15. #Find winner and adapt the weights of the winner and neighbors in the same way as in growing phase.

Applications

The GSOM can be used for many preprocessing tasks in Data mining, for Nonlinear dimensionality reduction, for approximation of principal curves and manifolds, for clustering and classification. It gives often the better representation of the data geometry than the SOM.