C-squares
C-squares is a system of spatially unique, location-based identifiers for areas on the surface of the earth, represented as cells from a Discrete Global Grid at a hierarchical set of resolution steps. The identifiers incorporate literal values of latitude and longitude in an interleaved notation, together with additional digits that support intermediate grid resolutions of 5, 0.5, 0.05 degrees, etc. The system was initially designed to represent data "footprints" or spatial extents in a more flexible manner than a standard minimum bounding rectangle, and to support "lightweight", text-based spatial querying; it can also provide a set of identifiers for grid cells used for assembly, storage and analysis of spatially organised data. Dataset extents expressed in c-squares notation can be visualised using a web-based utility, the c-squares mapper, an online instance of which is currently provided by CSIRO Oceans and Atmosphere in Australia. C-squares codes and associated published software are free to use and the software is released under version 2 of the GNU General Public License, a licence of the Free Software Foundation.
History
The c-squares method was developed by Tony Rees at CSIRO Oceans and Atmosphere in Australia in 2001-2, initially as a method for spatial indexing, rapid query, and compact storage and visualization of dataset spatial "footprints" in an agency-specific metadata directory. It was later published in the scientific literature as a freely available tool for use by other workers, together with a web-accessible mapping utility entitled the "c-squares mapper" for visualisation of data extents expressed in the c-squares notation. Since that time, a number of projects and international collaborations have employed c-squares to support spatial indexing and/or map production, including Fishbase, the Ocean Biogeographic Information System, AquaMaps, data analysis to support the designation of marine biogeographic realms, for multi-national fisheries data collation by the Scientific, Technical and Economic Committee for Fisheries of the European Commission,, and for data reporting by ICES. For its application in displaying and modelling global biodiversity data, c-squares was one of four components cited in the award of the Ebbe Nielsen Prize to Rees by the Global Biodiversity Information Facility in 2014. The concept of representing dataset "footprints" as cells of spatial data of this nature and alignment has been stated to have been inspired by the data addressing method in the U.S. National Oceanographic Data Center "World Ocean Database" product, which uses 10 degree World Meteorological Organization squares for organising its data content, and the set of 1:100,000 topographic maps issued by the national mapping agency for Australia ; each map covers a 0.5 degree square and, with its associated mapsheet labels, can notionally be used as a unit of spatial identification.Rationale
Indexing spatial data
Spatial data are inherently 2-dimensional; without additional indexing, a numeric range query in 2 dimensions is required to retrieve data items within a particular area. Such queries are computationally expensive so it can be beneficial to pre-process the data in some manner that reduces the inherent dimensionality from two to one dimension, for example as labelled cells of a grid; the grid labels can then be indexed by standard, one dimensional methods for rapid search and retrieval, and/or searched by simple alphanumeric text searches. C-squares is an example of such a grid where the cell identifiers are designed to be human- as well as machine-readable, and to be concordant with recognizable and commonly intervals of latitude and longitude.Dataset footprints
, that is, data associated with particular geographic locations on the earth, have spatial "footprints" that ideally are recorded in metadata systems or data catalogues, to support spatial searching of the resources in question. A "basic" generalization of any data footprint is the minimum bounding rectangle or MBR, that is, the smallest set of boundaries of latitude and longitude that completely contain the data. Such stored rectangles are relatively simple to query as a mathematical operation but, with real world data, may not be a good surrogate for the true data "footprint" if the latter contains disjoint or sparsely populated data items, items on a diagonal line, or items with significant "holes", such as a vessel track around a continent. Representation as a set of smaller "tiles", such those denoted by c-square codes at an appropriate resolution, can more accurately reflect the shape of such non-rectangular and/or non-contiguous datasets.Data binning
"Binning" describes the process of converting continuously variable data into a set of discrete "bins" in order to apply the indexing and subsequent search/retrieval processes, other processing and reporting, etc. The optimal size of the spatial data "bins" can depend on the user's requirements, desire for handling large vs. small datasets and density of available data. Large "bins" will have smaller data handling requirements and smooth out data deficiencies, but will result in a loss of resolution of fine scale data; small "bins" can result in large quantities of data to be handled which may be too much for easy handling. To date, 0.5 degree cells have been found to be a reasonable compromise between resolution and data storage requirements for global coverages, e.g. as used for Aquamaps, while for more local applications, either 0.5 or 0.1 degree cells may be useful.Data reduction
One advantage of binning spatial data as described above is that it offers the potential for data reduction in some use cases: for example rather than storing hundreds of raw data points within the same bin, the data can be represented as an average value and/or number of data points held, or just as presence/absence. By this means, the quantity of information required to be stored in the spatial index and associated information can be substantially reduced in many cases, with a concomitant improvement in performance.Hierarchical representation
As a property of a discrete global grid, hierarchical notation ensures that the geocodes for finer resolutions of the mesh incorporate those of all their parents, permitting rapid search and/or data aggregation at any desired equal or higher level of the hierarchy. In the c-squares case, the code is extended by additional alphanumeric characters as the spatial resolution increases, with the corollary that resolution can be decreased if desired, merely by truncating the code by the relevant amount.Equal angle grids
Equal angle grids have the advantage that transformation of spatial data in and out of the grid notation can be simple, since the latitude–longitude grid is itself equal angle. On the actual surface of the globe, the cells are approximately "square" only adjacent to the equator, and become progressively narrower and tapered as they approach the poles, and cells adjoining the poles are unique in possessing three faces rather than four. By contrast, equal area grids attempt to preserve a constant area for all cells at the same hierarchical level, at the expense of losing concordance with familiar lines of latitude and/or longitude.Regional (local/national) vs. global grids
Local and/or national grids have been developed for use within a number of countries, for example the UK National Grid has been in use since 1946, while a separate system is in use for Ireland. Discontinuities occur where such grids meet or overlap, and some areas are not covered at all. Global grids offer a solution to this problem and also offer a potential format for collation of cross-national data into a single repository for analysis and reporting, for example see Vanhee et al., 2018.The c-squares global grid notation
Initial 10 degree squares
10-degree c-squares are specified as being identical to equivalent World Meteteorological Organization square codes, refer illustration at right. These squares are aligned with 10-degree subdivisions of the global latitude–longitude grid, which for c-squares use is specified as employing the WGS84 datum. WMO squares are encoded with four digits, in the series 1xxx, 3xxx, 5xxx and 7xxx. The leading digit indicates the "global quadrant" with 1 for north-east, 3 for south-east, 5 for south-west and 7 for north-west. The next digit, 0 through 8, corresponds to the tens of latitude degrees either north or south; while the remaining 2 digits, 00 through 17, correspond to the tens of longitude degrees either east or west. Thus the 10 degree cell with its lower left corner at 0,0 is encoded 1000, and acts as a bin to contain all spatial data between 0 and 10 degrees north and 0 and 9.999... degrees east; the 10 degree cell with its lower left corner at 80 N, 170 E is encoded 1817, and acts as a bin to contain all spatial data between 80 and 90 degrees north and 170 and 179.999... degrees east.Subsequent recursive subdivision
C-squares extends the initial WMO 10×10 square notation via a recursive series of "cycles", each 3 digits long, separated by the colon character, the number of characters indicating the resolution encoded, as per these examples:- 1000... 10×10 degree square
- 1000:1... 5×5 degree square
- 1000:100... 1×1 degree square
- 1000:100:1... 0.5×0.5 degree square
- 1000:100:100... 0.1×0.1 degree square
- 1000:100:100:1... 0.05×0.05 degree square
To produce the 1 or 3 digits in any cycle following the initial 4-digit, 10-degree square identifier, first an "intermediate quadrant", 1 through 4 is designated, where 1 indicates low absolute values of both latitude and longitude, 2 indicates low longitude and high latitude, 3 indicates high latitude and low longitude, and 4 indicates high values for both; "low" and high" being taken from the relevant portion of the data to be gridded. This leading digit in a cycle is then followed simply by the next applicable digit for first latitude and then longitude: thus an input value of latitude +11.0, longitude +12.0 degrees will be encoded as the 5 degree c-square code 1101:1 and the 1 degree code 1101:112. Inspection of this code will show that the input latitude value can be recovered directly from the digits 1101:112 while the longitude is included as 1101:112; the sign for these is both positive, as indicated by the first digit of the leading 4.
From 2002 onwards, an online "" is available at the website of CSIRO Marine Research which will convert input values of latitude and longitude to the equivalent c-square code at user selectable resolutions from 10 to 0.1 degree cell size. Alternatively it is a comparatively simple task to program from first principles according to the c-squares specification; an example is available .
C-squares strings, and the c-squares mapper
A set of c-squares can be represented as a concatenated list of individual square codes, separated by the "pipe" character, thus: 7500:110:3|7500:110:1|1500:110:3|1500:110:1. This set of squares can then serve as an indication of a dataset extent, similar in function to a MultiPolygon in the Well-known text representation of geometry, the functional difference being that defined points forming the boundary of a polygon can be continuously variable, while those for the c-square boundaries are constrained to fixed intervals from the grid square resolution in use. If these strings are stored, for example as "long text" within a field of a conventional text storage system they can be used for the operation of spatial searches.C-squares strings can also be used directly as input to an instance of the "c-squares mapper", a web-based utility in operation since 2002 at CSIRO in Australia and also at other global locations. To visualize the position of any set of squares on a map, the current syntax to address an installation of the "c-squares mapper" is :
.
It should be noted here that the above call to the c-squares mapper is a simple one, with only a single parameter which produces a simple "default map"; the mapper is in fact quite highly customizable, capable of accepting up to seven c-squares strings concurrently, plotting them in user-specified colours, with a choice of empty of filled squares, user-selectable base map, etc. etc.; a full list of available input parameters is provided on the mapper "technical information" page. A more sophisticated map produced using a larger number of available parameters is the colour-coded example at right.
Spatial searching
In a system that uses c-squares codes as units of spatial indexing, a text-based search on any of these square identifiers will retrieve data associated with the relevant square. If a wildcard search is supported, a search on "7500%" will retrieve all data items in that ten degree square, a search on "7500:1%" will retrieve all data items in that five degree square, etc.The asterisk character "*" has a special meaning in c-squares notation, being a "compact" notation indicating that all finer cells within a higher level cell are included, to the level of resolution indicated by the number of asterisks. In the example above, "7500:*" would indicate that all 4 five-degree cells within parent ten-degree cell "7500" are filled, "7500:***" would indicate that all 100 one-degree cells within parent ten-degree cell "7500" are filled, etc. This approach enables the filling of contiguous blocks of cells with an economy of characters in many cases, that is useful for efficient storage and transfer of c-squares codes as required.