Select (SQL)


The SQL SELECT statement returns a result set of records from one or more tables.
A SELECT statement retrieves zero or more rows from one or more database tables or database views. In most applications, SELECT is the most commonly used data manipulation language command. As SQL is a declarative programming language, SELECT queries specify a result set, but do not specify how to calculate it. The database translates the query into a "query plan" which may vary between executions, database versions and database software. This functionality is called the "query optimizer" as it is responsible for finding the best possible execution plan for the query, within applicable constraints.
The SELECT statement has many optional clauses:
Given a table T, the query will result in all the elements of all the rows of the table being shown.
With the same table, the query will result in the elements from the column C1 of all the rows of the table being shown. This is similar to a projection in Relational algebra, except that in the general case, the result may contain duplicate rows. This is also known as a Vertical Partition in some database terms, restricting query output to view only specified fields or columns.
With the same table, the query will result in all the elements of all the rows where the value of column C1 is '1' being shown — in Relational algebra terms, a selection will be performed, because of the WHERE clause. This is also known as a Horizontal Partition, restricting rows output by a query according to specified conditions.
With more than one table, the result set will be every combination of rows. So if two tables are T1 and T2, will result in every combination of T1 rows with every T2 rows. E.g., if T1 has 3 rows and T2 has 5 rows, then 15 rows will result.
Although not in standard, most DBMS allows using a select clause without a table by pretending that an imaginary table with one row is used. This is mainly used to perform calculations where a table is not needed.
The SELECT clause specifies a list of properties by name, or the wildcard character to mean “all properties”.

Limiting result rows

Often it is convenient to indicate a maximum number of rows that are returned. This can be used for testing or to prevent consuming excessive resources if the query returns more information than expected. The approach to do this often varies per vendor.
In ISO, result sets may be limited by using
ISO introduced the FETCH FIRST clause.
According to PostgreSQL v.9 documentation, an SQL Window function performs a calculation across a set of table rows that are somehow related to the current row, in a way similar to aggregate functions.
The name recalls signal processing window functions. A window function call always contains an OVER clause.

ROW_NUMBER() window function

ROW_NUMBER OVER may be used for a simple table on the returned rows, e.g. to return no more than ten rows:

SELECT * FROM
OVER AS row_number,
columns
FROM tablename
) AS foo
WHERE row_number <= 10

ROW_NUMBER can be non-deterministic: if sort_key is not unique, each time you run the query it is possible to get different row numbers assigned to any rows where sort_key is the same. When sort_key is unique, each row will always get a unique row number.

RANK() window function

The RANK OVER window function acts like ROW_NUMBER, but may return more or less than n rows in case of tie conditions, e.g. to return the top-10 youngest persons:

SELECT * FROM OVER AS ranking,
person_id,
person_name,
age
FROM person
) AS foo
WHERE ranking <= 10

The above code could return more than ten rows, e.g. if there are two people of the same age, it could return eleven rows.

FETCH FIRST clause

Since ISO results limits can be specified as in the following example using the FETCH FIRST clause.
SELECT * FROM T
FETCH FIRST 10 ROWS ONLY

This clause currently is supported by CA DATACOM/DB 11, IBM DB2, SAP SQL Anywhere, PostgreSQL, EffiProz, H2, HSQLDB version 2.0, Oracle 12c and Mimer SQL.
Microsoft SQL Server 2008 and higher , but it is considered part of the ORDER BY clause. The ORDER BY, OFFSET, and FETCH FIRST clauses are all required for this usage.
SELECT * FROM T
ORDER BY acolumn DESC OFFSET 0 ROWS FETCH FIRST 10 ROWS ONLY

Non-standard syntax

Some DBMSs offer non-standard syntax either instead of or in addition to SQL standard syntax. Below, variants of the simple limit query for different DBMSes are listed:
SET ROWCOUNT 10
SELECT * FROM T
MS SQL Server
SELECT * FROM T
LIMIT 10 OFFSET 20
Netezza, MySQL, MariaDB, SAP SQL Anywhere, PostgreSQL, SQLite, HSQLDB, H2, Vertica, Polyhedra, Couchbase Server, Snowflake Computing, OpenLink Virtuoso
SELECT * from T
WHERE ROWNUM <= 10
Oracle
SELECT FIRST 10 * from T Ingres
SELECT FIRST 10 * FROM T order by a Informix
SELECT SKIP 20 FIRST 10 * FROM T order by c, d Informix
SELECT TOP 10 * FROM TMS SQL Server, SAP ASE, MS Access, SAP IQ, Teradata
SELECT * FROM T
SAMPLE 10
Teradata
SELECT TOP 20, 10 * FROM TOpenLink Virtuoso
SELECT TOP 10 START AT 20 * FROM TSAP SQL Anywhere
SELECT FIRST 10 SKIP 20 * FROM TFirebird
SELECT * FROM T
ROWS 20 TO 30
Firebird
SELECT * FROM T
WHERE ID_T > 10 FETCH FIRST 10 ROWS ONLY
DB2
SELECT * FROM T
WHERE ID_T > 20 FETCH FIRST 10 ROWS ONLY
DB2

Rows Pagination

Rows Pagination is an approach used to limit and display only a part of the total data of a query in the database. Instead of showing hundreds or thousands of rows at the same time, the server is requested only one page, and the user starts navigating by requesting the next page, and then the next one, and so on. It is very useful, specially in web systems, where there is no dedicated connection between the client and the server, so the client does not have to wait to read and display all the rows of the server.

Data in Pagination approach

  1. Select all rows from the database
  2. Read all rows but send to display only when the row_number of the rows read is between and
Select *
from
order by

Other simple method (a little more efficient than read all rows)

  1. Select all the rows from the beginning of the table to the last row to display
  2. Read the rows but send to display only when the row_number of the rows read is greater than
SQLDialect

select *
from
order by
FETCH FIRST ROWS ONLY
SQL ANSI 2008
Postgresql
SQL Server 2012
Derby
Oracle 12c
DB2 12

Select *
from
order by
LIMIT
MySQL
SQLite

Select TOP *
from
order by
SQL Server 2005

SET ROWCOUNT
Select *
from
order by
SET ROWCOUNT 0
Sybase, SQL Server 2000

Select *
FROM a
where rownum <=
Oracle 11

Method with positioning

  1. Select only rows starting from the next row to display
  2. Read and send to display all the rows read from the database
SQLDialect

Select *
from
order by
OFFSET ROWS
FETCH NEXT ROWS ONLY
SQL ANSI 2008
Postgresql
SQL Server 2012
Derby
Oracle 12c
DB2 12

Select *
from
order by
LIMIT OFFSET
MySQL
MariaDB
Postgresql
SQLite

Select *
from
order by
LIMIT,
MySQL
MariaDB
SQLite

select TOP
*, _offset=identity
into #temp
from
ORDER BY
select * from #temp where _offset >
DROP TABLE #temp
Sybase 12.5.3:

SET ROWCOUNT
select *, _offset=identity
into #temp
from
ORDER BY
select * from #temp where _offset >
DROP TABLE #temp
SET ROWCOUNT 0
Sybase 12.5.2:

select TOP *
from over as _offset
from
) xx
where _offset >

SQL Server 2005

SET ROWCOUNT
select *, _offset=identity
into #temp
from
ORDER BY
select * from #temp where _offset >
DROP TABLE #temp
SET ROWCOUNT 0
SQL Server 2000

SELECT * FROM a
WHERE rownum <=
WHERE _offset >=
Oracle 11

Method with filter (it is more sophisticated but necessary for very big dataset)

  1. Select only then rows with filter:
  2. # First Page: select only the first rows, depending on the type of database
  3. # Next Page: select only the first rows, depending on the type of database, where the is greater than
  4. # Previous Page: sort the data in the reverse order, select only the first rows, where the is less than , and sort the result in the correct order
  5. Read and send to display all the rows read from the database
First PageNext PagePrevious PageDialect

select *
from
order by
FETCH FIRST ROWS ONLY

select *
from
where >
order by
FETCH FIRST ROWS ONLY

select *
from a
order by
SQL ANSI 2008
Postgresql
SQL Server 2012
Derby
Oracle 12c
DB2 12

select *
from
order by
LIMIT

select *
from
where >
order by
LIMIT

select *
from a
order by
MySQL
SQLite

select TOP *
from
order by

select TOP *
from
where >
order by

select *
from a
order by
SQL Server 2005

SET ROWCOUNT
select *
from
order by
SET ROWCOUNT 0

SET ROWCOUNT
select *
from
where >
order by
SET ROWCOUNT 0

SET ROWCOUNT
select *
from a
order by
SET ROWCOUNT 0
Sybase, SQL Server 2000

select *
from a
where rownum <=

select *
from a
where rownum <=

select *
from a2
order by
Oracle 11

Hierarchical query

Some databases provide specialised syntax for hierarchical data.
A window function in is an aggregate function applied to a partition of the result set.
For example,
sum OVER
calculates the sum of the populations of all rows having the same city value as the current row.
Partitions are specified using the OVER clause which modifies the aggregate. Syntax:
:: =
OVER
The OVER clause can partition and order the result set. Ordering is used for order-relative functions such as row_number.

Query evaluation ANSI

The processing of a SELECT statement according to ANSI SQL would be the following:

Window function support by RDBMS vendors

The implementation of window function features by vendors of relational databases and SQL engines differs wildly. Most databases support at least some flavour of window functions. However, when we take a closer look it becomes clear that most vendors only implement a subset of the standard. Let's take the powerful RANGE clause as an example. Only Oracle, DB2, Spark/Hive, and Google Big Query fully implement this feature. More recently, vendors have added new extensions to the standard, e.g. array aggregation functions. These are particularly useful in the context of running SQL against a distributed file system where we have weaker data co-locality guarantees than on a distributed relational database. Rather than evenly distributing the data across all nodes, SQL engines running queries against a distributed filesystem can achieve data co-locality guarantees by nesting data and thus avoiding potentially expensive joins involving heavy shuffling across the network. User-defined aggregate functions that can be used in window functions are another extremely powerful feature.

Generating data in T-SQL

Method to generate data based on the union all

select 1 a, 1 b union all
select 1, 2 union all
select 1, 3 union all
select 2, 1 union all
select 5, 1

SQL Server 2008 supports the "row constructor" specified in the SQL3 standard

select *
from,,,, ) as x