SQL style guide

Overview

This style guide serves as a best-practices overview for writing good SQL in the DI-Team at Ricardo.ch

It has been forked from the one written by Simon Holywell and adjusted by the team to be used writing SQL statements on BigQuery.

SQL style guide by Simon Holywell is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

General

Do

SELECT fileHash  -- stored ssdeep hash
  FROM file_system
 WHERE fileName = '.vimrc';
/* Updating the file record after writing to the file */
UPDATE file_system
   SET fileModifiedDate = '1980-02-22 13:19:01.00000',
       fileSize = 209732
 WHERE fileName = '.vimrc';

Avoid

Naming conventions

General

SELECT firstName
  FROM staff;

Tables

Columns

Aliasing or correlations

SELECT firstName AS name
  FROM staff     AS sta
  JOIN students  AS stu
    ON stu.mentorId = sta.staffNum;
SELECT SUM(s.monitorTally) AS monitorTotal
  FROM staff AS s;

Uniform suffixes

The following suffixes have a universal meaning ensuring the columns can be read and understood easily from SQL code. Use the correct suffix where appropriate.

Query syntax

Reserved words

Always use uppercase for the reserved keywords like SELECT and WHERE.

It is best to avoid the abbreviated keywords and use the full length ones where available (prefer ABSOLUTE to ABS).

SELECT modelNum
  FROM phones
 WHERE phones.releaseDate > '2014-09-30';

White space

To make the code easier to read it is important that the correct complement of spacing is used. Do not crowd code or remove natural language spaces.

Spaces

Spaces, never tabs, should be used to line up the code so that the root keywords all end on the same character boundary. This forms a river down the middle making it easy for the readers eye to scan over the code and separate the keywords from the implementation detail. Rivers are bad in typography, but helpful here.

(SELECT speciesName,
        height     AS speciesHeight,
        diameter   AS speciesDiameter
   FROM flora
  WHERE speciesName = 'Banksia'
     OR speciesName = 'Sheoak'
     OR speciesName = 'Wattle'
  GROUP BY speciesName, observationDate)

  UNION ALL

(SELECT speciesName,
        height     AS speciesHeight,
        diameter   AS speciesDiameter
   FROM botanic_garden_flora
  WHERE speciesName = 'Banksia'
     OR speciesName = 'Sheoak'
     OR speciesName = 'Wattle'
  GROUP BY speciesName, observationDate)

Notice that SELECT, FROM, etc. are all right aligned while the actual column names and implementation specific details are left aligned.

Although not exhaustive always include spaces:

SELECT title,
       releaseDate,
       recordingDate
  FROM albums
 WHERE title = 'Charcoal Lane'
    OR title = 'The New Danger';

Line spacing

Always include newlines/vertical space:

Keeping all the keywords aligned to the righthand side and the values left aligned creates a uniform gap down the middle of query. It makes it much easier to scan the query definition over quickly too.

INSERT INTO albums (title, releaseDate, recordingDate)
VALUES ('Charcoal Lane', '1990-01-01 01:01:01.00000', '1990-01-01 01:01:01.00000'),
       ('The New Danger', '2008-01-01 01:01:01.00000', '1990-01-01 01:01:01.00000');
UPDATE albums
   SET releaseDate = '1990-01-01 01:01:01.00000'
 WHERE title = 'The New Danger';
SELECT title,
       releaseDate,
       recordingDate,
       productionDate
  FROM albums
 WHERE title = 'Charcoal Lane'
    OR title = 'The New Danger';

Indentation

To ensure that SQL is readable it is important that standards of indentation are followed.

Joins

INNER JOINs should are the standard and therefore simply expressed as JOIN. Joins should be aligned as follows. Joins should be indented to the other side of the river and grouped with a new line where necessary.

SELECT rid.lastName
  FROM riders AS rid
  JOIN bikes AS bik
    ON rid.bikeVinNum = bik.vinNum
   AND bik.engines > 2
  JOIN crew
    ON rid.crewChiefLastName = crew.lastName
   AND crew.chief = 'Y';

Subqueries

The use of subqueries should be limited to the absolute minimum necessary. The use of WITH queries is encouraged. Subqueries should be indented and aligned to the right side of the river and then laid out using the same style as any other query. Sometimes it will make sense to have the closing parenthesis on a new line at the same character position as its opening partner—this is especially true where you have nested subqueries.

SELECT rid.lastName,
       (SELECT MAX(YEAR(championshipDate))
          FROM champions champ
         WHERE champ.lastName = rid.lastName
           AND champ.confirmed = 'Y') AS lastChampionshipYear
  FROM riders AS rid
 WHERE rid.lastName IN
       (SELECT champ.lastName
          FROM champions AS champ
         WHERE YEAR(championshipDate) > '2008'
           AND champ.confirmed = 'Y');

Preferred formalisms

SELECT CASE postcode
       WHEN 'BN1' THEN 'Brighton'
       WHEN 'EH1' THEN 'Edinburgh'
       END AS city
  FROM officeLocations
 WHERE country = 'United Kingdom'
   AND openingTime BETWEEN 8 AND 9
   AND postcode IN ('EH1', 'BN1', 'NN1', 'KW1')

Create syntax

When declaring schema information it is also important to maintain human readable code. To facilitate this ensure the column definitions are ordered and grouped where it makes sense to do so.

Indent column definitions by four (4) spaces within the CREATE definition.

Choosing data types

Specifying default values

Constraints and keys

Constraints and their subset, keys, are a very important component of any database definition. They can quickly become very difficult to read and reason about though so it is important that a standard set of guidelines are followed. As we are currently only consuming data from other systems the definition of primary keys and constraints is currently out of scope for this document.

Example
CREATE TABLE staff (
    staff_num      INTEGER(5)       NOT NULL,
    first_name     VARCHAR(100) NOT NULL,
    pens_in_drawer INTEGER(2)       NOT NULL
);

Designs to avoid