SQL basics (Part 1)

Let’s take a look at some SQL basics.

Note: Unless otherwise stated, the queries are tested in SQL Server only

Select

So to retrieve data from a database using SQL we write

select <columns> from <table>

and optionally we can add a where clause to reduce the result set based upon Boolean logic.

select <columns> from <table> where <boolean logic>

We can view all columns within a table using the wildcard * however in a production environment (at least where an application is retrieving data from the database) we would be better off specifying the columns we want to improve performance and ensure that if new columns are added or the likes, our query still produce the same “expected” result set, column-wise anyway.

So using the wild card we would have something like

select * from Plants

or specifying the required columns we use something like

select CommonName from Plants

or retrieving multiple columns

select CommonName from Plants
select CommonName, Genus from Plants

When specifying columns we might be querying from multiple tables so it’s best to alias the table name in case of column name duplication across different tables, hence we get something like

select p.CommonName from Plants p

Aliasing columns

Using the alias table query (from above) we will get a result set (in SQL Server Management Studio) with the column name CommonName. We can assign an alias to the column during a query and therefore change the name output for the column, for example

select p.CommonName as 'Common Name' from Plants p

this will output a column named Common Name now.

Aliasing can also be used on the output from functions etc. So, for example

select count(*) as Count from Plants p

will output a column named Count.

Count

Count returns a scalar value. We can get the number of rows in a table using the wildcard

select count(*) from Plants

This will return the total number of rows in the Plants table, however using a column name within count returns the number of non-NULL rows, hence

select count(p.CommonName) from Plants p

Min/Max

Both Min and Max returns a single value indicating the minimum (obviously using MIN) or maximum (obviously using MAX) non-NULL value from a column, both can be used on numeric or non-numeric columns.

Here’s an example of the usage

select max(p.Height) from Plants p

The above returns the maximum height found in the Plants table and

select min(p.Height) from Plants p

returns the minimum height.

AVG

AVG returns a single value indicating the average value within a selected column. AVG only works on numeric columns.

select avg(p.Height) from Plants p

SUM

SUM can be used on a numeric column and return the SUM of all values

select sum(p.Height) from Plants p

DISTINCT

The distinct keyword allows us to get only distinct (non-duplicating) values, i.e.

select distinct p.Genus from Plants p

The above will basically remove duplicates.

GROUP BY

We can create sub groups from our data using the GROUP BY clause. For example say we want to duplicate the DISTINCT functionality by only returning all plants with no duplicate common names we can do this with GROUP BY as

select p.CommonName from Plants p group by p.CommonName

Basically we create groups based upon the CommonName and then output each group’s CommonName. But we can think of the result set as groups (or arrays) of data so we can also do things like list a count for the number of duplicated names against each group

select p.CommonName, count(p.CommonName) from Plants p group by p.CommonName

This will now list each distinct group name and list a count alongside it to show how many items have that CommonName.

HAVING

The HAVING clause is used in a similar way to the WHERE clause but for GROUP BY result sets. An example might be where we’ve grouped by the CommonName of a plant in our database but are only interested in those names with more than a certain number of occurrences, thus

select p.CommonName, count(p.CommonName) 
from Plants p 
group by p.CommonName having count(p.CommonName) > 3

Now we’re basically get a result set with the CommonName and the Count for those CommonNames duplicated more than 3 times.

Note: In SQL Server we cannot alias the count(p.CommonName) as c, for example and then use c in the having clause, whereas MySQL does allow this syntax