Let’s take a look at some SQL basics.
Note: Unless otherwise stated, the queries are tested in SQL Server only
Select
So to retrieve data from a database using SQL we write
select <columns> from <table>
and optionally we can add a where clause to reduce the result set based upon Boolean logic.
select <columns> from <table> where <boolean logic>
We can view all columns within a table using the wildcard * however in a production environment (at least where an application is retrieving data from the database) we would be better off specifying the columns we want to improve performance and ensure that if new columns are added or the likes, our query still produce the same “expected” result set, column-wise anyway.
So using the wild card we would have something like
select * from Plants
or specifying the required columns we use something like
select CommonName from Plants
or retrieving multiple columns
select CommonName from Plants
select CommonName, Genus from Plants
When specifying columns we might be querying from multiple tables so it’s best to alias the table name in case of column name duplication across different tables, hence we get something like
select p.CommonName from Plants p
Aliasing columns
Using the alias table query (from above) we will get a result set (in SQL Server Management Studio) with the column name CommonName. We can assign an alias to the column during a query and therefore change the name output for the column, for example
select p.CommonName as 'Common Name' from Plants p
this will output a column named Common Name now.
Aliasing can also be used on the output from functions etc. So, for example
select count(*) as Count from Plants p
will output a column named Count.
Count
Count returns a scalar value. We can get the number of rows in a table using the wildcard
select count(*) from Plants
This will return the total number of rows in the Plants table, however using a column name within count returns the number of non-NULL rows, hence
select count(p.CommonName) from Plants p
Min/Max
Both Min and Max returns a single value indicating the minimum (obviously using MIN) or maximum (obviously using MAX) non-NULL value from a column, both can be used on numeric or non-numeric columns.
Here’s an example of the usage
select max(p.Height) from Plants p
The above returns the maximum height found in the Plants table and
select min(p.Height) from Plants p
returns the minimum height.
AVG
AVG returns a single value indicating the average value within a selected column. AVG only works on numeric columns.
select avg(p.Height) from Plants p
SUM
SUM can be used on a numeric column and return the SUM of all values
select sum(p.Height) from Plants p
DISTINCT
The distinct keyword allows us to get only distinct (non-duplicating) values, i.e.
select distinct p.Genus from Plants p
The above will basically remove duplicates.
GROUP BY
We can create sub groups from our data using the GROUP BY clause. For example say we want to duplicate the DISTINCT functionality by only returning all plants with no duplicate common names we can do this with GROUP BY as
select p.CommonName from Plants p group by p.CommonName
Basically we create groups based upon the CommonName and then output each group’s CommonName. But we can think of the result set as groups (or arrays) of data so we can also do things like list a count for the number of duplicated names against each group
select p.CommonName, count(p.CommonName) from Plants p group by p.CommonName
This will now list each distinct group name and list a count alongside it to show how many items have that CommonName.
HAVING
The HAVING clause is used in a similar way to the WHERE clause but for GROUP BY result sets. An example might be where we’ve grouped by the CommonName of a plant in our database but are only interested in those names with more than a certain number of occurrences, thus
select p.CommonName, count(p.CommonName)
from Plants p
group by p.CommonName having count(p.CommonName) > 3
Now we’re basically get a result set with the CommonName and the Count for those CommonNames duplicated more than 3 times.
Note: In SQL Server we cannot alias the count(p.CommonName) as c, for example and then use c in the having clause, whereas MySQL does allow this syntax