Grouping with ROW_NUMBER
Want to group a result set by only one field, but include others in the result set? You can use the ROW_NUMBER function with its PARTITION BY parameter.
To play with this, let’s create some sample data:
Now we have a table with sample data:
Using the data we just created, we want to show only the most recent transaction for EACH customer. We can do a simple GROUP BY, but if we include TransactionID in the result set, we are forced to get a record for each TransactionID, since it has to be contained in the GROUP BY’s list of non-aggregates:
We get the full result set – not what we want. To include TransactionID (or any other field that may be in the table) in the result set, but grouping only on CustID, we can include the ROW_NUMBER function in the select list, using the PARTITION BY clause:
We get this:
Using the ROW_NUMBER function in the select list (with out the PARTITION BY parameter) simply gives us the unique row number for each record. When we use the PARTITION BY clause, it groups on CustID. The ORDER BY clause is required, and necessary for our results – this allows us to make the most recent SaleDate = row 1.
I give credit to Dan Humphries for showing examples of ROW_NUMBER here.