Monthly Archives: February 2010

RANDOMIZING RESULTS WITH NEWID

Obtaining a random selection from SQL Server is easy using the NEWID system function. NEWID generates a globally unique identifier (GUID) that can be used for many purposes. Globally unique means that NEWID’s result is virtually guaranteed to be unique across the system (more about this later).

Let’s picture a hypothetical scenario in which you have a monthly lottery at your company to reward customers placing the largest number of orders. You want to select one customer – a random customer from the top 10 customers that have placed the largest volume of orders for the month. This random customer will be given a 5% discount on all orders for the entire month following.

First, let’s find our top 10 ordering customers for the month:

USE Northwind
GO
SELECT TOP 10 COUNT(orderid) AS order_count, customerid
FROM orders
WHERE orderdate BETWEEN ‘1998-02-01′ AND ‘1998-03-01′
GROUP BY customerid
ORDER BY 1 DESC

We have a result set that shows ties for some customers, emphasizing the need to use a random selection in this scenario. We can now use a nested SELECT statement that uses the NEWID function to randomly select 1 customer from our group of top 10 orderers, calculating the date range:

USE Northwind
GO
SELECT TOP 1 customerid, order_count
FROM
(
SELECT TOP 10 COUNT(orderid) AS order_count, customerid
FROM orders
WHERE orderdate BETWEEN ‘1998-02-01′ AND ‘1998-03-01′
GROUP BY customerid
ORDER BY 1 DESC
)x
ORDER BY NEWID()

Let’s run it again to be sure we are getting a random selection:

As expected, it returns a random customerid, along with that customer’s order count for the month. If we run the query again, a random selection is generated every time.
So what does NEWID return exactly? NEWID returns a GUID – a globally unique identifier. This is a value that is actually of SQL Server data type uniqueidentifier, and is a long hexadecimal string separated by dashes:

SELECT NEWID()

NEWID is not the only function that will generate a random number – we also have RAND. So why not use the RAND function to solve our previous scenario? At first glance, RAND may appear to offer the same randomizing operations as NEWID. Let’s try using RAND in place of NEWID:

USE Northwind
GO
SELECT TOP 1 customerid, order_count
FROM
(
SELECT TOP 10 COUNT(orderid) AS order_count, customerid
FROM orders
WHERE orderdate BETWEEN ‘1998-02-01′ AND ‘1998-03-01′
GROUP BY customerid
ORDER BY 1 DESC
)x
ORDER BY RAND()

Running the query again returns the exact same results:

Why is this? Because the RAND function does return a random number for the current session, but will always return the exact same number when run multiple times in the same query. Let’s test this by returning the all of the top customers alongside the value of RAND:

SELECT RAND(), customerid, order_count
FROM
(
SELECT TOP 10 COUNT(orderid) AS order_count, customerid
FROM orders
WHERE orderdate BETWEEN ‘1998-02-01′ AND ‘1998-03-01′
GROUP BY customerid
ORDER BY 1 DESC
)x
ORDER BY RAND()

We see that RAND returns the same value for each record. Contrast this to NEWID’s return values:

SELECT NEWID(), customerid, order_count
FROM
(
SELECT TOP 10 COUNT(orderid) AS order_count, customerid
FROM orders
WHERE orderdate BETWEEN ‘1998-02-01′ AND ‘1998-03-01′
GROUP BY customerid
ORDER BY 1 DESC
)x
ORDER BY NEWID()

NEWID’s value is different for each record.

The NEWID function can be also used to generate primary key values, or values for any numeric field requiring a unique id.
How unique is a GUID? Terms like ‘Globally Unique,’ or ‘Univerally Unique’ are invoked when GUIDs are described, implying that a generated GUID (NEWID, UUID, etc.) will be unique on a worldwide scale – and this is usually the case, but it’s not guaranteed. However, the number of GUIDs that are available is so large that the chance of encountering the same GUID is remotely small – the number of GUIDs is 2 to the power of 128 – that’s 340,282,366,920,938,463,463,374,607,431,768,211,456!

http://en.wikipedia.org/wiki/Globally_Unique_Identifier

http://msdn.microsoft.com/en-us/library/aa276822(SQL.80).aspx

http://msdn.microsoft.com/en-us/library/ms177610.aspx

KILL A LINUX PROCESS BY PROCESS NAME

If you have a situation where you need to find and kill a specific process by name or script name, you can use this command:

kill `ps -ef | grep YourProcessName | awk ‘{print $2}’`

This is invaluable in cases where the same process command (python, java, etc.) is being used for multiple scripts. For example, there are a few different python processes running on my webserver, but I only want to kill one of them. So, instead of using:

kill $(pidof python)

which kills the first python process, or:

killall -9 python

which kills ALL running python processes, I can just specify the process by the actual python script that is running.

Reference:
http://www.howtogeek.com/howto/ubuntu/kill-a-process-by-process-name-from-ubuntu-command-line/

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: