I came across an implmentation of some SQL Server code (T-SQL) that attempted to look at a sequence (datetime) in this case of events and was trying to compare data found in the current record with data found in the previous record in the sequence. The solution used cursors to iterate through the record sets and used variables to store the previous records information so that the appropriate comparisons could be made. This got me thinking that there must be a set-based solution to this.
First, lets create a problem looking for a solution. We have a number of devices that measure the temperature of a process every hour. We want to produce a query for a period of time that shows the time only when the temperature changes for that device. The example table and data below give a clearer picture.
CREATE TABLE device_recording (
device_id INT,
recording_time SMALLDATETIME,
temperature INT
)
GO
INSERT INTO device_recording VALUES (1, '2011-01-01 09:00', 10)
INSERT INTO device_recording VALUES (1, '2011-01-01 10:00', 10)
INSERT INTO device_recording VALUES (1, '2011-01-01 11:00', 11)
INSERT INTO device_recording VALUES (1, '2011-01-01 12:00', 10)
INSERT INTO device_recording VALUES (1, '2011-01-01 13:00', 10)
INSERT INTO device_recording VALUES (2, '2011-01-01 09:00', 14)
INSERT INTO device_recording VALUES (2, '2011-01-01 10:00', 15)
INSERT INTO device_recording VALUES (2, '2011-01-01 11:00', 15)
INSERT INTO device_recording VALUES (2, '2011-01-01 12:00', 15)
INSERT INTO device_recording VALUES (2, '2011-01-01 13:00', 15)
INSERT INTO device_recording VALUES (3, '2011-01-01 09:00', 8)
INSERT INTO device_recording VALUES (3, '2011-01-01 10:00', 8)
INSERT INTO device_recording VALUES (3, '2011-01-01 11:00', 8)
INSERT INTO device_recording VALUES (3, '2011-01-01 12:00', 8)
INSERT INTO device_recording VALUES (3, '2011-01-01 13:00', 9)
From this we would want a query to produce just the following data.
1, 2011-01-01 09:00, 10
1, 2011-01-01 11:00, 11
1, 2011-01-01 12:00, 10
2, 2011-01-01 09:00, 14
2, 2011-01-01 10:00, 15
3, 2011-01-01 09:00, 8
3, 2011-01-01 13:00, 9
i.e. the time at which the temperature changed for each device.
The way I tackled this was to use a CTE (Common Table Expression), which numbers each row for device ordered by the reading_time. In the example above you want a row number 1,2,...5 for each device, with 1 being against the earliest time, and 5 against the latest time. I make use of the ROW_NUMBER() function and the PARTITION keyword to achieve this.
WITH device_recording_row_num AS
(
SELECT device_id, recording_time, temperature,
ROW_NUMBER() OVER (PARTITION BY device_id ORDER BY recording_time) AS row_num
FROM device_recording
)
Now I have a an ordered set of row numbers per device, we can join the table with itself on both the device_id and also the row_num, but with one of the join tables (previous) offset by row_num + 1. At this stage it's probably easier to see in SQL.
WITH device_recording_row_num AS
(
SELECT device_id, recording_time, temperature,
ROW_NUMBER() OVER (PARTITION BY device_id ORDER BY recording_time) AS row_num
FROM device_recording
)
SELECT
[current].device_id, [current].recording_time, [current].temperature
FROM
device_recording_row_num AS [current]
LEFT OUTER JOIN
device_recording_row_num AS previous
ON [current].device_id = previous.device_id
AND [current].row_num = previous.row_num + 1
WHERE
[current].temperature <> previous.temperature
OR [previous].temperature IS NULL
ORDER BY
[current].device_id, [current].recording_time
Note: I've tested this on SQL Server 2008 but I believe it should work on 2005 too.
This technique could be applied to do other time series related queries.