Login or Sign Up to become a member!
LessThanDot Sit Logo

LessThanDot

Data Management

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot friendfeed Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Your profile

    Search

    XML Feeds

    Google Ads

    « T-SQL Tuesday #016 - TOP N Percent for the groupT-SQL Tuesday #016 - COUNT and DELETE duplicates »
    comments

    This month's T-SQL Tuesday is hosted by Jes Borland and it is all about grouping and aggregate functions, here is my attempt. I wrote most of this post on my way to the MVP summit in Seattle. This post is all about the stock market, charting data for Intraday chart and for end of day charts. This post contains a lot of code and I apologize for that, I just want you to be able to run the code, in order to do that we have to setup all the tables and data.

    Most people think of aggregate/grouping functions being Min, Max, Avg, Sum etc etc. To me Row_number and other ranking/windowing functions, datepart, converting to date, year() and month() can also be considered grouping


    When I say intraday, I mean the values that you would see for a stock if you would pull up a chart at 3 PM, in this case it would be a chart from 9:30 AM till 4PM if this was traded on for example the NYSE or NASDAQ.

    End of day values are close values, so for example the close price for Apple on March 4 was $360.

    We are going to chart intraday in either 1 day or 5 day charts. If it is one day we will chart in minutes, if it is 5 days we will chart in 5 minute increments.
    For end of day data we are going to chart daily if it is 1 month, 3 months or 6 months and weekly if it is greater than 6 months.

    So to start we are going to make up some tables with some fake data.



    Preparing the tables and data

    First we need a table of symbols (some people will call them tickers). This table is very simple, all it has is a SymbolID and a Symbol. In reality this table would look different because sometimes companies will change the symbol, when Sun Microsystems changed from SUNW to JAVA is one such example.
    Here is the table

    1. CREATE TABLE Symbols (
    2.        SymbolID INT NOT NULL PRIMARY KEY,
    3.        Symbol VARCHAR(20) NOT NULL)


    We will insert these 4 symbols

    1. INSERT Symbols VALUES(1,'ABC')
    2. INSERT Symbols VALUES(2,'DEF')
    3. INSERT Symbols VALUES(3,'MNO')
    4. INSERT Symbols VALUES(4,'XYZ')



    Next up is the creation of the table of numbers, this will facilitate the creation of the data later on.

    1. CREATE TABLE Numbers (number INT NOT NULL  PRIMARY KEY)
    2. GO


    This will populate the table with 90000 rows.

    1. INSERT Numbers
    2. SELECT TOP 90000 ROW_NUMBER() OVER(ORDER BY s1.id )
    3. FROM sysobjects s1,sysobjects s2,sysobjects s3



    Next up is a table that will hold some time information

    1. CREATE TABLE TempTickTime ( TickTime DATETIME NOT NULL)
    2. GO




    This will populate that table with 30 second intervals between 2011-02-28 09:30:30.000 and 2011-03-31 15:30:00.000 only when it is between 9:30 AM and 4 PM

    1. DECLARE @StartTime DATETIME = '20110228 09:30:00'
    2. INSERT TempTickTime
    3. SELECT DATEADD(s,number * 30,@StartTime)
    4. FROM Numbers
    5. WHERE  CONVERT(TIME, DATEADD(s,number * 30,@StartTime)) BETWEEN '09:30:00.0000000' AND '16:00:00.0000000'




    Now when you deal with global markets, some instruments trade Monday through Friday, some of them trade Sunday till Thursday and there are other variations.
    From the 4 tickers we have, two will trade Monday through Friday and two will trade Sunday till Thursday

    Create this table and populate it as follows

    1. CREATE TABLE TickData (
    2.        SymbolID INT NOT NULL,
    3.        TickTime DATETIME NOT NULL,
    4.        TickPrice DECIMAL (20,10) NOT NULL)
    5.  
    6.  
    7. SET DATEFIRST 1 --Default to Sunday as 1
    8.  
    9. INSERT TickData
    10. SELECT SymbolId,tickTime,1100 + 1 * RAND() * CONVERT(FLOAT,tickTime) * SymbolId/2.01
    11. FROM TempTickTime t
    12. CROSS JOIN Symbols s
    13. WHERE DATEPART(dw,tickTime) BETWEEN 2 AND 6 --Monday till Friday
    14. AND s.SymbolID IN (1,2)
    15. ORDER BY tickTime
    16.  
    17.  
    18. INSERT TickData
    19. SELECT SymbolId,tickTime,1100 + 1 * RAND() * CONVERT(FLOAT,tickTime) * SymbolId/2.01
    20. FROM TempTickTime t
    21. CROSS JOIN Symbols s
    22. WHERE DATEPART(dw,tickTime) BETWEEN 1 AND 5 --Sunday till Thursday
    23. AND s.SymbolID IN (3,4)
    24. ORDER BY tickTime


    What the query does is insert the SymbolID, the ticktime and then a pseudo random value that represents the price. The query also is grouping by day of week by using the DATEPART function.

    We are done with intraday data, next up is end of day

    First create this table

    1. CREATE TABLE EODData (
    2.        SymbolID INT NOT NULL,
    3.        SomeDate DATETIME NOT NULL,
    4.        TickPrice DECIMAL (20,10) NOT NULL,
    5.        IsEndOfWeek tinyint NOT NULL)


    In the query below we are grabbing the max time per day for a SymbolID and the associated price for that time. We are in essence grouping by SymbolId and Date, since we are ordering by TickTime descending and we are only grabbing where the ROW value is 1, we will get the latest value for a day.

    1. ;WITH CTE AS(SELECT *,
    2. ROW_NUMBER() OVER (PARTITION BY SymbolId,CONVERT(DATE,TickTime) ORDER BY TickTime DESC) AS ROW
    3. FROM TickData)
    4.  
    5. INSERT EODData
    6. SELECT SymbolId,CONVERT(DATE,TickTime),TickPrice,0 FROM CTE
    7. WHERE ROW = 1
    8. ORDER BY SymbolID,CONVERT(DATE,TickTime)


    Here is another way of doing the insert by grouping by SymbolId and converting the TickTime to a date and grabbing the max TickTime for that, with this derived table we join back to the TickData table and do our inserts.

    1. --INSERT EODData
    2. SELECT t.SymbolId,CONVERT(DATE,TickTime),TickPrice,0
    3. FROM TickData t
    4. JOIN(
    5.        SELECT SymbolId,MAX(TickTime) as MaxTime
    6.        FROM TickData
    7.        GROUP BY SymbolId,CONVERT(DATE,TickTime)) x on
    8. t.SymbolId = x.SymbolId
    9. and t.TickTime =  x.MaxTime



    Here is where we do some grouping, in order to grab the last possible value for a week, we need to group by SymbolID, year, month and the week number.

    1. SELECT SymbolId,MAX(SomeDate) as MaxDate
    2. FROM EODData
    3. GROUP BY SymbolId,YEAR(SomeDate), MONTH(SomeDate),DATEPART(wk,SomeDate)
    4. order by SymbolID, MaxDate


    That query produces the following output, as you can see it has the latest value for each week for each symbol.


    SymbolIdMaxDate
    12011-03-05 00:00:00.000
    12011-03-12 00:00:00.000
    12011-03-19 00:00:00.000
    12011-03-26 00:00:00.000
    12011-03-31 00:00:00.000
    22011-03-05 00:00:00.000
    22011-03-12 00:00:00.000
    22011-03-19 00:00:00.000
    22011-03-26 00:00:00.000
    22011-03-31 00:00:00.000
    32011-02-28 00:00:00.000
    32011-03-04 00:00:00.000
    32011-03-11 00:00:00.000
    32011-03-18 00:00:00.000
    32011-03-25 00:00:00.000
    32011-03-31 00:00:00.000
    42011-02-28 00:00:00.000
    42011-03-04 00:00:00.000
    42011-03-11 00:00:00.000
    42011-03-18 00:00:00.000
    42011-03-25 00:00:00.000
    42011-03-31 00:00:00.000



    Here is an example of how to join the grouping query to the table so that we can get all the details for the row, we will use this as the basis for our update later on

    1. SELECT e.* from EODData e
    2. join (
    3.        SELECT SymbolId,MAX(SomeDate) as MaxDate
    4.        FROM EODData
    5.        GROUP BY SymbolId,YEAR(SomeDate), MONTH(SomeDate),DATEPART(wk,SomeDate)) x
    6. on e.SomeDAte = x.MaxDate
    7. and e.SymbolId =x.SymbolId

    And here is how we update the IsEndOfWeek column with the value 1 for the rows that fall on the end of the week

    1. UPDATE  e
    2. SET e.IsEndOfWeek = 1
    3. FROM EODData e
    4. JOIN (SELECT SymbolId,MAX(SomeDate) AS MaxDate
    5.        FROM EODData
    6.        GROUP BY SymbolId,YEAR(SomeDate), MONTH(SomeDate),DATEPART(wk,SomeDate)
    7.        ) x
    8. ON e.SomeDate = x.MaxDate
    9. AND e.SymbolId =x.SymbolId



    Charting end of day values

    If we chart 1,3 or 6 months we will use daily values

    1. SELECT *
    2. FROM EODData
    3. WHERE SymbolId = 1
    4. ORDER BY SomeDate


    If we chart anything over 6 months we want to grab weekly values, the query for that is now really simple

    1. SELECT *
    2. FROM EODData
    3. WHERE SymbolId = 1
    4. AND IsEndOfWeek = 1
    5. ORDER BY SomeDate




    Charting intraday data

    If we want data for a 1 day chart then we are going to grab in 1 minute intervals, if we are going to chart 5 days we will grab in 5 minute chunks.


    There is going to be a lot going on in the following code snippet so I will try to explain it in the comments

    1. DECLARE @StartDate DATETIME = '2011-03-01 09:30:00.000'
    2. DECLARE @TimeSpan INT = 5
    3. -- 1  will return 2011-03-31 00:00:00.000
    4. -- 5  will return 2011-03-25 00:00:00.000
    5.  
    6. --Grab latest 1 or 5 days, we have to account for weekends and markets being closed, this is why we do @TimeSpan * 5
    7. -- and then we do top @TimeSpan..which can be 1 or 5
    8. -- we convert to date so that we get distinct dates
    9. SELECT  TOP (@TimeSpan) @StartDate = Today
    10.     FROM
    11.         (SELECT  DISTINCT TOP (@TimeSpan * 5) CONVERT(DATE,TickTime) AS Today
    12.         FROM dbo.TickData
    13.         WHERE SymbolID =  1
    14.         AND TickTime > @StartDate
    15.         ORDER BY CONVERT(DATE,TickTime) DESC) x
    16.     ORDER BY Today DESC
    17.    
    18.  
    19. --We have t and q for column names because this is being generated as JSON and we want the data to be as small as possible
    20. SELECT
    21.     t1.TickTime AS t,
    22.     t1.TickPrice AS q
    23.     FROM dbo.TickData t1
    24.     JOIN
    25.         (SELECT SymbolID, MAX(TickTime) AS Ticktime,DATEPART(mi,ticktime) AS TickMinute
    26.         FROM dbo.TickData
    27.         WHERE SymbolID =  1
    28.         AND ticktime >= @StartDate
    29.         GROUP BY SymbolID,DATEPART(hh,TickTime),DATEPART(mi,ticktime),DATEPART(dd,ticktime)) x
    30.     ON x.SymbolID = t1.SymbolID
    31.     AND x.Ticktime = t1.Ticktime
    32.     WHERE t1.SymbolID = 1
    33.     AND TickMinute %  @TimeSpan = 0  --0nly grab the minutes what the value of @TimeSpan holds
    34.     ORDER BY t1.TickTime

    Take a look at this line AND TickMinute % @TimeSpan = 0
    So basically we are aggregating in 1 or 5 minutes (really whatever @TimeSpan is, if it is 3 then it will be in 3 minute chunks). We are using the % (Modulo) operator to accomplish this.


    --If you run the code above with @TimeSpan = 5, you get data in 5 minute intervals
    DECLARE @TimeSpan INT = 5

    tq
    2011-03-31 09:30:30.000 17132.3582356171
    2011-03-31 09:35:30.000 17132.3596056885
    2011-03-31 09:40:30.000 17132.3609757598
    2011-03-31 09:45:30.000 17132.3623458311

    --If you run the code above with @TimeSpan = 1, you get data in 1 minute intervals
    DECLARE @TimeSpan INT = 1

    tq
    2011-03-31 09:30:30.000 17132.3582356171
    2011-03-31 09:31:30.000 17132.3585096314
    2011-03-31 09:32:30.000 17132.3587836457
    2011-03-31 09:33:30.000 17132.3590576599
    2011-03-31 09:34:30.000 17132.3593316742
    2011-03-31 09:35:30.000 17132.3596056885
    2011-03-31 09:36:30.000 17132.3598797027

    That is it for this post, there is a lot of code but hopefully you can get an idea of what it all does, if you have any questions leave me a comment.



    *** Remember, if you have a SQL related question, try our Microsoft SQL Server Programming forum or our Microsoft SQL Server Admin forum

    About the Author

    User bio imageDenis has been working with SQL Server since version 6.5. Although he worked as an ASP/JSP/ColdFusion developer before the dot com bust, he has been working exclusively as a database developer/architect since 2002. In addition to English, Denis is also fluent in Croatian and Dutch, but he can curse in many other languages and dialects (just ask the SQL optimizer) He lives in Princeton, NJ with his wife and three kids.
    Social SitingsTwitterFacebookLinkedInHomePageFlickrLTD RSS Feed
    791 views
    Instapaper

    4 comments

    Comment from: Jes Schultz Borland (grrlgeek) [Member]
    Jes Schultz Borland (grrlgeek) Denis, thanks for the great contribution!
    03/09/11 @ 19:05
    Comment from: ian [Visitor]
    ian the very bottom query is amazing and has helped me with somthing that has been causing me a headache for sometime. i have one small problem however.

    the @startdate = '2011-03-01 09:30:00.000', yet the results set increment by 5mins and 30 seconds i.e '2011-03-01 09:35:30.000' ,'2011-03-01 09:40:30.000'. i have a similar problem that seems to revolve around 5 mins and 59 seconds, and i cant for the life of me work out what is wrong.

    im tring to get

    '2011-03-01 09:30:00.000'
    '2011-03-01 09:35:00.000'
    '2011-03-01 09:40:00.000' etc.

    any thoughts??

    thanks again for such an amazing piece of work.
    09/21/11 @ 09:05
    Comment from: SQLDenis [Member] Email
    SQLDenis That is because I populate like this

    DECLARE @StartTime DATETIME = '20110228 09:30:00'
    INSERT TempTickTime
    SELECT DATEADD(s,number * 30,@StartTime)
    FROM Numbers
    WHERE CONVERT(TIME, DATEADD(s,number * 30,@StartTime)) BETWEEN '09:30:00.0000000' AND '16:00:00.0000000'

    so I start at 09:30:00, add 30 seconds

    However at the bottom query I am doing

    SELECT SymbolID, MAX(TickTime) AS Ticktime,DATEPART(mi,ticktime) AS TickMinute

    if you do MIN instead of MAX you should get the time closest to the minute

    09/21/11 @ 09:37
    Comment from: ian [Visitor]
    ian amazing, thank you so much for your help and this amazing blog. cant thank you enough :)

    br

    ian
    09/21/11 @ 10:20

    Leave a comment


    Your email address will not be revealed on this site.

    Your URL will be displayed.
    (Line breaks become <br />)
    (Name, email & website)
    (Allow users to contact you through a message form (your email will not be revealed.)