A question was asked on StackOverflow about how NTILE() works: Want to learn more on NTILE() I answered that question and then started thinking about NTILE(), I realized that I have never used this function in production code. Everytime I used it was for demoware purposes. I did a quick check against a bunch of databases

T-SQL | |

1 2 | SELECT * FROM sys.sql_modules WHERE OBJECT_DEFINITION(object_id) LIKE '%ntile%' |

SELECT * FROM sys.sql_modules WHERE OBJECT_DEFINITION(object_id) LIKE '%ntile%'

——-

(0 row(s) affected)

Doesn’t exist. I also have a hard time figuring out where I would use it, maybe if I want to put an equal number or rows in a bunch of buckets, maybe then this will be used to call another process asynchronously?

For you that don’t know what NTILE() does it basically distributes the rows in a bunch of groups. If you specify 2 groups and you have 10 rows then each group will have 5 rows.

Let’s look at some code to understand how it works, first create this simple table

T-SQL | |

1 2 3 4 5 6 7 | CREATE TABLE #temp(StudentID CHAR(2), Score INT) INSERT #temp VALUES('S1',75 ) INSERT #temp VALUES('S2',83) INSERT #temp VALUES('S3',91) INSERT #temp VALUES('S4',83) INSERT #temp VALUES('S5',93 ) |

CREATE TABLE #temp(StudentID CHAR(2), Score INT) INSERT #temp VALUES('S1',75 ) INSERT #temp VALUES('S2',83) INSERT #temp VALUES('S3',91) INSERT #temp VALUES('S4',83) INSERT #temp VALUES('S5',93 )

Now if you use NTILE() to create 2 buckets, you will see 1 and 2 as NtileValue

T-SQL | |

1 2 3 | SELECT NTILE(2) OVER(ORDER BY Score) AS NtileValue,* FROM #temp ORDER BY 1 |

SELECT NTILE(2) OVER(ORDER BY Score) AS NtileValue,* FROM #temp ORDER BY 1

Here are the results:

NtileValue StudentID Score 1 S1 75 1 S2 83 1 S4 83 2 S3 91 2 S5 93

Since the number of rows are not even, the first bucket will have three rows and the second bucket will have two rows

Let’s add one more row to this table

T-SQL | |

1 | INSERT #temp VALUES('S6',92 ) |

INSERT #temp VALUES('S6',92 )

Now let’s run the same query as before

T-SQL | |

1 2 3 | SELECT NTILE(2) OVER(ORDER BY Score) AS NtileValue,* FROM #temp ORDER BY 1 |

SELECT NTILE(2) OVER(ORDER BY Score) AS NtileValue,* FROM #temp ORDER BY 1

Here are the results:

NtileValue StudentID Score 1 S1 75 1 S2 83 1 S4 83 2 S3 91 2 S6 92 2 S5 93

As you can see both buckets now have three rows

What if we use NTILE(3)?

T-SQL | |

1 2 3 | SELECT NTILE(3) OVER(ORDER BY Score) AS NtileValue,* FROM #temp ORDER BY 1 |

SELECT NTILE(3) OVER(ORDER BY Score) AS NtileValue,* FROM #temp ORDER BY 1

Here are the results:

NtileValue StudentID Score 1 S1 75 1 S2 83 2 S4 83 2 S3 91 3 S6 92 3 S5 93

As expected, you now get three buckets back.

So do you use NTILE() currently or do you have plans to use it in the future?

## 4 Comments

I have never used it before in production code. The only scenario I can think of is to divide a dataset in equal chunks so I can do parallel processing in SSIS. However, they have designed a component for that now (balanced data distributor) ðŸ™‚

In such a case, I would maybe rather use a modulo function, as rows switch between groups as they pass by, instead of first all the rows from one group and then another.

I’ve used it several times.

It is good for statistical calculations – e.g. finding the median of a dataset.

Ntile(2) is much faster than select top 50 percent. And in the WF it is much easier to define the sorting and partioning (if needed).

So yes – it is very usefull from a statistical point of view ;0)

BR

Brian

I output percentiles for a number of things in our application (ie, top 10% accounts) and use the NTILE function to generate the necessary values.

One gotcha with NTILE is that NTILE(100) on a table with 150 rows puts 100 of them in buckets 1-50 and 50 in buckets 50 – 100 which significantly restricts its utility unless assured that the number of rows to buckets will always be sufficiently great for this skew to be insignificant.

IMO it would be more useful if implemented as suggested here https://connect.microsoft.com/SQLServer/feedback/details/761015/tsql-ntile-over-less-rows-than-integer-expression-is-not-weighted