Loane sharp2006-07-27 02:17:14

Hi there

I am using the following statement to calculate percentile ranks of a set of

scores in quite a large table (400k rows). The tempdb file grows

dramatically (to 40GB+) and the query never reaches a conclusion. Surely I’m

doing something wrong?! (This is actually part of a broader calculation …

This query calculates the number of scores below a particular score, which I

will then divide by the total number of scores in the table to get the

percentile rank.)

Any comments would be gratefully appreciated.

Best regards

Loane

update table

set numbelow =

t.numbelow

from

(

select (select count(*) from table b where b.normscore > 0 and

b.normscore<=a.normscore) 'numbelow'
from table a
) as t

Lawrence garvi2006-07-27 02:24:54

[the_ad_placement id="add-en-post-1"]

select count(*) from table b where b.normscore > 0 and b.normscore<=a.normscore ) 'numbelow' What version of SQL Server are you using? With SQL 2000 this is a complex type scenario. We have a nightmare of a procedure we call to compute 20th and 80th percentiles on a collection of numbers. I'm still trying to understand the logic of the procedure so that I can evaluate whether it can be simplified, rewritten, or extracted out as a function call. With SQL 2005, using expressions in the TOP clause, this is trival, and given our pending migration to SQL 2005, I'm not likely to pursue the issues affecting our SQL 2000 implementation. Normally, the pth percentile is obtained by first calculating the rank l = p(n+1)/100, rounded to the nearest integer and then taking the value that corresponds to that rank. In case of lower and upper quartiles, the ranks are 0.25*(10+1) = 2.75 3 and 0.75*(10+1) = 8.25 8 which corresponds to 125 and 170 resp. If these ten values were stored in a SQL Server 2005 table, we can now use a simple TOP clause with expression to select the desired percentiles. First calculate the rank for the 25th percentile: .25 * (count(*) + 1) and then round it to the nearest integer: round(.25 * (count(*) + 1) and store it in @rank25 Then calculate the rank for the 75th percentile: .75 * (count(*) + 1) and then round it to the nearest integer: round(.75 * (count(*) + 1) and store it in @rank75 Then, to take the value that correspondents to that rank, we use the TOP clause with the computed rank of the 25th and 75th percentile in a nested query: DECLARE @blood TABLE( systolic int ) INSERT INTO @blood SELECT 120 UNION ALL SELECT 125 UNION ALL SELECT 125 UNION ALL SELECT 145 UNION ALL SELECT 145 UNION ALL SELECT 150 UNION ALL SELECT 150 UNION ALL SELECT 160 UNION ALL SELECT 170 UNION ALL SELECT 175 DECLARE @rowcount int DECLARE @rank25 tinyint DECLARE @rank75 tinyint SET @rowcount = (SELECT COUNT(*) from #blood) SET @rank25 = ROUND(.25*(@rowcount+1),0) SET @rank75 = ROUND(.75*(@rowcount+1),0) SELECT TOP 1 systolic FROM ( SELECT TOP (@rank25) systolic FROM @blood ORDER BY systolic ) b ORDER BY b.systolic DESC SELECT TOP 1 systolic FROM ( SELECT TOP (@rank75) systolic FROM @blood ORDER BY systolic ) b ORDER BY b.systolic DESC The result from the first query is 125; the result from the second query is 170. -- Lawrence Garvin, M.S., MVP-Software Distribution Everything you need for WSUS is at http://technet2.microsoft.com/windowsserver/en/technologies/featured/wsus/default.mspx

And, eveything else is at

http://wsusinfo.onsitechsolutions.com

….

select count(*) from table b where b.normscore > 0 and b.normscore<=a.normscore ) 'numbelow' What version of SQL Server are you using? With SQL 2000 this is a complex type scenario. We have a nightmare of a procedure we call to compute 20th and 80th percentiles on a collection of numbers. I'm still trying to understand the logic of the procedure so that I can evaluate whether it can be simplified, rewritten, or extracted out as a function call. With SQL 2005, using expressions in the TOP clause, this is trival, and given our pending migration to SQL 2005, I'm not likely to pursue the issues affecting our SQL 2000 implementation. Normally, the pth percentile is obtained by first calculating the rank l = p(n+1)/100, rounded to the nearest integer and then taking the value that corresponds to that rank. In case of lower and upper quartiles, the ranks are 0.25*(10+1) = 2.75 3 and 0.75*(10+1) = 8.25 8 which corresponds to 125 and 170 resp. If these ten values were stored in a SQL Server 2005 table, we can now use a simple TOP clause with expression to select the desired percentiles. First calculate the rank for the 25th percentile: .25 * (count(*) + 1) and then round it to the nearest integer: round(.25 * (count(*) + 1) and store it in @rank25 Then calculate the rank for the 75th percentile: .75 * (count(*) + 1) and then round it to the nearest integer: round(.75 * (count(*) + 1) and store it in @rank75 Then, to take the value that correspondents to that rank, we use the TOP clause with the computed rank of the 25th and 75th percentile in a nested query: DECLARE @blood TABLE( systolic int ) INSERT INTO @blood SELECT 120 UNION ALL SELECT 125 UNION ALL SELECT 125 UNION ALL SELECT 145 UNION ALL SELECT 145 UNION ALL SELECT 150 UNION ALL SELECT 150 UNION ALL SELECT 160 UNION ALL SELECT 170 UNION ALL SELECT 175 DECLARE @rowcount int DECLARE @rank25 tinyint DECLARE @rank75 tinyint SET @rowcount = (SELECT COUNT(*) from #blood) SET @rank25 = ROUND(.25*(@rowcount+1),0) SET @rank75 = ROUND(.75*(@rowcount+1),0) SELECT TOP 1 systolic FROM ( SELECT TOP (@rank25) systolic FROM @blood ORDER BY systolic ) b ORDER BY b.systolic DESC SELECT TOP 1 systolic FROM ( SELECT TOP (@rank75) systolic FROM @blood ORDER BY systolic ) b ORDER BY b.systolic DESC The result from the first query is 125; the result from the second query is 170. -- Lawrence Garvin, M.S., MVP-Software Distribution Everything you need for WSUS is at http://technet2.microsoft.com/windowsserver/en/technologies/featured/wsus/default.mspx

And, eveything else is at

http://wsusinfo.onsitechsolutions.com

….

Loane sharp2008-03-08 01:11:20

Hi there

I am using the following statement to calculate percentile ranks of a set of

scores in quite a large table (400k rows). The tempdb file grows

dramatically (to 40GB+) and the query never reaches a conclusion. Surely I’m

doing something wrong?! (This is actually part of a broader calculation …

This query calculates the number of scores below a particular score, which I

will then divide by the total number of scores in the table to get the

percentile rank.)

Any comments would be gratefully appreciated.

Best regards

Loane

update table

set numbelow =

t.numbelow

from

(

select (select count(*) from table b where b.normscore > 0 and

b.normscore<=a.normscore) 'numbelow'
from table a
) as t

Lawrence garvi2008-03-09 13:21:37

[the_ad_placement id="add-en-post-3"]

select count(*) from table b where b.normscore > 0 and b.normscore<=a.normscore ) 'numbelow' What version of SQL Server are you using? With SQL 2000 this is a complex type scenario. We have a nightmare of a procedure we call to compute 20th and 80th percentiles on a collection of numbers. I'm still trying to understand the logic of the procedure so that I can evaluate whether it can be simplified, rewritten, or extracted out as a function call. With SQL 2005, using expressions in the TOP clause, this is trival, and given our pending migration to SQL 2005, I'm not likely to pursue the issues affecting our SQL 2000 implementation. Normally, the pth percentile is obtained by first calculating the rank l = p(n+1)/100, rounded to the nearest integer and then taking the value that corresponds to that rank. In case of lower and upper quartiles, the ranks are 0.25*(10+1) = 2.75 3 and 0.75*(10+1) = 8.25 8 which corresponds to 125 and 170 resp. If these ten values were stored in a SQL Server 2005 table, we can now use a simple TOP clause with expression to select the desired percentiles. First calculate the rank for the 25th percentile: .25 * (count(*) + 1) and then round it to the nearest integer: round(.25 * (count(*) + 1) and store it in @rank25 Then calculate the rank for the 75th percentile: .75 * (count(*) + 1) and then round it to the nearest integer: round(.75 * (count(*) + 1) and store it in @rank75 Then, to take the value that correspondents to that rank, we use the TOP clause with the computed rank of the 25th and 75th percentile in a nested query: DECLARE @blood TABLE( systolic int ) INSERT INTO @blood SELECT 120 UNION ALL SELECT 125 UNION ALL SELECT 125 UNION ALL SELECT 145 UNION ALL SELECT 145 UNION ALL SELECT 150 UNION ALL SELECT 150 UNION ALL SELECT 160 UNION ALL SELECT 170 UNION ALL SELECT 175 DECLARE @rowcount int DECLARE @rank25 tinyint DECLARE @rank75 tinyint SET @rowcount = (SELECT COUNT(*) from #blood) SET @rank25 = ROUND(.25*(@rowcount+1),0) SET @rank75 = ROUND(.75*(@rowcount+1),0) SELECT TOP 1 systolic FROM ( SELECT TOP (@rank25) systolic FROM @blood ORDER BY systolic ) b ORDER BY b.systolic DESC SELECT TOP 1 systolic FROM ( SELECT TOP (@rank75) systolic FROM @blood ORDER BY systolic ) b ORDER BY b.systolic DESC The result from the first query is 125; the result from the second query is 170. -- Lawrence Garvin, M.S., MVP-Software Distribution Everything you need for WSUS is at http://technet2.microsoft.com/windowsserver/en/technologies/featured/wsus/default.mspx

And, eveything else is at

http://wsusinfo.onsitechsolutions.com

….

select count(*) from table b where b.normscore > 0 and b.normscore<=a.normscore ) 'numbelow' What version of SQL Server are you using? With SQL 2000 this is a complex type scenario. We have a nightmare of a procedure we call to compute 20th and 80th percentiles on a collection of numbers. I'm still trying to understand the logic of the procedure so that I can evaluate whether it can be simplified, rewritten, or extracted out as a function call. With SQL 2005, using expressions in the TOP clause, this is trival, and given our pending migration to SQL 2005, I'm not likely to pursue the issues affecting our SQL 2000 implementation. Normally, the pth percentile is obtained by first calculating the rank l = p(n+1)/100, rounded to the nearest integer and then taking the value that corresponds to that rank. In case of lower and upper quartiles, the ranks are 0.25*(10+1) = 2.75 3 and 0.75*(10+1) = 8.25 8 which corresponds to 125 and 170 resp. If these ten values were stored in a SQL Server 2005 table, we can now use a simple TOP clause with expression to select the desired percentiles. First calculate the rank for the 25th percentile: .25 * (count(*) + 1) and then round it to the nearest integer: round(.25 * (count(*) + 1) and store it in @rank25 Then calculate the rank for the 75th percentile: .75 * (count(*) + 1) and then round it to the nearest integer: round(.75 * (count(*) + 1) and store it in @rank75 Then, to take the value that correspondents to that rank, we use the TOP clause with the computed rank of the 25th and 75th percentile in a nested query: DECLARE @blood TABLE( systolic int ) INSERT INTO @blood SELECT 120 UNION ALL SELECT 125 UNION ALL SELECT 125 UNION ALL SELECT 145 UNION ALL SELECT 145 UNION ALL SELECT 150 UNION ALL SELECT 150 UNION ALL SELECT 160 UNION ALL SELECT 170 UNION ALL SELECT 175 DECLARE @rowcount int DECLARE @rank25 tinyint DECLARE @rank75 tinyint SET @rowcount = (SELECT COUNT(*) from #blood) SET @rank25 = ROUND(.25*(@rowcount+1),0) SET @rank75 = ROUND(.75*(@rowcount+1),0) SELECT TOP 1 systolic FROM ( SELECT TOP (@rank25) systolic FROM @blood ORDER BY systolic ) b ORDER BY b.systolic DESC SELECT TOP 1 systolic FROM ( SELECT TOP (@rank75) systolic FROM @blood ORDER BY systolic ) b ORDER BY b.systolic DESC The result from the first query is 125; the result from the second query is 170. -- Lawrence Garvin, M.S., MVP-Software Distribution Everything you need for WSUS is at http://technet2.microsoft.com/windowsserver/en/technologies/featured/wsus/default.mspx

And, eveything else is at

http://wsusinfo.onsitechsolutions.com

….

## Leave a Reply

You must be logged in to post a comment.