Mombu the Microsoft Forum sponsored links

Go Back   Mombu the Microsoft Forum > Microsoft > SQL SERVER (TECHNET) > calculate percentile using t-sql
User Name
Password
REGISTER NOW! Mark Forums Read

sponsored links


Reply
 
1 27th July 02:17
loane sharp
External User
 
Posts: 1
Default calculate percentile using t-sql


Hi there
I am using the following statement to calculate percentile ranks of a set of
scores in quite a large table (400k rows). The tempdb file grows
dramatically (to 40GB+) and the query never reaches a conclusion. Surely I'm
doing something wrong?! (This is actually part of a broader calculation ...
This query calculates the number of scores below a particular score, which I
will then divide by the total number of scores in the table to get the
percentile rank.)
Any comments would be gratefully appreciated.
Best regards
Loane

update table
set numbelow =
t.numbelow
from
(
select (select count(*) from table b where b.normscore > 0 and
b.normscore<=a.normscore) 'numbelow'
from table a
) as t
  Reply With Quote


  sponsored links


2 27th July 02:24
lawrence garvin mvp
External User
 
Posts: 1
Default calculate percentile using t-sql


select count(*) from table b
where b.normscore > 0 and b.normscore<=a.normscore ) 'numbelow'

What version of SQL Server are you using?

With SQL 2000 this is a complex type scenario. We have a nightmare of a
procedure we call to compute 20th and 80th percentiles on a collection of
numbers. I'm still trying to understand the logic of the procedure so that I
can evaluate whether it can be simplified, rewritten, or extracted out as a
function call.


With SQL 2005, using expressions in the TOP clause, this is trival, and
given our pending migration to SQL 2005, I'm not likely to pursue the issues
affecting our SQL 2000 implementation.

Normally, the pth percentile is obtained by first calculating the rank l =
p(n+1)/100, rounded to the nearest integer and then taking the value that
corresponds to that rank. In case of lower and upper quartiles, the ranks
are 0.25*(10+1) = 2.75 3 and 0.75*(10+1) = 8.25 8 which corresponds to
125 and 170 resp.

If these ten values were stored in a SQL Server 2005 table, we can now use a
simple TOP clause with expression to select the desired percentiles.

First calculate the rank for the 25th percentile: .25 * (count(*) + 1)

and then round it to the nearest integer: round(.25 * (count(*) + 1)

and store it in @rank25

Then calculate the rank for the 75th percentile: .75 * (count(*) + 1)

and then round it to the nearest integer: round(.75 * (count(*) + 1)

and store it in @rank75

Then, to take the value that correspondents to that rank, we use the TOP
clause with the computed rank of the 25th and 75th percentile in a nested
query:

DECLARE @blood TABLE(

systolic int

)

INSERT INTO @blood

SELECT 120

UNION ALL SELECT 125

UNION ALL SELECT 125

UNION ALL SELECT 145

UNION ALL SELECT 145

UNION ALL SELECT 150

UNION ALL SELECT 150

UNION ALL SELECT 160

UNION ALL SELECT 170

UNION ALL SELECT 175

DECLARE @rowcount int

DECLARE @rank25 tinyint

DECLARE @rank75 tinyint

SET @rowcount = (SELECT COUNT(*) from #blood)

SET @rank25 = ROUND(.25*(@rowcount+1),0)

SET @rank75 = ROUND(.75*(@rowcount+1),0)

SELECT TOP 1 systolic FROM (

SELECT TOP (@rank25) systolic FROM @blood ORDER BY systolic

) b

ORDER BY b.systolic DESC

SELECT TOP 1 systolic FROM (

SELECT TOP (@rank75) systolic FROM @blood ORDER BY systolic

) b

ORDER BY b.systolic DESC

The result from the first query is 125; the result from the second query is
170.


--
Lawrence Garvin, M.S., MVP-Software Distribution
Everything you need for WSUS is at
http://technet2.microsoft.com/windowsserver/en/technologies/featured/wsus/default.mspx
And, eveything else is at
http://wsusinfo.onsitechsolutions.com
....
  Reply With Quote
3 8th March 01:11
loane sharp
External User
 
Posts: 1
Default calculate percentile using t-sql


Hi there
I am using the following statement to calculate percentile ranks of a set of
scores in quite a large table (400k rows). The tempdb file grows
dramatically (to 40GB+) and the query never reaches a conclusion. Surely I'm
doing something wrong?! (This is actually part of a broader calculation ...
This query calculates the number of scores below a particular score, which I
will then divide by the total number of scores in the table to get the
percentile rank.)
Any comments would be gratefully appreciated.
Best regards
Loane

update table
set numbelow =
t.numbelow
from
(
select (select count(*) from table b where b.normscore > 0 and
b.normscore<=a.normscore) 'numbelow'
from table a
) as t
  Reply With Quote
4 9th March 13:21
lawrence garvin \(mvp\)
External User
 
Posts: 1
Default calculate percentile using t-sql


select count(*) from table b
where b.normscore > 0 and b.normscore<=a.normscore ) 'numbelow'

What version of SQL Server are you using?

With SQL 2000 this is a complex type scenario. We have a nightmare of a
procedure we call to compute 20th and 80th percentiles on a collection of
numbers. I'm still trying to understand the logic of the procedure so that I
can evaluate whether it can be simplified, rewritten, or extracted out as a
function call.


With SQL 2005, using expressions in the TOP clause, this is trival, and
given our pending migration to SQL 2005, I'm not likely to pursue the issues
affecting our SQL 2000 implementation.

Normally, the pth percentile is obtained by first calculating the rank l =
p(n+1)/100, rounded to the nearest integer and then taking the value that
corresponds to that rank. In case of lower and upper quartiles, the ranks
are 0.25*(10+1) = 2.75 3 and 0.75*(10+1) = 8.25 8 which corresponds to
125 and 170 resp.

If these ten values were stored in a SQL Server 2005 table, we can now use a
simple TOP clause with expression to select the desired percentiles.

First calculate the rank for the 25th percentile: .25 * (count(*) + 1)

and then round it to the nearest integer: round(.25 * (count(*) + 1)

and store it in @rank25

Then calculate the rank for the 75th percentile: .75 * (count(*) + 1)

and then round it to the nearest integer: round(.75 * (count(*) + 1)

and store it in @rank75

Then, to take the value that correspondents to that rank, we use the TOP
clause with the computed rank of the 25th and 75th percentile in a nested
query:

DECLARE @blood TABLE(

systolic int

)

INSERT INTO @blood

SELECT 120

UNION ALL SELECT 125

UNION ALL SELECT 125

UNION ALL SELECT 145

UNION ALL SELECT 145

UNION ALL SELECT 150

UNION ALL SELECT 150

UNION ALL SELECT 160

UNION ALL SELECT 170

UNION ALL SELECT 175

DECLARE @rowcount int

DECLARE @rank25 tinyint

DECLARE @rank75 tinyint

SET @rowcount = (SELECT COUNT(*) from #blood)

SET @rank25 = ROUND(.25*(@rowcount+1),0)

SET @rank75 = ROUND(.75*(@rowcount+1),0)

SELECT TOP 1 systolic FROM (

SELECT TOP (@rank25) systolic FROM @blood ORDER BY systolic

) b

ORDER BY b.systolic DESC

SELECT TOP 1 systolic FROM (

SELECT TOP (@rank75) systolic FROM @blood ORDER BY systolic

) b

ORDER BY b.systolic DESC

The result from the first query is 125; the result from the second query is
170.


--
Lawrence Garvin, M.S., MVP-Software Distribution
Everything you need for WSUS is at
http://technet2.microsoft.com/windowsserver/en/technologies/featured/wsus/default.mspx
And, eveything else is at
http://wsusinfo.onsitechsolutions.com
....
  Reply With Quote


  sponsored links


Reply


Thread Tools
Display Modes




Copyright 2006 SmartyDevil.com - Dies Mies Jeschet Boenedoesef Douvema Enitemaus -
666