Friday, October 22, 2010

Implementing SKIP and TAKE in SQLServer

I wanted to see how far I could push the envelop to implement custom paging using just SQL Server.
Suppose you have a stored procedure that does wonderful things, but returns a lot of data. So you want to page that data but want to make the paging happen as far back in the data retrieval cycle as possible.   So if your data is coming from a SQL Server database, you can apply the pattern described below:

In order to make the paging minimally invasive to already working stored procedures, create a new stored procedure which will return the paged record set and the total number of records.  Here it us:

create proc usp_SkipTake(
 @columnDefs varchar(1000), --the column names and dataTypes in the exact same order as the stored procedure which will be paged. 
  --example: firstName varchar(50), lastname varchar(50)
 @sprocWithInputParams varchar(1000),  --this is the stored procedure which will be paged. 
 @skip int, --the number of records to skip 
 @take int, --the number of records to return
 @orderBy varchar(255)=null   --allows you to specify the sort order of the data. This is the only new feature your old stored procedure will need to implement to fit this design pattern.
 )
as Begin
 declare @p varchar(8000)
 declare @lastRecord varchar(9) 
 set @lastRecord = isnull(@skip,0) + case when isnull(@take,0)=0 then 1000000 else @take end
 set @p = 'create table #tmptbl  (' + @columnDefs  + ', _rowNo int identity)' +
  'insert into #tmptbl exec ' + @sprocWithInputParams + '
  select * from #tmptbl 
   where (_rowNo > ' + convert(varchar(9), @skip) + ' AND _rowNo <= ' + @lastRecord  + ')
  select recordCount=count (_rowNo) from #tmptbl 
  drop table #tmptbl'
 exec(@p)
End

Note that if you supply the @take value of 0, it will default to getting the first million records.  This was an arbitrary number.  I assumed that would be the same as retrieving all records for the vast majority of use cases.

A note about performance: I found that this approach will likely double the amount of time it takes to return data from the database. Coming up next, we will examine how caching can help improve performance.

Only if the data changes will the underlying stored procedure be run. Otherwise, the data will persist in a real table, but different segments will be fetched based on the skip and take parameters.

Two new parameters are introduced to the stored procedure:

  • @cacheTableName: the table name used to cache the data. I made it so that it must contain tblCache in the name, for the sake of clarity.
  • @dependencyTableName: the table which will be monitored for changes. It must contain a timestamp/record-version column named ts.

--#region create tblCacheMeta if it doesn't exist.  This table is used to track whether the cached data is still valid.
if object_id( 'tblCacheMeta' ) is null
Begin
 CREATE TABLE dbo.[tblCacheMeta](
  [cacheId] [int] IDENTITY(1,1) NOT NULL,
  [cacheTable] [nvarchar](100) NOT NULL,
  [dependencyTable] [nvarchar](1000) NOT NULL, --must have timestamp column named ts
  [columnDefs] [varchar](1000) NULL,
  [command] [varchar](1000) NULL,
  [maxTS] [int] NULL,
  [records] [int] NULL,
  lastCreateTime dateTime null,
  reasonForRebuild varchar(100)  
  CONSTRAINT [PK_tblCacheMeta] PRIMARY KEY CLUSTERED ([cacheId] ASC),
  CONSTRAINT [IX_Table_1_cacheTable] UNIQUE NONCLUSTERED  ([cacheTable] ASC)
 ) 
End
GO
create proc usp_SkipTake(
 @cacheTableName varchar(100),
 @dependencyTableName varchar(100),
 @columnDefs varchar(1000), --the column names and dataTypes in the exact same order as they in the stored procedure which will be paged. 
  --example: firstName varchar(50), lastname(varchar(50)
 @sprocWithInputParams varchar(1000),  --this is the stored procedure which will be paged. 
 @skip int, --the number of records to skip 
 @take int, --the number of records to return
 @orderBy varchar(255)=null   --allows you to specify the sort order of the data. This is the only new feature your old stored procedure will need to implement to fit this design pattern.
 )
as Begin
 declare @p varchar(8000)
 declare @lastRecord varchar(9) 
 set @lastRecord = isnull(@skip,0) + case when isnull(@take,0)=0 then 1000000 else @take end
 declare @errStr varchar(255) 
 if lower(@cacheTableName) not like '%tblcache%' 
 Begin
  set @errStr = 'The cache table name must contain tblCache;  
   You specified: [' + @cacheTableName + ']. 
   This is a rule of thumb to help you avoid deleting critical tables by accident!'
  raiserror(@errStr,16,1)
  return
 End
 declare @createTable bit, @cacheId int
 
 --get the cacheId from the cacheTable
 --set the temp table path to the right database.  
 set @cacheTableName = 'tempdb..' + @cacheTableName
 select @cacheId = (select cacheId from tblCacheMeta where cacheTable=@cacheTableName)
 if @cacheId is null 
 Begin 
  insert into tblCacheMeta (cacheTable, dependencyTable) values(@cacheTableName, @dependencyTableName )
  set @cacheId = SCOPE_IDENTITY()
 End
 --drop the cacheTable and recreate it if out of date (record added, modified, or deleted in the dependency table) or the column defs have changed   
  --store the current max time stamp and record-count into a temp
  create table #tmp1 (maxTS int, records int)
  set @p = 'insert into #tmp1 (maxTS, records)
    select maxTS=max(ts+0), records=count(*) from ' + @dependencyTableName 
  exec(@p)        
 declare @reasonForRecalc varchar(255) set @reasonForRecalc = ''
 print 'object_id of [' + @cacheTableName + '] is ' + isnull(convert(varchar(99), object_id(@cacheTableName)), 'NULL')
 if object_id(@cacheTableName) is not null  Begin  
   --check if the column defs have changed
   declare @columnDefsChanged bit, @outOfDate bit
   set @columnDefsChanged = isnull((select case when [columnDefs]=@columnDefs then 0 else 1 end from tblCacheMeta where cacheId=@cacheId),0)   
    --set the out-of-date flag
    set @outOfDate = isnull((select case when cm.maxTS=m.maxTS and cm.records=m.records then 0 else 1 end from 
     #tmp1 m, tblCacheMeta cm where cm.cacheId=@cacheId), 0)  
   --print '@outOfDate: ' + convert(varchar, @outOfDate) + ', @columnDefsChanged: ' + convert(varchar, @columnDefsChanged)
   if @outOfDate=1 or @columnDefsChanged=1
   Begin
    if @columnDefsChanged=1 set @reasonForRecalc = @reasonForRecalc + ' columndefs have changed'
    if @outOfDate=1 set @reasonForRecalc = @reasonForRecalc + '; the data is out of date'
    set @p='drop table ' + @cacheTableName
    exec(@p)
    set @createTable = 1
   End
 End
 Else Begin
  set @reasonForRecalc = @reasonForRecalc + '; Cache table ' + @cacheTableName + ' does not exist so it will be created.'
  set @createTable=1 --turn on the createTable flag so that the table will be created
 End
 
 
 if @createTable=1 
 Begin
 
  --create and insert into the cacheTable
  set @p = 'create table ' + @cacheTableName  + '(' + @columnDefs  + ', _rowNo int identity)  
   insert into ' + @cacheTableName  + ' exec ' + @sprocWithInputParams   
  exec(@p) 
  
  --update the cacheMetadata table select * from tblCacheMeta 
  update tblCacheMeta set dependencyTable=@dependencyTableName, columnDefs=@columnDefs, command=@sprocWithInputParams, lastCreateTime=getDate(), maxTS=m.maxTS, records=m.records, 
  reasonForRebuild= @reasonForRecalc
  from #tmp1 m
  where cacheId=@cacheId
 End
 
 
 set @p = 'select * from ' + @cacheTableName + '
    where (_rowNo > ' + convert(varchar(9), @skip) + ' AND _rowNo <= ' + @lastRecord  + ')
   select recordCount=count(_rowNo), cacheTable=''' + @cacheTableName + ''' from ' + @cacheTableName
 exec(@p) 
 
End

Todo: Have a way of deleting stale tables from the tempDB database. Possibly by moving the tblCacheMeta table to the tempDB database. This will allow the clean-up script to drop the stale tables which haven't been referenced in a while.

No comments: