2 pages: [1] [2]
HostWeb Forums » Microsoft Databases » microsoft.public.sqlserver.relationalserver.performance » Performance impact of different approaches to querying for 20,000 rows of data

Topic: Performance impact of different approaches to querying for 20,000 rows of data

Reply | New Topic | This is SPAM | This is Offensive

Submitted: 1/10/2008 8:44:23 AM

By: Zoe Hart
I have two servers - a web server and a data server. I have a web service on
my web server that receives a request for information on anywhere from 1 to
60,000 products. I'm looking at a couple of different approaches for
querying the database for the product information on the 1 to 60,000
products.

1) Pass the product list into a SQL stored procedure as a single delimited
string and have that SQL stored procedure do a SELECT * FROM Products WHERE
Sku IN ('sku1', 'sku2', 'sku3',..., 'sku60000'). I'm not sure of all of the
details of how I'll get from my single delimited string to that SQL query,
but I assume I can do it.

2) Loop through the list of 1 to 60,000 products and issue 1 to 60,000 calls
to a SQL stored procedure that returns data for a single product. My client
that's issuing the requests is multi-threaded and I'm comfortable that I can
set this up so that I'm issuing the 60,000 calls 5-10 or more calls at a
time.

The client that's requesting the 1 to 60,000 products is not very time
sensitive. It's a batch product feed process that runs 1 to 4 times a day.
The database I'm querying also supports a web site that does have real-time
requirements, so I don't want to structure my query in such a way that my
web service is negatively impacting the performance of my web site.

I'm trying to understand the pros and cons of the two approaches and would
appreciate any inputs. Thoughts I have so far:

Option 1 may perform better for the product feed client because it has one
large network transaction instead of 1 to 60,000 small network transactions.
But Option 1 may put a more intense load on the SQL server for a period of
time, potentially negatively impacting the real-time performance of the web
site.

With either solution I probably want to look at ways to ensure that the
request for data for 1 to 60,000 products is done at a lower priority than
real-time requests from the web site.

Any thoughts or suggestions?

Thanks,
Zoe



Replies below ↓

Replies

Reply | New Topic | This is SPAM | This is Offensive

Submitted: 1/10/2008 11:12:25 AM

By: Andrew J. Kelly

Re: Performance impact of different approaches to querying for 20,000 rows of data

Zoe,

Have a look at this link to see how to pass in a delimited list and process
it appropriately. But I would probably create 2 or more stored procedures to
handle the different ranges of product requests so you can get a proper
query plan for each. For instance if you only had one product specified you
can easily do an index seek and get a good plan. But to retrieve 60K
products you may need to do a scan or Merge Join or even Hash Join. Due to
parameter sniffing if the first time the proc was run it had 50K products
you would get a plan for that many rows. But when you call it the next time
even with 1 product you will still get the same plan as before. So I would
have your app decide how may products there will be and call one of 2 or
more (depends on how many different query plans you may encounter) sps so
they each get their own plan. The case where they only lookup a single
product you can use a straight forward query with an =. More than 1 you need
to use dynamic sql or parse it into a table with a UDF.

http://www.sommarskog.se/arrays-in-sql.html

--
Andrew J. Kelly SQL MVP
Solid Quality Mentors


"Zoe Hart" <zoe.hart@nospam.competitive.com> wrote in message
news:eNn1t%235UIHA.1480@TK2MSFTNGP06.phx.gbl...
>I have two servers - a web server and a data server. I have a web service
>on my web server that receives a request for information on anywhere from 1
>to 60,000 products. I'm looking at a couple of different approaches for
>querying the database for the product information on the 1 to 60,000
>products.
>
> 1) Pass the product list into a SQL stored procedure as a single delimited
> string and have that SQL stored procedure do a SELECT * FROM Products
> WHERE Sku IN ('sku1', 'sku2', 'sku3',..., 'sku60000'). I'm not sure of all
> of the details of how I'll get from my single delimited string to that SQL
> query, but I assume I can do it.
>
> 2) Loop through the list of 1 to 60,000 products and issue 1 to 60,000
> calls to a SQL stored procedure that returns data for a single product. My
> client that's issuing the requests is multi-threaded and I'm comfortable
> that I can set this up so that I'm issuing the 60,000 calls 5-10 or more
> calls at a time.
>
> The client that's requesting the 1 to 60,000 products is not very time
> sensitive. It's a batch product feed process that runs 1 to 4 times a day.
> The database I'm querying also supports a web site that does have
> real-time requirements, so I don't want to structure my query in such a
> way that my web service is negatively impacting the performance of my web
> site.
>
> I'm trying to understand the pros and cons of the two approaches and would
> appreciate any inputs. Thoughts I have so far:
>
> Option 1 may perform better for the product feed client because it has one
> large network transaction instead of 1 to 60,000 small network
> transactions. But Option 1 may put a more intense load on the SQL server
> for a period of time, potentially negatively impacting the real-time
> performance of the web site.
>
> With either solution I probably want to look at ways to ensure that the
> request for data for 1 to 60,000 products is done at a lower priority than
> real-time requests from the web site.
>
> Any thoughts or suggestions?
>
> Thanks,
> Zoe
>


Reply | New Topic | This is SPAM | This is Offensive

Submitted: 1/10/2008 11:21:45 AM

By: Anonymous

Re: Performance impact of different approaches to querying for 20,000 rows of data

"Zoe Hart" <zoe.hart@nospam.competitive.com> wrote in message
news:eNn1t%235UIHA.1480@TK2MSFTNGP06.phx.gbl...
>I have two servers - a web server and a data server. I have a web service
>on my web server that receives a request for information on anywhere from 1
>to 60,000 products. I'm looking at a couple of different approaches for
>querying the database for the product information on the 1 to 60,000
>products.
>
> 1) Pass the product list into a SQL stored procedure as a single delimited
> string and have that SQL stored procedure do a SELECT * FROM Products
> WHERE Sku IN ('sku1', 'sku2', 'sku3',..., 'sku60000'). I'm not sure of all
> of the details of how I'll get from my single delimited string to that SQL
> query, but I assume I can do it.
>
> 2) Loop through the list of 1 to 60,000 products and issue 1 to 60,000
> calls to a SQL stored procedure that returns data for a single product. My
> client that's issuing the requests is multi-threaded and I'm comfortable
> that I can set this up so that I'm issuing the 60,000 calls 5-10 or more
> calls at a time.
>
> The client that's requesting the 1 to 60,000 products is not very time
> sensitive. It's a batch product feed process that runs 1 to 4 times a day.
> The database I'm querying also supports a web site that does have
> real-time requirements, so I don't want to structure my query in such a
> way that my web service is negatively impacting the performance of my web
> site.
>
> I'm trying to understand the pros and cons of the two approaches and would
> appreciate any inputs. Thoughts I have so far:
>
> Option 1 may perform better for the product feed client because it has one
> large network transaction instead of 1 to 60,000 small network
> transactions. But Option 1 may put a more intense load on the SQL server
> for a period of time, potentially negatively impacting the real-time
> performance of the web site.
>
> With either solution I probably want to look at ways to ensure that the
> request for data for 1 to 60,000 products is done at a lower priority than
> real-time requests from the web site.
>
> Any thoughts or suggestions?
>
> Thanks,
> Zoe
>


Zoe,

Maybe another alternative is to use an XML as a parameter so you can load
easily this XML in a table, add indexes or whatever you need, and do a join
with your product table. I don't recommend option 2 (60000 calls) due to
network overhead and latency. It will be A LOT slower than the one call
alternative.

--

Rubén Garrigós
Solid Quality Mentors


Reply | New Topic | This is SPAM | This is Offensive

Submitted: 1/10/2008 3:59:15 PM

By: Erland Sommarskog

Re: Performance impact of different approaches to querying for 20,000 rows of data

Zoe Hart (zoe.hart@nospam.competitive.com) writes:
> 1) Pass the product list into a SQL stored procedure as a single
> delimited string and have that SQL stored procedure do a SELECT * FROM
> Products WHERE Sku IN ('sku1', 'sku2', 'sku3',..., 'sku60000'). I'm not
> sure of all of the details of how I'll get from my single delimited
> string to that SQL query, but I assume I can do it.

Whatever, don't do exactly this. The time it would take to compile
that statement is amazing, particularly if you are on SQL 2000.

But there are other alternatives, as I discuss in my article
http://www.sommarskog.se/arrays-in-sql.html.

Andrew made an important point about the need for different plans due
to the number of elements in the list. I think the best approach is to
unpack the list into a temp table, as this will cause a recompile
and the actual join is likely to use the best plan.




--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

Reply | New Topic | This is SPAM | This is Offensive

Submitted: 1/15/2008 7:52:25 AM

By: Zoe Hart

Re: Performance impact of different approaches to querying for 20,000 rows of data

Thanks for all the good input. Our rough plan at this point is to have a
stored procedure that accepts an XML input that contains the 1-n SKUs. That
stored procedure will create a temporary table that with columns for the SKU
and the other data we intend to look up. The stored procedure will write the
1-n SKUs to the temporary table and then use UPDATE FROM to join the
temporary table to one or more tables that contain the data we need and
update the columns in the temporary table. We'll then SELECT * from the
temporary table FOR XML to get our results. We can either go with that
result as is or map it to a format we like better in the code that calls the
stored proc.

Thanks again.
Zoe

"Zoe Hart" <zoe.hart@nospam.competitive.com> wrote in message
news:eNn1t%235UIHA.1480@TK2MSFTNGP06.phx.gbl...
>I have two servers - a web server and a data server. I have a web service
>on my web server that receives a request for information on anywhere from 1
>to 60,000 products. I'm looking at a couple of different approaches for
>querying the database for the product information on the 1 to 60,000
>products.
>
> 1) Pass the product list into a SQL stored procedure as a single delimited
> string and have that SQL stored procedure do a SELECT * FROM Products
> WHERE Sku IN ('sku1', 'sku2', 'sku3',..., 'sku60000'). I'm not sure of all
> of the details of how I'll get from my single delimited string to that SQL
> query, but I assume I can do it.
>
> 2) Loop through the list of 1 to 60,000 products and issue 1 to 60,000
> calls to a SQL stored procedure that returns data for a single product. My
> client that's issuing the requests is multi-threaded and I'm comfortable
> that I can set this up so that I'm issuing the 60,000 calls 5-10 or more
> calls at a time.
>
> The client that's requesting the 1 to 60,000 products is not very time
> sensitive. It's a batch product feed process that runs 1 to 4 times a day.
> The database I'm querying also supports a web site that does have
> real-time requirements, so I don't want to structure my query in such a
> way that my web service is negatively impacting the performance of my web
> site.
>
> I'm trying to understand the pros and cons of the two approaches and would
> appreciate any inputs. Thoughts I have so far:
>
> Option 1 may perform better for the product feed client because it has one
> large network transaction instead of 1 to 60,000 small network
> transactions. But Option 1 may put a more intense load on the SQL server
> for a period of time, potentially negatively impacting the real-time
> performance of the web site.
>
> With either solution I probably want to look at ways to ensure that the
> request for data for 1 to 60,000 products is done at a lower priority than
> real-time requests from the web site.
>
> Any thoughts or suggestions?
>
> Thanks,
> Zoe
>



Contents
Home
Forums
About Us
Contact Us
Web Hosting:
Hosting Providers
How to choose a name
What is a Hosting Provider
Hosting Types
Choosing the right plan
 
Search
 
Login to HostWeb.com
Email
Password
If you do not have an account with us yet, join now - it's FREE!