Hi all, I have been running into this same issue for some time now and can't
come up with a valid reason why SQL is doing what s is doing...
In a nut shell I have three tables
t1 is a fact table
t2 is a time dimension table
t3 is a parameter table that get loaded prior to the query being executed.
Simple code
Select count (t1.stuff)
From t1
Inner join t2
On
t1.date_key = t2.date_key
Inner join t3
on
t2.date_number between t3.start_date and t3.end_date
The optimizer performs a full table scan on the fact then applies the date
range filter... Though if I were to replace the start_date and end_date
values with hardcode values then the optimizer uses the index and pulls only
those records needed from the fact.
Really weird...
Any thoughts?
Thanks
EricStatistics for the tables/index are probably out of date. Update them
using UPDATE STATISTICS command. See if that helps. Or provide an index
hints to the query to specify which index you want it to use. Like:
Select count (t1.stuff)
>From t1
Inner join t2
On
t1.date_key = t2.date_key
Inner join t3
on
t2.date_number between t3.start_date and t3.end_date
WITH (INDEX(index_name))|||Statistics for the tables/index are probably out of date. Update them
using UPDATE STATISTICS command. See if that helps. Or provide an index
hints to the query to specify which index you want it to use. Like:
Select count (t1.stuff)
>From t1
Inner join t2
On
t1.date_key = t2.date_key
Inner join t3
on
t2.date_number between t3.start_date and t3.end_date
WITH (INDEX(index_name))|||Hello Green thank you for your quick reply.
Thats the odd thing... everything looking good. here are the stats: in
addtion supplying the index hint doesn't seem to change the plan result.
DBCC SHOW_STATISTICS
Statistics for FACT INDEX
Updated Rows Rows Sampled Steps
Density Average key length
-- -- -- --
-- --
Mar 6 2006 2:40PM 71680256 304424 197
160.55275 4.0
Statistics for TIME DIMENSION INDEX .
Updated Rows Rows Sampled Steps
Density Average key length
-- -- -- --
-- --
Feb 28 2006 6:46PM 7306 7306 4
1.368738E-4 4.0
DBCC SHOWCONTIG
DBCC SHOWCONTIG scanning Fact_TABLE' table...
Table: 'Pharmacy_Data_Fact_TABLE' (421576540); index ID: 4, database ID: 29
LEAF level scan performed.
- Pages Scanned........................: 155003
- Extents Scanned.......................: 24942
- Extent Switches.......................: 90108
- Avg. Pages per Extent..................: 6.2
- Scan Density [Best Count:Actual Count]......: 21.50% [19376:90109]
- Logical Scan Fragmentation ..............: 13.55%
- Extent Scan Fragmentation ...............: 11.16%
- Avg. Bytes Free per Page................: 1143.1
- Avg. Page Density (full)................: 85.88%
DBCC execution completed. If DBCC printed error messages, contact your
system administrator.
DBCC SHOWCONTIG scanning 'Time_Dimension' table...
Table: 'Time_Dimension' (597577167); index ID: 27, database ID: 29
LEAF level scan performed.
- Pages Scanned........................: 14
- Extents Scanned.......................: 2
- Extent Switches.......................: 1
- Avg. Pages per Extent..................: 7.0
- Scan Density [Best Count:Actual Count]......: 100.00% [2:2]
- Logical Scan Fragmentation ..............: 0.00%
- Extent Scan Fragmentation ...............: 0.00%
- Avg. Bytes Free per Page................: 268.1
- Avg. Page Density (full)................: 96.69%
DBCC execution completed. If DBCC printed error messages, contact your
system administrator.
"Green" wrote:
> Statistics for the tables/index are probably out of date. Update them
> using UPDATE STATISTICS command. See if that helps. Or provide an index
> hints to the query to specify which index you want it to use. Like:
> Select count (t1.stuff)
> Inner join t2
> On
> t1.date_key = t2.date_key
> Inner join t3
> on
> t2.date_number between t3.start_date and t3.end_date
> WITH (INDEX(index_name))
>|||Just a guess, but I don't think the optimizer knows what is in table T3, so
it cannot make a decision based on those dates. As a result, it is unable
to use the index on the dates, since they are unknown. Again, this is just
a guess and could be totally off.
Also, I don't think the statistics on table T3 reflect the current data, but
rather the (completely unrelated) data that was present when the statistics
were generated. I am assuming that statistics on this table are either non
existent, or were generated when the table had no data in it. Try inserting
a row into the table and generating statistics on it, then removing that
row, and running your process again. It is possible that having statistics
with a single valid row in this table will cause the optimizer to choose the
correct path.
Just a couple of ideas.
"Eric" <Eric@.discussions.microsoft.com> wrote in message
news:C7CA8FC0-BD0A-4779-870E-9658E4C72084@.microsoft.com...
> Hi all, I have been running into this same issue for some time now and
can't
> come up with a valid reason why SQL is doing what s is doing...
> In a nut shell I have three tables
> t1 is a fact table
> t2 is a time dimension table
> t3 is a parameter table that get loaded prior to the query being executed.
> Simple code
> Select count (t1.stuff)
> From t1
> Inner join t2
> On
> t1.date_key = t2.date_key
> Inner join t3
> on
> t2.date_number between t3.start_date and t3.end_date
> The optimizer performs a full table scan on the fact then applies the date
> range filter... Though if I were to replace the start_date and end_date
> values with hardcode values then the optimizer uses the index and pulls
only
> those records needed from the fact.
> Really weird...
> Any thoughts?
> Thanks
> Eric
>|||Hi Jim,
I confirmed that the stats on t3 were up to date and truncated the table,
reloaded in it and once again updated the stats. Unfortunately the plan
result was the same.
I assume the optimizer know what is in a table based on the table stats
now taking what you had mentioned into count I think you may be on to
something... but the wrench is why is it if I use a SARG value or create an
additional year column in t3 and make the year column in t2 equal to year i
n
t3 the index get used.
Select count (t1.stuff)
From t1
Inner join t2
On
t1.date_key = t2.date_key
Inner join t3
on
t2.date_number between t3.start_date and t3.end_date and
t2.year_number = t3.year_number
Thanks|||Eric,
It's impossible for the optimizer to do anything but make generic rowcount
estimates for predicates that involve comparing columns from one table
with columns from another. Here, for example, the actual limiting values
of t2.date_number are unknown until run-time, so the histogram of
t2.date_number values is basically useless for plan costing.
By adding an extra superfluous condition to your join's ON clause,
you are reducing the generic rowcount estimate used by the optimizer.
With nothing else to go on, the optimizer will assume that fewer rows
match (<this condition> AND <that condition> ) than match (<this condition> ).
Assuming that fewer rows from t2 match the condition, the estimated cost
of using the index drops, while the cost of a table scan does not drop, and
the change is enough to make the plan that uses an index appear cheapter.
Steve Kass
Drew University
Eric wrote:
>Hi Jim,
>I confirmed that the stats on t3 were up to date and truncated the table,
>reloaded in it and once again updated the stats. Unfortunately the plan
>result was the same.
>I assume the optimizer know what is in a table based on the table stats
>now taking what you had mentioned into count I think you may be on to
>something... but the wrench is why is it if I use a SARG value or create an
>additional year column in t3 and make the year column in t2 equal to year
in
>t3 the index get used.
>Select count (t1.stuff)
>From t1
>Inner join t2
>On
>t1.date_key = t2.date_key
>Inner join t3
>on
>t2.date_number between t3.start_date and t3.end_date and
>t2.year_number = t3.year_number
>Thanks
>
>|||I would expect that equality is handled very different then between for
optimizational purposes. This makes perfect sense since a single value is
more exlicit and easier to determine ahead of time than a range of values.
However, you did not state whether or not this year_number column is indexed
or not. If there is an index on this field , then I would expect it to make
a difference simply because it is easier for the optimizer to know that it
is targeting a certain percentage of rows with the join. If there is no
index, then Steve Kass's explanation seems the only way to explain it.
Regardless, I find that the behavior of the optimizer never ceases to amaze
me.
"Eric" <Eric@.discussions.microsoft.com> wrote in message
news:2DC2E151-6AB0-472B-B385-3B9918F34F73@.microsoft.com...
> Hi Jim,
> I confirmed that the stats on t3 were up to date and truncated the table,
> reloaded in it and once again updated the stats. Unfortunately the plan
> result was the same.
> I assume the optimizer know what is in a table based on the table stats
> now taking what you had mentioned into count I think you may be on to
> something... but the wrench is why is it if I use a SARG value or create
an
> additional year column in t3 and make the year column in t2 equal to year
in
> t3 the index get used.
> Select count (t1.stuff)
> From t1
> Inner join t2
> On
> t1.date_key = t2.date_key
> Inner join t3
> on
> t2.date_number between t3.start_date and t3.end_date and
> t2.year_number = t3.year_number
> Thanks
>|||Try running the SQL through the SQL Optimizer for Visual Studio. It will
rewrite your statement in every possible fashion to get you the best
performing SQL. It has a free 30 day trial. You can find it at
http://www.extensibles.com/modules...=Products&op=NS
examnotes <Eric@.discussions.microsoft.com> wrote in
news:C7CA8FC0-BD0A-4779-870E-9658E4C72084@.microsoft.com:
>
The Relentless One
Debugging is a state of mind
http://www.extensibles.com/
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment