Home » RDBMS Server » Performance Tuning » Tuning query without where clause (Oracle 11g, Windows Server 2008 using Toad 11)
Tuning query without where clause [message #606262] Tue, 21 January 2014 15:54 Go to next message
jxh461
Messages: 185
Registered: March 2005
Senior Member
I have a question regarding tuning queries without a where clause.

Example query:

select a.col1,a.col2,a.col3,a.col4
from  table1 a
join table2 b on a.col1 = b.col2


Normally when there is a where clause I use the filter column to create an appropriate index.
But in the case where there is no such clause, does it really help to create an index on the
join column ? Please advise

The explain plan on such a query would indicate a full table scan on both joined tables.

[Updated on: Tue, 21 January 2014 15:55]

Report message to a moderator

Re: Tuning query without where clause [message #606264 is a reply to message #606262] Tue, 21 January 2014 16:29 Go to previous messageGo to next message
John Watson
Messages: 8922
Registered: January 2010
Location: Global Village
Senior Member
Think about the various join methods: hash, sort-merge, nested loop. Which would benefit from an index?
Re: Tuning query without where clause [message #606267 is a reply to message #606262] Tue, 21 January 2014 17:36 Go to previous messageGo to next message
flyboy
Messages: 1903
Registered: November 2006
Senior Member
In this case the question is: is A.COL1 or B.COL2 (or both for 1:1 relationship) unique?

If so, there should be primary key or unique constraint on that unique column, which is assured by (unique) index, so there is no need to create another one.

If not, meditate over the result for 3 rows with A.COL1=1 and 5 rows with B.COL2=1 (and different values in other columns for distinguishing 15 rows in the result set) and whether you really want it.
Re: Tuning query without where clause [message #606273 is a reply to message #606262] Tue, 21 January 2014 21:53 Go to previous messageGo to next message
Kevin Meade
Messages: 2103
Registered: December 1999
Location: Connecticut USA
Senior Member
I can think of three points of interest here.

1. 2% RULE: Indexes will not initially affect the performance of this query. This query wants all rows from both tables. Assuming that these two tables do not have lots of additional columns in them, the query plan should most likely be two tables scans and a hash join. This would be the fastest query plan given that all rows are wanted from both tables and you have a simple equi-join between the tables. A NESTED LOOPS JOIN with a supporting index on the inner table of the join would almost certainly do more work than a simple hash join in this situation. Again, this is due to the fact that all rows are wanted from both tables and the join is a simple equi-join.

2. COVERING INDEXES: If these are wide tables (lots of columns) then you could create two indexes based on the join columns and select list. Something like these indexes:

index table1(col1,col2,col3,col4)
index table2(col2)

Consider the situation where TABLE1 has an average row length of 400 bytes and the length of columns col1,col2,col3,col4 together is 20 bytes. An index on these four columns would be 10X to 20X smaller than the original table. The same could be true for table2 as well. All this really does though is replace the tables with two smaller indexes. The basic idea is simple: if two objects both contain all the rows and columns needed for a query, then scanning the smaller one will be faster than scanning the bigger one. That is what you can do when you create two indexes with the proper columns and column ordering; you create two smaller objects that still have all the same rows and columns needed by the query. The query plan will be the same: two index fast full scans followed by the same hash join. But it would make the query go faster if the indexes were significantly smaller than the tables and the original table scans consumed a significant amount of time in query execution. If most of the time is in the join step then replacing table scans with index fast full scans will not make much of a difference. Again, for this to work, each index should be much smaller than its table. As a rule of thumb, three times smaller or more, otherwise there is no substantial benefit in performance. Additionally this is a desperate trick. It is easy to go too far with this technique so don't be too fast to employ it as you should always consider how such indexes mix with other indexes on the table and other queries in your system.

3. DUNSEL JOINS: Notice that the query only selects from TABLE1, not from table2. If there is a parent child relationship supported by a required foreign key with TABLE1 as the child and table2 as the parent then Oracle 10g and 11g will remove table2 from the query. If you think about it you will understand why the database can do this.

The database would turn your query:

select a.col1,a.col2,a.col3,a.col4
from  table1 a
join table2 b on a.col1 = b.col2


into this query:

select a.col1,a.col2,a.col3,a.col4
from  table1 a


Which would best be done with a table scan or index fast full scan if you create the proper index. Consider the situation where table1.col1 is defined as NOT NULL and the constraint on table1 of FOREIGN KEY (COL1) REFERENCES TABLE2 (COL2). Given these two conditions, what is the use of joining to table2 in the query?

This of course would be the fastest way to do this query.

I am not sure what you are looking for in your question but these three points offer something to think about.

Kevin

[Updated on: Thu, 23 January 2014 03:57]

Report message to a moderator

Re: Tuning query without where clause [message #607562 is a reply to message #606273] Sun, 09 February 2014 19:58 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
What Kevin said ... except I would add (1a) A Nested Loop join may be preferable in cases where TABLE1 contains a high proportion of rows (>90%) that have no match in TABLE2 - or vice-versa. In such a case, full scans and hash joins could process a large number of rows that have no possibility of matching in the other table. These cases are rare, but they can benefit by adding an index on the join column(s).

Ross Leishman
Re: Tuning query without where clause [message #608506 is a reply to message #607562] Thu, 20 February 2014 06:32 Go to previous messageGo to next message
msol25
Messages: 396
Registered: June 2011
Senior Member
hi,

You can use hint like this:


select /*+ use_hash(a b) full(a) full(b) parallel(a 5) parallel(b 5) */
       a.col1
      ,a.col2
      ,a.col3
      ,a.col4 
from   table1 a
join   table2 b on a.col1 = b.col2



Please check the execution plan after using hint and take the decision for best appropriate plan.I hope your query should be faster using
given hint.




[Updated on: Thu, 20 February 2014 06:33]

Report message to a moderator

Re: Tuning query without where clause [message #608514 is a reply to message #608506] Thu, 20 February 2014 06:57 Go to previous message
Kevin Meade
Messages: 2103
Registered: December 1999
Location: Connecticut USA
Senior Member
msol25 presents an example of how to force a query plan that contains two table scans and a hash join. It would also exploit parallel query to increase the number of processes working on the query which would also affect the size of join potentially making them smaller and thus fit in memory better for the hash join. Hints are good as a learning tool or debugging tool. However, I would never use those hints to get a query like this to do what we want. Oracle as a database already provides the necessary tool for that. In this case, statistics are sufficient. Just get your status up-to-date with the right parameters to account for SKEW etc. and you should have little problem achieving the proper query plan for such a simple query as this one. But to restate, mso125 has the right idea in terms of plan style (table scan/hash join) (and option PQ).

Ross brings up an interesting side case. This behaviour differs on different Oracle environments, particularly on the strategic EXADATA platform. There is an optimization on EXADATA that would handle this case. On EXADATA, a bloom filter would be passed to the disk drives and they would apply it to the driving table data stream (non-hashed table or probING table) and effectively throw away the 90% of rows that would not join before they make their way across the network to the database server for actual joining. This means the table scan will still occur, but it will happen on the massively parallel disks of EXADATA. Since these drives would have removed non-joining rows before sending them on, these non-joining rows would not consume network bandwidth, not consume database cache memory, not be part of the actual hash join on the database server.

Kevin
Previous Topic: package size
Next Topic: Bloom filter joins
Goto Forum:
  


Current Time: Thu Mar 28 16:25:04 CDT 2024