End-to-end performance tuning is something you hear more and more about. I have seen many presentations about how the “modern” DBA has to be intimately concerned with all layers of the application stack and cannot just focus on the database. I’m on board with that, but the reality is, I have rarely seen it in practice. The only time DBAs, developers, and netops folks seem to venture out of their silos is during crisis meetings and in emergency-situation chatrooms.
A case in point was when I was asked to troubleshoot an invalid cursor error on a client’s database. Here’s the helpful documentation Oracle provides for this error:
oerr ora 1001
01001, 00000, "invalid cursor"
Hm. I guess that this is one of those things you’re just supposed to “know”. 🙂
Actually, invalid cursor is generally a problem with the application design. Perhaps the code is squandering resources and opening too many cursors. A common solution is to jack up the value of MAXOPENCURSORS. (Note that this is not an Oracle parameter as some people seem to think. It’s precompiler option. Meaning that you set this in a C header file and recompile your application in order to change it. But don’t ask me about this stuff; I’m a DBA, not a developer, remember?)
Well, there was no chance of throwing this problem back on the developers and saying “it’s your problem, fix it.” The application in this system is a black box, the source code is unavailable, and the vendor who wrote the software is long gone. Must be a pretty sweet time for the developers in this shop; they get to spend their days with their feet up or playing foosball. Thus it was up to us DBA-types to come up with a solution. Where to begin?
The error arose when a user of the application tried to change her password through the web interface. Our first thought was of course to look at the full error message, which provided the line in the code where the error arose:
09/20/2009 12:24:13 => User edit failed:
ORA-01001: invalid cursor
ORA-06512: at "APP_OWNER.SECURITY_CHECK", line 634
ORA-06512: at "APP_OWNER.SECURITY_CHECK", line 105
ORA-06512: at "APP_OWNER.USER_UPDATE_OPTIONS", line 2
ORA-04088: error during execution of trigger
The top point of failure (updateUser) was in the compiled app code, and inaccessible to us. But the rest of the stack was in the server-side PL/SQL code. The first of these (SECURITY_CHECK line 634) is a SELECT using CAST and MULTISET functions. What does that tell me? Well, I know that BULK COLLECT is often preferred over MULTISET because it involves fewer context switches between the SQL and PL/SQL engines… but that would be indicative of a CPU issue, and we seem to be dealing with a cursor cache memory issue here.
I have the client’s DBA try to recreate the error in the QA environment, but he can’t. That’s an interesting tidbit, but it’s not definitive because the QA and production environments are not (alas) completely identical. So there could be some configurable database parameter (shared pool? PGA?) that might help with this.
But rather than blindly start tweaking parameters, it’s time to look at what the user’s session is actually doing right up until the point of failure. I have my DBA buddy run the password update while I repeatedly query v$session but it goes by too fast; I am unable to identify his session amidst the regular user chatter (it’s not a terribly busy database, which is why I thought this might work). So I set up a logon trigger to capture user activity in a tracefile:
create or replace trigger enable_sql_trace after logon on database
if( dbms_session.is_role_enabled('SQL_TRACE') )
execute immediate 'alter session set timed_statistics = TRUE';
execute immediate 'alter session set max_dump_file_size = unlimited';
Then I set up the SQL_TRACE role and assign it to the application user:
create role sql_trace;
grant sql_trace to app_user;
The DBA runs his update, gets the error, but… no new tracefile appears in USER_DUMP_DEST. So I turn off my trigger…
revoke sql_trace from app_user;
alter trigger enable_sql_trace disable;
…and I go back to to looking at v$session. Why was no new session created? I look at the INACTIVE sessions… say, there’s a whole bunch of APP_USER sessions here that have been logged in since… last March! That can only mean… Connection pool!
The DBA said yes, indeed, the app server runs a Resin connection pool in the middle tier. Well, case closed. We scheduled a restart of Resin, all the cursors that the app sessions had been hanging onto were freed, and the problem vanished (for the next 6 months or so, anyway).
My point is, I could have been a much more effective troubleshooter if I had known more about the architecture of the full application stack. Situations like this inspire me to get outside of my database box and ask more questions… but really, this kind of groping in the dark seems standard; I see it all the time.
See Note 1007395.6: Common Causes of ORA-01001 for more background on ORA-01001.