Tuesday, February 10, 2015

What Is That Light-Green Oracle Database CPU Wait Time?

This page has been permenately moved. Please CLICK HERE to be redirected.

Thanks, Craig.


What Really Is That Light-Green Oracle Database CPU Wait Time?



Have you ever wondered what that light-green "cpu wait time" really means in Oracle Enterprise Manager? It's what I call, the "gap" time. The "gap" time is the "missing" or the "leftover" time when DB Time does not equal the DB CPU (foreground process CPU consumption) plus the non-idle wait time. And, it happens more often than you might think.

If you have ever noticed that the database time seems too large, then you need to read this article. And, if you really want to know what the light-green "cpu wait time" in your OEM charts is, then you need to read this article. It's that good.

If you're serious about Oracle performance tuning and analysis, you'll want to know I just posted my complete 2015 public training schedule. It's on the main OraPub.com page HERE. Remember, alumni receive a 50% discount...crazy I know.

My Experiment Shows...


My experiment shows a strong relationship between the "gap" time and operating system CPU utilization. This means that a significant portion of the "gap" time is Oracle foreground processes sitting in the CPU run queue ready to consume CPU. This CPU run queue time is not part of DB CPU but it part of DB Time. So, when the CPU run queue time increases, so does DB Time and so does the "gap" time. And I have the data to show it! And you can run the same experiment yourself.

Let me put this another way. Most of the DB Time "gap" is Oracle foreground processes waiting in the operating system CPU run queue so they can eventually and truly consume CPU.

This is really important: When an Oracle foreground process is not consuming CPU but is sitting in the CPU run queue, Oracle Active Session History (ASH) facility records the session sample state as "CPU" and if the Oracle process is a foreground process (not a background process) Oracle's time model records this time as DB Time but not DB CPU. So in both the ASH and time model cases, someone must do some math to calculate this "cpu wait time".

But that name... "cpu wait"!

CPU Wait Time Is A Lousy Name


"CPU wait time" is a lousy name. Why? Mainly because it has caused lots of confusion and speculation. The name would be more appropriately called something like, "cpu queue time." Three reasons come to mind.

First, wait time means something special to Oracle DBAs. To an Oracle DBA anything associate with a "wait" should have a wait event name, a wait occurance, the time should be instrumented (i.e., measured) and should be recorded in the many wait interface related views, such as v$system_event or v$session.

Second, from an Oracle perspective the process is truly "on cpu" because the process is not "waiting." Remember, an Oracle session is either in one of two states; CPU or WAIT. There is no third choice. So the words "CPU Wait" are really confusing.

Third, from an OS perspective or simply a non-Oracle perspective, the Oracle process is sitting in the CPU run queue.

I'm sure in some Oracle Corporation meeting the words "cpu wait" were considered a great idea, but it has caused lots of confusion. And I'm sure it's here to stay.

What Does This "CPU WAIT" Look Like In OEM?


In OEM, the "cpu wait" is a light green color. I grabbed a publically available screenshot off the internet and posted it below. Look familiar? 


OK, so it's really easy to spot in OEM. And if you've seen it before you know EXACTLY what I'm referring to.

What Is CPU Wait Time?


First, let's review what we do know.

1. DB CPU is true Oracle foreground process CPU consumption as reported by the OS through a system call, such as getrusage.

2. CPU Wait time is derived, that is, somebody at Oracle wrote code to calculate the "cpu wait" time.

3. CPU Wait time is a lousy name because it causes lots of confusion.

4. CPU Wait time is shown in OEM as a light green color. DB CPU is shown as a dark/normal green color.

Second, I need to define what I'll call the DB Time "gap." This is not error and I am not implying something is wrong with database time, that it's not useful or anything like that. All I am saying is that sometimes DB Time does not equal DB CPU plus the non-idle wait time. Let's put that in a formula:

DB Time = DB CPU + non Idle Wait Time + gap

Really, What Is CPU Wait Time?


Now I'm ready to answer the question, "What is CPU WAIT time?" Here is the answer stated multiple ways.

"CPU Wait" time is Oracle foreground process OS CPU run queue time.

I ran an experiment (detailed below) and as the OS CPU utilization increased, so did the DB Time "gap" implying that the gap is CPU run queue time or at least a significant part of it.

I ran an experiment and there was a strong correlation between OS CPU utilization and the DB Time "gap" implying that the gap is CPU run queue time.

I ran an experiment and using queuing theory I was able to predict the "gap" time implying that the gap is CPU run queue time. (Whoops... sorry. That's what I'll present in my next post!)

So I'm very comfortable stating that when DB Time is greater than Oracle process CPU consumption plus the non-idle wait time, it's probably the result of Oracle foreground process CPU run queue time.

Yes, there could be some math problems on Oracle's side, there could be uninstrumented time (for sure it's happened before), the operating system could be reporting bogus values or a host of other potential issues. But unless there is an obvious wrong value, I'm sticking with the experimental evidence.

Now I'm going to show the experimental "evidence" that is, that the DB Time "gap" time correlates with the OS CPU utilization.

Let The Data Drive Our Understanding


You can download all the data collection scripts, raw experimental data, Mathematica notepad files, graphic files, etc HERE in a single zip file.

You should be able to run the experiment on any Linux Oracle test system. All you need is a logical IO load and for that I used my free opload tool which, you can download HERE.

The experiment placed an increasing logical IO load on an Linux Oracle 12c system until the operating system CPU utilization exceeded 90%. The load was increased 18 times. During each of the 18 loads, I gathered 31 three minute samples. Each sample contains busy time (v$osstat), idle time (v$osstat), logical IO (v$sysstat "session logical reads"), non-idle wait time (v$system_event where wait_class != 'Idle'), DB CPU (v$sys_time_model), background cpu time (v$sys_time_model), database time (v$sys_time_model DB time) and the sample time (dual table current_timestamp).

The CPU utilization was calculated using the "busy idle" method that I blog about HERE. This method is detailed in my Utilization On Steroids online video seminar.

The workload is defined as the logical IOs per second, lio/s.

Below is a table summarizing the experimental data. The times shown are the averages. If you look at the actual raw experimental data contained in the analysis pack, you'll notice the data is very consistent. This is not suprising since the load I placed should produce a very consistent workload.

Do you see the gaps? Look closely at load 18. The DB Time is 8891.4 seconds. But the sum of DB CPU (996.8 seconds) and the non-idle wait time (2719.2) seconds only equals 3716.0. Yet DB Time is 8891.4. So the "gap" is 5175.3 which is DB Time (8891.3) minus DB CPU (996.8) minus the non-idle wait time (2719.2).

Note: Load 11 and 12 where excluded because of a problem with my data collection. Sorry.

While we can numberically see the DB Time "gap" increase as the CPU utilization increases, check out the graphic in the next section!

The Correlation Between CPU Utilization And DB Time Gap


We can numerically and visually see that as the CPU utilization increases, so does the DB Time "gap." But is there a strong mathematical correlation? To determine this, I used all the experimental samples (except load 11 and 12). Because there was 17 different workloads and with each workload I gathered 31 samples, the correlation comprises of something like 527 samples. Pretty good sample set I'd say.

The correlation coefficient is a strong 0.891. The strongest is 1.0 and the weakest is 0.

Graphically, here is the scatterplot showing the relationship between the CPU utilization and the workload.

Don't expect the DB Time "gap" and OS CPU utilization correlation to be perfect. Remember that DB Time does not include Oracle background process CPU consumption, yet it is obviously part of the OS CPU utilization.

Summary


My experiment indicated the light-green "CPU wait time" is primarily Oracle foreground process operating system CPU run queue time. This is DB Time "gap" time.

My experiment also showed the "gap" time is highly correlated with CPU utilization. Which means, as the CPU utilization increases, so does the "gap" time.

If there are Oracle Database instrumentation bugs or a host of other potential problems, that will also affect the "gap" time.

If you want a more complete and detailed DB Time formula is would be this:

DB Time = DB CPU + Non Idle Wait Time + gap time

In my next post, I'll show you how to calculate the gap time based on queuing theory!

Thanks for reading!

Craig.








Tuesday, February 3, 2015

How To Approach Different Oracle Database Performance Problems

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

How To Approach Different Oracle Database Performance Problems

Jump Start Your Oracle Database Tuning Effort


Every Oracle Database Administrator will tell you no two performance problems are the same. But a seasoned Oracle DBA recognizes there are similarities...patterns. Fast problem pattern recognition allows us to minimize diagnosis time, so we can focus on developing amazing solutions.

I tend to group Oracle performance problems into four patterns. Quickly exploring these four patterns is what this article is all about.


You Can Not Possibly List Every Problem And Solution


When I teach, some Oracle Database Administrators want me to outline every conceivable problem along with the solution. Not only is the thought of this exhausting, it's not possible. Even my Stori product uses pattern matching. One of the keys to becoming a fantastic performance analyst is the ability quickly look at a problem and then decided which diagnosis approach is the best. For example, if you don't know the problem SQL (assuming there is one) tracing is not likely to be your best approach.

The Four Oracle Database Performance Patterns


Here are the four performance patterns I tend to group problems into.

The SQL Is Known


Many times there is a well know SQL statement that is responsible for the poor performance. While I will always do a quick Oracle Time Based Analysis (see below) and verify the accused SQL, I will directly attack this problem by tuning with SQL specific diagnostic and tuning tools.

But... I will also ask a senior application user, if the users are using the application correctly. Sometimes new applications users try and use a new application like their old application. It's like trying to drive a car with moving your feet as you are riding a bicycle... not going to work and it's dangerous!

Business Process Specific


I find that when the business is seriously affected by application performance issues, that's when the "limited budget" is suddenly not so limited. When managers and their business's are affected they want action.

When I'm approached to help solve a problem, I always ask how the business is being affected. If I keep hearing about a specific business process or application module I know two things.

First, there are many SQL statements involved. Second, the problem is bounded by a business process or application. This is when I start the diagnostic process with an Oracle Time Based Analysis approach which, will result in multiple solutions to the same problem.

As I teach in my online seminar How To Tune Oracle With An AWR Report, user feel performance through time. So, if our analysis is time based we can create a quantitative link between our analysis and their experience. If our analysis creates solutions that reduce time, then we can expect the user experience to improve. This combined with my "3 Circle" approach yields spot-on solutions very quickly.

While an Oracle Time Based Analysis is amazing, because Oracle does not instrument CPU consumption we can't answer the question, "What's Oracle doing with all that CPU?" If you want to drill into this topic check out my online seminar, Detailing Oracle CPU Consumption: The Missing Link.

It's Just Slow


How many times have I experienced this... It's Just Slow!


If what the user is attempting to explain is true, the performance issue is affecting a wide range of business processes. The problem is probably not a single issue (but could be) and clearly the key SQL is not know. Again, this is a perfect problem scenario to apply an Oracle Time Based Analysis.

The reason I say this is because an OTBA will look at the problem from multiple perspectives, categorize Oracle time and develop solutions to reduce those big categories of time. If you also do Unit Of Work Time Based Analysis, you can an even anticipate the impact of your solutions! Do an OraPub website search HERE or search my blog for UOWTBA.

Random Incident That Quickly Appears And Vanishes


This is the most difficult problem to fix. Mainly because the problem "randomly" appears and can't be duplicated. (Don't even bother calling Oracle Support to help in this situation.) Furthermore, it's too quick for an AWR report to show it's activity and you don't want to impact the production system by gathering tons of detailed performance statistics.

Even a solid Oracle Time Based Analysis will likely not help in this situation. Again, the problem is performance data collection and retention. The instrumented AWR or Statpack data does not provide enough detail. What we need step-by-step activity...like a timeline.

Because this type of problem scares both DBAs and business managers, you will likely need to answer questions like this:

  • What is that blip all about?
  • Did this impact users?
  • Has it happened before?
  • Will it happen again?
  • What should we do about it?

The only way I know how to truly diagnose a problem like this is to do a session-level time-line analysis. Thankfully, this is possible using the Oracle Active Session History data. Both v$active_session_history and dba_hist_active_sess_history are absolutely key in solving problems like this.

ASH samples Oracle Database session activity once each second (by default). This is very different than measuring how long something takes, which is the data an AWR report is based upon. Because sampling is non-continuous, a lot of detail can be collected, stored and analyzed.

A time-line type of analysis is so important, I enhanced my ASH tools in my OraPub System Monitor (OSM) toolkit to provide this type of analysis. If you want to check them out, download the OSM toolkit HERE, install it and read the osm/interactive/ash-readme.txt file.

As an example, using these tools you can construct an incident time-line like this:

HH:MM:SS.FFF User/Process  Notes
------------ ------------- -----------------
15:18:28.796 suspect (837) started the massive update (see SQL below)

15:28:00.389 user (57)     application hung (row lock on TM_SHEET_LINE_EXPLOR)
15:28:30.486 user (74)     application hung (row lock on TM_SHEET_LINE_EXPLOR)
15:29:30.??? -             row locks becomes the top wait event (16 locked users)
15:29:50.749 user (83)     application hung (row lock on TM_SHEET_LINE_EXPLOR)

15:30:20.871 user (837)    suspect broke out of update (implied)
15:30:20.871 user (57)     application returned
15:30:20.871 user (74)     application returned
15:30:20.871 user (83)     application returned

15:30:30.905 smon (721)    first smon action since before 15:25:00 (os thread startup)
15:30:50.974 user (837)    first wait for undo - suspect broke out of update
15:30:50.974 -             225 active session, now top event (wait for a undo record)

15:33:41.636 smon (721)    last PQ event (PX Deq: Test for msg)
15:33:41.636 user (837)    application returned to suspect. Undo completed
15:33:51.670 smon (721)    last related event (DFS lock handle)

Without ASH seemingly random problems would be a virtually impossible nightmare scenario for an Oracle DBA.

Summary


It's true. You need the right tool for the job. And the same is true when diagnosing Oracle Database performance. What I've done above is group probably 90% of the problems we face as Oracle DBAs into four categories. And each of these categories needs a special kind of tool and/or diagnosis method.

Once we recognize the problem pattern and get the best tool/method involved to diagnosis the problem, then we will know the time spent developing amazing solutions is time well spent.

Enjoy your work!

Craig.


Monday, January 19, 2015

I Have Lots Of Oracle Database Server Power But Performance Is Slow/Bad

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

I Have Lots Of Oracle Database Server Power But Performance Is Slow/Bad


Oracle Database parallelism and serialization is what we as Oracle Database Administrators live and die for. You have a screaming fast Oracle Database system and there is lots of computing power available.

But performance is unacceptable; users are screaming, the phone is ringing, and those fancy dashboards are flashing like it's Christmastime.

What is going on?! What can I do about it?! That's what this post is about.

Learn By Doing


Back in December on the third day of my Oracle Performance Firefighting class, I had each student bring in an AWR report from one of their systems that was giving them problems. (I keep my classes small, giving each student time to do their analysis and time for us to talk about it as a class.)
Get this:
Half of the systems had a similar "problem." I think it's important every DBA understands this "problem" because it's more common than most people believe.

There was plenty of computing power and the key SQL statement they cared about was a batch job. What was the core problem? The quick answer is "serialization" that is, a lack of parallelism. Exploring this using a very large production system AWR report and coming up with solutions is what this posting is all about.

Serialization Is Death


In Oracle systems, serialization is death and parallelism is life. Follow this line: business, end user, application designer, DBA, Oracle Database Kernel Architects (or whatever their title is), OS Administrators, OS designers, CPU designers and IO subsystem designers all have something in common. They work hard to parallelize tasks. Just one example: Oracle is designed to have multiple background and foreground processes running parallel.

But all this parallelization effort can be wasted and minimized if a process turns into a serial work stream (at any level; Oracle, OS, business, etc.). The result is "slowness" because the wall time increases.

Available Power And Slowness Equals Opportunity


When I tune Oracle Database systems, I look for opportunities. And each of my solutions will specifically target an opportunity. When I see unused power and complaints of slowness, I look for ways to increase parallelism. Why? Because having available power combined with slowness likely means a serialization limitation exists.

For sure serialization may be necessary. Two examples come to mind; Oracle database memory serialization control (think: latch and mutex) and business rules.

But if I can find a way to increase performance by using up available power by increasing parallelism, I'll likely be able to turn a slow serialization situation into a screaming fast parallelization situation!

How To Recognize A Serially Constrained System


It's easy to recognize a serially constrained system. Ask yourself these two questions. First, is there available CPU or IO power? Second, are there complaints of application "slowness." If the answer to both of these questions is "Yes" then there is likely a serialization issue. Furthermore, the general solution is to use the available resources to our advantage. That is, find areas to increase parallelization, which will use the available resources and improve performance.

If you have the power, use it! What are you saving it for?

(There may be a very good answer to the "saving" but I'll save that for another article.)

Can I Be Out Of CPU And Be Serially Constrained?


Yes. An Oracle Database system can be serially constrained and be out of OS resources. A great example of this is when there is a raging Oracle memory serialization issue. If you see both significant Oracle latching or mutex wait time combined with a raging CPU bottleneck, you likely have a serialization issue... an Oracle Database memory structure access serialization issue.

So, while available power on a "slow" system likely means we have a serially constrained system there are situations in Oracle with a raging CPU bottleneck that also means there is likely a serialization issue.

Find Out: Is There Available CPU Power?


Here Is A Real Life Situation. To simplify, I'm going to focus on only instance number one. Look at instance number one in the below picture.


The above AWR report snippet shows RAC node #1 OS CPU utilization at 15%. This means that over the AWR report snapshot interval, the average CPU utilization was 15%. I never initially trust an AWR report for calculated results. Plus it's good practice to do the math yourself. If you use the super fast busy-idle method I have outlined in THIS POST and detailed in my online seminar, Utilization On Steroids, the utilization calculates to 16% ( 0.5/(0.5+2.7)=0.16 ). So the AWR Report's 15% for CPU "% Busy" looks to be correct.

Clearly with an average CPU utilization of 15%, we have an opportunity to use the unused CPU power to our advantage.

Find Out: Is There Available IO Power?


I am looking for fast IO responsiveness. That is, a low response time. A great way to get a quick view of IO subsystem responsiveness is to look at the average wait time for the event, db file sequential read.

The wait event, db file sequential read is the time it takes to read a single block synchronously. I like to call it a pure IO read call: a) what time is it? b) make the IO call and wait until you get it, c) what time is it? d) calculate the delta and you have the wait time...and the IO read call response time! If you want more details, I wrote about this HERE, which includes a short video.

For our system, let's figure out the single block IO subsystem read response time. Using the same AWR report, here is a screen shot of the Top Time Events.


Again, I'm just going to focus on the first instance. If you look closely (middle right area), you'll see for instance number one, the average db file sequential read time wait time is 2.22ms. That's fast!

There is no way a physical spinning disk is going to return a block in 2.22ms. This means that many of Oracle's single block read calls are be satisfied through some non-Oracle cache. Perhaps an OS cache or an IO subsystem cache. We can't tell, but we do know the block was NOT an Oracle's buffer cache because the db file sequential wait means the block was not found in Oracle's buffer cache.

A single block synchronous IO read call with an average of 2.22ms means there is available IO read capacity and probably available write capacity as well. Again, just like with the OS CPU subsystem, we have unused power that we will try and use to our advantage.

At this point, I will assume there is also plenty of memory and network capacity available. So, the bottom line is we have a "slow" system combined with available CPU and available IO power. Wow! That is a great situation to be in. I call this, "low hanging fruit."

Real Life: Looking For The "Slow" SQL


At the top of this post, I mentioned that in my Firefighting class in each of the "serialization" cases, there was a key SQL statement that was part of a larger batch process. Keep in mind, that at this point in the analysis I did NOT know this. All I knew was that users were complaining and there was plenty of CPU and IO resources.

Usually, in this situation there is a relatively long running process. There could be lots of quick SQL statement involved, but usually this is not the case. And I'm hoping there is a key long running SQL statement that can be parallelized.

Long running can roughly be translated into "high elapsed time." I've have written a number of articles about elapsed time (search my blog for: elapsed time) and even have a free tool with which, you can gather to get more than simply the average elapsed time. And I have online seminars that touch on this subject: Tuning Oracle Using An AWR Report and also, Using Skewed Performance Data To Your Advantage. So there are lots of useful resources on this topic.

In the AWR report, I'm going to look closely at the SQL Statistics, in particular the "SQL ordered by Elapsed Time (Global)." What I really want is the statistics only for instance one, that is, not global. But that's all I have available. Plus the DBAs will/should know if the key SQL statement(s) are run on instance one. Here's the report.


In the report above, look at the elapsed times (second column on the left). Now looking right, find the "Execs", that is, the executions column. The execution column is the number of completed executions within this AWR snapshot range. If the executions is zero, this means the SQL did not complete during the snapshot interval, that is before the ending snapshot.

If you're wondering, these top elapsed time SQL statements are involved in batch processing. When I look at this, I see opportunity, fruit waiting to be harvested!

And I love this: Every DBA in the class in this situation said, "Oh! I know about this SQL. It's always causing problems." Now it's time to do something about it!

Real Life: Putting This All Together


We have identified available CPU and IO capacity. And we have identified THE elapsed time SQL statement. While I'm a pretty laid back kind of guy, at this point I start to apply some pressure. Why? Because the users are complaining, we have identified both an opportunity, the cause of the problem and the general solutions.

There are two general solutions:

1. Do less work. You want to empty a candy dish faster? Then start with less candy in the dish! If you want a SQL statement to run faster, tune the SQL so it touches less blocks.

2. Do the same amount of work, but group the work and run each group at the same time. This is parallelization! This is why the total elapsed time will not decrease (it will probably increase a little) but the wall time will likely decrease... and dramatically! Here is a LINK to posting that contains a short video demonstrating the difference between elapsed time and wall time.

How To Parallelize (in summary)


There are many different ways to parallelize. But the goal is the same: use the available resources to reduce wall time (not necessarily the elapsed time). Perhaps the application can be redesigned to run in parallel streams. But that can take a very long time and be a real hassle. But in many cases, it's the best long term solution.

If you are short on time, are licensed for Oracle Parallel Query and the SQL has been optimized (oh boy... how many times have all heard that before), you likely can use Oracle PQ. And of course, even if the SQL is not optimized, you can still run PQ and performance may be fantastic.

By the way, adding faster IO disks or more IO disks (what is a "disk" is nowadays anyways) will likely NOT work. Remember the IO subsystem is performing wonderfully.

Thanks for reading and enjoy the mystery of your work!

Craig.

Monday, January 12, 2015

Do The LGWRs Always Sleep For The Full Three Seconds?

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

Do Oracle Database LGWRs (10g, 11g, 12c) Always Sleep For The Full Three Seconds?


Back in June I wrote (included a video) about the Oracle Database log writer(s) "3 second sleep rule." That's the rule we were all taught by our instructors when we started learning about Oracle yet never really knew if it was true. In that post, I demonstrated Oracle Database log writer background processes are normally put to sleep for three seconds.

In this post, I want to answer a related but different question.

Do Oracle Database log writer background processes ALWAYS sleep for the full three seconds? Our initial response would likely be, "Of course not! Because what if a foreground process commits during the three second sleep? The log writer(s) must wake up." That would make sense.

But, is this really true and what else could we learn by digging into this? I created an experiment to check this out, and that is what this post is all about.

The Experiment


In my June post I demonstrated the Three Second Rule. You will see this again below. But in this experiment we are looking for a situation when one of the 12c log writers wakes BEFORE their three second sleep.

You can download the experimental script I detail below HERE.

This is really tricky to demonstrate because of all the processes involved. There is a the Oracle foreground process and in 12c, there are multiple log writer background processes. Because this is experiment follows a timeline, I needed to gather the process activity data and then somehow merge it all together in a way that we humans can understand.

What I did was to do an operating system trace ( strace ) each process ( strace -p $lgwr )  with the timestamp option ( strace -p $lgwr -tt ) sending each process's the output to a separate file ( strace -p $lgwr -tt -o lgwr.txt ). This was done to all four processes and of course, I needed to start the scripts to run in the background. Shown directly below are the log writer strace details.

lgwr=`ps -eaf | grep $sid | grep lgwr | awk '{print $2}'`
lg00=`ps -eaf | grep $sid | grep lg00 | awk '{print $2}'`
lg01=`ps -eaf | grep $sid | grep lg01 | awk '{print $2}'`

echo "lgwr=$lgwr  lg00=$lg00  lg01=$lg01"

strace -p $lgwr -tt -o lgwr.str &
strace -p $lg00 -tt -o lg00.str &
strace -p $lg01 -tt -o lg01.str &

Once the log writers were being traced, I connected to sqlplus and launched the below text in the background as well.

drop table bogus;
create table bogus as select * from dba_objects where object_id in (83395,176271,176279,176280);
select * from bogus;
commit;
exec dbms_lock.sleep(2.1);

exec dbms_lock.sleep(2.2);
exec dbms_lock.sleep(2.3);
update bogus set object_name='83395' where object_id=83395;
exec dbms_lock.sleep(3.1);
update bogus set object_name='176271' where object_id=176271;
exec dbms_lock.sleep(3.2);
update bogus set object_name='176279' where object_id=176279;
exec dbms_lock.sleep(3.3);
update bogus set object_name='176280' where object_id=176280;
exec dbms_lock.sleep(3.4);
commit;
exec dbms_lock.sleep(3.5);
update bogus set object_name='89567' where object_id=89567;
exec dbms_lock.sleep(3.6);
commit;
exec dbms_lock.sleep(3.7);
exit;

Once the sqlplus session was connected,

sqlplus system/manager @/tmp/runit.bogus &
sleep 2

I grabbed it's OS process id and started an OS trace on it as well:

svpr=`ps -eaf | grep -v grep | grep oracle$sid | awk '{print $2}' `
echo "svpr=$svpr"

strace -p $svpr -tt -o svpr.str &

Then I slept for 30 seconds, killed the tracing processes (not the log writers!):

sleep 30

for pid in `ps -eaf | grep -v grep | grep strace | awk '{print $2}'`
do
  echo "killing pid $pid"
  kill -2 $pid
done

Then I merged the trace files, sorted them by time, got rid of stuff in the trace files I didn't want to see and put the results into a final "clean" file.

merge=/tmp/strace.merge.bogus
rm -f $merge
for fn in lgwr lg00 lg01 svpr
do
  cat ${fn}.str | awk -v FN=$fn '{print $1 " " FN " " $2 " " $3 " " $4 " " $5 " " $6 " " $7 " " $8 " " $9}' >> $merge
done

ls -ltr $merge
date
cat $merge | sort > /tmp/final.bogus

cat /tmp/final.bogus | grep -v times | grep -v getrusage | grep -v "svpr lseek" | grep -v clock_gettime | grep -v gettimeofday | grep -v "svpr read" | grep -v "svpr write" > /tmp/final.bogus.clean

The amazing thing is... this actually worked! Here is the output below:

19:11:41.981934 svpr semtimedop(7503875, {{34, -1, 0}}, 1, {2, 200000000}) =
19:11:42.859905 lg01 semtimedop(7503875, {{19, -1, 0}}, 1, {3, 0}) =
19:11:43.986421 lgwr semtimedop(7503875, {{16, -1, 0}}, 1, {3, 0}) =
19:11:44.186404 svpr semtimedop(7503875, {{34, -1, 0}}, 1, {2, 300000000}) =
19:11:44.982768 lg00 semtimedop(7503875, {{18, -1, 0}}, 1, {3, 0}) =
19:11:45.860871 lg01 semtimedop(7503875, {{19, -1, 0}}, 1, {3, 0}) =
19:11:46.499014 svpr semtimedop(7503875, {{34, -1, 0}}, 1, {3, 100000000}) =
19:11:46.989885 lgwr semtimedop(7503875, {{16, -1, 0}}, 1, {3, 0}) =
19:11:47.983782 lg00 semtimedop(7503875, {{18, -1, 0}}, 1, {3, 0}) =
19:11:48.861837 lg01 semtimedop(7503875, {{19, -1, 0}}, 1, {3, 0}) =
19:11:49.608154 svpr semtimedop(7503875, {{34, -1, 0}}, 1, {3, 200000000}) =
19:11:49.993520 lgwr semtimedop(7503875, {{16, -1, 0}}, 1, {3, 0}) =
19:11:50.984737 lg00 semtimedop(7503875, {{18, -1, 0}}, 1, {3, 0}) =
19:11:51.862921 lg01 semtimedop(7503875, {{19, -1, 0}}, 1, {3, 0}) =
19:11:52.817751 svpr semtimedop(7503875, {{34, -1, 0}}, 1, {3, 300000000}) =
19:11:52.997116 lgwr semtimedop(7503875, {{16, -1, 0}}, 1, {3, 0}) =
19:11:53.985784 lg00 semtimedop(7503875, {{18, -1, 0}}, 1, {3, 0}) =
19:11:54.863809 lg01 semtimedop(7503875, {{19, -1, 0}}, 1, {3, 0}) =
19:11:55.998974 lgwr open("/proc/41955/stat", O_RDONLY) = 19
19:11:55.999029 lgwr read(19, "41955 (ora_pmon_prod35) S 1 4195"..., 999) =
19:11:55.999075 lgwr close(19) = 0
19:11:55.999746 lgwr semtimedop(7503875, {{16, -1, 0}}, 1, {3, 0}) =
19:11:56.127326 svpr semtimedop(7503875, {{34, -1, 0}}, 1, {3, 400000000}) =
19:11:56.986935 lg00 semtimedop(7503875, {{18, -1, 0}}, 1, {3, 0}) =
19:11:57.864930 lg01 semtimedop(7503875, {{19, -1, 0}}, 1, {3, 0}) =
19:11:59.003212 lgwr semtimedop(7503875, {{16, -1, 0}}, 1, {3, 0}) =
19:11:59.531161 svpr semctl(7503875, 16, SETVAL, 0x7fff00000001) = 0
19:11:59.531544 lgwr semctl(7503875, 18, SETVAL, 0x7fff00000001) = 0
19:11:59.532204 lg00 pwrite(256, "\1\"\0\0\311\21\0\0\354\277\0\0\20\200{\356\220\6\0\0\r\0\0\0\367^K\5\1\0\0\0"..., 2048, 2331136) = 2048
19:11:59.532317 lgwr semtimedop(7503875, {{16, -1, 0}}, 1, {2, 480000000}) =
19:11:59.532680 svpr semtimedop(7503875, {{34, -1, 0}}, 1, {0, 100000000}) =
19:11:59.537202 lg00 semctl(7503875, 34, SETVAL, 0x7fff00000001) = 0
19:11:59.537263 lg00 semctl(7503875, 16, SETVAL, 0x7fff00000001) = 0
19:11:59.537350 lg00 semtimedop(7503875, {{18, -1, 0}}, 1, {3, 0}) =
19:11:59.538483 lgwr semtimedop(7503875, {{16, -1, 0}}, 1, {2, 470000000}) =
19:11:59.540574 svpr semtimedop(7503875, {{34, -1, 0}}, 1, {3, 500000000}) =
19:12:00.865928 lg01 semtimedop(7503875, {{19, -1, 0}}, 1, {3, 0}) =
19:12:02.011876 lgwr semtimedop(7503875, {{16, -1, 0}}, 1, {3, 0}) =
19:12:02.537887 lg00 semtimedop(7503875, {{18, -1, 0}}, 1, {3, 0}) =
19:12:03.050381 svpr semtimedop(7503875, {{34, -1, 0}}, 1, {3, 600000000}) =
19:12:03.866796 lg01 semtimedop(7503875, {{19, -1, 0}}, 1, {3, 0}) =
19:12:05.014819 lgwr semtimedop(7503875, {{16, -1, 0}}, 1, {3, 0}) =
19:12:05.538797 lg00 semtimedop(7503875, {{18, -1, 0}}, 1, {3, 0}) =
19:12:06.657075 svpr semtimedop(7503875, {{34, -1, 0}}, 1, {3, 700000000}) =
19:12:06.867922 lg01 semtimedop(7503875, {{19, -1, 0}}, 1, {3, 0}) =
19:12:08.017814 lgwr semtimedop(7503875, {{16, -1, 0}}, 1, {3, 0}) =
19:12:08.539750 lg00 semtimedop(7503875, {{18, -1, 0}}, 1, {3, 0}) =
19:12:09.868825 lg01 semtimedop(7503875, {{19, -1, 0}}, 1, {3, 0} 

There is a lot of detail in the above output. I'm only going to make a few comments that pertain to the objectives of this post.

Oracle is using the semaphore call semtimedop to sleep. The beauty of this call, is it allow the process to be woken (that is, signaled) by another process! Keep that mind as you follow the timeline.

Here we go:

19:11:41.981934. Notice the server process' "2, 2" and later the "2,3" and "3, 1" and "3, 2"? This is the result of the dbms_lock.sleep commands contained in the sqlplus script!

19:11:42.859905. Notice lg01 and the other log writer background processes always have a "3, 0" semtimedop call? That is their "3 second sleep."

Look at the first few lgwr entries. I've listed them here:

19:11:43.986421
19:11:46.989885
19:11:49.993520
19:11:52.997116

Notice anything strange about the above times? They are all just about 3 seconds apart of from each other. That's the 3 second sleep in action. But that's not the focus of this post. So let's move on.

Read this slow: I want to focus on just one part of the output which, is shown below. Notice the server process is sleeping for 3.4 seconds. If you look at the sqlplus script (near the top of this post), immediately after the 3.4 second sleep the server process issues a commit. Therefore, because the 3.4 sleep starts at 19:11:56.1 and I'm expecting to see some log writer activity in 3.4 seconds. This would be at 19.11.59.5. This could occur in the middle of the log writer 3 second sleep, which means we will likely see a log writer kick into action before their 3 second sleep completes! Let's take a look.

19:11:56.127326 svpr semtimedop(7503875, {{34, -1, 0}}, 1, {3, 400000000}) =
19:11:56.986935 lg00 semtimedop(7503875, {{18, -1, 0}}, 1, {3, 0}) =
19:11:57.864930 lg01 semtimedop(7503875, {{19, -1, 0}}, 1, {3, 0}) =
19:11:59.003212 lgwr semtimedop(7503875, {{16, -1, 0}}, 1, {3, 0}) =
19:11:59.531161 svpr semctl(7503875, 16, SETVAL, 0x7fff00000001) = 0
19:11:59.531544 lgwr semctl(7503875, 18, SETVAL, 0x7fff00000001) = 0
19:11:59.532204 lg00 pwrite(256, "\1\"\0\0\311\21\0\0\354\277\0\0\20\200{\356\220\6\0\0\r\0\0\0\367^K\5\1\0\0\0"..., 2048, 2331136) = 2048
19:11:59.532317 lgwr semtimedop(7503875, {{16, -1, 0}}, 1, {2, 480000000}) 

We can see the server process 3.4 second sleep starting at time 19:11:56.1 and we can see the sleep end and the server process' next command begin at the expected time of 19:11:59.5. Next in the trace file output is result of the commit. The commit results in the wake of both the lgwr and lg00 background processes.

But notice the lgwr background process started one of its 3 second sleeps at 19:11:59.0 which means it doesn't want to wake until 19:12:02.0. But look at when the lgwr process woke up. It woke up at 19.11.59.5 which is clearly before the expected time of 19:12:02.0. What you just noticed was the lgwr background process was signaled to wake up before its three second sleep completed.

But why did the lgwr need to be woken up? Because the server process' redo must be immediately written.

But it gets even better because the lgwr background process doesn't do the redo write! The lgwr process signals the lg00 process to do the write, which we can see occurs at time 19:11:59:5. Wow. Amazing!

What We Can Learn From This


Personally, I love these kinds of postings because we can see Oracle in action and demonstrating what we believe to be true. So what does all this actually demonstrate? Here's a list:

  1. We can see the 12c log writers involved. Not only lgwr.
  2. All log writer background process initiate a sleep for the default three seconds. I have seen situations where it is not three seconds, but it appears the default is three seconds.
  3. The server process signals the lgwr process to write immediately after a commit is issued.
  4. The server process signals the lgwr process to write using a semaphore.
  5. The log writers (starting in 12c) can signal each other using semaphores. We saw lgwr signal the lg00 background process to write.
  6. The server process was performing updates over 10+ a second period, yet its redo was not written to disk until it committed. This demonstrates that ALL redo is not flushed every three seconds. (This is probably not what you learned... unless you joined one of my Oracle Performance Firefighting classes.)
  7. The log writers while normally put to sleep for three seconds, can be woken in the middle for an urgent task (like writing committed data to an online redo log).

I hope you enjoyed this post!

Thanks for reading,

Craig.