
Help! Trying to measure how long interrupts are shut off for.
Hi Everybody,
For a project I'm working on, I have written a kernel module that
hooks the RTC timer interrupt. The module sets up the RTC so it will
interrupt at a rate of 8Khz (or about 122 microseconds per tick).
To get a feel for how much jitter there is associated with this
interrupt, I wrote some code to measure elapsed time between
consecutive interrupts (which should obviously be around 122
microseconds). This measurement is taken using the rdtscl function.
The code keeps track of the maximum elapsed time and whenever a new
max is hit, it prints out a message reporting the new max.
When I ran my test, I was a suprised to see that my code measured a
maximum of over 2 milliseconds between consecutive interrupts.
Further testing showed that in most cases, my timer interrupt was
pretty much on time, but occasionally (like 10 times in 30 minutes) it
was getting whacked into the millisecond range. The lead me to
hypothesize that somewhere in the system, somebody is shutting off
interrupts for around 2 milliseconds, which caused my code to miss a
bunch of RTC timer interrupts.
So, to prove my hypothesis, I set out to measure how long interrupts
were being shut off for. I did this by modifying the kernel to
intercept the save_flags and restore_flags macros. The basic idea of
my modification goes like this:
When save_flags is called, I look to see if I'm going from an
Interrupts Enabled(IE) to an Interrupts Disabled (ID) state. If I am,
I grab the time using rdtscl and save it in a global variable.
When restore_flags is called, I look to see if I'm going from an ID
state to an IE state. If I am, I grab the time using rdtscl. Then I
calculate the difference between the two times. If the difference is
greater than the max difference so far, I save the new max and call
"show_stack(0)". I call show_stack so I can see where I am when the
new max takes place. The idea is that this will identify the section
of code that is shutting off interrupts for long periods of time.
Armed with that knowledge, I'm hoping I can figure out how to avoid
the long delay.
And now finally, I come to my problem. When I run my kernel module
with my hacked, interrupt-off-measuring kernel, I see results that
don't make sense. I see my kernel telling me that the maximum amount
of time that interrupts are shut off for is literally in the hundreds
of nanoseconds. However, during this time, my module is telling me
that it measured a time between consecutive interrupts of 2
milliseconds! So if nobody is shutting off interrupts for more than a
few hundred nanoseconds who the heck is holding off my timer
interrupt???
One guess is that there are other ways to shut off and turn on
interrupts. Perhaps there is code that calls cli and sti directly?
Or, could somebody be masking all interrupts but theirs? Or, does the
RTC hardware go out to lunch occasionally?
Thanks for taking the time to read this. Any thoughts or ideas would
be most welcome.
Jason