INTERNAL: Top command shows iowait soaring to 45% -- this is about 100x more than steady state
Anonym
FOR INTERNAL USE ONLY
The observed behavior indicates a problem with memory management by Kakadu 3.4 (and the Eastman Kodak derivative). This would require a need for future Win/Unix releases of RAIS to have better load throttling mechanisms and improved memory
management.
When 200 qajpips finish displaying first cycle of 3 images (and is starting on 2nd cycle), RAIS load plummets from ~80% down to ~15% and iowait climbs to ~45% for many minutes. This is approximately 100 times more than the ~0.3% iowait stats seen after RAIS reaches steady state.
Steps to reproduce the problem:
- Launch top command on Espresso
- Start X client (Exceed or Cygwin) on PC
- Launch load graph (perfmeter) on Espresso
- Use IE on PC to display admin-console for RAIS running on Espresso
- On Dell GX270 PC, launch 200 qajpips running against RAIS on Espresso -- with each qajpip configured to display three large images (ohare, sanfran, abu_4bands) 250 times.
- After first cycle finishes (i.e. ~600 images displayed), note that load graph plummets and that top shows the idle time and iowait time soaring.
- After second cycle finishes (i.e. total of ~1,200 images displayed), note that load graph has a deep but fairly brief downward spike
- After RAIS reaches steady state (3 or 4 display cycles, usually 30-45 minutes or so), note that ~95% of CPU is used (i.e. idle state ~5%) and iowait state is ~0.0%
Conjecture is that for the first display cycle, the 200 qajpips are running in lockstep -- and that for some reason this causes huge problems for RAIS at the end of the cycle. As time goes on, it appears that the load smears out -- meaning that each images is
being accessed by ~1/3 of the qajpips at any point in time. Perhaps this causes RAIS to keep all three images cached, which in turn avoids the spike in iowait time when qajpips start a new display cycle.
Note that the three images (ohare, sanfran, and abu_4bands) are relatively small images (~500 Mb raw compressed with 10:1 ratio yielding ~50 Mb *.jp2). The concern is that if work with bigger images (say 1 Gb - 8 Gb), just a handful of clients accessing the
same images could trigger this same problem with runaway iowait times and plunging load profile.
Build: Unix 1.0 release candidate 4
(was also seen with "iteration 1" build)
System: Espresso running RAIS, Dell GX270 PC running 200 remote qajpips
NOTE: This techtip references CR ID 34645