Monitoring Portal Server
Most applications suggest using a larger percentage of the total heap for the new generation, but in the case of Portal Server, using only one eighth the space for the young generation is appropriate, because most memory used by Portal Server is
Even with a large heap size, after a portal instance has been running under moderate load for a few days, most of the heap appears to be used because of the lazy nature of the GC. The GC performs full garbage collections until the resident set size (RSS) reaches approximately 85 percent of the total heap space; at that point the garbage collections can have a measurable impact on performance.
For example, on a 900 MHz UltraSPARCIII™, a full GC on a 2 GB heap can take over ten seconds. During that period of time, the system is unavailable to respond to web requests. During a reliability test, full GCs are clearly visible as spikes in the response time. You must understand the impact on performance and the frequency of full GCs. In production, full GCs go unnoticed most of the time, but any monitoring scripts that measure the performance of the system need to account for the possibility that a full GC might occur.
Measuring the frequency of full GCs is sometimes the only way to determine if the system has a memory leak. Conduct an analysis that shows the expected frequency (of a baseline system) and compare that to the observed rate of full GCs. To record the frequency of GCs, use the vebose:gc JVM™ parameter.
CPU Utilization
When deployed using the building module concept (as described in Chapter 5, “Creating Your Portal Design”), Portal Server has a capable, scalable CPU architecture that also degrades gracefully under high loads.
However, when monitoring a production site, track CPU utilization over time. Load usually comes in spikes and keeping ahead of spikes involves a careful assessment of availability capabilities.
Most organizations find that portal sites are “sticky” in nature. This means that site usage grows over time, even when the size of the user community is fixed, as users become more comfortable with the site. When the size of the user community also grows over time a successful portal site can see a substantial growth in the CPU requirements over a short period of time.
When monitoring a portal server’s CPU utilization, determine the average page latency during peak load and how that differs from the average latency.