OpenVMS Internals: OpenVMS Nonpaged Pool Management Changes

Originally published in Digital Systems Journal, March/April 1994.  Thought I’d dust it off and make it available again.

OpenVMS AXP V1.0 and OpenVMS VAX 6.0 have introduced Auto-tuning Dynamic Memory Allocation for allocation and deallocation of nonpaged pool.  Although the internals have changed, the same calls are used to allocate and deallocate pool.  This is also one of the few times where Digital has actually reduced the number of SYSGEN parameters.

These changes not only provide for more effective use of nonpaged pool, but also make the code easier to follow.

In the pre-6.0 nonpaged pool management, there were four sections of the nonpaged pool: Small Request Packets (SPR’s), Intermediate Request Packets (IRP’s), Large Request Packets (LRP’s), and the variable nonpaged pool.  The SRP, IRP, and LRP lists where also known as look-aside lists, and were set up to allow for fast allocation and deallocation of discrete amounts of nonpaged pool.  If the memory allocation could not be satisfied by the look-aside lists, the memory was allocated from variable pool.

Each section of the nonpaged pool was reserved pages in the system page table depending on several SYSGEN parameters:

  • SRPCOUNT, SRPCOUNTV, and SRPSIZE for the SRP look-aside list.
  • IRPCOUNT, IRPCOUNTV for the IRP look-aside list (IRPSIZE=176)
  • LRPCOUNT, LRPCOUNTV, and LRPSIZE for the LRP look-aside list.
  • NPAGEDYN, NPAGEVIR for the variable nonpaged pool region.

The new nonpaged pool consists entirely of the variable nonpaged pool. All parameters for the old look-aside lists are gone. All nonpaged pool is sized using only the SYSGEN parameters NPAGEDYN – the initial number of bytes of nonpaged pool – and NPAGEVIR – the maximum number of bytes of nonpaged pool.  The new look-aside lists are populated as needed from the variable pool and periodically returned to the variable pool from the look-aside lists.

The new methodology is more responsive to nonpage pool usage than it used to be. Rather than depending on a System Manager to size the three look-aside lists, the operating system can expend look-aside lists as needed.  Previously, when a look-aside list expanded, pages where permanently added to the list, even for a momentary peak.  Under the autotuning pool management, the expanded lists are periodically trimmed to return memory to pool for other users.

The New Autotuning Look-aside Lists

Previously, the data cells IOC$GQ_SRPIQ, IOC$GQ_IRPIQ, IOC$GQ_LRPIQ served as the listheads for relative queues of packets.  The relative interlocked queue instructions INSQTI and REMQHI instructions were used to insert and remove packets from these queues.

Boot-time-sized SRP, IRP, and LRP look-aside lists have been replaced by an array of 80 look-aside lists for memory allocation from 64 bytes to 5120 bytes in 64 byte increments. These lists start empty and are filled as memory allocated from variable pool is deallocated.

In OpenVMS AXP, the array of listheads is pointed to by the data cell IOC$GQ_LISTHEADS.  In OpenVMS VAX V6.0, this and other data cells are local to the SYSTEM_PRIMITIVES executive image.  However, the nonpaged pool initialization places the address of these local data cells into EXE$AR_NPOOL_DATA.  The look-aside listheads can still be accessed in the System Dump Analyzer (SDA) by defining the offset in the nonpaged pool data area with:

Use these commands to examine the look-aside list on an AXP use SDA:

On a VAX, these lists are stored as relative queues and manipulated using the interlocked queue instructions REMQHI and INSQTI (See Figure 1.)

 

VAX IOC$_GQ_LISTHEADS
Figure 1

On the AXP, these lists are stored as NULL-terminated linked lists.  The second longword of the listhead is a sequence number.  The routines EXE$LAL_REMOVE_FIRST and EXE$LAL_INSERT_FIRST – in module LOOK_ASIDE_LIST – are used to manipulate the linked list.  Load-Locked and Store-Condition instructions provide atomic operations on the list rather than use the higher overhead Privileged Architecture Library (PAL) code calls that implement REMQHI and INSQTI (See Figure 2.)

ALPHA IOC$GQ_LISTHEADS
Figure 2

Use of the REMQHI or EXE$LAL_REMOVE_FIRST allows memory allocation to occur quickly from a populated look-aside list at any Interrupt Priority Level (IPL) and without requiring the acquisition of the POOL spin-lock.

A Typical Nonpaged Pool Allocation

Nonpaged pool allocation is performed by the routine EXE$ALONONPAGED.  This routine is a JSB entry point with Register 1 (R1) filled in with the requested size of the memory allocated.  The entry point for EXE$ALONPAGVAR still exists but also points to the routine EXE$ALONONPAGED.

EXE$ALONONPAGED takes the requested allocation size and rounds it up to the nearest multiple of 64.  If the resulting allocation size is less than or equal to 5120 bytes, the memory is allocated from the look-aside list.  The allocation size is divided by 64 bytes to determine which look-aside queue will be used to allocate the memory.  An attempt is made to remove a packet from the queue – REMQHI on VAX, EXE$LAL_REMOVE_FIRST on AXP.

If the removal is successful, the pool is returned to the caller. Otherwise, the memory is allocated from the variable pool.  Because this pool is an ordered linked list and cannot be accessed by an atomic instruction, it is protected by the POOL spinlock and raising IPL to IPL$_POOL.

As in previous versions of OpenVMS, if the caller is running higher than IPL$_POOL and the look-aside list is empty, the allocation request fails immediately with an insufficient memory error – SS$_INSFMEM.  Otherwise the pool is allocated from the variable pool using the EXE$ALLOCATE routine.  The routine EXE$ALLOCATE functions as in previous versions of OpenVMS.

A Typical Nonpaged Pool Deallocation

There are two JSB entry points for nonpaged pool deallocation: EXE$DEANONPAGED and EXE$DEANONPGDSIZ.  The routine EXE$DEANONPAGED uses the value stored in the IRP$W_SIZE field of the deallocated memory to determine how much memory is to be deallocated.  It extracts this field and transfers control to EXE$DEANONPDGSIZ.

In OpenVMS versions preceding V6.0, the address of the returned memory was used to determine if the memory should be returned to the SRP, IRP, LRP, or variable pool list.  Because all pool now comes from the variable pool, these checks are no longer required.

EXE$DEANONPDGSIZ works similar to EXE$ALONONPAGED.  It rounds the deallocation size up to a multiple of 64 bytes.  It also verifies that the deallocated memory starts on a 64 byte boundary; if not, the system will BUG_CHECK.

For deallocations of memory less than 5120 bytes, the length of the packet is divided by 64 bytes, and the result us used as an index into the array of look-aside lists.  The INSQTI instruction is used to return pool to the queue on the VAX, the routine EXE$LAL_INSERT_FIRST is used to return pool to the linked list on the AXP.

If the returned memory is larger than 5120 bytes, the POOL spinlock is acquired, and the memory is returned to the variable nonpaged pool by calling EXE$DEALLOCATE>  The routine EXE$DEALLOCATE functions as in previous versions of OpenVMS.

The new nonpaged pool management routines provide a routine called EXE$RECLAIMLISTS to dequeue memory from the look-aside lists and return it to the variable pool.  This routine can make either an aggressive or a nonaggressive pass to return look-aside list packets to the variable pool.  This routine is called under two conditions.

During the initialization of the nonpaged pool, a Time Queue Element (TQE) is created to call the routine EXE$RECLAIMLISTS for a nonaggressive pass every 30 seconds.  ON e nonaggressive pass, EXE$RECLAIMLISTS first acquires the POOL spinlock.  Then the listheads are scanned to find look-aside queues with more than two packets on the queue.  If the list has more than two packets on the list, one is removed – with either the REMQHI or EXE$LAL_REMOVE_FIRST – and returned to the variable pool by calling EXE$DEALLOCATE.  On the VAX, the first three look-aside lists are skipped because they account for 99 percent of the activity, especially MSCP traffic.  On the AXP, all look-aside lists are scanned for reclamation.  It is unclear why the AXP code differs on this point.

An aggressive pass scans the lists as before, but no check is made to determine how many packets are on the list.  If the list has packets, one is returned to the variable pool. This is if an attempt to allocate memory from the variable pool fails in the routine EXE$ALONPAGVAR.

Allocation Failures

During allocation requests, initially an attempt is made to satisfy it from the look-aside lists.  If the request is too large or the look-aside list is empty, an attempt is made from the variable pool by calling EXE$ALONPAGVAR.  Sometimes there is not enough, or not a large enough piece, of nonpaged pool to satisfy the request.  Should this occur, EXE$ALONPAGVAR calls EXE$RECLAIMLISTS once to reclaim some of the look-aside lists.  Hopefully, some of the freed entries will create enough contiguous pool to satisfy the request.

If the request is still unsatisfied, an attempt is made to expand pool by calling EXE$EXTENDPOOL.  If the pool is unable to be expanded – already expanded to NPAGVIR – EXE$ALONPAGVAR makes an atetmpt at freeing sufficient pool by calling the routine EXE$FLUSHLISTS>

EXE$FLUSHLISTS is a “last gasp effort” routine by the nonpaged pool allocation to satisfy an allocation request.  EXE$FLUSHLISTS starts at the first look-aside listhead and remove each packet and returns it to the variable pool through EXE$DEALLOCATE until a large enough block is found to satisfy the request.

If there still is not a large enough section of pool to satisfy the allocation request, EXE$ALONPAGVAR returns an insufficient memory error – SS$_INSFMEM.

A Better Nonpaged Pool

The new strategy provides some direct and indirect benefits to pool management.  Fragmentation is reduced by allocated in 64-byte increments instead of 16-byte increments.  IN addition, once pool of a selected size is allocated, it can be reused more frequently because it is easily found against in the look-aside list after deallocation.

The new strategy is more efficient with dealing with a changing workload.  Statically sized look-aside lists are unable to migrate pool between look-aside lists as needed.  The system “tuned” itself for dealing with the worst case.  By occasionally trimming the look-aside lists and returning memory to the variable pool, the system handles better workload changes that effect usage of pool.

More nonpaged pool will be allocated from look-aside lists than before.  Look-aside can be allocated from and deallocated to without added synchronization, requiring less use of the POOL spinlock. Less scanning of variable pool is done, further improving performance.

Last, there are only two SYSGEN parameters left that size nonpaged pool – NPAGEDYN and NPAGEVIR.  Making eight SYSGEM parameters obsolete is certainly a welcome to this System Manager.