Kernel: Virtual Paging: Difference between revisions
Created page with "== Overview == The IRIX kernel virtual memory subsystem manages physical page allocation, deallocation, and mapping for kernel use. It features a sophisticated free page list organized by cache color buckets with separate queues for clean/stale and associated/unassociated pages, support for large pages (contiguous allocation and coalescing), NUMA-aware node-specific freelists, cache coloring and VCE avoidance, and optimizations for direct-mapped (K0/K1) vs K2 addresses...." |
No edit summary |
||
| Line 107: | Line 107: | ||
Porting: BSD simpler; lacks IRIX's sophisticated coloring, NUMA freelists, and coalescing. | Porting: BSD simpler; lacks IRIX's sophisticated coloring, NUMA freelists, and coalescing. | ||
Overall, IRIX VM is classic SVR4 with heavy MIPS/NUMA/R10000 optimizations. illumos provides nearest modern analog; BSD too simplified for direct mapping. For replication: preserve pfdat flags, phead structure, node freelists, and coalescing logic. | Overall, IRIX VM is classic SVR4 with heavy MIPS/NUMA/R10000 optimizations. illumos provides nearest modern analog; BSD too simplified for direct mapping. For replication: preserve pfdat flags, phead structure, node freelists, and coalescing logic. | ||
[[Category: Kernel Documentation]] | |||
Latest revision as of 23:55, 9 January 2026
Overview
The IRIX kernel virtual memory subsystem manages physical page allocation, deallocation, and mapping for kernel use. It features a sophisticated free page list organized by cache color buckets with separate queues for clean/stale and associated/unassociated pages, support for large pages (contiguous allocation and coalescing), NUMA-aware node-specific freelists, cache coloring and VCE avoidance, and optimizations for direct-mapped (K0/K1) vs K2 addresses. Key characteristics:
pfdat_t structures track per-page metadata (flags, use count, hash chains, etc.). phead_t bucket arrays per node and page size manage free lists. Aggressive cache line reuse and coloring to minimize conflicts. Reservation system via memory pools to prevent deadlock. Special handling for R10000 speculation bug (lowmem separation). Poisoned page support for ECC/uncorrectable errors. Kernel stack pool for fast allocation.
The implementation prioritizes low-latency kernel allocation with careful TLB and cache management.
Key Functions
Core Allocation
kvpalloc / kvpalloc_node: Allocate virtual + physical pages (K2 + physical). kvalloc: Allocate only K2 virtual space. kpalloc / kpalloc_node: Map physical pages into existing K2 space. pagealloc / pagealloc_node / pagealloc_size: Core physical page allocator (single or large pages). contig_memalloc / kmem_contig_alloc: Physically contiguous allocation.
Deallocation
kvpfree / kvpffree: Free virtual + physical (handles K0/K1/K2). kvfree: Free only K2 virtual space. pagefree / pagefree_size: Return physical page to freelist (with coalescing). kmem_contig_free: Free contiguous block.
Large Page Support
lpage_alloc_contig_physmem: Allocate large contiguous block. lpage_free_contig_physmem: Free large block. lpage_coalesce: Background merging of adjacent free base pages. lpage_split: Break large page into smaller ones.
Special Cases
page_mapin / page_mapout: Temporary mapping for copy/zero. page_copy / page_zero: COW and fault-time zeroing (with BTE on SN0). page_discard / page_error_clean: Handle ECC/poisoned pages. kstack_alloc / kstack_free: Kernel stack page pool.
Undocumented or IRIX-Specific Interfaces and Behaviors
Critical Structures (from pfdat.h, page.h, etc.)
pfd_t (page frame data): pf_flags: P_QUEUE, P_HASH, P_ANON, P_DONE, P_WAIT, P_DIRTY, P_DUMP, P_BULKDATA, P_ERROR, P_HWBAD, etc. pf_use: Reference count. pf_next/prev: Free list links. pf_hchain: Hash chain. pf_tag: Vnode or anon handle. pf_pageno: File offset.
phead_t (per-color bucket): ph_count: Number of pages. ph_list[PH_NLISTS]: CLEAN/STALE, ASSOC/NOASSOC (and POISONOUS on NUMA).
Node-specific: pg_free_t per node: freelists, phead arrays, rotors, counters.
Free List Organization
Per-node, per-page-size phead arrays. Cache color bucketed (pheadmask). Separate lists: CLEAN/STALE × ASSOC/NOASSOC (plus POISONOUS). Rotor for round-robin uncached allocation.
NUMA and Migration
Node-local freelists. Round-robin or radial search fallback. Poisoned page handling (directory clearing, discard queues).
R10000 Speculation Workaround
Low memory (<256MB) separated. Special sxbrk variants (low/high memory). Kernel reference tracking (krpf) for DMA safety.
Cache and TLB Optimizations
Direct K0/K1 preferred when possible. VCE avoidance via color validation. Stale → clean promotion with selective cache flush.
Similarities to illumos and BSD Kernel Implementations
illumos (Solaris-derived) Strong similarity:
Page freelist with hash buckets and color awareness. kmem allocator for kernel objects (zones similar to IRIX zones). Contiguous allocation via vmem. Page daemon (vhand equivalent). NUMA support evolved differently (resource pools).
Porting: illumos kmem/page subsystem closest; lacks exact phead coloring and large-page coalescing. BSD (FreeBSD) More divergent:
Simple page queues (free, cache, etc.). UMA/kmem for slabs, vm_page for physical. Contiguous via contigmalloc. No per-color buckets or large-page splitting.
Porting: BSD simpler; lacks IRIX's sophisticated coloring, NUMA freelists, and coalescing. Overall, IRIX VM is classic SVR4 with heavy MIPS/NUMA/R10000 optimizations. illumos provides nearest modern analog; BSD too simplified for direct mapping. For replication: preserve pfdat flags, phead structure, node freelists, and coalescing logic.