Virtualization is the big buzz word these days. I would like to describe one of the older virtualization techniques around: Virtual Memory (VM) on Solaris 10. Every 32-bit application, command, utility (e.g. ls, vi, fmd, acroread, oracle process, etc) is given 2 to the 32 = 4Gb of virtual memory to potentially use. 64-bit applications are given 16 Exabytes of virtual memory which is 4 billion times more than 32-bit programs. How can my system with only 8Gb of DRAM be running 120 processes that each have 4Gbs of memory and maybe 8 more that have 16 Ebs? The magic is that each process is seeing virtual memory not real or physical memory. The kernel's VM subsystem manages DRAM in units called pages which are currently either 4Kb on X86, X64 systems and 8Kb on SPARC V9 (UltraSparc) based systems. Thekernel is the guts or core of the Solaris 10 Operating system. The kernel manages a system's resources like CPUs, Memory and I/O devices. Each processor (core) stores the contents of memory or the addresses of (virtual) memory (called pointers in C) in what are called registers (the fastest memory on our systems). These registers doubled in size from 32-bits to 64-bits when SPARC V9 came out in the mid to late 90's. The size of an instruction like ld or add did not change. Instructions remained 32-bits. The X86 and AMD processors have variable length instructions. So, because a register can hold either a 32-bit virtual address (just halfof the register is used) or a 64-bit virtual address gives us the numbers 4Gb or 16Eb.
This is a common description I have heard over the years on what virtual memory is: "it is the sum of swap space and memory". I hate this definition which totally misses the point. A simpler more accurate description is that it is the memory given to the process which ranges from 0 to 4Gb (for 32-bit programs) addresses. These are not real DRAM addresses, hence the term virtual memory. This range of addresses is referred to as the process's address space. Some of the benefits of Virtual Memory are that we can 1. run (many) larger programs than the DRAM we have. 2. managing DRAM is easier for the OS (just keep track of the pages) 3. its cheaper because most of the program can be out on disk. Only the actively referenced pages are kept in DRAM. These pages are referred to as the resident set or working set. RSS in the output of prstat is the size of the process's resident set.
The virtual memory address space is broken into segments like text (code segment), data (variables) , heap (dynamically allocated memory), shared libraries (shared functions), and stack (frames containing function call details like input arguments). The VM subsystem does not map any segment to address zero. On SPARC the text is the segment at the lowest address but not address zero in order to catch a common bug in C/C++ programs which is called a null pointer dereference. Any references to address zero while a thread runs causes a segmentation violation core dump and the process dies. The stack segment is at the lowest virtual address for 32-bit x86 programs. The segments are further broken into fixed size units called pages. On a page by page basis we map their virtual addresses to actual physical DRAM page addresses. This is done through what are called page tables or mapping tables. Page tables which are loaded by the kernel are used by a piece of the processor called the Memory Management Unit (MMU) in order to do the virtual address to physical address translations. MMUs in turn use a fully associative or set associative cache called a Translation Lookaside Buffer (TLB) to speed up these translations. I plan on describing caches in a future blog. Programs get loaded into virtually contiguous pages of memory but these pages are not physically contiguous.
Wow, I guess there is more to this than I thought! If you wish more details look at page 27 and Chapters 8-13 of the Solaris Internals 2nd Edition by McDougall and Mauro. Anyway, here are some commands that show some of this stuff I have been talking about:
See if the binary executable file is 32 or 64-bit:
Show virtual memory layout of cron's segments. Notice below that the text starts at address 0x10000 (64Kb). Notice a big gap in addresses after heap segment which is currently unmapped but available virtual memory. The files under Mapped File label are the segment's backing store. The backing store for anon, heap, and stack segments is swap space. Backing store is where pages come from or go to during page ins and page outs. Read only segments like text do not need to go through a page out. Memory is simply freed if page daemon decides to steal a text page. Notice all of the numbers are multiples of 8 because this is a SPARC V9 machine. Notice not quite all of the C library text is resident in DRAM and that its memory is protected read/execute. The second libc.so.1 segment is the data segment of the C library and its memory is protected read/write/execute.
# pmap -x 239
239:/usr/sbin/cron
AddressKbytesRSSAnonLocked ModeMapped File
000100004040-- r-x--cron
0002A000161616- rwx--cron
0002E000565656- rwx--[ heap ]
FEF7600088-- rwxs-[ anon ]
...
FF180000888816-- r-x--libc.so.1
FF26E000323232- rwx--libc.so.1
...
FFBFE000888- rw---[ stack ]
-------- ------- ------- ------- -------
total Kb28482616256-
Show sizes of text, data, bss for a binary executable. Data contains initialized variables while bss contains all the un-initialized variables. Bss and heap are combined into one segment. Text again contains the compiled code or instructions of the program:
# /usr/ccs/bin/size /usr/sbin/cron
35523 + 15014 + 5394 = 55931
The pstack command shows a user stack back trace (more commonlycalled a stack trace) of each thread of a process (in this case my shell).I will describe all the numbers in a future blog, but for now the output shows that bash started executing in the function named _start which called main which called reader_loop, etc until it gave up the CPU in waitid to wait for the pstack command to exit.
What I plan on showing in my next blog is how you can use mdb to dump the physical contents of any page of a running process. I will display the contents of a shell script while it is running. This is an example that used to go over well in my Solaris 10 Internals classes.