| 62 | | substitute |
| | 64 | substitute the slower (but much larger) hard disk for RAM when things get tight. A common principle here is to eject the LRU (Least Recently Used) block of RAM, writing it out to a special place on disk (the "swap file"), and loading in new data requested by an active process. |
| | 65 | |
| | 66 | Symptoms that might mean you've got a RAM bottleneck: |
| | 67 | |
| | 68 | * frequent, heavy disk activity when you're not trying to write out or copy large files usually means that you're swapping. Since you only swap when you've run out of RAM, that's a bad sign. If you're lucky enough to have a machine with a functional disk activity light, keep an eye on it. If you don't have a disk activity light, listen with your ears: unless you've got a solid-state disk (as of 2008, if you aren't sure whether you have a solid state disk, you ''probably don't have one''), disk activity like this is actually audible as whirring and clicking. |
| | 69 | * Applications shutting down without warning could mean hitting a hard wall on RAM. If the computer has ''X'' amount of RAM, and you've instructed the operating system to set aside ''Y'' amount of swap space, and then you ask the computer to do tasks that consume more than ''X + Y'' memory in aggregate, your computer has to decide what to do: it's probably going to start by killing off some of the more offensive memory hogs to get the system back into a normal state. On Linux-based systems, this job is performed by a kernel subsystem called the [http://linux-mm.org/OOM_Killer oom-killer] (out of memory killer). It's kind of a black art and you really don't want to get to the point where you need it. |
| | 70 | |
| | 71 | Here's `vmstat` of a computer that has exhausted its RAM and is chewing into swap: |
| | 72 | {{{ |
| | 73 | [0 dkg@monkey ~]$ vmstat 1 5 |
| | 74 | procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- |
| | 75 | r b swpd free buff cache si so bi bo in cs us sy id wa |
| | 76 | 2 0 131548 5880 4068 121408 1 1 19 48 171 290 18 2 79 1 |
| | 77 | 1 1 131628 4920 4448 84212 0 80 12 80 116 313 2 16 75 7 |
| | 78 | 6 4 143132 5124 256 50252 0 11496 0 11648 615 1296 1 69 0 30 |
| | 79 | 2 1 158556 4856 172 43008 96 15404 100 15404 745 721 3 70 0 27 |
| | 80 | 7 7 181712 4732 192 38280 528 22192 556 22276 983 954 4 84 0 12 |
| | 81 | [0 dkg@monkey ~]$ |
| | 82 | }}} |
| | 83 | |
| | 84 | Things to note: |
| | 85 | |
| | 86 | * the `swap` columnset is active in both `si` (swap in, meaning bringing a block of RAM in from disk) and `si` (swap out, meaning writing a page of RAM out to disk) |
| | 87 | * the `cpu` spends a good chunk of its time in the `wa` (I/O wait) state, waiting for the swap to take effect |
| | 88 | * the number of swapped pages (`swpd`) is increasing |
| | 89 | * the amount of `memory` allocated to buffers (`buff`) and `cache` drop precipitously -- buffering and caching are two performance-optimizing ways that the kernel makes use of memory that is otherwise unallocated. They speed up your use of the machine without you asking them to do anything concretely, but they are not strictly required to make the computer work. So when memory gets tight, the kernel reclaims the RAM it was using for buffers and caches to try to accommodate the new requirements coming from the user. |
| | 92 | |
| | 93 | There are really two kinds of resources you can exhaust related to disks, but only one of them typically results in the sluggishness this page attempts to diagnose. I'll get the other one out of the way first: |
| | 94 | |
| | 95 | === Disk Space (capacity) === |
| | 96 | |
| | 97 | This is the form of disk resource people are most used to seeing exhausted. You get messages like "cannot save file, disk is full" from your programs, or you get weird misbehaviors or system failures -- services being unable to log, mail transfer agents bouncing mail, etc. |
| | 98 | |
| | 99 | The quickest way on a reasonable system to get a sense of how your disks are is just `df`. Using the `-h` flag shows you the numbers in "human-readable" format: |
| | 100 | |
| | 101 | {{{ |
| | 102 | 0 ape:~# df -h |
| | 103 | Filesystem Size Used Avail Use% Mounted on |
| | 104 | /dev/mapper/vg_ape0-root |
| | 105 | 1008M 802M 156M 84% / |
| | 106 | tmpfs 64M 0 64M 0% /lib/init/rw |
| | 107 | udev 10M 44K 10M 1% /dev |
| | 108 | tmpfs 64M 0 64M 0% /dev/shm |
| | 109 | /dev/sda1 228M 139M 78M 65% /boot |
| | 110 | 0 ape:~# |
| | 111 | }}} |
| | 112 | You see here that `ape` only has 156MB available on its root filesystem. This is pretty tight: any time you get close to 90% full on a filesystem, the kernel has to do a lot more work to decide how to place the files you want to store. With a near-full filesystem, storing a larger file can take more time because it often needs to be broken up into smaller chunks and distributed across the disk, which means more moving parts. |
| | 113 | |
| | 114 | But the modern operating system kernel hides almost all of these shenanigans from the end user pretty well, so you usually won't notice performance degradation from full filesystems until they're actually full, and you get the nasty hard errors mentioned above. This brings me to the flavor of disk-related resource that often does cause performance problems... |
| | 115 | |
| | 116 | === Disk Throughput === |
| | 117 | |
| | 118 | Where disks are most likely to cause user-visible sluggishness is in their ''throughput'', not their ''capacity''. Accessing data off of a disk (or writing data to a disk) is extremely slow when compared to other parts of a modern computer. If the user has to experience this behavior directly, they'll likely feel like the computer is sluggish. |
| | 119 | |
| | 120 | A good kernel can deal with this for disk ''writes'' transparently, as long as there is enough RAM: when the user says "save this file to disk", the kernel just says, "ok, fine", caching the data first in (fast) RAM, returns control to the user, and writes the data out to (slow) disk in little chunks while during time that the user is otherwise idle. (if you've ever tried to save a file to a floppy disk in Windows 98, you know the annoyance that comes from an OS ''not'' doing this: writes to floppy disks under that OS were asynchronous -- so the whole machine would lock up while the file was actually being saved). |
| | 121 | |
| | 122 | Modern kernels also do similar sleights-of-hand on disk ''reads'', though it is harder to do. |
| | 123 | |
| | 124 | It starts getting tricky here: when you [#RAMakamemory run out of RAM], your computer often starts thrashing your disks. But there are other things that can cause you to thrash your disks too. |