wiki:debirf/xen

debirf on xen

i just tried to make an initramfs that would let me launch a xen instance. These are my notes about it. All commands for the creation are done from the dom0. This is probably not the best way to do this, but it's what i've done at the moment.

Create the filesystem without launching the xen instance, and mount it within the dom0:

xen-create-image --dhcp --hostname=debirf
mount /dev/vg0/debirf-disk /mnt/

add the needed things:

chroot /mnt aptitude update
chroot /mnt aptitude dist-upgrade
chroot /mnt aptitude install locales
chroot /mnt dpkg-reconfigure locales

(choose your favorite locale in that last step)

set up the new fake init:

cat >/mnt/init <<EOF
#!/bin/sh
exec /sbin/init
EOF
chmod a+x /mnt/init

clean up what you don't need:

find /mnt/usr/share/locale/ -maxdepth 1 -mindepth 1 -type d ! -iname 'en*' -exec rm -rf '{}' ';'
find /mnt/var/lib/apt/lists/ /mnt/var/cache/apt -type f -delete

fix /etc/fstab to avoid trying to fsck a non-existent root filesystem:

cat > /mnt/etc/fstab <<EOF
proc /proc proc rw,nodev,nosuid,noexec 0 0
EOF

Update /mnt/root/.bashrc, /mnt/etc/skel, and other files to make the system more comfortable. You might want to install a key in /mnt/root/.ssh/authorized_keys and set PasswordAuthentication No in /mnt/etc/sshd_config

clean up the hosts file and interfaces:

cat >/mnt/etc/hosts <<EOF
127.0.0.1       localhost
127.0.1.1 debirf
# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
EOF
cat >/mnt/etc/network/interfaces <<EOF
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet dhcp
EOF

build the initramfs:

(cd /mnt && find * | cpio -H newc --create | gzip) > /boot/debirf-$(uname -r)

and repair the xen config to not refer to any block devices:

cat >/etc/xen/debirf.cfg <<EOF
kernel  = '/boot/vmlinuz-$(uname -r)'
ramdisk = '/boot/debirf-$(uname -r)'
memory  = '350'
name    = 'debirf'
dhcp = 'dhcp'
vif  = [ '' ]
on_poweroff = 'destroy'
on_reboot   = 'restart'
on_crash    = 'restart'
EOF

starting the debirf domU

xm create -c debirf.cfg

limitations

This seems to require at least 330MB of RAM for the domU. Otherwise, i get an infinite number of these on the domU's console:

Out of Memory: Kill process 2 (migration/0) score 0 and children.

And the only way to deal with it is ctrl+] (to detach from the domU's console), and

xm destroy debirf

the OOM-killer kicks in really rapidly if you've only got 330MB of RAM as well.

other notes

RAM is tight

from within the domU, free reports that a lot of the RAM is "cached":

0 debirf:~# free
             total       used       free     shared    buffers     cached
Mem:        337920     330368       7552          0          0     157160
-/+ buffers/cache:     173208     164712
Swap:            0          0          0
0 debirf:~# 

But there's nowhere for that "cached" RAM to be written back to disk, so the OOM-killer is likely to cause trouble immediately.

swap

It's possible to pass the domU a block device to use as swap, and that helps out a lot with memory issues, without requiring that block device to be otherwise structured.

unpacking into a tmpfs

I've tried one other experiment, which is to take the functional initramfs, and stuff it inside another initramfs. That other initramfs is minimal, and its only goal is to unpack the real initramfs (here named rootfs.cgz) into a tmpfs, switch over into that tmpfs, nuke the remaining stuff on the rootfs, and exec /sbin/init within the new tmpfs. Fortunately, debian's stock run-init from the klibc-utils package does the bulk of that work already. So all i needed to do was to include /bin/busybox and its dependencies (discovered via ldd /bin/busybox), and give it hardlinked names for the basic steps needed.

I found that busybox's cpio wasn't capable of unpacking rootfs.cgz (it gave an error message of "need to fix this"), so i switched to the actual utility from the cpio package. This meant that i needed to include the libraries needed from cpio as well.

One final thing to note is that the run-init executable appears to need /lib/klibc-whatever.so in order to work (otherwise, busybox's /bin/sh claims "file not found" at boot time). This tiny library can be found in the libklibc package.

so here's what the system looks like (this was built on a native amd64 machine):

0 builder:/tmp/unpack# ls -lR
.:
total 53004
drwxr-xr-x 2 root root     4096 2007-11-07 00:19 bin
-rwxr-xr-x 1 root root      238 2007-11-07 00:45 init
drwxr-xr-x 2 root root     4096 2007-11-07 00:22 lib
lrwxrwxrwx 1 root root        3 2007-11-06 23:44 lib64 -> lib
-rw-r--r-- 1 root root 54203480 2007-11-06 23:29 rootfs.cgz

./bin:
total 2668
-rwxr-xr-x 5 root root 519432 2007-11-07 00:11 busybox
-rwxr-xr-x 1 root root 101584 2006-08-04 12:06 cpio
-rwxr-xr-x 5 root root 519432 2007-11-07 00:11 gunzip
-rwxr-xr-x 5 root root 519432 2007-11-07 00:11 mkdir
-rwxr-xr-x 5 root root 519432 2007-11-07 00:11 mount
-rwxr-xr-x 1 root root   2240 2007-02-08 13:30 run-init
-rwxr-xr-x 5 root root 519432 2007-11-07 00:11 sh

./lib:
total 2072
-rwxr-xr-x 1 root root   73000 2007-02-08 13:30 klibc-HTvrSvZXEQwxnmV8HLm2r15Q8yI.so
-rwxr-xr-x 1 root root   97928 2007-05-15 09:19 ld-2.3.6.so
lrwxrwxrwx 1 root root      11 2007-11-06 23:45 ld-linux-x86-64.so.2 -> ld-2.3.6.so
-rwxr-xr-x 1 root root 1286104 2007-05-15 09:19 libc-2.3.6.so
-rw-r--r-- 1 root root   22656 2007-05-15 09:19 libcrypt-2.3.6.so
lrwxrwxrwx 1 root root      17 2007-11-06 23:46 libcrypt.so.1 -> libcrypt-2.3.6.so
lrwxrwxrwx 1 root root      13 2007-11-06 23:45 libc.so.6 -> libc-2.3.6.so
-rw-r--r-- 1 root root  531600 2007-05-15 09:19 libm-2.3.6.so
lrwxrwxrwx 1 root root      13 2007-11-06 23:45 libm.so.6 -> libm-2.3.6.so
-rw-r--r-- 1 root root   85880 2007-05-15 09:19 libnsl-2.3.6.so
lrwxrwxrwx 1 root root      15 2007-11-07 00:07 libnsl.so.1 -> libnsl-2.3.6.so
0 builder:/tmp/unpack# cat ./init 
#!/bin/sh
mkdir /newroot
mount -t tmpfs tmpfs /newroot
cd /newroot
# specify /bin/cpio so that it gets used instead of the busybox builtin
gunzip - </rootfs.cgz | /bin/cpio -i
exec /bin/run-init . /sbin/init <./dev/console >./dev/console
0 builder:/tmp/unpack# 

the packages that i needed to pull these files from were:

  • busybox
  • klibc-utils
  • libklibc
  • cpio
  • libc6

finding the files needed

here's how i found which library files were needed:

ldd tmp/unpack/bin/* | egrep '*.so.[[:digit:]]+ \(0x[[:xdigit:]]{8}\)$' | \
  sed -r 's|.*[[:space:]](/[^[[:space:]]*)[[:space:]]\(0x[[:xdigit:]]{8}\)$|\1|' | \
  sort -u

The wrapped initramfs was created as usual with:

(cd /tmp/unpack && find * | cpio -H newc --create | gzip -9) > /boot/debirf-nested-$(uname -r)

The rootfs.cgz was 54203480 bytes, and the final package turned out to be 55134814 bytes, almost exactly an extra 1MB.

tmpfs advantages

The main advantage of all of this hoop-jumping and gzipped nesting is that the tmpfs filesystem appears to be much more robust than the rootfs. In particular, it's capable of making use of swap, cleanly backing the in-RAM filesystem with a block device, in case RAM gets tight. It also appears to free RAM more cleanly than the rootfs, though i haven't figured out how to document specific misbehavior from the rootfs directly.

tmpfs concerns

A couple concerns about this approach: tmpfs filesystems are by default 0.5*physical RAM. If this isn't enough room to unpack the filesystem, you might be out of luck. It looks like busybox's mount will allow you to pass an -osize=1G option, though. Maybe that should be done in the /init by default, to "make sure" that there's enough room?

we're requiring an additional chunk of RAM just to get things set up like this: the rootfs.cgz has to live in RAM while its unpacked version also lives in RAM. This seems like a waste.

I created one of these overmounted tmpfs situations with 480MB RAM and a 1GB swap partition available. I put it through its paces (including upgrading from etch to lenny), and then shutdown every running service except the login that was granting me the shell, and used swapoff to remove the backing disk. I ended up with the following state:

0 debirf:~# df -k
Filesystem           1K-blocks      Used Available Use% Mounted on
tmpfs                   248320         0    248320   0% /lib/init/rw
tmpfs                   248320         0    248320   0% /dev/shm
rootfs                 3072000    225888   2846112   8% /
0 debirf:~# free
             total       used       free     shared    buffers     cached
Mem:        496640     321560     175080          0          0     221400
-/+ buffers/cache:     100160     396480
Swap:            0          0          0
0 debirf:~# 

Interesting to note a couple things:

  • df still thinks the root filesystem is a rootfs -- could this be because of a broken /etc/mtab that isn't updated?
  • the size of the root filesystem is almost exactly equal to the amount of RAM that free thinks is "cached"
  • the kernel plus its attendant processes plus the getty plus my shell (i.e. everything except the RAM used for the filesystem) appears to be about 100MB. This seems steep to me, and might indicate that the RAM from the original rootfs.cgz hasn't actually been cleaned up. We could test this conjecture by simply adding another large file (say, a second copy of rootfs.cgz) into the wrapper initramfs, and see if the amount of RAM used increases by that amount.
Last modified 10 years ago Last modified on Nov 9, 2007, 4:23:25 PM