Wednesday, December 13, 2006

USB DSL-N part 2: Virtual Consoles and Read-Write

In an earlier post I described the process I went through to install DSL-N on a bootable USB flash drive. After using the drive for a while, I noticed a couple of things: There were no additional virtual consoles with command prompts (as you'd see under KNOPPIX), and the flash drive's filesystem was mounted read-only. These are both relatively easy to fix.

By default, the inittab file in DSL-N looks like this:
# /etc/inittab: init(8) configuration.

id:5:initdefault:

si::sysinit:/etc/init.d/rcS

~~:S:respawn:/bin/bash -login >/dev/tty1 2>&1 </dev/tty1


l0:0:wait:/etc/init.d/knoppix-halt
l1:1:wait:/etc/init.d/rc 1
l2:2:wait:/etc/init.d/rc 2
l3:3:wait:/etc/init.d/rc 3
l4:4:wait:/etc/init.d/rc 4
l5:5:wait:/etc/init.d/rc 5
l6:6:wait:/etc/init.d/knoppix-reboot

ca::ctrlaltdel:/etc/init 0

kb::kbrequest:/bin/echo "Keyboard Request -- edit /etc/inittab to let this work."

pf::powerwait:/etc/init.d/powerfail start
pn::powerfailnow:/etc/init.d/powerfail now
po::powerokwait:/etc/init.d/powerfail stop

1:12345:respawn:/bin/bash -login >/dev/tty1 2>&1 </dev/tty1
2:234:respawn:/bin/bash -login >/dev/tty2 2>&1 </dev/tty2
3:234:respawn:/bin/bash -login >/dev/tty3 2>&1 </dev/tty3
4:234:respawn:/bin/bash -login >/dev/tty4 2>&1 </dev/tty4
As you can see, the default runlevel is 5, and at this runlevel only a single bash shell is started, on tty1. Also note that there's no entry in the inittab to specifically start X (e.g., through a display manager). So how does X get started? This question is answered in the file /home/dsl/.bash_profile (which is read because of the "-login" switch on bash in the inittab). Here's what the file looks like:
#!/bin/bash
export IRCNICK=DSL
SSH=`env | grep SSH_CONNECTION`
RUNLEVEL=`runlevel|cut -f2 -d' '`
if [ -z "$SSH" ]; then
if [ $RUNLEVEL -eq 5 ]; then
startx
fi
fi
So, when "bash -profile" is run under runlevel 5 (and if we're not logging in through an SSH connection), the startx command is executed. This command doesn't return until X is terminated.

I wanted to modify the inittab so that more virtual consoles were started (without running X on them!), but I didn't want to go to the trouble of re-mastering the DSL-N knoppix image. So, instead I modified the minirt image that is used to set up things for the rest of the DSL-N boot process. In particular, I modified one file in minirt: "linuxrc".

Modifying the minirt image is pretty easy. Just copy the image somewhere and do the following:
gunzip dsl-minirt26.gz
mount -o loop dsl-minirt26 /mnt/tmp
emacs /mnt/tmp/linuxrc
umount /mnt/tmp
gzip dsl-minirt26
Then copy the resulting dsl-minirt26.gz back onto the USB flash drive.

I changed linuxrc by adding a few lines near the bottom of the script. If you look at the default linuxrc file, you'll see lines like this near the end:
# Give control to the init process.
echo "${CRE}${BLUE}Starting init process.${NORMAL}"
rm -f /linuxrc
exit 0
Right above this section, I inserted the lines:
# Installing substitute config files:
echo "${CRE}${BLUE}Copying custom config files.${NORMAL}"
cp /cdrom/dsl/config/etc/inittab /UNIONFS/etc/


The directory "/cdrom" is actually the root of the filesystem on the USB flash drive. Under that, I've created a "dsl" directory to store extra bits and pieces for DSL-N. The new linuxrc copies a modified inittab file from there and drops it on top of the default file in /UNIONFS/etc.

The modified inittab file looks like this:
# /etc/inittab: init(8) configuration.

id:5:initdefault:

si::sysinit:/etc/init.d/rcS

~~:S:respawn:/bin/bash -login >/dev/tty1 2>&1 </dev/tty1


l0:0:wait:/etc/init.d/knoppix-halt
l1:1:wait:/etc/init.d/rc 1
l2:2:wait:/etc/init.d/rc 2
l3:3:wait:/etc/init.d/rc 3
l4:4:wait:/etc/init.d/rc 4
l5:5:wait:/etc/init.d/rc 5
l6:6:wait:/etc/init.d/knoppix-reboot

ca::ctrlaltdel:/etc/init 0

kb::kbrequest:/bin/echo "Keyboard Request -- edit /etc/inittab to let this work."

pf::powerwait:/etc/init.d/powerfail start
pn::powerfailnow:/etc/init.d/powerfail now
po::powerokwait:/etc/init.d/powerfail stop

1:12345:respawn:/bin/bash >/dev/tty1 2>&1 </dev/tty1
2:12345:respawn:/bin/bash -login >/dev/tty2 2>&1 </dev/tty2
3:12345:respawn:/bin/bash >/dev/tty3 2>&1 </dev/tty3
4:12345:respawn:/bin/bash >/dev/tty4 2>&1 </dev/tty4
I've moved the shell that starts X onto tty2 (since under KNOPPIX my fingers know to press ctrl-alt-f1 to get a text console) and I've added shells running on tty1, tty3 and tty4. To avoid starting extra instances of X, the additional shells are started without "-login" (so that the .bash_profile file is ignored). This works fine, although I'm sure there are more elegant ways to do it.

Regarding the "read-only" mount of the USB flash drive's filesystem, I could also fix that by modifying linuxrc, since this is where the filesystem is explicitly mounted with the "-ro" flag. Instead, I decided to leave it the way it was. When I need to write something, I type
mount -o remount,rw /cdrom

Friday, December 08, 2006

Migrating an NT4 domain to a Samba BDC

For a long time now, since we broke away from the VMS collective, we've had a mixture of server operating systems in our department. Early on, we ran a Novell Netware server for our computing labs. Windows servers later replaced this, and served both computer labs and administrative desktops. Later, we deployed an IIS-based web server. These Windows machines lived alongside AIX compute servers and later Linux compute, mail, print, file, web, etc., servers. This heterogeneity peaked sometime in the 90s, and since then we've been trying to consolidate. Our server infrastructure is now split between Linux and Windows, and we're at the point of eliminating the last of the Windows servers. The only Windows servers we still have are an IIS web server and several Windows NT4 domain controllers.

We've been planning to retire the domain controllers for a while, but have been debating how to go about it. We could just do away with domain authentication altogether, or we could piggyback on an Active Directory infrastructure maintained by our parent organization. We've also begun planning for a migration to Linux-based domain controllers, but since the existing Windows domain controllers are lightly loaded and functional, the migration was never a priority. Ultimately, we think we'll just migrate to our parent organization's AD.

We recently ran into a short-term need for a replacement, though. We were down to two Windows domain controllers (a PDC and a BDC), and our PDC failed. We promoted the BDC to a PDC, but that left us with no failover. Since the remaining domain controller is getting old, we were worried about this possibility, so we did a couple of things in parallel, to make sure we had at least one additional domain controller up and running within a day or two: We set up a new NT4 domain controller, from scratch, and at the same time we set up a Linux-based backup domain controller.

For the Linux BDC, I started by installing the latest samba version (3.0.23d). Then I looked around the web for instructions for creating a BDC. For simplicity (to save time in getting the new server up) I decided to use the tdbsam backend rather than configuring an ldap server. Following the clear instructions in the NT Migration section of the Samba Guide got me a long way.

After I'd configured the smb.conf file (which I'll talk about in detail a little later), I followed these steps (beginning with the smb service turned off):
net rpc getsid -S oldpdc -W OURDOMAIN
net setlocalsid blah-blah-blah-blah (whatever is returned by the previous command)

service smb start
net rpc join -S oldpdc -W OURDOMAIN -U Administrator

net rpc vampire -S oldpdc -U Administrator
where "oldpdc" is our current Windows PDC and "OURDOMAIN" is our domain name. The final "net rpc vampire" command should suck the user and group information out of the old PDC and populate the samba server's tdbsam databases.

(Note that if you repeat this process you should clean out the tdb files in /etc/samba and /var/lib/samba and clean out newly-created entries in /etc/passwd, shadow and group before trying again.)

The first time I ran this, I found that the process began running quickly, then slowed to a near-stop. It turned out that there was a problem with the currently installed version of the groupadd command (from the shadow-utils package on the Fedora-based computer I was using). After adding a few groups, it began eating up memory until the OOM-killer killed it. I fixed this by upgrading shadow-utils to version 4.0.7-9.

This was only the first of the problems I ran into with groups. The "net rpc vampire" command uses the following entries from smb.conf to create local unix users and groups:
  • add user script
  • add group script
  • add user to group script
  • add machine script
Adding groups and adding users to groups turned out to be very tricky.

The first problem with adding groups was the fact that many of our Windows groups have spaces in their names. (E.g., "Domain Admins".) The Linux groupadd command doesn't allow spaces in group names, although (to my surprise) the operating system will honor group names containing spaces.
The samba documentation and advice I found online suggests two ways of dealing with this:
  1. Begin by using groupadd to create a dummy group, with a name acceptable to groupadd, and then rename that group once it's in the /etc/group file, or
  2. Create unix groups with mangled names, and then map them (using "net rpc groupmap") in the tdbsam database to Windows group names.
These both sound like reasonable workarounds, but it turns out that only solution number 1 will actually work. I found this out by trial and error, because I initially tried solution number 2.

I did it as follows: The smb.conf documentation says that the "add group script" can be any executable script that returns the gid of the newly-created group. So, I created this script:
#!/usr/bin/perl
# DON'T USE THIS SCRIPT, IT WON'T WORK!
use strict;
my $newgroup = shift;
$newgroup =~ s/ /_/g;
print STDERR `/usr/sbin/groupadd -f $newgroup`;
my $gid = getgrnam($newgroup);
print $gid;
This looks pretty straightforward, and "net rpc vampire" used it without complaint. All of the unix groups were created, and they were all properly mapped automatically to their Windows equivalents (some with spaces in their names). The problem was that users were only added to groups that had no spaces in their Windows names. It turns out that "net rpc vampire" does a brute-force lookup in /etc/group to see if a group exists before it even tries to add members to the group. If the group's literal, Windows, space-containing name isn't there, the group will be silently ignored.

So, I fell back on solution number 1, for which a script is actually provided in one of the samba HOWTOs. This worked like a charm.... except that users were only added to, at most, a single group.

The "net rpc vampire" command uses the "add user to group script" from smb.conf. In my case, this looked like
add user to group script = /usr/sbin/usermod -G '%g' '%u'
At first glance, this seems like it would work. (Note that usermod, unlike groupadd, is perfectly happy with group names containing spaces.) Usermod accepts a list of additional groups to which the user should be added. Looking at the man page, however, I saw that it wants a complete list of additional groups. If the user is currently a member of any groups not in the list, usermod removes the user from those groups. So, using usermod as I was, each user would only end up belonging to the last group "net rpc vampire" processed for that user.

The solution was to find a command that would just add a single user to a single group. This turns out to be "gpasswd -a $user $group", from the shadow-utils package. I didn't find anyone else on the web who had used this. In fact, many sources recommend using usermod, as I had done originally. This would work if all users belonged to only a single group, so maybe that's why the problem hadn't been noticed.

With these changes, the final smb.conf file looks like this:
[global]
workgroup = OURDOMAIN
netbios name = PDC

interfaces = eth0, lo
bind interfaces only = Yes
hosts allow = 192.168.0.0/16, 127.0.0.1
hosts deny = 0.0.0.0/0

passdb backend = tdbsam

add user script = /usr/sbin/useradd -g users '%u'
add group script = /etc/samba/smbgrpadd.sh '%g'
add user to group script = /usr/bin/gpasswd -a '%u' '%g'
add machine script = /usr/sbin/useradd -s /bin/false -d /dev/null '%u'

preferred master = Yes
wins support = Yes
shutdown script = /sbin/shutdown
abort shutdown script = /sbin/shutdown -c
logon script = Login.bat

domain logons = Yes

domain master = no

preferred master = Yes
username map = /etc/samba/smbusers

log level = 1
syslog = 0
log file = /var/log/samba.log
max log size = 5000

smb ports = 139 445
name resolve order = wins bcast hosts
time server = Yes
map acl inherit = Yes
printing = cups

[IPC$]
path = /tmp

[netlogon]
comment = Network Logon Service
path = /home/netlogon
guest ok = Yes
locking = No
Using this, the "net rpc vampire" process completed successfully, and we have a Linux/Samba-based BDC up and running.

Friday, December 01, 2006

Installing DSL-N on a USB flash drive

Back in the eighties, our department was a VAX/VMS shop. We had a departmental VAX 11/780 in a server room downstairs, and serial lines ran through the building to terminals in computer labs and on desktops. In the late eighties, we started to see a few Unix machines. First, a Sun workstation, and then a bunch of DECstation 3100s, running Ultrix. I had the first of the latter on my desk. It had two external 500-megabyte SCSI disks in shoebox-sized enclosures. Each of these disks cost several thousand dollars.

Like everybody else, I've watched with amazement as the price per gigabyte of storage has dropped since then. I don't have any compelling reason to buy a thumb drive, but I decided I'd buy one when a gigabyte of storage got below $25. So, I recently ordered a cute little OCZ Roadster 1GB flash drive ($14 after the rebate) and I've been playing with it.

The things I wanted to have on the drive were:
  • A bootable Linux operating system
  • Some useful Windows utilities, like putty
  • Installation kits for things like Firefox
  • Reference documents
  • Plenty of space for whatever files I want to transport around by sneakernet
Ideally, I'd also like to have an encrypted zip archive for holding confidential files, and standalone browsers for this archive that work under Windows and Linux (at least).

Since the drive is only (!) 1GB, I didn't want to put KNOPPIX on it, although I use KNOPPIX from CD constantly. Since KNOPPIX takes up 700MB, that would only leave 300MB for other stuff, and I was worried that that might be too little sometimes. Another bootable CD distribution I'm familiar with is the much smaller (~50MB) Damn Small Linux (DSL). DSL is derived from KNOPPIX, and shares KNOPPIX's excellent hardware detection abilities. Unfortunately, though, DSL is based on a Linux 2.4 kernel, and I really wanted to use something based on 2.6, for better support for newer hardware. Because of this, I decided to go with DSL-N, which is a derivative of DSL that uses a 2.6 kernel and has some additional software.

Both the DSL and DSL-N bootable CDs include a graphical tool for installing the operating system onto a USB flash drive. So, I created a DSL-N 1.0-RC4 CD, booted it and gave it a try. DSL-N offers two ways of creating a bootable USB flash drive: USB zip or USB hdd. USB hdd is preferable, since it allows you to format the flash drive as a single large vfat partition. I tried this first, but found that my test machine wouldn't boot the drive, although it did detect it and list in as one of the bootable devices. So, I tried the USB zip configuration. In this configuration, the drive is divided into two partitions: a smaller boot partition at the front, and a larger partition on the rest of the drive for storing data. This configuration booted fine in the test machine.

But there's a problem with having more than one partition on the flash drive. I want to be able to use the drive under either Windows or Linux, but Windows only seems to be able to see the first partition on a flash device. Using the USB zip configuration, this would mean that the drive would be relatively useless on Windows computers.

So, I started looking at other options. Searching around the web, I saw some comments that made me suspect that the problem with booting the USB hdd configuration might be due to DSL-N's use of syslinux for booting. Since I'm more familiar with grub anyway, I thought I'd try to create a single-partition, grub-bootable configuration. Fortunately, other people have already done similar things. In particular, I found Jeremy Turner's "Grub Tips and Tricks" article from Free Software Magazine Issue 10, January/February 2006, extremely helpful. In it, he describes in detail how to install grub and DSL (but not DSL-N) on a USB flash drive.

First, I used fdisk to create a single partition on the flash drive, of type "W95 FAT32". Then I formatted the partition as a vfat filesystem:
/sbin/mkfs.vfat /dev/sdb1
(obviously adjust the device name to match whatever's appropriate for the system on which you're doing this.)

Then, following Jeremy Turner's DSL example, I created a "boot" directory tree on the flash drive, then mounted the DSL-N iso image and copied several files from it onto the flash drive:
mount /dev/sdb1 /mnt/flash
cd /mnt/flash
mkdir boot
mkdir boot/kernels
mkdir boot/images
mkdir boot/grub
mount -o loop /usr/src/dsl-n-01RC4.iso /mnt/iso
cp /mnt/iso/KNOPPIX/KNOPPIX /mnt/flash/images/dsl
cp /mnt/iso/isolinux/linux /mnt/flash/kernels/dsl-linux26
cp /mnt/iso/isolinux/minirt.gz /mnt/flash/images/dsl-minirt26.gz
umount /mnt/iso
sync
To make the drive bootable, the only other thing we need to do is install grub and configure it. Again, following Jeremy Turner's example I ran grub and used grub's filename completion feature as a way of determining which grub device corresponded to my flash drive:
grub> find /boot/grub/stage1
(hd0,1)
(hd1,1)
grub> find (hd1,0)/boot/im
grub> find (hd1,0)/boot/images/
Showing that (hd1,0) really is my flash drive, as I'd guessed, since there's no "images" directory under the boot directory on my hard disk. That being settled, I needed to copy grub's stage1, stage2, etc., files onto the flash drive, and create a grub configuration file.

To get the stage files copied over without missing anything, I just did a wholesale copy of my hard disks's grub directory:
cp -r -a -p /boot/grub/* /mnt/flash/boot/grub/
Since the flash drive partition is formatted vfat, I got a complaint at this point about the symbolic link "menu.lst -> ./grub.conf" in the original grub directory. The grub I'm using is configured to look for a configuration file called "grub.conf", so I called the file that in the flash drive's /boot/grub directory. I then edited grub.conf to make it initially look something like this:
default=0
timeout=10
root=(hd0,0)
title DSL-N 1.0RC4
kernel /boot/kernels/dsl-linux26 ramdisk_size=100000 init=/etc/init lang=us apm=power-off vga=791 toram nomce noapic quiet knoppix_dir=boot/images knoppix_name=dsl initrd=/boot/images/dsl-minirt26.gz BOOT_IMAGE=/boot/images/dsl
initrd /boot/images/dsl-minirt26.gz
I've broken the long kernel line here, but it's actually one long line. I'm not sure if some of the kernel parameters (like BOOT_IMAGE= and initrd=) are actually used, but they don't seem to hurt.

Then I told grub to install the boot loader:
grub> root (hd1,0)
grub> setup (hd1)
grub> quit
Next, try booting it in the test machine, and voila! it works. Experimenting with it, I do notice something that might be a peculiarity of the test machine: If I boot the machine cold (after the power has been turned off), the USB drive boots properly. If, instead, I reboot the machine without cycling the power, the results are unexpected. In some cases, the USB drive is ignored, and in others the DSL-N boot begins, but minirt fails to find the KNOPPIX image on the drive. These seem like timing issues. With that caveat, I'm pleased with the device so far.

(Update: Based on William Waddington's advice, I've also set the partition active, using fdisk, and installed grub on the active partition's boot sector. With these changes, I can now boot the drive on at least one machine that wouldn't boot it before.)

Since I'm using grub now, it's easy to add other bootable images. Here's what my current grub.conf file looks like:
default=0
timeout=10
root=(hd0,0)
title DSL-N 1.0RC4
kernel /boot/kernels/dsl-linux26 ramdisk_size=100000 init=/etc/init lang=us apm=power-off vga=791 toram nomce noapic quiet knoppix_dir=boot/images knoppix_name=dsl initrd=/boot/images/dsl-minirt26.gz BOOT_IMAGE=/boot/images/dsl
initrd /boot/images/dsl-minirt26.gz
title memtest86+-1.65
kernel /boot/kernels/memtest86+-1.65.bin
title NT/XP Password and Registry Editor
kernel /boot/kernels/memdisk
initrd /boot/images/bd050303.bin
title Super Grub Disk
kernel /boot/kernels/memdisk
initrd /boot/images/sgd_0.9528_english_floppy.img
As you can see, I've added memtest86+, the Offline NT Password and Registry Editor and Super Grub Disk.

In the rest of the disk, I store standalone Windows executables for putty, pscp, psftp and gzip,
and installers for Firefox, Wireshark and 7-Zip. And I have enough room left over for the 1916 edition of Webster's Unabridged dictionary, Shakespeare's First Folio and the King James Bible! Total disk usage so far is about 200MB. Still plenty of sneakernet bandwidth left.

Monday, July 10, 2006

Introduction

I'm the IT manager for a medium-sized university department. A colleague and I support about 450 computers. These run the gamut from staff desktops to research clusters, from Windows to Linux to Mac. You name it, we've fixed it (and probably broken it, too.) We have particular expertise in Linux, which we've been supporting since 1993, and we're just dipping our toes into OS X.

Everyone who does IT knows that it's really easy to get 90% of the way toward what you're trying to accomplish. It's the remaining 10% that keeps you up late banging your head against the keyboard. The Devil is In the Details.

In this blog I'll be recording useful tips and tricks for dealing with the fiddly bits that are left over after you've done all of the easy stuff.