AIX Alternate Disk Installation
Jeff Marsh
In this article, I will describe some tools within AIX (some new, some old) that can help you reduce the off-hours time spent by your administration staff during maintenance upgrades. I will also show you some uses for these same toolsets that can help you reduce recovery times due to rootvg
corruption.
Alternate Disk Installation
What is it? According to the IBM AIX Installation Guide:
"Alternate disk installation, available in AIX Version 4.3, allows installing the system while it is up and running, allowing installation or upgrade down time to be decreased considerably."
Thus, with another set of bootable drives within a server, you can install maintenance (e.g., upgrade your system from AIX 4.3.3.04 to AIX 4.3.3.06) during the day without interruption or any effects to the running applications. However, you will still need a reboot to make it active.
The support model prior to Alternate Disk Installation required all work to be done off-hours during an application maintenance window that generally took two to four hours. Now you can reduce that off-hour time from two to four hours per server to just the time to reboot. I'll also show you how you can complete multiple upgrades in that same reboot window using Network Installation Manager (NIM).
Requirements
To enable Alternate Disk Installation, you need to install the following base-level filesets and upgrade to at least these corresponding fileset levels. These filesets do not require a reboot to install:
Base level filesets: Fileset levels: bos.alt_disk_install.rte 26 bos.alt_disk_install.boot_images 27You will also need another free, bootable drive within your server. In this case, you are configuring new servers with four internal drives for systems administration purposes: two drives for the primary
rootvg
mirrored, and two for alt_disk_install
implementations. You could get by with just one additional drive, but we prefer to have two.
How It WorksAlternate Disk Installation works by cloning your primary
rootvg
running on hdisk0 and hdisk1, for example, to a second set of drives, hdisk2 and hdisk3. After the system completes those copies using basic find
, backup
, and restfile
utilities, it will install the latest maintenance level you designate.
This process is shown in Figure 1. First, you clone
hdisk0/1 to hdisk2/3, and then you apply maintenance to the newly cloned hdisk2/3
while the applications continue to run against hdisk0/1.
To complete this task from SMIT, issue the following fast path. You should expect to see the following panels:smitty alt_clone Clone the rootvg to an Alternate Disk: Type or select values in entry fields. Press Enter AFTER making all desired changes. * Target Disk(s) to install [hdisk2 hdisk3] Phase to execute all + image.data file [] / Exclude list [] / Bundle to install [update_all] + -OR- Fileset(s) to install [] Fix bundle to install [] -OR- Fixes to install [] Directory or Device with images [/mnt] (required if filesets, bundles or fixes used) installp Flags COMMIT software updates? yes + SAVE replaced files? no + AUTOMATICALLY install requisite software? yes + EXTEND file systems if space needed? yes + OVERWRITE same or newer versions? no + VERIFY install and check file sizes? no + Customization script [] / Set bootlist to boot from this disk on next reboot? yes Reboot when complete? no + Verbose output? no + Debug output? no + [BOTTOM] F1=Help F2=Refresh F3=Cancel F4=List F5=Reset F6=Command F7=Edit F8=Image F9=Shell F10=Exit Enter=DoFrom the example above, note the following:
- You are cloning to hdisk2 and hdisk3.
- You are running an
update_all
operation of maintenance mounted in the/mnt
mount point. (In this case, this is a CD-ROM with the AIX 4.3.3.06 maintenance filesets.) - You are specifying that this operation should change our bootlist to hdisk2 and hdisk3 after completion.
- You are not asking the process to complete an immediate reboot upon completion of the upgrade because this is something you want to schedule in an appropriate maintenance window.
bootlist -m normal -o
command that the bootlist will be set to hdisk2 hdisk3, and issuing an lspv command will show the following:root@aknimp1:/> lspv hdisk0 000f261d90bf6ea0 rootvg hdisk1 000f261dae86d104 rootvg hdisk2 000f261db52d4d95 altinst_rootvg hdisk3 000f261db52d4ca6 altinst_rootvg hdisk4 000f018d07d4f412 None hdisk5 000f261dbde71c66 None hdisk6 000f261dbd8eea89 nimresvgAt this point, you have cloned and installed the latest AIX maintenance level during the day. You are now ready to activate that latest maintenance with a reboot operation at whatever time is appropriate for the outage to your application users. You can save significant off-hours time for maintenance upgrades; our off-hours time has been reduced to the time needed for a simple reboot. Alternate Disk Installation -- After the Reboot After the reboot, issue the
oslevel
command or complete the appropriate verifications to ensure your maintenance upgrade occurred as expected. If you issue the lspv
command, you will notice the following:
root@aknimp1:/> lspv hdisk0 000f261d90bf6ea0 old_rootvg hdisk1 000f261dae86d104 old_rootvg hdisk2 000f261db52d4d95 rootvg hdisk3 000f261db52d4ca6 rootvg hdisk4 000f018d07d4f412 None hdisk5 000f261dbde71c66 None hdisk6 000f261dbd8eea89 nimresvgBoth hdisk2 and hdisk3, from which you have booted, now show a volume group identifier of rootvg. Hdisks 0 and 1 now show a volume group of
old_rootvg
and are varied off.
Now, you have several options. My preference is to leave hdisk0 and hdisk1 alone with the old maintenance levels in case you need to fall back on them.
Let's assume that after the reboot your applications aren't working well with the latest maintenance. The previous support model suggests that you need to get the mksysb
backup taken prior to your upgrade and begin a restore process. This could take two hours or more, with the hope that the tape image was good. The new support model with Alternate Disk Installation says to change your bootlist back to hdisk0 and hdisk1 and to reboot the server. At some future point, when you decide the maintenance is good and you don't need to fall back, you can clone the latest maintenance residing on hdisk2/3 back to hdisk0/1.
Cloning Back to hdisk0/1
To complete the cloning of hdisk2/3 back to hdisk0/1, you must issue the following commands:
-
alt_disk_install -W hdisk0 hdisk1
-- Wakes up theold_rootvg
alt_disk_install -S
-- Puts theold_rootvg
back to sleepalt_disk_install -X old_rootvg
-- Removes theold_rootvg
volume group name associated with hdisk0/1 from the ODM and assigns them a value of "none", which will allow the cloning to recur cleanly.smitty alt_clone
-- Reclone back to hdisk0/1 using the previous example.
old_rootvg
volume group name from the ODM.
Other Uses for alt_disk_install
Some other items that alt_disk_install
may be helpful with are:
- Nightly backup of your system -- Using
alt_disk_install
, you can backup your system nightly (or at whatever frequency is appropriate) without having to managemksysb
tapes. If you suffer some type ofrootvg
corruption, either major or minor, you can restore using the data on the cloned drives. mksysb
Images -- Thealt_disk_install
command can be used to install images (AIX 4.3 or later) onto AIX 4.1 and later versions.- You can also use
alt_disk_install
for recovery of corrupted files inrootvg
and to reduce the size of logical volumes inrootvg
, as described in the following sections.
rootvg
where a file or a few files are corrupted or inadvertently deleted, you can wake up the cloned copy of the rootvg
and copy those deleted or corrupted files back to the primary rootvg
while the server is up and running.
In this example, you are booted against hdisk0/1 and have recently cloned the system to hdisk2/3. To access the cloned copy of the rootvg
while the server is up and running, complete the following:
alt_disk_install -W hdisk2 hdisk3
-- Wakes up the cloned copy:root@aknimp1:/> alt_disk_install -W hdisk2 hdisk3 Waking up altinst_rootvg volume group ... Replaying log for /dev/alt_hd4.
- From a
df -k
command, you will notice that the wake up command has mounted the alternaterootvg
logical volumes, which are prefaced with/alt_inst
prefix:root@aknimp1:/> df -k Filesystem 1024-blocks Free %Used Iused %Iused Mounted on /dev/hd4 49152 5608 89% 1226 5% / /dev/hd2 753664 5056 100% 19966 11% /usr /dev/hd9var 16384 14340 13% 222 6% /var /dev/hd3 32768 30376 8% 98 2% /tmp /dev/lvexport 131072 126772 4% 41 1% /export /dev/lv01 4980736 94468 99% 4546 1% /export/lpp_source /dev/lv02 917504 448868 52% 29468 13% /export/spot /dev/lvmksysb 15204352 3381328 78% 31 1% /export/mksysb /dev/lvadmin 131072 126868 4% 25 1% /admin /dev/hd1 16384 15820 4% 20 1% /home /dev/lvadsm 16384 56 100% 21 1% /var/adsm /dev/alt_hd4 49152 5704 89% 1192 5% /alt_inst /dev/alt_lvadmin 131072 126868 4% 25 1% /alt_inst/admin /dev/alt_hd1 16384 15820 4% 20 1% /alt_inst/home /dev/alt_hd3 32768 30376 8% 98 2% /alt_inst/tmp /dev/alt_hd2 753664 5056 100% 19966 11% /alt_inst/usr /dev/alt_hd9var 16384 14380 13% 219 6% /alt_inst/var /dev/alt_lvadsm 16384 1848 89% 20 1% /alt_inst/var/adsm
- Copy the corrupted files from the appropriate
alt_inst
logical volume/filesystem. In this case, I corrupted my/etc/hosts
file, so I will issue the following command to restore it from my latest cloned backup:cp /alt_inst/etc/hosts /etc/hosts
- When you have restored the required files, put the
altinst_rootvg
back to sleep, which will unmount the/alt_inst
logical volumes/filesystems by issuing:alt_disk_install -S
rootvg
? It took a tape restore of the system to complete. Now, you can complete that reduction within a simple cloning process. The steps to complete that process are as follows:
- Issue a
mkszfile
command to create the/image.data
file. - Edit the
/image.data
file and specifySHRINK=yes
in thelogical_volume_policy
stanza:image_data: IMAGE_TYPE= bff DATE_TIME= Tue Oct 3 10:29:55 CDT 2000 UNAME_INFO= AIX aknimp1 3 4 000F261D4C00 PRODUCT_TAPE= no USERVG_LIST= nimresvg OSLEVEL= 4.3.3.10 logical_volume_policy: SHRINK= yes EXACT_FIT= no ils_data: LANG= en_US
- Clone the
rootvg
to hdisk2 and hdisk3, specifying your customized/image.data
file by issuing one of the following commands:sm itty alt_clone
(remember to specify the location of yourimage.data
file on theimage.data
file prompt) oral t_disk_install -i/image.data -B -C hdisk2 hdisk3
(from the command line) - After the completion of the cloning operation, wake up the
altinst_rootvg
by issuing:alt_disk_install -W hdisk2 hdisk3
- Review your
df -k
output and compare the primary logical volume sizing to their/alt_inst
counterparts. - If you are satisfied with the sizing reduction, change your bootlist (bootlist
-m normal hdisk2 hdisk3
) and reboot.
mksysb
images for backup and recovery, installation of new servers (cloning), and the re-installation of existing servers in case of a disaster.
There's a great deal of functionality provided by NIM. I recommend reviewing the usage guide to see what NIM features could benefit your environment. I also recommend a good Redbook from IBM, NIM: From A to Z in AIX 4.3 (SG24-5524-00), which was published in February 2000.
I won't cover the specifics of setting up the NIM master and the corresponding NIM client configurations; it is not an overly complicated process. However, it will require someone with NIM-specific knowledge to lay out the functional NIM environment. If you support SP complexes, you have already had a fair amount of exposure to NIM even though it is buried one layer below PSSP.
One key feature of NIM that will help manage a group of servers concurrently is the Machine Group definition. Within NIM, you can operate as easily on a single machine as you can a group of machines. For instance, we have defined several machine groups within our NIM master environment. These definitions allow us to operate on a group of like servers concurrently.
How Does It Integrate with Alternate Disk Installation?
NIM knows how to fully exploit Alternate Disk Installation. For example, look at the initial clone and update_all
operation. Let's say you want to use NIM to extend the model (instead of upgrading the maintenance level on a single server) and you want to complete this operation on ten Lotus Notes servers that are similarly configured and are defined in a Notes machine group within NIM. From SMIT on the NIM master, issue the following fast path and you will see this panel:
smitty nim_alt_clone Clone the rootvg to an Alternate Disk Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] * Target Machine / Group to Install [NOTES] + * Target Disk(s) to install [hdisk2 hdisk3] Phase to execute all + IMAGE_DATA resource [] +/ EXCLUDE_FILES resource [] +/ (leave blank to include all files in backup) BUNDLE to install [] + -OR- Fileset(s) to install [] FIX_BUNDLE to install [] + -OR- FIXES to install [update_all] LPP_SOURCE [aix433_lppsource] + (required if filesets, bundles or fixes used) installp Flags COMMIT software updates? yes + SAVE replaced files? no + AUTOMATICALLY install requisite software? yes + EXTEND filesystems if space needed? yes + OVERWRITE same or newer versions? no + VERIFY install and check file sizes? no + Customization SCRIPT resource [] +/ Set bootlist to boot from this disk on next reboot yes + Reboot when complete? no + Verbose output? no + Debug output? no + Group controls (only valid for group targets): Number of concurrent operations [] # Time limit (hours) [] # F1=Help F2=Refresh F3=Cancel F4=List F5=Reset F6=Command F7=Edit F8=Image F9=Shell F10=Exit Enter=Do
In this example, you would cause every server defined in the Notes Machine group to begin a process to clone itself from hdisk0/1 to hdisk2/3. At the completion of the cloning operation, NIM would then NFS-mount the aix433_lppsource
resource (in this case, it's the AIX 4.3.3 lppsource
filesystem, which includes the 4.3.3.06 maintenance) and apply it to the newly cloned hdisk2/3 on each of these servers. This also instructs NIM to change the bootlist on each of these servers as a part of the operation but does not cause an immediate reboot. I recommend, however, using NIM to schedule a reboot of all these servers during the maintenance window.
All of this work, including the cloning and upgrading of the maintenance level, can be completed during the day without affecting the running application (e.g., Notes). For the previous support model, this same upgrade would have taken about 2 hours per server plus reboot time to complete during an application maintenance window, generally in the middle of the night. If a single person worked to complete this process, this could have taken about 25 hours spread across multiple weekends to complete. With NIM and Alternate Disk Installation, this upgrade outage can be reduced to the time to reboot these 10 servers concurrently (or about 30 minutes, in our case). Note that your time may vary depending on speed of network, number of filesets being updated, time to reboot, and problems encountered.
Figure 2 shows the process using NIM/Machine Groups and Alternate Disk Installation. First, you instruct the NIM master to have each of the servers in the defined machine group clone hdisk0/1 to hdisk2/3 (depicted in red). Then, NIM will NFS-mount the appropriate LPPSOURCE filesystem containing the AIX 4.3.3.06 maintenance level and apply that maintenance to the newly cloned drives (operation in green). Again, this process happens concurrently on all servers in the defined NIM machine group without affecting the running applications.
Conclusion
My team is in the process of rolling out this methodology change. I think we can significantly reduce the amount of time spent in support of our current AIX standalone infrastructure. I also think Alternate Disk Installation and NIM, can help you better manage your infrastructure and provide some consistency to your installation, upgrade, maintenance, and build procedures. In conclusion, I hope the above discussion will help you significantly reduce the amount of off-hours time associated with maintenance or fileset upgrades within AIX.
Jeff Marsh is the Systems Advisor to the UNIX Server Team working at American Century Investments, a premier investment manager serving nearly two million individual and institutional investors. Jeff can be contacted at: [email protected].