Thursday, July 24, 2008

Extract troubleshooting info from Windows XP BSOD error messages

Microsoft Windows XP systems are notorious for crashing for any number of reasons and in a number of ways. Some of these crashes are mild and can easily be overcome simply by closing a non-responding application or by rebooting the system. However, others are more serious and can bring the entire system to its knees. Microsoft calls these types of crashes “Stop errors” because the operating system stops responding. When a Stop error occurs, the GUI is replaced by a DOS-like blue screen with a cryptic error message followed by a code number. This screen is affectionately referred to as the Blue Screen Of Death, or BSOD for short.

Common BSODs in Windows XP

Now that you have a good idea of how to dissect a BSOD and pull out the relevant pieces of information from all the gibberish on the screen, let’s look at some of the more common BSODs in Windows XP. I’ll only cover just a few of the BSOD conditions, but there are lots of possible Stop errors. For each BSOD I discuss, I’ll provide a link to an article on the Microsoft Knowledge Base that covers that particular Stop error. (Since more than one article might address a Stop error, you may want to search the Knowledge Base if you discover that you need more information.)

STOP:0×0000000A
IRQL_NOT_LESS_OR_EQUAL

This Stop error, which can be caused by either software or hardware, indicates that a kernel-mode process or driver attempted to access a memory location it did not have permission to access or a memory location that exists at a kernel interrupt request level (IRQL) that was too high. A kernel-mode process can access other only processes that have an IRQL that’s equal to or lower than its own.

Troubleshooting a Stop 0×0000000A error in Windows XP

STOP: 0×0000001E
KMODE_EXCEPTION_NOT_HANDLED

This Stop error indicates that indicates that the Windows XP kernel detected an illegal or unknown processor instruction. The problems that cause this Stop error can be either software or hardware related and result from invalid memory and access violations, which are intercepted by Windows’ default error handler if error-handling routines are not present in the code itself.

Possible Resolutions to STOP 0×0A, 0×01E, and 0×50 Errors

STOP: 0×00000050
PAGE_FAULT_IN_NONPAGED_AREA

This Stop error indicates that requested data was not in memory. The system generates an exception error when using a reference to an invalid system memory address. Defective memory (including main memory, L2 RAM cache, video RAM) or incompatible software (including remote control and antivirus software) might cause this Stop error.

Possible Resolutions to STOP 0×0A, 0×01E, and 0×50 Errors

STOP: 0×0000007B
INACCESSIBLE_BOOT_DEVICE

This Stop error indicates that Windows XP has lost access to the system partition or boot volume during the startup process. Installing incorrect device drivers when installing or upgrading storage adapter hardware typically causes this Stop error. This error could also indicate a possible virus infection.

Troubleshooting Stop 0×0000007B or “0×4,0,0,0″ Error

STOP: 0×0000007F
UNEXPECTED_KERNEL_MODE_TRAP

This Stop error indicates a hardware problem resulting from mismatched memory, defective memory, a malfunctioning CPU, or a fan failure that’s causing overheating.

General causes of “STOP 0×0000007F” errors

STOP: 0×0000009F
DRIVER_POWER_STATE_FAILURE

This Stop error indicates that a driver is in an inconsistent or invalid power state. This Stop error typically occurs during events that involve power state transitions, such as shutting down, or moving in or out of standby or hibernate mode.

Troubleshooting a Stop 0×9F Error in Windows XP

STOP: 0×000000D1
DRIVER_IRQL_NOT_LESS_OR_EQUAL

This Stop error indicates that the system attempted to access pageable memory using a kernel process IRQL that was too high. The most typical cause is a bad device driver (one that uses improper addresses). It can also be caused by faulty or mismatched RAM or a damaged pagefile.

Error Message with RAM Problems or Damaged Virtual Memory Manager

STOP: 0×000000EA
THREAD_STUCK_IN_DEVICE_DRIVER\

This Stop error indicates that a device driver problem is causing the system to pause indefinitely. Typically, this problem is caused by a display driver waiting for the video hardware to enter an idle state. This might indicate a hardware problem with the video adapter or a faulty video driver.

Error message: STOP 0×000000EA THREAD_STUCK_IN_DEVICE_DRIVER

STOP: 0×00000024
NTFS_FILE_SYSTEM

This Stop error indicates that a problem occurred within Ntfs.sys, the driver file that allows the system to read and write to drives formatted with the NTFS file system. (A similar Stop message, 0×00000023, exists for the file allocation table [FAT16 or FAT32)] file systems.)

Troubleshooting Stop 0×24 or NTFS_FILE_SYSTEM Error Messages

STOP: 0xC0000218
UNKNOWN_HARD_ERROR

This Stop error indicates that a necessary registry hive file could not be loaded. The file may be corrupt or missing. The registry file may have been corrupted due to hard disk corruption or some other hardware problem. A driver may have corrupted the registry data while loading into memory or the memory where the registry is loading may have a parity error.

How to Troubleshoot a Stop 0xC0000218 Error Message

STOP: 0xC0000221
STATUS_IMAGE_CHECKSUM_MISMATCH

This Stop message indicates driver, system file, or disk corruption problems (such as a damaged paging file). Faulty memory hardware can also cause this Stop message to appear.

“STOP: C0000221 unknown hard error” or “STOP: C0000221 STATUS_IMAGE_CHECKSUM_MISMATCH” error message occurs

Note: This post has been kept on this blog for personal reference and has been taken from TechRepublic website.

-------------- End of Document -----------------

Tags: Windows XP, Windows Server 2003

Published Date: 20080724

Wednesday, July 9, 2008

How to rebuild the SYSVOL tree when none exists in Active Directory

A Windows admin has trouble promoting the second DC in a domain. It seems that AD replication was working and DNS was healthy, but FRS was not. No SYSVOL or Netlogon share, no SYSVOL tree on the second domain controller. The FRS event log was logging Event ID 13508 events but no 13509 events

 

When tying to force SYSVOL replication, using KB 290762 -- setting BURFLAGS value on the PDC to D4 and on the other DC to D2 -- something went wrong and it wiped out the SYSVOL tree on the primary domain controller. It was as if it had replicated the empty SYSVOL to the PDC instead of the other way around. So there is no SYSVOL tree on either DC.

You can started from scratch, but that is not a good political decision. And you will not have root cause to justify it.

The solution is to create the SYSVOL tree, including junction points and proper ACLs. Of course, you will also need to create the default domain policy and the default domain controller policy.

There is a decent article on the Microsoft Help and Support site, KB 315457 How to rebuild the SYSVOL tree and its content in a domain, but like many articles of this nature, Microsoft tries to cover all the bases.

In addition, the Microsoft's KB assumes you have a SYSVOL tree in the domain -- which we do not have -- so we need to generate a new default domain policy and default domain controller policy. you might  run into an additional problem with other policies that had objects in AD but do not exist in SYSVOL.

I would recommend referring to the KB for details, but this is how you solve the problem of no SYSVOL on any DCs.

Step 1: Stop the FRS service on both DCs and create the SYSVOL tree on the PDC. This is pretty basic. Use Windows Explorer or a command prompt. I used a good DC I had in a lab as a guide. The tree looked like this:

    SYSVOL
    • Domain
      • DO_NOT_REMOVE_NtFrs_PreInstall_Directory
      • Policies
      • Scripts
    • Staging
    • Staging Area
    • SYSVOL
      • Corp.net

Step 2: Set the ACLs. Just leave the default ACLs on all directories except the DO_NOT_REMOVE_NtFrs_PreInstall_Directory. Again, looking at my lab domain, we removed all users and groups except domain administrators and System I and defined both of them to have "Special Permissions" only. I also set the "DO_NOT_REMOVE" directory attributes to Hidden and Read.

Step 3: Create the junction points. Remember the junction points connect a "real" directory to a "mirrored" directory. The \SYSVOL\domain is the real (Source) directory connected to \SYSVOL\SYSVOL\corp.net, a junction point. \SYSVOL\Staging\Domain is the real (Source) directory connected to \SYSVOL\Staging Areas\Corp.net.

KB 315457 shows how to determine the actual source directory if you need that information, but here is what we did:

Using the linkd command,

linkd "%systemroot%\SYSVOL\SYSVOL\Corp.net" %SYSTEMROOT%\SYSVOL\DOMAIN

linkd "%systemroot%\Sysvol\staging Areas\Corp.net" %systemroot%\sysvol\Staging\Domain

Step 4: Rebuild default domain policies. Using the DCGPOFix tool, available from Microsoft's download site, this was pretty easy. Just run the tool and it asks if you want to create a new default domain policy (answer yes) and if you want to create a new default domain controllers policy (answer yes). At this point, we double-checked to make sure the SYSVOL tree and the policies were all correct.

Step 5: Replicate SYSVOL. We had already found that using KB 290762 wiped out SYSVOL on the PDC, so we didn't want to do that again. Because we only had two DCs and because the file replication service had been stopped, it seemed logical that starting the FRS -- first on the PDC and then the other DC -- would jump-start FRS. SYSVOL was replicated, and we had the SYSVOL share.

This next part isn't really a step. It's something we ran into that you should be aware of. After Step 5, SYSVOL was shared but not NETLOGON. When SYSVOL was deleted from the PDC, it also deleted two custom Group Policies. When SYSVOL was replicated after the rebuild, errors were logged in the event log complaining about these two policies. Using ADSIEdit, we went to Corp.net\system\Policies and deleted the objects for the two deleted policies. Soon, the Netlogon share appeared, and the 1704 event in the application log validated replication of policy.

After doing an operation like this, it's a good idea to check the event logs for related errors and create a sample GPO and see if it replicates.

------------------- End of Document -----------------

Tags: Windows Server 2003

Published Date: 20080709