Tuesday, May 18, 2010

Violate Best Practices at your own risk

Learned something new in one of those crazy, all night "the sky is falling" scenarios that turn out to be a stubborn admin shooting himself in the foot. He chose to ignore best practices that state: put users into global groups, and then global groups into...well...whatever is necessary.

He decided to put users into universal groups in another domain, even though all the users were from one domain and should have simply been put into a global group in their own domain.  The reason? Politics and stubbornness.

You see, we were forced to create some additional domains against our best advice. And then, again, despite our advice to just put centralized computer resources in those domains (a la the old NT resource domain model) they decided to put security groups in that domain, as well. Because they wanted to.

That made the groups Universal.

On the surface, that seems fine...except for the following:
Scenario Setup
1)  All the users were in domain X
2) The computers were in domain Y
3) The security groups were in domain Y (for no reason except political)
4) The OUs containing the users in domain X are being renamed and reorganized
5) All the DCs in domain X and domain Y are GCs, except two per domain: 1 for backups (DIT is smaller); and 1 for the Inf Master.
6) All the DCs that are GCs got the message about the OU changes.
7) But...there are phantom records in the Universal groups that only get updated about every two days...so...the groups broke because for some inexplicable reason, the servers consuming the groups fell in love with the Inf Master role holder.

Ouch.

Phantom Records
Most of us tend to forget about those lovely little phantom records. See Microsoft KB 248047 for details. Basically, non-GC DCs have no direct knolwedge of objects in other trusted domains, because they don't have a GC. So groups like universal groups containing accounts from trusted domains in Active Directory insert a phantom object for these cross-domain group-to-user references.

Phantom records only contain:
  • The distinguished name of the object
  • The GUID of the object
  • The SID of the object
When a user is added to a universal group, for example, in another domain, any non-GC DCs add one of these phantom objects representing the remote user. That way, they can "intellegently" display and identify the user when someon peaks into the group, without requiring access to a GC.

So when these folks renamed the OUs containing the users, the distinguished name for the user object changed. And access to the systems using those security groups containing the bad user DNs stopped.

And we had an all night call. People tried to force replication. They tried creating a new group and adding the users--but low and behold--the users added to the new group showed the same bad DNs on the non-GC DCs!

It had nothing to do with replication. Replication was working fine. Those phantom records were to blame.  Well, no. Not true. The real cause was the decision to put groups, which would normally be global groups in domain X (per best practice), in domain Y. This made them universal and yada, yada, yada.

You see, non-GCs only rescan their phantom objects and refind/correct DNs about every two days. Because it is a "labor" intensive operation. If it did it more frequently, it might not even finish before it had to start the process again. And then there's all that nasty WAN traffic and DC non-responsiveness while it chugs through this process.

Now you can kick off a phantom scan manually (which we did) but you don't want to do so lightly. And an hour after I kicked off the phantom scan, someone actually renamed the OUs again, anyway, thereby causing the problem all over again. Nice. Thanks.

The interesting thing to me is that the folks managing this do not want to correct it by simply making global groups in the appropriate domain. Instead, they wanted directions on kicking off phantom scans, themselves, whenever they felt it necessary. Like every five minutes or so.

Let's just say that's why they are not Enterprise Admins.

So, sorry. If you shoot yourself in the foot, you have to expect to bleed. And don't expect me to load your gun for you.