Friday, September 30, 2011

Outlook and Global Catalogs

We had an interesting issue recently where Windows clients at a branch office site were getting global catalog services from a domain controller in a remote site. In our environment, we're running Active Directory and at the moment are still at forest level 2003 R2, although over 90% of our DCs are Windows 2008 R2. We have close to 300 sites arranged in your traditional hub-and-spoke replication topology, and we've put two domain controllers at 80% of the sites to ensure that if one DC goes down there is still a DC/GC available locally.

Anyway, at this one site, the users sporadically went for GC services to a hub DC instead of one of their two local ones.

We checked all the usual suspects, including:
  • Was the client in a subnet that was inadvertently mapped to the wrong site, overlapped other mappings or not mapped at all in Active Directory Sites and Services? (That's the most common answer.)
  • Had an over-zealous admin hacked the registry and hardcoded the closest GC setting? Microsoftt KB 319206
  • Were over-zealous network admins blocking ports again? (GC services answer on ports 3268 and for secure LDAP, 3269)
All the normal answers were "no," and it was working most of the time, just not all the time.

So I asked the site to do a network trace from a client the next time they had an issue when they opened Outlook. Bingo. You gotta love traces.

Looking at the trace, it took all of one minute to find the problem. Another minute to fix it.

I filtered the trace looking for DNS queries and the first query I saw a DNS query for LDAP. The query was sent to the local DC.
Good so far.
Except that it returned information about the two local DC/GCs plus the remote DC/GC. hmmm

I opened DNS and drilled down to _tcp.Sitename._sites.gc._msdcs.parent.domain for the site having the issues. Sure enough, there was an _ldap record registered for the remote DC there. That's why, when clients asked for LDAP servers in the site, they had a 1 in 3 change of getting that remote DC.

So I deleted the _ldap._tcp.Sitename._sites.gc._msdcs.parent.domain record from DNS so the remote DC/GC would no longer be offered as a good alternative.

Why did the remote DC/GC get in there in the first place?

We have speculated that it's due to auto-site-coverage and that on 6/6/11 (the date when the remote DC created its _ldap record in that zone) the two DC/GCs in that specific site were having network issues or down or otherwise unable to cover their site. So the remote DC/GC stepped up to the plate and registered to cover the site since there were no other DC/GCs to cover it.

Please note that this is probably not a common issue, but it does happen. It's happened maybe 3 or 4 times to us in about 10 years.

Hope this helps someone....