Friday, September 30, 2011

Outlook and Global Catalogs

We had an interesting issue recently where Windows clients at a branch office site were getting global catalog services from a domain controller in a remote site. In our environment, we're running Active Directory and at the moment are still at forest level 2003 R2, although over 90% of our DCs are Windows 2008 R2. We have close to 300 sites arranged in your traditional hub-and-spoke replication topology, and we've put two domain controllers at 80% of the sites to ensure that if one DC goes down there is still a DC/GC available locally.

Anyway, at this one site, the users sporadically went for GC services to a hub DC instead of one of their two local ones.

We checked all the usual suspects, including:
  • Was the client in a subnet that was inadvertently mapped to the wrong site, overlapped other mappings or not mapped at all in Active Directory Sites and Services? (That's the most common answer.)
  • Had an over-zealous admin hacked the registry and hardcoded the closest GC setting? Microsoftt KB 319206
  • Were over-zealous network admins blocking ports again? (GC services answer on ports 3268 and for secure LDAP, 3269)
All the normal answers were "no," and it was working most of the time, just not all the time.

So I asked the site to do a network trace from a client the next time they had an issue when they opened Outlook. Bingo. You gotta love traces.

Looking at the trace, it took all of one minute to find the problem. Another minute to fix it.

I filtered the trace looking for DNS queries and the first query I saw a DNS query for LDAP. The query was sent to the local DC.
Good so far.
Except that it returned information about the two local DC/GCs plus the remote DC/GC. hmmm

I opened DNS and drilled down to _tcp.Sitename._sites.gc._msdcs.parent.domain for the site having the issues. Sure enough, there was an _ldap record registered for the remote DC there. That's why, when clients asked for LDAP servers in the site, they had a 1 in 3 change of getting that remote DC.

So I deleted the _ldap._tcp.Sitename._sites.gc._msdcs.parent.domain record from DNS so the remote DC/GC would no longer be offered as a good alternative.

Why did the remote DC/GC get in there in the first place?

We have speculated that it's due to auto-site-coverage and that on 6/6/11 (the date when the remote DC created its _ldap record in that zone) the two DC/GCs in that specific site were having network issues or down or otherwise unable to cover their site. So the remote DC/GC stepped up to the plate and registered to cover the site since there were no other DC/GCs to cover it.

Please note that this is probably not a common issue, but it does happen. It's happened maybe 3 or 4 times to us in about 10 years.

Hope this helps someone....


Jeff said...

One best practice MS suggested is to "Configure Domain Controllers or global catalogs to Not Register Generic Records" outside of their sites, so remote site can't answer desperate call from other remote sites. We used DnsAvoidRegisterRecords. Not sure if this works the same way il all version of Windows server.

Amy said...

Right :).
And we do that for some key locations. But we can't do it for all because we rely on that to cover outages and other conditions where we want a fall back position.

Nothing is perfect, I'm afraid.

Courtney Winter said...

Thanks for taking the time to share this, I feel strongly concerning it and love reading additional on this subject. If doable, as you gain information, would you mind updating your blog with additional information? it's extremely helpful on behalf of me.
pst file limit