Thursday, July 17, 2008

Domain Controller Replacements

Were in the middle of replacing all 550+ DCs in our Active Directory environment with new hardware.  Because some developers and applications are hardcoded to use certain DCs and since the DCs are also our DNS servers, we did not want their IP addresses or names to change.  If we changed their IPs, for example, wed have to change the DNS entries on all the servers TCPIP NIC configurations, as well as the scopes in DHCP.

This isnt too bad, because we worked out a step-by-step process to demote, rename, re-ip, the old systems before we tear them down completely.  Then we can bring up the new DCs with the original name and IP.  Its a lot cleaner than doing DC renames later and much less fraught with difficulties.

Except.Ive run into fun replication issues with stubborn metadata and KCCs.

THE SETUP

·       Normally, if we make a change, convergence for our entire AD infrastructure takes about 1 hour.  A recent AD health check by Microsoft confirmed this.

·       We have two DCs at every site for redundancy.

·       All DCs are GCs except 2 per domain:  the infrastructure master role holder; and a special DC at a central site we use for backups (the ntds.dit is smaller and easier to backup if it is not a GC).

·       The infrastructure master is always located at a domain hub site.

·       The second DC at the domain hub site is a GC, PDC-emulator role holder, and RID master role holder.

·       We have a hub-and-spoke replication topology for each domain, centered around a site with excellent WAN connectivity.  That hub then replicates with our site as the national hub.

·       Typically, there are anywhere from 5 to 10 sites within a domain.  Some have more, though none have less than 5.

·       All DCs are DNS servers, carrying their domains AD-integrated DNS zone as well as some other legacy zones and the standard root zone.

THE SCENARIO

I demoted and took out the old DC/GC at a domain hub site. 

When I tried to promote the new hardware, I got the message Cant join the domain, user already exists.  (Of course, the user is the computer, in this case.)

Ive had those errors before and it is invariably one of three things:

·       Debris left over in Sites & Services.  If you look at the site, you might see the old DC you promoted still there as an object, but it wont have any connector objects.

o       You can just delete the DC object *IF* you expand it and there is *no* NTDS Settings and no connectors listed in Sites & Services.

o       If the NTDS Settings/connectors still exist under the DC object in Sites & Services, youll need to perform a forcible removal via NTDSUTIL, which Ill discuss a little later in this blog.

·       The old DC may still be listed as a name server in DNS on the domain DNS zones Name Servers tab.

o       Open DNS and select the domains DNS zone.  Right-click on the zone and pick properties to look at the Name Servers tab.

o       If the old DCs name is still listed as a name server, remove it.

·       The old DC may have left an old computer account in  Active Directory Users and Computers (ADUC) and you need to delete the old account.  (Thats why we usually rename the computer after the demotion, but before we take it down hard for the last time.  If you rename it, there should be no old account left in ADUC with the same name.)

WHAT I DID

But this time, I checked all the above things, and it looked clean.

So I opened the DCPROMO log, %windir%\debug\dcpromoui.log and went to the bottom.  I discovered which DC it was talking to, to sponsor its addition into the domain.  (Note:  to find the sponsor, search for:  Enter MyNetJoinDomain)

I checked the sponsoring DC and found that it still listed the old DC in Sites & Services *and* it preferred that old DC as its replication partner, even though it no longer existed.  And there was nothing I could do in replmon, repadmin or Sites & Services to force the KCC to give up replication with the dead DC and establish a connection with the remaining DC in the site.

So I did a forcible removal of metadata about the old DC by using NTDSUTIL.  (Ill list that process further down) with the focus set on the stubborn sponsoring DC.

Tried to DCPROMO the new DC againno go.  Checked the log again and found it had selected a different sponsoring DC from another site.  This new sponsoring DC also refused to give up its replication connector to the old, removed DC.  So I had to do NTDSUTIL again to remove metadata on that system. 

HOW I DID IT

Here, in a nutshell, is how to remove a demoted DCs metadata so that the KCC will stop trying to create connectors to DCs that no longer exist, and so that you can reuse a domain controller name (if you wish to).

Oh, you have to be at least a domain admin.

And you have to do this on a DC in your domain.

1.      Open Sites & Services / Expand the target site / Expand the target DC you want to remove

2.      Check for the NTDS object beneath the DC server object and connections within that

3.      If the NTDS object does *not* exist, just delete the DC server object and youre done.  Skip to the end.

4.      If the NTDS object exists, *continue*

5.      At the command prompt, enter:  ntdsutil

6.      Enter:  connections

7.      Enter:  connect to server servername
where
servername is the FQDN (myserver.subdom.dom) of a DC in the domain youre working with

8.      Enter: quit

9.      Enter: select operation target

10.     Enter: List sites

a.      Scroll through the sites to find the site containing the stubborn DC server object

b.      Enter:  select site sitenumber  
where
sitenumber is the number of the site containing the stubborn DC server object

11.     Enter:  list domains

a.      Scroll through the domains to find the domain containing the stubborn DC server object

b.      Enter:  select domain domainnumber  
where
domainnumber is the number of the domain containing the stubborn DC server object

12.     Enter:  list servers in site

13.     Enter: select server servernumber
Where
servernumber is the stubborn DC you want to remove

14.     Enter:  list current selections
VERIFY that you have selected the DC you want to remove

15.     Enter: Quit

16.     Enter:  remove selected server

17.     Read the popup window and VERIFY what you are going to delete

18.     Click on [Yes]

19.     Enter: quit
Keep entering
quit until you exit ntdsutil

20.     Go back to Sites & Services

21.     Make sure the stubborn DC now has no NTDS object under it.

22.     If the stubborn DC is now clean, delete the stubborn DC in Sites & Services

FINAL EXPLANATION

Because I was working with a DC at the main domain hub site, and that DC was the only GC at that site, the KCC on all the other domain DCs preferred that DC/GC.  When the DC/GC was demoted, there was nothing at the other end of that connector.  The sites preferred to keep that connector (even though I deleted connectors and forced the KCC to rerun)

Because the other DCs in other sites were only trying to communicate with the demoted DC/GC, they did not replicate the metadata that indicated that the old DC/GC was gone.  So the KCC would always regenerate connectors to the dead DC, which in turn meant DCs other sites never got the news about the demotion.

I finally had to use the ntdsutil method on every promotion-sponsoring DC, which was basically one DC at each site in the domain (the DC elected as the replication bridgehead) before the dcpromo would agree that the DCs name was free to use to join the new DC to the domain and promote it.

Whew.  What a pain in the neck.

Sincerely,
Amy G. Padgett

No comments: