Mombu the Microsoft Forum sponsored links

Go Back   Mombu the Microsoft Forum > Microsoft > Time related cluster fail-over?
User Name
Password
REGISTER NOW! Mark Forums Read

sponsored links


Reply
 
1 18th July 08:57
john collins
External User
 
Posts: 1
Default Time related cluster fail-over?


This morning one of my server 2003 (enterprise) clusters failed to the other
node and the DFS Root did not return online. I've posed to the DFS group
about that...

After examining the system event logs I noticed a number of time related
messages that indicate that the two nodes times were off by as much as 23
seconds. I recall, but can't find now, that clusters are extremely time
sensitive. Is that true? If so, what is the tolerance?

Below is a synopsis of the system log. I would appreciate anyone's insight
so I can go about preventing this in the future.

Regards,

John

0321:41 nodes are 9.4 sec different ID 1202
0337:48 nodes are 14.6 sec different
0354:53 nodes are 1.9 sec different
0409:43 W32Time reports warning ID# 50
0411:58 nodes are 24.8 sec different
0424:02 nodes are 19.5 sec different
0425:51 W32Time reports warning ID# 50
0446:01 nodes are 14.5 sec different
0448:57 nodes are 8.1 Sec different
0456:55 nodes are 2.9 sec different
0538:03 nodes are 8.1 sec different
0558:11 nodes are 13.2 sec different
0619:04 nodes are 18.3 sec different
0631:12 nodes are 12.8 sec different
0704:53 nodes are 7.3 sec different
0713:21 nodes are 2.3 sec different
0721:09 nodes are 7.9 sec different
0724:55 nodes are 13.2 sec different
0737:25 nodes are 7.1 sec different
0738:33 nodes are 1.5 sec different
0739:46 nodes are -4.4 sec different
0741:42 nodes are -9.8 sec different
0744:30 nodes are -15.2 sec different
0749:29 nodes are -20.2 sec different
0753:35 nodes are -14.3 sec different
0754:42 nodes are -7.4 sec different
0755:49 nodes are -5.4 sec different
0757:07 nodes are 7.4 sec different
0759:07 nodes are 12.9 sec different
0801:37 nodes are 18.5 sec different
0806:16 nodes are 23.5 sec different
0811:32 nodes are 16.2 sec different
0812:27 nodes are 9.9 sec different
0812:43 node lost commo w/ EE on public nic ID 1123
0812:43 node lost commo w/ EE on private nic ID 1123
0813:05 Cluser node EE removed ID 1135
0813:05 Attempting to bring online Cluster Group ID 1200
0813:06 Attempting to bring online ESI ID 1200
0813:06 Attempting to bring online Disk Resources ID 1200
0813:06 Brought ESI online ID 1201
0813:09 The time provider NtpCLient is currently receiving valid time data
from DC ID 37
0813:10 Cluster file share resource DFS Root failed to start with error 21
ID 1068 (appears 8 times)
0813:10 Cluster File SHare DFS Root cannot be brought online because the
share
could not be created. ID 1053 (appears 8 times)
0813:10 Cluster resource DFS Root failed ID 1069 (appears 8 times)
0813:10 Failed to bring the Resouce Group Cluster Group completely online or
offline. ID 1205
0813:16 Brought the Resouce Group Disk Resources online ID 1201
0813:45 Node (re)established commo w/ cluster node EE on Public NIC ID 1122
0813:45 Node (re)established commo w/ cluster node EE on Private NIC ID 1122
0813:46 Interface for node EE on private net is operatonal ID 1125
0813:46 Interface for node EE on public net is operatonal ID 1125
0813:48 nodes are 1.3 sec different
  Reply With Quote


  sponsored links


2 26th July 09:46
mike rosado [msft]
External User
 
Posts: 1
Default Time related cluster fail-over?


Hi John,

I'm by no means an expert in this subject matter of Windows Time Service,
but I'll try to assist you to the best of my ability in trying to explain
the importance of Windows Time Service in relation to a Cluster Server.

This link below explains best the importance of Microsoft® Windows ServerT
2003 Windows Time service, also known as W32Time to the Microsoft Cluster
Service:

http://www.microsoft.com/Resources/Do***entation/windowsserv/2003/all/techref/en-us/W2K3TR_times_intro.asp?frame=true

Excerpts below explain that the Time Service is important to keep the
network synchronized between the two Cluster nodes.

"The Windows Time service is essential to the successful operation of
Kerberos authentication and, therefore, to Active Directory-based
authentication. Any Kerberos-aware application, including most security
services, relies on time synchronization between the computers that are
participating in the authentication request. Active Directory domain
controllers must also have synchronized clocks to help ensure accurate data
replication."

Importance of Time Protocols:

Time protocols communicate between two computers to exchange time
information and then use that information to synchronize their clocks. With
the Windows Time service time protocol, a client requests time information
from a server and synchronizes its clock based on the information that is
received.

The Windows Time service uses the Network Time Protocol (NTP) to help
synchronize time across a network. NTP is an Internet time protocol that
includes the discipline algorithms necessary for synchronizing clocks. NTP
is a more accurate time protocol than the Simple Network Time Protocol
(SNTP) that is used in some versions of Windows; however W32Time continues
to support SNTP to enable backward compatibility with computers running
SNTP-based time services, such as Windows 2000.

On the same path, the Cluster uses RPC heavily for it's communication so if
the Time Service is not synchronized you may experience problems as noted in
the article below:

RPC Error Messages Returned for Active Directory Replication When Time Is
Out of Synchronization
http://support.microsoft.com/?id=257187

As of Windows 2000 SP3 which added support for Kerberos on a Cluster is yet
another reason why the Time Service need to be synchronized:

235529 Kerberos support on Windows 2000-based server clusters
http://support.microsoft.com/?id=235529

Kerberos Administration in Windows 2000
http://support.microsoft.com/?id=232179

Maximum Tolerance for Synchronization of Computer Clocks:

The KDC server's clock and the Kerberos client's clock have to be
synchronized to within a specified number of minutes. If the clocks are not
synchronized within the specified number of minutes, tickets are not issued
to the client. This is a deterrent in Replay attacks. Settings are in
minutes. Default value: 5 minutes.

These are the reasons why the Cluster service needs time to be synchronized.
In Windows NT 4.0, the Cluster service used a different version of the
Time Service resource to ensure good time between the nodes. In Windows
2000 and later the Cluster service relies on existing time sources...but it
still needs to be relatively close.

--
Hope this helps,
Mike Rosado
Windows 2000 MCSE + MCDBA
Microsoft Enterprise Platform Support
Windows NT/2000/2003 Cluster Technologies

================================================== ==
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
================================================== ==

This posting is provided "AS IS" with no warranties, and confers no rights.
<http://www.microsoft.com/info/cpyright.htm>

-----Original Message-----
  Reply With Quote
3 26th July 09:46
john collins
External User
 
Posts: 1
Default Time related cluster fail-over?


Mike,

Thanks for the pointers. Unfortunately none of those article's can tell me
the tolerance in time discrepancy between two nodes. Can you tell me what
that is?

I understand WIndows Time quite well. Our AD domain provides time from the
DC's and I hae even turned on verbose logging of time.

I need to understand what was happeing for those several hours before the
cluster failed over. I don't see all those time error messages in such
quantity usually. Sometimes they occur but only two or there instances at a
time...not the 30 or so that I indicated below.

Can you give me a further explanition as to what happened to cause the
failover? Is this a problem seen elsewhere or is my case isolated?

Please advise.

Regards,

John
  Reply With Quote
4 26th July 09:47
mike rosado [msft]
External User
 
Posts: 1
Default Time related cluster fail-over?


John,

Went back an re-read your original posting. As stated previously the Time
Service is crucial to Cluster because of the following:

"The Windows Time service is essential to the successful operation of
Kerberos authentication and, therefore, to Active Directory-based
authentication. Any Kerberos-aware application, including most security
services, relies on time synchronization between the computers that are
participating in the authentication request. Active Directory domain
controllers must also have synchronized clocks to help ensure accurate data
replication."

Since I know you more than anyone understands this very well, then I think
you need to call Microsoft Product Support Service to obtain the following
hotfix. Because this sounds like the issue you're experiencing.

830092 W32Time frequently logs Event ID 50 and poor time synchronization
occurs
http://support.microsoft.com/?id=830092

--
Hope this helps,
Mike Rosado
Windows 2000 MCSE + MCDBA
Microsoft Enterprise Platform Support
Windows NT/2000/2003 Cluster Technologies

================================================== ==
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
================================================== ==

This posting is provided "AS IS" with no warranties, and confers no rights.
<http://www.microsoft.com/info/cpyright.htm>

-----Original Message-----
  Reply With Quote
Reply


Thread Tools
Display Modes




Copyright © 2006 SmartyDevil.com - Dies Mies Jeschet Boenedoesef Douvema Enitemaus -
666