View Issue Details

IDProjectCategoryView StatusLast Update
000010810000-005: Information Modelpublic2009-01-13 18:13
Reporterrandyarmstrong Assigned ToWolfgang Mahnke  
PrioritynormalSeverityminorReproducibilityalways
Status closedResolutionfixed 
Summary0000108: Review Diagnostics Information for Implementation Concerns
Description

The diagnostics information needs to be reviewed to ensure there are no implementation concerns.

TagsNo tags attached.
Commit Version
Fix Due Date

Activities

user2

2008-05-27 16:43

  ~0000715

Randy will complete implementation before the spec is released.

randyarmstrong

2008-12-05 09:13

administrator   ~0000978

After reviewing the diagnostics I am having problems understanding the use case for the various arrays. It seems to me that clients should rely on audit events to determine when sessions/subscriptions come and go and subscribe to the individual objects for the values that they are interested in. I realize that the arrays theoretically allow clients to get coherent snapshots but I don't think a coherent snapshot for this kind of information is useful since servers will likely delay updating their internal structures in a timely fashions for performance reasons.

I think it is worth changing this because:

1) The array duplicating large structures make it much more difficult to implement a scalable server.
2) Events are a more natural 'UA way' to provide the information required.

I have attached a Visio diagram that replaces the array variables with folder objects and adds a notifier hierarchy.

In the proposed model the actually diagnostic data is not replicated in many variables which makes it much easier to implement efficiently.

I have more comments on the individual structures:

3) SessionDiagnosticsDataType - don't see the point of:

CurrentPublishTimerExpirations (available on subscription)
KeepAliveCount (available on subscription)
CurrentRepublishRequestsInQueue (Republish requests are not queued)
RepublishCount (there is a RepublishCounter as well)
PublishingCount (this is a PublishCounter as well)

Perhaps better names:

NotificationMessageCount
KeepAliveMessageCount
RepublishedMessageCount
DataChangeNotificationCount
EventNotificationCount

4) ServiceCounterDataType

The unauthorized counter is not really useful because it can only capture a subset of authorization failures.
Audit events are a better way to track this data.

5) SubscriptionDiagnosticsDataType

Missing:

PublishTimerExpiredCount
MaxLifeTimeCount,
MaxKeepAliveCount,
KeepAliveCount - the current count for keep alive messages.
NotificationMessageCount
KeepAliveMessageCount
RepublishedMessageCount
UnpublishedMessageCount - number of messages that are ready to send but not publish requests.
UnacknowledgedMessageCount - the number of unacknowledged messages saved in the queue.
DiscardedMessageCount - the number of messages that were discarded before they were acknowledged
MonitoredItemCount
NextSequenceNumber

6) SamplingRateDiagnostics

I don't see the point of collecting this information. For the sample server many items are exception based which means there is no sampling going on. For the wrapped DA3 servers the groups support multiple sampling rates so the UA server does not know how the items are grouped for sampling.

This object would make more if this information was folded into the ServerDiagnosticSummary object:

CurrentMonitoredItemCount
DisabledMonitoredItemCount
CumulatedMonitoredItemCount
MaxMonitoredItemCount

This information could be replicated in the Subscription diagnostics.

2008-12-05 09:13

 

randyarmstrong

2008-12-16 17:30

administrator   ~0001003

Agreed of Dec 16th, 2008 UA TelCon

We need the arrays because:

1) It makes it easy for clients to receive all diagnostics information as it is available without constantly adding/removing monitored items.
2) Events are an optional feature for servers and requiring events would reduce the number of servers that can support diagnostics.

That said we need to put some caveats on the behaviors of the array values such as:

1) There is no relationship between the order of references returns during a browse and the order in the array.
2) Different clients with different credentials will get arrays with different lengths.
3) 2) implies that impersonating users can cause the length of the array to change.
4) 2) also implies that the index of a session/subscription in the array is not fixed.
5) 4) implies that the IndexRange parameter is potentially misleading and should not be allowed.

On the individual diagnostics structures:
Sampling Rate Diagnostics:

1) Should be changed to SamplingInterval since that is the term in used in the MonitoredItem services.
2) Monitored items are grouped by their revised SamplingInterval which may or may not be the same as the internal sampling rates used by the server.
3) The SamplingErrorCount is not meaningful information and should be removed.

Session Diagnostics:

1) The unauthorized error count on each request is not that meaningful. Replace with a single UnauthorizedRequestCount for all services.
2) TotalRequestCount should be added.
2) The following values are subscription related and don't belong in the session diagnostics:

keepAliveCount,
currentRepublishRequestsInQueue,
maxRepublishRequestsInQueue,
republishCounter,
publishingCount,
publishingQueueOverflowCount

Subscription Diagnostics:

1) Need maxLifetimeCount (same as the subscription parameter)
2) Change lifetimeCount to currentLifetimeCount (should always be 0 if everything is ok).
3) Change keepAliveStateCount to currentKeepAliveCount (should always be 0 if a continuous stream of data is being published).
4) Change lateStateCount to latePublishRequestCount (i.e. the number of times the publish timer expires and there are unsent notifications).
5) Need to add the following:

UnacknowledgedMessageCount - the number of unacknowledged messages saved in the queue.
DiscardedMessageCount - the number of messages that were discarded before they were acknowledged.
MonitoredItemCount - the total number of monitored items.
DisabledMonitoredItemCount - the number of disabled monitored items.
MonitoringQueueOverflowCount - the number of times a monitored item dropped notifications because of a queue overflow.
NextSequenceNumber - sequence number for the next notification message.

user2

2009-01-07 17:47

  ~0001031

Static diagnostic nodes that always appear in the address space will return BAD_NOTREADABLE when they are read or subscribed to and diagnostics are turned off. Dynamic diagnostic nodes (like the session nodes) will not appear in the address space when diagnostics are turned off.

user2

2009-01-07 18:54

  ~0001033

Another discussion about diagnostics on sample rates. We decided that these stats only make sense if a server implements a small number of fixed scan rates and therefore should made optional. If the sample rate diagnostics are provided then the node ids are expected to be persistent between server runs.

Wolfgang Mahnke

2009-01-09 14:03

developer   ~0001037

Fixed in version 1.01.21

user2

2009-01-13 18:13

  ~0001043

Reviewed and accepted changes in telecon today.

Issue History

Date Modified Username Field Change
2007-02-19 17:25 randyarmstrong New Issue
2007-02-19 17:25 randyarmstrong Status new => assigned
2007-02-19 17:25 randyarmstrong Assigned To => randyarmstrong
2008-05-27 16:43 user2 Note Added: 0000715
2008-12-05 09:13 randyarmstrong Note Added: 0000978
2008-12-05 09:13 randyarmstrong Status assigned => acknowledged
2008-12-05 09:13 randyarmstrong File Added: DiagnosticsProposal 2008-12-04.vsd
2008-12-16 17:30 randyarmstrong Note Added: 0001003
2008-12-16 17:30 randyarmstrong Assigned To randyarmstrong => Wolfgang Mahnke
2008-12-16 17:30 randyarmstrong Status acknowledged => assigned
2009-01-07 17:47 user2 Note Added: 0001031
2009-01-07 18:54 user2 Note Added: 0001033
2009-01-09 14:03 Wolfgang Mahnke Status assigned => resolved
2009-01-09 14:03 Wolfgang Mahnke Resolution open => fixed
2009-01-09 14:03 Wolfgang Mahnke Note Added: 0001037
2009-01-13 18:13 user2 Status resolved => closed
2009-01-13 18:13 user2 Note Added: 0001043