cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Will the event/subscription utilize the ThingWorx cluster ?

seanccc
17-Peridot

Will the event/subscription utilize the ThingWorx cluster ?

Hi, 

We're using ThingWorx 9.1.0 on REHL 8.2 and we have the ThingWorx cluster installed(2 ThingWorx foundation servers ).   

 

If the event processing subsystem has max 500 threads ,  does this mean that 500 threads are (almost)equally distributed to 2 ThingWorx servers?   If I raise 1000 events in a service,  will the 2 ThingWorx servers share the load of event processing ? 

 

Regards,

Sean

1 ACCEPTED SOLUTION

Accepted Solutions
tcoufal
12-Amethyst
(To:seanccc)

Hi there,

no it wont balance it out like you might think it would.

But I also depends what was the trigger action, type of thread which lead to event trigger and subsequent subscription handle. 

All Timer/Scheduler and their respective events and subscriptions to those events are all handled on "Singleton" instance, which can be any node in the cluster. Usually it is the one that started first and its status is normally unchanged, until new "Singleton" is elected (current singleton goes offline....).

You can see it in Clustering Subsystem info - Singleton server or not. 

There is also "toggleSingleton" service under ClusteringSubsystem which as might guess, changes the singleton. No reason to use it really, unless you spot that one server is experiencing some issues in regards to event subsystem overload and you want to restart it for an example, thus take it from the cluster beforehand (eventual consistency).

So one server instance will always be under lot more stress due to the fact that All Timers and All Schedulers are handled on one server. That depends on your business logic..

But, if the event is caused by either Client request (http) or by connected asset (wsexecution) it would be handled on platform which received that request from your LoadBalancer. And that could land on any instance, assuming that you are using correct setting and appropriate loadbalancing algorithms.  

View solution in original post

15 REPLIES 15
yhan
17-Peridot
(To:seanccc)

Hi Sean,

 

Cluster is designed for the purpose of reducing the duration of outages for TWX instead of sharing load. In cluster environment, cache master tends to process more events and queue jobs first. It cannot achieve complete averaging between two server. Just for your reference~

 

Thanks,

/Yoyo

seanccc
17-Peridot
(To:yhan)

@yhan ,

 

What's  the "cache master" means ?   Is it the ThingWorx foundation server that the events are raised from ?  or  Ignite server ? or something else ? 

 

It's fine if the average cannot be done between the 2 servers as long as the 2 servers' computing resource are utilized I think. For example ,  if the events are handled by ThingWorx Server A , but the CPU usage of the Server A has been over 90%,  or some events' waiting time exceed certain threshold,  will some of events are handled by Server B simultaneously in the situations ? 

 

Regards,

Sean

 

tcoufal
12-Amethyst
(To:seanccc)

Cache master  = Singleton server.

PTC should stick to nomenclature which they created.. 

 

And no. It won't delegate any unhandled events from Server's A queue to Server's B queue. It does not work like that. 

 

But remember, You don't have 500 working threads for Event subsystem, default settings is 16 with possibility to spun up to 500 if queue size reaches some threshold, but based on my experience system crashes when it does create those 500 threads. 

But ultimately queue size is what limits you, if you create 1000 events, they go to queue first and that it gets assigned to event worker thread and the queue should goes to zero in some timely manner.

Hello @seanccc 

 

I don't think it's going to be a perfect thread allocation.

 

Refer to figure below. If a user logs in to the system, the load balancer will determine which system is currently better suited to respond to the request, but only one system will handle it, other systems do not share the tasks that need to be handled.

MaxWang_0-1614823948683.png

https://supportdev.ptc.com/help/thingworx/platform/r9/en/#page/ThingWorx%2FHelp%2FThingWorxHighAvail...

BR,

Max

seanccc
17-Peridot
(To:MaxWang)

@MaxWang ,

 

I agree with you for the http requests .   but here I'm asking the mechanism of the event/subscription, i.e.  the EventProcessingSubsystem.  If a event is raised by a service running in Server A,  is it possible that the event is consumed in Server B ? 

 

Regards,

Sean 

Hello @seanccc 

 

I found that Apache Ignite has been added since 9.0, which can realize active active load balance, so HA systems can handle events together.

https://support.ptc.com/help/thingworx/platform/r9/en/index.html#page/ThingWorx%2FHelp%2FThingWorxHi... 

MaxWang_0-1614906938335.png

 

Hope this information is useful to you.

 

BR,

Max

seanccc
17-Peridot
(To:MaxWang)

@MaxWang ,

 

The page only mentions Ignite is used in ThingWorx 9,  but doesn't explain how and when the 2nd server will process a event which is not raised from itself,  and it doesn't describes how active-active cluster affects the configuration of EventProcessingSubsystem,  for example,  if the max count of thread is 500 in EventProcessingSubsystem,  how the 500 threads are allocated between 2 servers , or each server can has 500 threads ?   

Could you provide more document about it ? 

 

Regards,

Sean

 

Regards,

Sean 

Hello @seanccc 

 

I can't find more information to explain the mechanism.
Hope your post can be seen by the product manager of Thingworx and reply.

 

BR,

Max

slangley
23-Emerald I
(To:seanccc)

Hi @seanccc 

 

I have reached out to R&D to get answers to your questions.

 

Regards.

 

--Sharon

seanccc
17-Peridot
(To:slangley)

@slangley ,

 

Thank you for following it up. 

 

Regards,

Sean

tcoufal
12-Amethyst
(To:seanccc)

Because it does not. 

Ignite has nothing to do with application. It is a "simple" rather "simply" distributed data store. 

Event handling is part of the application. 

Thingworx uses context-stack, so it knows which precess invoked another. 

event handler is executed in eventProcessingSubsystem thread(s) on which the originated request landed.

 

tcoufal
12-Amethyst
(To:MaxWang)

"the load balancer will determine which system is currently better suited to respond to the request"

Be careful when you write things like that, because it depends, and by default it is not true. 

 

In your haproxy config example you are using "roundrobin" as the loadbalancing algorithm. So it does not consider any metric other that if the backend is simply "up" (health check on /Thingworx/health or ready only returns status code).

No weighted roundrobin, no leastconn. No server is better suited than the other in this scenario. Unless you consider dead backend as not suited.... 

 

In nGINX is ip_hash used for session stickiness, that would be a big problem when you would use some reverse-proxy systems, since it would cause all sessions to be stuck to the same backend.  Again basic algo is Roundrobin.

 

The HA after TWX 9.0 uses ignite, and all nodes can share the tasks freely, it's like taking tasks from the same pool, not simply divide the task to 2 pools.

 

The HA before 9.0, uses active/ standby, all tasks is done by the primary server.

tcoufal
12-Amethyst
(To:seanccc)

Hi there,

no it wont balance it out like you might think it would.

But I also depends what was the trigger action, type of thread which lead to event trigger and subsequent subscription handle. 

All Timer/Scheduler and their respective events and subscriptions to those events are all handled on "Singleton" instance, which can be any node in the cluster. Usually it is the one that started first and its status is normally unchanged, until new "Singleton" is elected (current singleton goes offline....).

You can see it in Clustering Subsystem info - Singleton server or not. 

There is also "toggleSingleton" service under ClusteringSubsystem which as might guess, changes the singleton. No reason to use it really, unless you spot that one server is experiencing some issues in regards to event subsystem overload and you want to restart it for an example, thus take it from the cluster beforehand (eventual consistency).

So one server instance will always be under lot more stress due to the fact that All Timers and All Schedulers are handled on one server. That depends on your business logic..

But, if the event is caused by either Client request (http) or by connected asset (wsexecution) it would be handled on platform which received that request from your LoadBalancer. And that could land on any instance, assuming that you are using correct setting and appropriate loadbalancing algorithms.  

View solution in original post

seanccc
17-Peridot
(To:tcoufal)

@tcoufal ,

 

Thank you for the replies,  they' re very helpful information  to understand it.  

 

Regards,

Sean

Announcements