cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Community Tip - Did you get an answer that solved your problem? Please mark it as an Accepted Solution so others with the same problem can find the answer easily. X

Why TWX Platform could not receive/queue all value streams updated by edges?

mren
1-Visitor

Why TWX Platform could not receive/queue all value streams updated by edges?

We made one edge update a temperature property increased by 1 every 5 secs. If 300k ~ 400k edges update logged temperature properties(value streams) every 5 secs concurrently, the ThingWorx Platform(TWX) could only receive/queue 85% ~ 90% of properties. When the edges' connections are established stably, We observed the 'totalWritesQueued' value of Value Stream Substem every 5 secs and computed its minus, then divided this minus by the amount of edges.

My question is, why the TWX couldn't handle 100% of properties when  more than 300k edges update properties concurrently?


+ TWX System Resource:

  - CPU E5-2650 v3 @ 2.30GHz with 10 cores * HT * 2-way

  - CPU usage: less than 50%

  - Total Memory: 128GB

  - JVM Memory: 96GB


+ TWX Subsystem Configuration:

  - Default except:

    > Value Stream Subsystem:

      ' maximumBlockSize: 50,000

      ' maximumQueueSize: 10, 000, 000

      ' maximumWaitTime: 1000

      ' numberOfProcessingThreads: 80

+ Logs:

  - Tomcat:

    com.thingworx.exceptions.CouldNotConnectException: 172.21.128.30-temp13270 is not connected

    java.util.concurrent.TimeoutException: Timed out APIRequestMessage [requestId: 1398634, endpointId: -1, sessionId: -1, method: POST, entityName:

    172.21.128.30-temp11300, characteristic: Services, target: NotifyPropertyUpdate]

  - TWX:

    [L: ERROR] [O: c.t.p.p.BatchStreamEntryProcessor] [I: ] [U: SuperUser] [S: ] [T: pool-8-thread-10] Error executing batch

    [L: WARN] [O: c.t.s.s.w.WSCommunicationsSubsystem] [I: ] [U: ] [S: ] [T: WsCommunication-151] Could not dispatch async service request [NotifyPropertyUpdate] :

      Timed out APIRequestMessage [requestId: 1405707, endpointId: -1, sessionId: -1, method: POST, entityName: 172.18.142.102-temp540, characteristic: Services,

      target:NotifyPropertyUpdate]

    [L: WARN] [O: c.t.c.s.e.DispatchingServerEndpoint] [I: ] [U: ] [S: ] [T: http-nio-8080-exec-68] NULL MessageSynchronizationContext! Request either timed-out waiting

      for this response, or it was received by mistake: ResponseMessage [requestId: 1401202, endpointId: 175200, sessionId: 1406274, code: STATUS_SUCCESS,

      multipart: false, packet #: 0, total packets: 0]

+ QueryPropertyHistory for One Thing: NOT CONTINUOUS

temperature

timestamp

98

2017-01-04 03:49:27.962

93

2017-01-04 03:49:01.006

91

2017-01-04 03:48:49.861

87

2017-01-04 03:48:26.911

86

2017-01-04 03:48:21.909

85

2017-01-04 03:48:16.907

79

2017-01-04 03:47:28.103

77

2017-01-04 03:47:17.009

74

2017-01-04 03:47:00.646

72

2017-01-04 03:46:46.981

70

2017-01-04 03:46:34.941

69

2017-01-04 03:46:29.372

66

2017-01-04 03:46:11.139

65

2017-01-04 03:46:05.810

63

2017-01-04 03:45:54.443

62

2017-01-04 03:45:48.696

61

2017-01-04 03:45:42.753

60

2017-01-04 03:45:37.495

58

2017-01-04 03:45:26.171

53

2017-01-04 03:44:57.037

50

2017-01-04 03:44:37.082

48

2017-01-04 03:44:26.089

46

2017-01-04 03:44:13.205

44

2017-01-04 03:44:03.188

42

2017-01-04 03:43:52.557

38

2017-01-04 03:43:29.961

33

2017-01-04 03:42:55.120

31

2017-01-04 03:42:44.548

27

2017-01-04 03:42:21.796

26

2017-01-04 03:42:12.657

25

2017-01-04 03:42:07.615

24

2017-01-04 03:42:02.288

21

2017-01-04 03:41:42.045

19

2017-01-04 03:41:30.359

18

2017-01-04 03:41:24.820

16

2017-01-04 03:41:13.063

15

2017-01-04 03:41:04.834

14

2017-01-04 03:40:59.453

13

2017-01-04 03:40:54.440

12

2017-01-04 03:40:49.361

11

2017-01-04 03:40:40.500

10

2017-01-04 03:40:28.502

9

2017-01-04 03:40:23.246

8

2017-01-04 03:40:15.742

7

2017-01-04 03:40:10.398

6

2017-01-04 03:40:04.640

5

2017-01-04 03:39:49.397

4

2017-01-04 03:39:38.949

3

2017-01-04 03:39:33.756

2

2017-01-04 03:39:28.749

1

2017-01-04 03:39:17.345

8 REPLIES 8
ttielebein
14-Alexandrite
(To:mren)

Are you saying that you have 300,000 - 400,000 edge devices updating the Platform, with no Connection Server, and you are wondering why the Platform cannot handle this? The Platform is not designed to handle this many edge devices on its own. You need the Connection Server for this. Here is a KCS Article on the subject. Hope this helps!

mren
1-Visitor
(To:ttielebein)

Thanks so much for your reply!

To simplify the problem, I didn't mention the Connection Server. In fact, I made 300k ~ 400k edge devices update properties via 2 ~ 3 optimized high-performance Connection Servers. I've checked that the TWX platform can handle 200k value streams sent by remote devices every 5 secs via only one Connection Server.

I really appreciate your help:-)

ttielebein
14-Alexandrite
(To:mren)

Ah, so the issue can't be the fault of too many connections. You're actually not the first person to post about this, and I am pretty sure there was no resolution on the other thread. You may want to consider opening a support case for this issue. How are your properties updated? Are you sure the updates are propagating to the Platform? Are these remote properties? What are their remote bind settings?

mren
1-Visitor
(To:ttielebein)

Thanks for your advise. I've opened a support case and I'm waiting for the feedback.

>> How are your properties updated?

the code below shows updating a temperature property every 5 secs for 12 times:

ConnectedThingClient client = new ConnectedThingClient(getClientConfigurator(), null);

client.start();

if (client.waitForConnection(10000)) {

     TemperatureThing thing = new TemperatureThing(thingName, sersorDescription, client, simulated, 0, tempConfig, testData);

     client.bindThing(thing);

     thing.loadPropertySubscriptions();

     long start = System.currentTimeMillis();

     while ((System.currentTimeMillis() - start) < 60000) {

          for (VirtualThing vt : client.getThings().values()) {

               vt.processScanRequest(); // defination described below

          }

          Thread.sleep(5000);

     }

}

/*

* public void processScanRequest() {

*      temperature_vtq.setValue(new NumberPrimitive(lastTemp)); // the 'lastTemp' is increased by 1 each time

*      temperature_vtq.setQuality(QualityStatus.GOOD);

*      setPropertyVTQ("temperature", temperature_vtq,true);

*       updateSubscribedProperties(10000);

* }

*/

>> Are you sure the updates are propagating to the Platform?

I'm not sure whether the Platform has received all of updates. I observed 'submitedTaskCount/completedTaskCount'(in WS Execution Processing Subsystem Metrics) and 'totalWritesQueued'(in Value Stream Processing Subsystem Metrics), but neither of them reach the amount of properties egde devices should update when 30k ~ 40k devices update concurrently.

>> Are these remote properties?

Yes.

>> What are their remote bind settings?

I'm sorry that I don't know what you mean:-( What info I should supply?

supandey
19-Tanzanite
(To:mren)

Hello Ming, I suppose Tori is looking for the Data Change Type , may be i am wrong, but I would be interested to know how is the property's Data Change Type is setup - would it be correct to assume you have set it to Always? Are these simulated edge devices or real hardware? I see some timeouts in the log could indicate to the connectivity issues as well from the edge device to ThingWorx.

Streams/Value Stream are logging asynchronously, since you are pushing data at such a high rate with so many connections, would it help to have a small pause while pushing the values? That could probably allow the platform just enough time to persist the value for the purpose of reflecting it on the chart. Else in my opinion it could happen that values do get pushed but since they aren't persisted they probably won't appear when you are selecting them for display.

mren
1-Visitor
(To:supandey)

The property's Data Change Type is not Always, but is Value with '0' as the Change Threshold by default. They are simulated edge devices. Yes, there are timeouts log at the edge devices side too:

ERROR - com.thingworx.communications.client.things.VirtualThing: Unable To Update Subscribed Properties For 172.21.128.92-temp20738 : Timed out APIRequestMessage [requestId: 454508, endpointId: 284402, sessionId: 204915, method: POST, entityName: 172.21.128.92-temp20738, characteristic: Services, target: UpdateSubscribedPropertyValues]

ERROR - com.thingworx.communications.client.endpoints.ClientCommunicationEndpoint: Exception refilling connections was: java.util.concurrent.TimeoutException: timeout waiting for websocket handshake to complete

ERROR - com.thingworx.communications.client.connection.netty.NettyClientConnectionFactory: Exception occurred while connecting to server java.util.concurrent.TimeoutException: timeout waiting for websocket handshake to complete

ERROR - com.thingworx.communications.client.endpoints.ClientCommunicationEndpoint: Refilling connections on [endpoint [627276, ws://172.21.128.46:9003/Thingworx/WS, timeout waiting for websocket handshake to complete], uri: {}] failed : {}

ERROR - com.thingworx.communications.common.endpoints.CommunicationEndpoint: [Endpoint id: [721349, 721349-0, EP_721349], connection id: {}, connection name: {}] FAILED to remove Connection - was not found

ERROR - com.thingworx.communications.client.endpoints.ClientCommunicationEndpoint: Refilling connections on [endpoint [721349, ws://172.21.128.47:9003/Thingworx/WS, Timed out AuthRequestMessage [requestId: 11311356, endpointId: -1, sessionId: -1, method: AUTHREQUEST]], uri: {}] failed : {}

ERROR - com.thingworx.communications.client.endpoints.ClientCommunicationEndpoint: Exception refilling connections was: java.util.concurrent.TimeoutException: Timed out AuthRequestMessage [requestId: 11311356, endpointId: -1, sessionId: -1, method: AUTHREQUEST]

And there are many replicated lines at the Connection Server side as below:

ERROR i.vertx.core.net.impl.ConnectionBase - java.io.IOException: Connection reset by peer

[vert.x-eventloop-thread-3] WARN  c.t.c.a.AbstractClientEndpoint - Error encountered, closing WebSocket

java.io.IOException: Connection reset by peer

        at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.8.0_101]

        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[na:1.8.0_101]

        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.8.0_101]

        at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[na:1.8.0_101]

        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[na:1.8.0_101]

        at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:221) ~[netty-all-4.0.33.Final.jar:4.0.33.Final]

        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:898) ~[netty-all-4.0.33.Final.jar:4.0.33.Final]

        at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242) ~[netty-all-4.0.33.Final.jar:4.0.33.Final]

        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) ~[netty-all-4.0.33.Final.jar:4.0.33.Final]

        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) ~[netty-all-4.0.33.Final.jar:4.0.33.Final]

        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) ~[netty-all-4.0.33.Final.jar:4.0.33.Final]

        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) ~[netty-all-4.0.33.Final.jar:4.0.33.Final]

        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) ~[netty-all-4.0.33.Final.jar:4.0.33.Final]

        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) ~[netty-all-4.0.33.Final.jar:4.0.33.Final]

        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_101]

I'm afraid that we have to consider seriously whether we can pause while pushing in this situation....

Can the metrics 'submitedTaskCount' or 'totalWritesQueued' show how many values do get pushed?

My question is how are connecting 300K-400K devices we are trying to connect 20,000 devices through connection server and connection are dropping and even only 8000 devices are connected and others are disconnected or unbound.

 

We are using HAProxy and PGPool. I hope you are not using this.

 

Please let us know.

sbt
15-Moonstone
15-Moonstone
(To:mren)

Hello mren,

I am in similar situation where we are pooling more than 2000 assets data on to the thingworx platform.

I am getting similar error in the .net edge with the error code 1114 .

We are using postgres as the default persistent provider and what should be my configuration for persistent provider to meet my requirement, Like thread pool etc. also if there is any other configuration to be considered, please let me know.

 

Thanks.

 

 

Announcements


Top Tags