cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

ThingWorx Navigate is now Windchill Navigate Learn More

Translate the entire conversation x

How to find which stream consumed more space in Database

Velkumar
19-Tanzanite

How to find which stream consumed more space in Database

Hi all,

 

We have a ThingWorx instance that has been running for the last 4 years. In the database, streams have consumed 1 TB of data. Is there a way to find out exactly which stream has consumed the most space in the database?

 

Velkumar_0-1754984515924.png

 

 

Thanks,

VR

 

ACCEPTED SOLUTION

Accepted Solutions

How about naive approaches? They might take a while to run though...

 

select entity_id,count(*) from stream group by entity_id;

select entity_id,sum(pg_column_size(field_values)) from stream group by entity_id;

 

View solution in original post

7 REPLIES 7

How about naive approaches? They might take a while to run though...

 

select entity_id,count(*) from stream group by entity_id;

select entity_id,sum(pg_column_size(field_values)) from stream group by entity_id;

 

Rocko
18-Opal
(To:Rocko)

The caveat being it may not directly reflect disk space due to how the DB organizes its tables, blocks, dead tuples and so on, but it should give an indication on the worst offenders.

Velkumar
19-Tanzanite
(To:Rocko)

Hi @Rocko 

 

I tried this but it takes lot of time. Is there any other quick approach to find it

 

/VR

Not that I am aware of. 

   select entity_id,count(*) from stream group by entity_id;

is not using an index this is why it takes so long.

You could create an index, but that will take time AND disk space. But it would help for the future.

 

Most obvious ideas (most likely already checked):

- I assume you do have a lot of streams in the platform? If you only have like 5 you could figure which one it has to be?

- Other idea if you use the GetStreamEntryCount service on the Stream-Things, maybe this is faster than the SQL? But maybe internally it does the same (not sure). At least for a count - not for actual size..

Hello @Velkumar,

 

You can also apply some heuristics to it. Assuming that all stream records for each given stream have similar structure, and you didn't evolve that structure too much, you can do some Monte-Carlo sampling:

 

  • For each Stream ID:
    • Generate N random record IDs
    • Run @Rocko's second query with "WHERE id IN (<IDs>)" clause -- this won't take long
      • Compute the average
      • Multiply it by the count from @Rocko's first query

 

Sorry, I'm too lazy to write an SQL for that, but you should be able to do it all in a single SELECT.

 

This should give you a good approximation, and you can even estimate the quality of that approximation by checking how the result converges as you increase N, e.g. from 10 to 100 to 1000.

 

/ Constantine

Hello @Velkumar,

 

It looks like you have some responses from some community members. If any of these replies helped you solve your question please mark the appropriate reply as the Accepted Solution. 
Of course, if you have more to share on your issue, please let the Community know so other community members can continue to help you.

Thanks,
Vivek N.
Community Moderation Team.

Announcements


Top Tags