Hi everyone,
We're currently facing a performance challenge in our ThingWorx implementation related to saving a large volume of image files (potentially millions over time).
We’ve observed that writing files directly to a file share (network storage) from ThingWorx is quite slow, whereas saving them to the local repository (on the same server where the ThingWorx ApplicationServer/Experience Service runs) is significantly faster.
To address this, our development team has proposed the following approach:
Save incoming files temporarily to the local ThingWorx repository (for fast write operations).
Then, in the background, run a scheduled service (e.g., every 15 or 30 minutes) that:
Moves these files to the file share for long-term storage.
Deletes them from the local repository after successful transfer.
We’d like to ask the community:
Is this a recommended or sustainable strategy for handling large-scale file storage in ThingWorx?
Are there risks or performance concerns (e.g., file locking, memory issues, scaling problems) we should be aware of with this kind of background file migration?
Any alternatives or best practices others have used for similar use cases?
Thanks in advance for your insights!
Solved! Go to Solution.
Hello,
I don't have so much experience with such volumetry but I will try to give you some inputs waiting other people to help you !
Regards,
Guillaume
Hello,
I don't have so much experience with such volumetry but I will try to give you some inputs waiting other people to help you !
Regards,
Guillaume
Hello Guillaume,
Thank you very much for your input and for taking the time to share these ideas!
Thanks again for your suggestions!
Hey @Rocko, if you have any insights about this topic. It would be great to have your input here.. thank you
Much of that depends on the use case, if there are concurrent read/write operations etc.
Most likely the slower access is due to the network storage being slow relative to local storage.
I can't find anything wrong with your approach to store locally first and taking the load off Thingworx and move the files in the background.
Depending on your use case you could manage current storage location in a DB table.
I would also try to structure the files a bit - if you have millions, probably create some reasonable subdirectories to make handling easier.
The files are already well structured. the path is like for example name/serialnumber/timestamp/foldername --> then pictures in it.
As you said you find this approach good to store locally first and taking the load off Thingworx and move the files in the background, would you also able to give me a bit technical aspect of it that how should i pursue this task. That would be really helpful for me to implement this solution. Thanks
You need to decide if you want to do it inside of TWX using schedulers, or outside, using e.g. Windows Task Mgmt or a cron job.
I would recommend the latter to keep the load outside of TWX (it's the same compute/CPU, but it doesn't occupy resources/threads in your TWX instance). SInce both repos are accessible in TWX you use the local one for fast writing, and when you want to access the file, you read from the repo assigned to the file share. If an expected file is missing you know it's still to be copied and can be found in the local repo.
The background job just moves all files from one (local) directory to a fileshare directory.
On the directory structure I would use a flat structure in the local repo and have the taxonomy in the file name and then decompose the filename and make that directories in the target repo.
Hi @MA8731174,
I wanted to follow up with you on your post to see if your question has been answered.
if you have more to share on your issue, please let the Community know so that we can continue to support.
Thanks,
Abhi
Hello,
Tl;dr -- raise this concern with your IT and see what they can propose.
Just a few semi-random thoughts on this subject:
Sorry, I just dumped what I had in my head, might be able to provide better guidance once you come back with some specifics.
/ Constantine
Hi Constantine,
thanks a lot for the detailed response and your valuable input — much appreciated.
Just to give you a bit more background on our setup and the reason we moved away from our original approach:
Initially, we were storing all images directly on the server. Over time, as the number of files grew, we repeatedly ran into disk space limitations. Every few weeks we had to manually extend the storage, and occasionally — especially over weekends — the disk would fill up completely, making the system temporarily unusable for end users. This was obviously not sustainable.
Because of that, we decided to go with a NetApp storage system, which acts as a dedicated file server connected to the application server. It's a classical setup — purely for disk space, nothing fancy. The main advantage for us is: if we allocate 100 GB, we get exactly 100 GB without having to worry about local disk management, VM constraints, or weekend outages. It's scalable and separate from the VM layer.
We don't deal with physical servers; everything is virtualized, both in the past and today. The storage is handled separately and dedicated to the file needs of our application.
1. How frequently is data written, in what chunks, and what is the overall volume per minute/day?
In the past, data was written to the system every minute due to high activity in project. While this has decreased slightly over time, we can still say that image data is being saved regularly — approximately every minute — as customers continue to submit entries with attached photos. so basically its just storing images. the size of image is also small like 190kb per image.
2. How much data is overwritten?
We do not overwrite any existing data i meant with data is photo. Each new entry (mashup ) results in a new set of files being stored permanently. The system is designed to be write-once, read-many.
3. How frequently are files listed or accessed?
File access typically happens via the user interface (mashup), when a customer clicks on an entry to view associated photos. On average, we estimate this occurs about once per hour or onces in 4 hours, as customers generally access the system only when a claim or follow-up is required. the claim means when customers claim
4. Do you ever delete data?
No. Once data is written, it is retained permanently. We currently have no deletion policy or cleanup routine in place.
5. Once written, how frequently is the data accessed?
This is similar to question 3. Access frequency is not time-dependent. Whether the data is new or several months old, it may be accessed when a claim is raised — meaning both recent and older data are treated equally in terms of relevance and retrieval.
6. For files of the same age, are read patterns uniform or spiky?
Access patterns are not uniform or repetitive. Each photo is typically accessed once — or at most, twice — in connection with a specific case or claim. After that, it is rarely (if ever) accessed again. There are no files in our system that are regularly revisited or accessed repeatedly.
Well, 200KB x 10 images / 1 minute is only about 0.03MB/s throughput, which is a synonym of "zero" nowadays. So I guess the throughput is not what makes it slow, is it? Must be listing files then, or some operations that you run in sequence, while they might be able to happen in parallel. A few more questions:
1. How do you know which files to present to the user when they open a claim? Is there a file listing operation at any point?
2. How do you display those images to the user -- do you fetch them sequentially, or display all at once?
3. You can easily measure your file share performance -- just write a large file and time it. Do the same for 1000 small files, too. Then run the same test for reading data (sequential and random) -- this will give you a good idea about your bottleneck.
NetApp is great stuff, by the way! I guess for you it is just an SMB or NFS share? ONTAP supports intelligent storage tiering and has several options to speed up file access at the edge, for example this and that. You should really check with your IT -- there's a chance that they might be able to deploy something on your ThingWorx VM, which does local caching automatically and transparently.
Finally, does your ThingWorx run on Windows or Linux?
/ Constantine
So, let me clear now that whats an issue i am having with file share. The problem we are facing is not about listing images. Thats working very efficiently and fast. The issue is when we send the data without picture from view app from phone it goes very very fast but when we capture picture and send it then it saves the data with a short delay means its slow and sometimes can take 2 3 seconds to service to be completed...we are having sync service and not async in thingworx which saves image
We have checked it and use local repository of thingworx and then if we make picture and send it from front end (view app - vuforia studio experience) then it goes also very fast.
So the issue is saving file from front end even though we compress image in mobile, it would be around 180kb and it should go fast but no it goes very slow.
Answering to your questions
1) There is a en entry in mashup with stored its image path and we list that image on click on it which works very very fast.
2) We save normally 1 picture with each entry and access them as i said by clicking on it. The path looks like for eg this 123123/12312312312312/2222/1
and in this folder named 1 there is an image.
3) This test which i have done is that i used this netapp and the pictures are being saved slow. Service takes time to save it. but on local repo it saves as expected means fast.
Our thingworx is running on windows.
Please let me know if you have further questions. I really appreciate your time and valuable answers in this regard. Thank you
Got it. Well, since you're on Windows, this should be easy to test -- just try to upload a large file to that file share using Explorer (not ThingWorx), and see what the throughput looks like. Try the same with a few small files, too. To make it a bit more scientific, you can use a simple bat file, something like that (assuming you mount your netapp share as Z drive):
@echo Large started: %date% %time%
copy large.bin z:\tmp\
@echo Large completed: %date% %time%
@echo Small started: %date% %time%
copy small1.bin z:\tmp\
copy small2.bin z:\tmp\
copy small3.bin z:\tmp\
copy small4.bin z:\tmp\
copy small5.bin z:\tmp\
@echo Small completed: %date% %time%
Make a large file something like 100MB, and the small ones say 200KB, like your photos.
Like I said, I'd be surprised if the throughput is below let's say 100MB/s. If it is really slow though, you should really speak with your IT -- the right way to solve this problem would be by making the file share fast, not building a custom layered storage for ThingWorx.
Edit: Just to make it clear -- ThingWorx doesn't do anything out of the ordinary with those files in file repositories. When you write a file, it literally just writes a file, when you list files, it just calls the underlying OS routine... In other words, there's no wizardry happening behind the scenes, it is all pretty transparent.
So, after the quick test and the details:
The issue clearly seems to be with the performance when handling many small files, which significantly slows down the transfer process despite the overall small data volume
Just read your other post about Python script... If that's what works slow, you might be able to copy all those files in parallel with asyncio.