Dear All,
I want to start a discussion about how Creo Simulate is using multicore and how to apply the best settings.
At the Moment I am doing a lot of Simulations but the calculation time is quite a mess even when I am optimazing the mesh etc.
Therefore we asked out PTC dealer how we can accelerate the calculation time.
The recommondation was to use as much cores as possible, use a lot of ram and also use a SSD.
So we ordered now a new Maschine with the following Hardware:
--> 2x CPU Intel Xeon E5-2699v4
--> 128 GB RAM
--> 512 GB SSD
--> Nvidia Quadro K5200
--> Windows 10 64bit
--> Creo 3.0 M120
We have now 2x 22 cores but Creo Simulate is only using max 6 cores or so.
This is something we cannot understand. Also the PTC dealer support don´t know how to proceed.
We checked several settings but it is not faster then my mobile Workstation.
So is maybe some one there who has some experience about using multicore with Creo Simulate?
Thank you in advance.
Hi,
in https://support.ptc.com/appserver/cs/view/solution.jsp?n=CS115541 you can find the following sentence:
The solver in PTC Creo Simulate 2.0 is highly multi-threaded and scales up to 64 threads.
Also see:
hardware optimization - Creo Simulate 2
MH
Hello
Should he give information in Windows or in the Config.pro or other that Creo take into account multi processors.
Kind regards.
Denis
Hi,
you only Need this when you want to Limit the amount of CPU.
Normally Creo is using all CPU´s from PTC Information.
What CPU is in your mobile workstation? There are some pretty fast mobile CPUs out there nowadays...
Also, what kind of analysis are you doing (linear, contact, nonlinear material etc)?
Hi,
thank you for your comments.
I already have seen the linked articel but this is nonsense in my eyes.
There is written "The solver in PTC Creo Simulate 2.0 is highly multi-threaded and scales up to 64 threads.". What does this mean?
I have no so much cores but only max 20% are used.
The CPU in the mobile Workstation is Intel Core i7-4810MQ.
It doesn´t matter what Kind of Simulation I am running.
According to Passmark, the 4810MQ has a single-thread score of 2055, versus 1768 for the 2699 v4. Thus I would expect the mobile system to take about 14% less time to run an analysis, assuming everything else is equal.
As Guilio says, only a small portion of the analysis uses multiple cores; I would say that if the CPU time at the end of the run is twice the elapsed time, the system is running fairly well without any bottlenecks. I've just run a medium-sized analysis in 1290 seconds with 2370 seconds of CPU time; my colleague's machine previously took 5000/6400, suggesting maybe some disk limitations (he has less RAM, needing more temp file access; and I put the results on a RAM disk)
Note that disk writing time can have a significant effect for large analyses, hence the recommendation for an SSD - this is mitigated by lots of RAM because it reduces the amount of disk read/write for temporary files, but this never reduces to zero.
Have you increased the SOLRAM ("memory allocation for solver") setting? This is important to achieve the reduction in temp file access - if you have lots of RAM then set it to the maximum of 16384 MB (IIRC), provided this is no more than half your installed RAM size. The default of 128 MB is rather outdated now!
What you really want is an overclocked i7-7700k! (although it seems they may not overclock too well...)
If I remember well, Simulate uses all the threads in each core only for the resolution of the equations.
For example, if you run a simple static analysis, Simulate does the follows steps:
Bye
but if so why is PTC writting "The solver in PTC Creo Simulate 2.0 is highly multi-threaded and scales up to 64 threads."
Multi-threaded means more cores are much better in my opinion.
I don't understand the mistake.
For example if you run the simulation on a cluster PC, that may have a lot op CPU, and each CPU a different number of threads, you already know that Simulate will utilize at maximum 64 threads.
Ok maybe there is a misunderstanding.
Creo is talking about Maximum threads of 64.
In our understanding this means that 64 cores (not cpu´s) can max. solve a calculation.
Therefore we decided to buy this Workstation with 2x22 cores means in total 44 threads.
But the solver is just using maybe 20% of all cores and not all 44.
So why is PTC talking about 64 threads?
I cannot see this.
Here a screenshot of the Workstation running a Simulation.
Try to look at this option.
thank you for you answer. I already used this Settings but it has no Impact .
We thought we can Speed up the calculation time with using a lot of cores but this is not true.
What is your opinion about this issue?
Did those 'lumps' in the utilisation coincide with the first and second passes in the analysis? It looks like a pretty short analysis if so...
Not really useful, but I've noticed that on very short analyses (like ten seconds!) it never really seems to load up multiple cores, though it does on longer ones. No idea whether there's some sort of 'intelligent' scaling on the number of threads it divides the solve into.
the Simulation is running 30 minutes.
It seems that more cores will not Speed up the calculation time isn´t ?
Is it possible to run several Simulation together? That would help me to decrease to total calculation time.
One purpose would be to use different values for on convection condition and calculate this all at the same time.
How can I set up this?
A brief moment of clarity:
from the pic you can see
2 cores
44 cores
88 logical processors (threads)
You can see here, for example, regarding logical processors.
So, in theory, you have 20 threads more than the maximum.
Given the above, which kind of analysis are you doing?
Linear or non linear?
In my experience, but I'm sure I read it somewhere, nonlinear analysis are single thread.
I've just do a little nonlinear static analysis for test, and the program uses only one thread for the iterative resolution.
I give you an advice: with that amount of RAM, you can consider the use of a RAM-disk where putting the temporary folder of the analysis.
Especially with non-linear analysis, afther each iteration the program write the result of the just finished iteration in the folder. If the assembly is very big, the amount of datas per iteration to write on the disk could delay much.
I agree with the RAM disk approach. I've been using IMDisk for some years now; for most analyses, I can put both temporary and results files on the RAM disk, but there's still a useful advantage (over a spinning HDD at least) even just putting the results on the RAM disk.
The run status / .rpt file will tell you how much disk space is being used, so then you know how large to make the RAM disk.
I agree with everything thats been said so far. In my experience, Creo Simulate never scales perfectly. My CPU time is always greater than my Elapsed time (if its not, then I know Simulate is paging to my hard disk), but never a multiple of the number of cores I have. For example I have an 8 core machine, and Linear static analyses for me, the CPU time is about 1-2 times multiple of the Elapsed time. On Contact and Modal Analyses, this can be about 2-4 times the elapsed time, but I never see CPU time 8 times greater than the elapsed time.
I would suggest:
setting the config.pro option sim_solver_memory_allocation to 16384. This controls solram, and that value is the maximum amount of RAM that you can assign to Creo Simulate currently.
Writing results to your SSD, and maybe even consider a RAMDisk if you think it helps.
Yes it is possible to run multiple analyses together, but each analysis needs a solver license, so you will have to purchase additional solver licenses. I have about 4 solver licenses on my computer and I can run 4 different analyses concurrently as long as they can all share system resources well and not crowd out each other for RAM and processing power.
There is also a Distributed Computing extension where a simulation job can be shared across multiple computers in a network, but I have not tested it personally so I cant give you numbers on that.
thank you for this reply. I totally agree to all of your comments but I stikl do not understand why ptc is writing that simulate can use multithreaded taskes! when I run a simulation I always see how high the usage of the coy load is and there is almost no high load. That means that only a view cores are used and this is nit comparable to the note from ptc. I have tried several different types of simulations and it is alwa the same. If so you can never accelate your calculation speed isn it? What should be the best hardware setup in your opinion?
hi,
block solver of simulate can use multithreaded taskes:
and standard settings:
regards
paul
I would suggest you don't set always the solram at the maximum value.
As you can see here, the best scenario is when the solram is a bit bigger than the K matrix.
If you have solram = 16 Gb and a K = 2 Gb, there will be 14 Gb of ram that you can't use for nothing because proE/Creo holds the entire space of the solram only for the stiff matrix.
If you look the first, and most efficient, scenario there are 0.5 Gb of free space of RAM in the solram (2 Gb and a K=1.5 Gb) but the other ram that creo uses for the resolution, called DB, doesn't fill the remaining free part of solram.
Giulio Fraulini wrote:
...If you have solram = 16 Gb and a K = 2 Gb, there will be 14 Gb of ram that you can't use for nothing because proE/Creo holds the entire space of the solram only for the stiff matrix.
Apart from the 'wasted' ram, is there actually a performance penalty? The OP has 128 GB installed, so he probably won't miss 14 GB! I did suggest that one should keep to the recommendation of making SOLRAM no larger than half the installed RAM.
I notice when watching msengine.exe in Task Manager (Win 7) that the Commit Size is slightly larger than the SOLRAM value (as per your explanation), but the Private Working Set is generally much smaller (indeed indicating that not all of SOLRAM is being used). I don't know enough to understand how this will influence overall system performance.
In the case of having 128Gb of RAM I also think that you can set always 16Gb like solram. My discussion was a "general rule" inspired upon what PTC says on its Help Center.
I'm running a LDA analysis with contacts (pic. #1) and it is not true, as I said previusly, that there isn't the multithread with a non-linear analysis. I remember wrong.
What I'm seeing is that Creo uses multi-thread only to solve each iteration (pic. #2), not over the entire process that seems remain single-thread.
I see also that it don't use all CPU during the multi-thread but almost 70%. I don't know why... (pic. #3)
A big part of time is spent for i/o from the hard-disk (SSD), so a RAM-disk may bring a considerable improvement of time.
Interesting. Just as an aside, in Task Manager I prefer the "one graph, all CPUs" option as it more clearly shows the total CPU utilisation over time (you just have to remember how many virtual CPUs you have!).
this is how it Looks at my Workstation (nonlinear)
How much time does pass between each iteration (see in checkpoints tab).
The time difference between the "Time Steps" is not immediately clarifier of the speed of the computation: if it requires 20 or 10 iteration to achieve the convergence within the n-time step the comparison between the n-time step and the (n+1)-time step is wrong.
In the pic attached, within the same time step, it pass few second between iterations (only at the achieve of the convergence of this time step, the information are written on the disk, so it needs more time for i/o).
Maybe the passage between iteration is so fast (few seconds in my case with 8 threads, probably less if I have your xenon with 88 threads...) that not all the processor power is used.
here a short Portion of the logfile;
Begin Global Matrix Assembly, Pass 62
Fri Mar 31, 2017 14:15:35
Begin Equation Solve, Pass 62
Fri Mar 31, 2017 14:15:35
Begin Load Calculations
Fri Mar 31, 2017 14:15:48
Begin Convergence Check Pass 62
Fri Mar 31, 2017 14:15:55
Begin Temperature and Flux Calculation
Fri Mar 31, 2017 14:16:00
Begin P-Loop Pass 63
Fri Mar 31, 2017 14:16:13
Begin Time Step 63
Fri Mar 31, 2017 14:16:13
Begin Global Matrix Assembly, Pass 63
Fri Mar 31, 2017 14:16:13
Begin Equation Solve, Pass 63
Fri Mar 31, 2017 14:16:13
Begin Load Calculations
Fri Mar 31, 2017 14:16:26
Begin Convergence Check Pass 63
Fri Mar 31, 2017 14:16:34
Begin Temperature and Flux Calculation
Fri Mar 31, 2017 14:16:38
Completed Analysis: M290_lowerarm_500A_trans500
Fri Mar 31, 2017 14:16:52
This is the beginning of the Checkpoints tab report.
I meant the part where you can see the itaration.
Could you post us the entire file*.pas?
I meant to attach the file, not to paste its content.
Anyway I don't understand...you have said that the analysis was nonlinear.
But in your report I don't see the iterations...
From the report I see terms like "Temperature" and "flux"; so it is a thermal analysis.
I don't have experience in this kind of analysis, but I think that as a nonlinear mechanical analysis the program solves the problem in the iterative manner; otherwise I wonder the senso of the sentences "Begin Convergence Check Pass #" in your report.