Skip to main content
13-Aquamarine
December 22, 2010
Solved

Mechanica: limiting (critical) hardware

  • December 22, 2010
  • 7 replies
  • 39834 views

I'm wondering what the bottleneck is when running Mechanica analysis, hardware-wise.

We're currently running a large analysis on a 6 GB, quad-core machine. All the temporary files and results are going to the local hard disk, so there shouldn't be any significant network access (and the Task Manager graphs shows none).

Although Mechanica has recognised four cores (and occasionally uses them, getting more than 25% CPU usage), most of the time it's only at 5-10% CPU.

What's it waiting for? What hardware would increase the speed for these analyses? Do we need a fast RAID hard disk set-up, or faster memory, or more memory, or what?


This thread is inactive and closed by the PTC Community Management Team. If you would like to provide a reply and re-open this thread, please notify the moderator and reference the thread. You may also use "Start a topic" button to ask a new question. Please be sure to include what version of the PTC product you are using so another community member knowledgeable about your version may be able to assist.
Best answer by TadDoxsee

Hi All,

I've been reading this discussion and thought I'd try to clarify a few points.

Hyper-threading

First, concerning hyper-threading, Burt's graphs clearly show that there is no benefit to using hyper-threading. We found similar results through our own testing and therefore recommend that users do not use hyper-threading.

Parallel Processing

For very large models, the most time consuming part of a Mechanica analysis is solving the global stiffness matrix equations. For this part of the analysis, Mechanica uses, by default, all of the available CPU cores for multiprocessing, up to a limit of 64 cores. Today, there are a few other parts of the analysis where Mechanica uses multiple cores and we plan to expand multiprocessing to other parts of the analysis in the future.

RAM and solram

The biggest influences on performance are the amount of RAM in your machine and how that RAM is used by Mechanica.

The amount of memory that use used during an analysis depends on several factors, including the complexity of the model, the desired accuracy of the solution, and the type of analysis or design study you are running. You can see how much total memory an analysis takes by looking at the bottom of the Summary tab of the Run Status dialog. The line you're looking for looks like this:


Maximum Memory Usage (kilobytes): XXXX

If the maximum memory usage of Mechanica plus the memory used by the OS and the other applications exceeds the amount of RAM in your machine, then the operating system (OS) will swap data between RAM and the hard disk, which seriously degrades the performance of your applications. Thus, to achieve maximum performance, you want to make sure that the maximum memory usage is less than the amount of RAM in your machine,

For very large models, the thing that requires the most memory during an analysis is the global stiffness matrix. You can see how big the global stiffness matrix is by looking on the Checkpoints tab of the Run Status dialog box (also in the .pas file in the study directory). The line you're looking for is

Size of global matrix profile (mb):

Mechanica allows you to limit the amount of memory that the global stiffness matrix will consume by setting the Memory Allocation field in the Solver Settings area of the Run Settings dialog.

We often call this Memory Allocation setting "solram". With this setting, you allocate a fixed amount of memory in which to hold slices of the global stiffness matrix that the linear equation solver works with at any one time. If the global stiffness matrix is too big to fit in solram, then Mechanica will swap part of the matrix back and forth between disk and RAM using an specialized swapping algorithm that is more efficient than the general swapping algorithm used by the OS.

To explain these concepts in more detail, I describe three different scenarios of how Mechanica using memory during an analysis.

Scenario I

Mechanica runs most efficiently when the entire global stiffness matrix fits in solram and when the total memory used by Mechanica fits in RAM.

For example, suppose you have a machine with 4 GB of RAM and 4 GB of disk allocated to swap space. You run an analysis which needs 1 GB for the global stiffness matrix, K, and 2 GB for everything else, which I'll call DB. If you set solram to 1.5 GB, then, ignoring the RAM used by the operating system and other applications, the memory usage looks like this.

Available: RAM swap

|--------------------------------|--------------------------------|


Used by Mechanica

:

DB K
****************(########----) Ideal
solram



DB + solram < RAM good (no OS swapping)
K < solram good (no matrix equation swapping)

In the above, the memory used by DB is shown as ****, the memory used by K is shown as ###, and the memory allocated to solram is inside parentheses (###--). Because K is smaller than solram, there is some memory that is allocated to solram that is unused, shown as ----. This is an ideal situation because the K < solram and DB + solram < RAM and hence, no swapping will occur.

Scenario II

Then next most efficient scenario is when the entire amount memory used by Mechanica still fits in RAM, but the global stiffness matrix does not fit in solram.

Available: RAM swap

|--------------------------------|--------------------------------|


Used by Mechanica

:

DB K
****************(#########)## OK
solram



DB + solram < RAM good (no OS swapping)
K > solram not so good (matrix equations will be swapped)

In this case, the part of K which does not fit in solram, shown above as ###, will be swapped to disk with specialized, efficient Mechanica code.

In this scenario, the size of solram has some, but not a large, effect on the performance of the analysis. In general, the larger solram is, the faster the global stiffness matrix equations will be solved, as long as the total memory used fits in RAM.

Scenario III

The worst case scenario is when the total memory used by Mechanica does not fit in RAM. If the total memory allocated by Mechanica (and all of the other processes running on your machine) exceeds the total RAM of your machine, then the operating system will swap data.

Available: RAM swap

|--------------------------------|--------------------------------|


Used by Mechanica:

DB K
***********************(################----) Bad
solram

DB + solram > RAM bad (OS will swap data)
K < solram doesn't really matter

In this scenario, the analysis will run slowly because the operating system will swap data. If this occurs, it's better to decrease solram so that memory that Mechanica uses remains in RAM, as shown below

Available: RAM swap

|--------------------------------|--------------------------------|


Used by Mechanica:

DB K
***********************(######)########## OK
solram

DB + solram < RAM good
(no OS swapping)

K > solram not so good

(matrix equations will be swapped)

This is the same as scenario II above.

There are few other things to keep in mind.

  • If you use a 32-bit Window OS, the maximum amount of memory that any one application can use is 3.2 GB.
  • Solram is currently limited to a maximum of 8 GB. This maximum will be increased in a future release of Mechanica.

Here are some guidelines that you can follow to improve performance.

  1. Run on a machine with a 64-bit OS and lots of RAM.
  2. Exit other applications, so that Mechanica can use as much RAM as possible.
  3. Set solram low enough so that the total memory used by Mechanica is less than your total amount of RAM.
  4. If possible, set solram high enough so that the global stiffness matrix fits in solram (but don't violate guideline #3)


Disk Usage

The other major factor that influences performance is disk usage. During an analysis, Mechanica writes all of it's results to disk. Also, Mechanica temporarily stores on disk intermediate data that is required during the analysis. Although we haven't done detailed studies to determine their actual impact, the following guidelines should help improve performance.

  • Make sure you are not using any drives that are mounted across the network.
  • Use drives that have a generous amount of empty space on them.
  • Occasionally defragment your disks so that data can be written and read in large contiguous blocks.
  • Use fast hard drives.
  • Use disk striping with a redundant array of independent disks (RAID) to increase IO performance.
  • Use a RAM disk instead of a hard disk.
  • Use a solid-state drive instead of a hard disk drive.

Sorry for the length of this note. It's more than I had originally intended to write, but I wanted to explain in detail how to get the maximum performance from your hardware for Mechanica. I would be curious to know if there are users out there who are already following these guidelines and what their real-world experiences are.

Tad Doxsee

PTC

7 replies

12-Amethyst
December 23, 2010

Hi Jonathan,

Not really doing this recently but in the past one hassle was needing more memory than we had and then getting paging to HDD swapspace when we reached the limit. Probably not an issue with your setup but good to tick off to be sure. The next thing was setting the RAM that the solver is allowed to use. In older versions this defaulted to a really small amount of RAM (I guess to make sure it would not fail) so our rule of thumb for XP32 was to set solver RAM to half the available RAM e.g. 1GB for a 2GB machine (told you it was a while back). Early on we had the temporary files over the network problem but like you we sorted that out. Last but maybe relevant is the possibility for using solid state HDD for the temp files. Will be way quicker than any RAID setup with even high speed SCSI HDD and should not have to be that large for this type of work.

Hope some of this helps.

Regards,

Brent Drysdale

13-Aquamarine
December 23, 2010

Well, for anyone who's interested I've found out some more today.

I discovered Windows perfmon.exe, and I've been watching disk and CPU usage while running the analysis.

I know that memory/hard disk swapping is an issue, as the run used all 6 GB of memory plus 15 GB of disk space.

Although Mechanica does use multiple cores, in practice it spends a long time meshing with just one core, and only uses more than two (>50% CPU) for brief moments. I've also read elsewhere than Mechanica only uses two cores, although that may be out of date.

Elapsed time for the run was 1h20m, with CPU time being 47m - not much, when you consider there was 5h20m available in total from the four cores!

The long spells of less than one core (<25% CPU) are mostly while disk writes or reads are happening (up to 70 GB/s on this machine, which sounds impressive to me although I don't know what drive(s) is(are) in it). There are also some periods of low CPU and much slower disk access, around 800 kB/s, which might suggest that both random-access and contiguous-transfer speeds are important at different times.

Therefore, I'd conclude that the requirements are:

1) As much memory as possible (not really a surprise).

2) The fastest single or dual core processor possible - there will be little advantage in a quad core (at the moment).

3) The fastest hard drive (SCSI, RAID, SSD or whatever) possible.

On the point about the memory allocation size (the value in Mechanica's Run->Settings dialogue box): when I did Mechanica training, the instructor told us that this was not a maximum, but a block size (the increment in which memory is allocated) and that increasing it (the default is 128 kB) would not help. Equally, I could believe that a larger block size might be more efficient on a computer with >4 GB - can anyone from PTC comment?

1-Visitor
February 14, 2011

a couple of answers ....

If its using all the memory then this is your speed problem - once it starts paging to disk you're in for a long wait. Win7x64 and full RAM is the answer - just keep putting more RAM into the machine until the job runs with some memory headroom in Windows. If you cant get enough RAM into machine then you need to reduce the job size eg de-fillet model in non-critical areas.

Mechanica does use all cores allocated - we have a hex core i7 machine and it does use them HOWEVER only in a couple of phases - Equation Solve and Post Processing Calcs. All the rest is single threaded. Quad core is probably the best tradeoff. There needs to be a major rewrite to get more phases using more cores - maybe CREO.

Hyperthreading does NOTHING ! - it takes the same amount of time to run a job with H/T on (12 cores) as H/T off (6 cores) - theres just a heap of CPU thrash happening (2X the CPU time) go figure. Once again a rewrite is probably required to leaverage the H/T power of these new processors - maybe its not possible.

Memory allocation (SOLRAM) size doesnt seem to matter - you are actually better off leaving it low as it puts aside RAM which can cause a shortage elsewhere. Have a look in the *.pas files - it tells you what it should be set to - usually quite small.

More on this over at MCadCentral

http://www.mcadcentral.com/proe/forum/forum_posts.asp?TID=40699&PN=1&TPN=1

Frankly Mechanica's documentation is so out-of-date its laughable - they talk about 128M RAM. It really needs a major re-write - the results window needs to be properly integrated into ProE (FloEFD has superior integration). Unfortunately all our maintenance $'s were used to port it to CoCreate. Maybe CREO.

Can anyone comment from PTC where Mechanica is heading under CREO - are we going to get a proper re-write or just another port of the old girl ?

Shes still good but starting to show her age


TadDoxsee1-VisitorAnswer
1-Visitor
March 7, 2011

Hi All,

I've been reading this discussion and thought I'd try to clarify a few points.

Hyper-threading

First, concerning hyper-threading, Burt's graphs clearly show that there is no benefit to using hyper-threading. We found similar results through our own testing and therefore recommend that users do not use hyper-threading.

Parallel Processing

For very large models, the most time consuming part of a Mechanica analysis is solving the global stiffness matrix equations. For this part of the analysis, Mechanica uses, by default, all of the available CPU cores for multiprocessing, up to a limit of 64 cores. Today, there are a few other parts of the analysis where Mechanica uses multiple cores and we plan to expand multiprocessing to other parts of the analysis in the future.

RAM and solram

The biggest influences on performance are the amount of RAM in your machine and how that RAM is used by Mechanica.

The amount of memory that use used during an analysis depends on several factors, including the complexity of the model, the desired accuracy of the solution, and the type of analysis or design study you are running. You can see how much total memory an analysis takes by looking at the bottom of the Summary tab of the Run Status dialog. The line you're looking for looks like this:


Maximum Memory Usage (kilobytes): XXXX

If the maximum memory usage of Mechanica plus the memory used by the OS and the other applications exceeds the amount of RAM in your machine, then the operating system (OS) will swap data between RAM and the hard disk, which seriously degrades the performance of your applications. Thus, to achieve maximum performance, you want to make sure that the maximum memory usage is less than the amount of RAM in your machine,

For very large models, the thing that requires the most memory during an analysis is the global stiffness matrix. You can see how big the global stiffness matrix is by looking on the Checkpoints tab of the Run Status dialog box (also in the .pas file in the study directory). The line you're looking for is

Size of global matrix profile (mb):

Mechanica allows you to limit the amount of memory that the global stiffness matrix will consume by setting the Memory Allocation field in the Solver Settings area of the Run Settings dialog.

We often call this Memory Allocation setting "solram". With this setting, you allocate a fixed amount of memory in which to hold slices of the global stiffness matrix that the linear equation solver works with at any one time. If the global stiffness matrix is too big to fit in solram, then Mechanica will swap part of the matrix back and forth between disk and RAM using an specialized swapping algorithm that is more efficient than the general swapping algorithm used by the OS.

To explain these concepts in more detail, I describe three different scenarios of how Mechanica using memory during an analysis.

Scenario I

Mechanica runs most efficiently when the entire global stiffness matrix fits in solram and when the total memory used by Mechanica fits in RAM.

For example, suppose you have a machine with 4 GB of RAM and 4 GB of disk allocated to swap space. You run an analysis which needs 1 GB for the global stiffness matrix, K, and 2 GB for everything else, which I'll call DB. If you set solram to 1.5 GB, then, ignoring the RAM used by the operating system and other applications, the memory usage looks like this.

Available: RAM swap

|--------------------------------|--------------------------------|


Used by Mechanica

:

DB K
****************(########----) Ideal
solram



DB + solram < RAM good (no OS swapping)
K < solram good (no matrix equation swapping)

In the above, the memory used by DB is shown as ****, the memory used by K is shown as ###, and the memory allocated to solram is inside parentheses (###--). Because K is smaller than solram, there is some memory that is allocated to solram that is unused, shown as ----. This is an ideal situation because the K < solram and DB + solram < RAM and hence, no swapping will occur.

Scenario II

Then next most efficient scenario is when the entire amount memory used by Mechanica still fits in RAM, but the global stiffness matrix does not fit in solram.

Available: RAM swap

|--------------------------------|--------------------------------|


Used by Mechanica

:

DB K
****************(#########)## OK
solram



DB + solram < RAM good (no OS swapping)
K > solram not so good (matrix equations will be swapped)

In this case, the part of K which does not fit in solram, shown above as ###, will be swapped to disk with specialized, efficient Mechanica code.

In this scenario, the size of solram has some, but not a large, effect on the performance of the analysis. In general, the larger solram is, the faster the global stiffness matrix equations will be solved, as long as the total memory used fits in RAM.

Scenario III

The worst case scenario is when the total memory used by Mechanica does not fit in RAM. If the total memory allocated by Mechanica (and all of the other processes running on your machine) exceeds the total RAM of your machine, then the operating system will swap data.

Available: RAM swap

|--------------------------------|--------------------------------|


Used by Mechanica:

DB K
***********************(################----) Bad
solram

DB + solram > RAM bad (OS will swap data)
K < solram doesn't really matter

In this scenario, the analysis will run slowly because the operating system will swap data. If this occurs, it's better to decrease solram so that memory that Mechanica uses remains in RAM, as shown below

Available: RAM swap

|--------------------------------|--------------------------------|


Used by Mechanica:

DB K
***********************(######)########## OK
solram

DB + solram < RAM good
(no OS swapping)

K > solram not so good

(matrix equations will be swapped)

This is the same as scenario II above.

There are few other things to keep in mind.

  • If you use a 32-bit Window OS, the maximum amount of memory that any one application can use is 3.2 GB.
  • Solram is currently limited to a maximum of 8 GB. This maximum will be increased in a future release of Mechanica.

Here are some guidelines that you can follow to improve performance.

  1. Run on a machine with a 64-bit OS and lots of RAM.
  2. Exit other applications, so that Mechanica can use as much RAM as possible.
  3. Set solram low enough so that the total memory used by Mechanica is less than your total amount of RAM.
  4. If possible, set solram high enough so that the global stiffness matrix fits in solram (but don't violate guideline #3)


Disk Usage

The other major factor that influences performance is disk usage. During an analysis, Mechanica writes all of it's results to disk. Also, Mechanica temporarily stores on disk intermediate data that is required during the analysis. Although we haven't done detailed studies to determine their actual impact, the following guidelines should help improve performance.

  • Make sure you are not using any drives that are mounted across the network.
  • Use drives that have a generous amount of empty space on them.
  • Occasionally defragment your disks so that data can be written and read in large contiguous blocks.
  • Use fast hard drives.
  • Use disk striping with a redundant array of independent disks (RAID) to increase IO performance.
  • Use a RAM disk instead of a hard disk.
  • Use a solid-state drive instead of a hard disk drive.

Sorry for the length of this note. It's more than I had originally intended to write, but I wanted to explain in detail how to get the maximum performance from your hardware for Mechanica. I would be curious to know if there are users out there who are already following these guidelines and what their real-world experiences are.

Tad Doxsee

PTC

13-Aquamarine
March 8, 2011

Tad,

We run big models.

Can you expand upon the hardwired solram=8192 limit?

is this removed in creo?

Thanks

1-Visitor
March 8, 2011

Hi Charles,

I'm wondering, how big is big. For your big jobs, how big is the size of your global matrix profile (shown in the Checkpoints tab and the .pas file)? How much RAM is in the machine you run big jobs on?

For Creo 1.0, we plan to double the size of the maximum allowable solram from 8 GB to 16 GB.

Tad

13-Aquamarine
March 16, 2011

Apologies if this is a stupid question, and I'm just revealing my ignorance of Windoze memory management, but: why doesn't Mechanica use all the available memory before it starts spilling onto disk?

My PC reports 3.25 GB of installed memory, and generally runs around 1 GB in use (including Pro/E) according to Task Manager. My current Mechanica run typically pushes the "in use" up to about 1.7 GB (so ~700 MB of msengine.exe), and then starts using an additional 1+ GB on disk - why isn't it going up to ~3 GB of RAM first?

13-Aquamarine
June 20, 2011

A pragmatic update for reference, for anyone else who's lurking on this discussion:

Regardless of the theory above about the matrix size, and optimising the memory allocation size, it appears that Mechanica essentially uses just over twice the value entered, in total.

So, on my old 32 bit machine a value of 896 MB seemed to be the maximum, and gave the fastest runs at around 2 GB used. 1024 MB caused it to crash.

On my new, 6 GB 64 bit machine, a setting of 2048 Mb seem to allow Mechanica to use about 4.5-5.0 GB (leaving 1 GB for Windoze and Pro/E); 2560 MB (2.5 GB) can just start it using the Windows swapfile and is therefore too much.

So, the old guidance of "set it to about half your available RAM" is still more-or-less correct!

1-Visitor
March 17, 2011

.... and what about mechanica lite?

How do you set up the parameters, like solram there?

13-Aquamarine
March 24, 2011

Hello Vassilis,

Look in the results folder 'myanalysis' in your set working directory and read the 'myanalysis.rpt' file. At the bottom it tells you about the ram allocated to the block solver.

Just had a bit of a play in WF5 (creo). The config option sim_solver_memory_allocation appears to be disabled for Mechanica lite; the 128Mb solram allocation hardwired.

Also just noticed that with the arrival of 'hard points' for mesh seeding (functionality not in lite), simple datum point seeding no longer helps improve the mesh in lite. Simple datum point seeding did work when lite first appeared.

Mechanica lite is very limited.

12-Amethyst
March 30, 2011

Great conversation here, very pertinent to what I'm dealing with right now. We're looking at getting a new system for our main Mechanica user to move him from a dual core XP 32 bit system with 4 GB of RAM to a quad core 2.8GHz Core i7 with 12 GB RAM running Win7 64 bit. A question that has come up several times in the past for us is the idea of having a central computer that does the number crunching for all of our FEA applications (Mechanica, CFDesign, Maxwell). Build it with multiple quad core CPUs at 3 Ghz, throw 48 GB of RAM into it and run Win7 64 bit with the latest version of each application. Both CFDesign and Maxwell have the ability to remote solve built in. Simply a couple of configuration steps and the user selects that computer as the "solver", it uses that computer to do the work while providing a progress / status feedback on their desktop (CFDesign will even send you a text message or email to let you know it's done). This allows them to start work on building their next analysis, do work in Word etc without having a crippled system due to the solution hogging their local compute resources. As far as I know, Mechanica does not have a simple, built-in remote solve capability. Is this correct? I can build the analysis locally, but I'd have to copy the setup to the remote computer, open Mechanica on that remote computer and run it there via a remote desktop (or just do the whole thing remotely). If Mechanica does have the ability to remote solve, how do I set it up? If it does not, are there plans to implement that capability?

Thanks.

Erik C. Gifford

1-Visitor
March 30, 2011

Hi Eric,

You've accurately summarized the current Mechanica functionality. Mechanica does not yet have a simple, built-in remote solve capability. The closest existing functionality to what you want is the Run > Batch command. With it, you can create a study directory and a mecbatch.bat file that you can then copy to a compute server, and then use Microsoft Remote Desktop and a Command Prompt to run the analysis via the mecbatch.bat command. (See the online help for more details.)

Starting in Creo 1.0, you will be able to send any Mechanica analysis or design study to a remote compute server using the existing Distributed Pro/BATCH functionality.

Regards,

Tad Doxsee

13-Aquamarine
March 31, 2011

Tad,

A list of up and coming functionality now would be really really helpful. This sort of information drives a company's investment decisions.

I have added this 'remote solve facility' to my incomplete list of whats in CREO1.0 for simulation users. (we currently copy and remote desktop).

Bye for now

7-Bedrock
November 23, 2012

Erik brought up Ansys so I'll share what I run into with Ansys albeit way late past Erik's post.

Ansys' licensing will give you only 2 CPU cores. That's right, Ansys counts a core as a CPU. So if you have a 4-core computer on a single socket, Ansys will only allow you to use 2 cores. To leverage more CPU cores you have to buy their HPC licenses. This is where it gets even more ridiculous, for solve session independence you can buy per CPU core and it's espensive, like $3500 per. Their HPC packs come in 8-CPU core packs which are tied to an individual solve so cannot be shared to other machines while in use, and that pack is about $22,000.

Why companies count cores now when multi-core CPUs are norm is beyond me and to be so outrageous is ridiculous.

I haven't seen if Creo Simulate has the same model, but I certainly hope it does not. Distributed computing is one thing to use CPU cores over the network but for solves based on a local machine the software should leverage all available cores, and not extort me with insane prices to do so.

13-Aquamarine
November 23, 2012

Jason,

I agree that sounds pretty stupid!

With Mechanica (and Pro/E more generally) the issue is not licensing, but coding. Mechanica will happily use up to 64 cores (I believe), but it only does so for certain parts of certain types of analysis run. Therefore, those parts may run quickly, but the elapsed time is dominated by work which is still only carried out in a single thread. Pro/E (sorry, Creo) for the most part is purely single-threaded.

I recognise that there are significant programming challenges in using multiple threads efficiently; I'm just saying how it is currently...

12-Amethyst
November 26, 2012

Crazy the timing of Jason resurrecting this discussion. We're again looking at Ansys (Ansys Professional NLT to be specific) because our primary FEA guy (the one I mentioned before that came from an Ansys house) says he needs it to efficiently do a simulation of a thermal cycle test on one of our assemblies. Basically time based hot - cold cycling to determine the mechanical effect on the parts. Claims Mechanica can't do it, or at least as easily / well as Ansys. So again, you're in the $25k range to get started and as Jason described, if you want to make use of the cores available to you on your PC or remote number cruncher, well, that costs extra...not talking an HPC cluster or anything, just let me use the cores available on the one PC. Cha-ching.