Disaster Recovery - training and practice?

pcnelson · ‎Jun 17, 2013

ddemay · ‎Jun 17, 2013

Another way of asking this question is, outside of Windchill, have you ever
worked on or with disaster recovery? If so, then it is just adapting
experience to Windchill. A lot of the answers here depend on the technical
architecture of the system you have setup.

pcnelson · ‎Jun 18, 2013

To answer your question, no, I haven't. So I was hoping for more complete documentation, especially as it relates to SQL. PTC made a big deal out of PDMLinksupporting SQL when we were looking into it, but the documentation is still very Oracle biased.

HugoHermans · ‎Jun 18, 2013

What in case of server virtualisation with disaster failover? If you don't have a disaster recovery procedure, why do you have backups?

I'm not an expert, and I never assisted in a drill, but at least what have to be done on a regular basis is testing if the backups are decent.

My 2 cents, Hugo.

<< ProE WF5 M170 - PDMLink 9.1 M060>>

pcnelson · ‎Jun 18, 2013

Hello Hugo and thanks for your time to reply. I am not sure that I understand what you are asking. We do have a backups, but no written procedure or work instructions on how we would recover from a failure. I cant believe that it is as simple as just copying the files from the backup location over to the original location and restarting WC. Or maybe it is? Justas there are cetain steps and orders to shutting down WC, I had imagined that there were certain steps, and orders, to moving the backup files and database back to Production.

SteveVinyard · ‎Jun 18, 2013

Hi Pete, DR is a tricky subject because it seems like every IS department has a different architecture, skill set and different backup/recovery software. All of these things are part of the puzzle but few if any are the defacto standard.

If you can describe in detail some of these components it should be easier to help get you started (forgive me if you had already done this)

[cid:image001.gif@01CE6C26.79E01A10]

Steve Vinyard
Senior Solution Architect

HugoHermans · ‎Jun 18, 2013

Hi Pete,

I was only reflection my astonishment. My company spend consirable amount of effort in taking backups, daily, weekly, monthly, but when I ask for a recovery procedure, or a recovery excercise, the answer is: no time, to resources, blablabla. My point is: if you don't test your backups (reguraly), you don't have backups.

Regards, Hugo.

sdrzewiczewski · ‎Jun 18, 2013

Agreed. We spent almost a full year with a Technology Recovery Program. We made sure our first tier applications like ERP, PLM, Email, etc... could either fail-over or be restored to our remote datacenter.

Prior to this project we would have scheduled semi-annual application refreshes. We would take our backups and restore them to one of our non-production environments and run a suite of tests to validate that we our process was correct. We would do one test each year with data from off-site storage, and one test with local backup. Each simulating a different type of issue.

The database technology is only one portion of your backup, and whether it's SQL Server or Oracle isn't much of an issue. You also need to have your LDAP exported at the same time, you need your file vaults, unless your storing the data directly in the database, and you need the codebase and configuration files for the application tier as well.

How do you plan to restore in case of disaster? Are you going to rebuild Windchill (install fresh and configure) then restore LDAP and the database? Are you leveraging Virtualization technology? The list of options goes on.

In a disaster what's the acceptable amount of time you can be without your system? Is it minutes, hours, days, weeks? That should affect how you plan to restore, or maybe even how you architect Windchill in the first place.

There was a presentation last year at PTC/User by BAE that walked through their story of what happens when a disaster strikes. If you have access to that it's a good read.

Steve D

cc-2 · ‎Jun 19, 2013

Hi

I agree with Stephen

DR is a balance between how much investment (appropriate software, architecture, resources etc....) to restore within an agree time (seconds, minutes, days...), the willingness to lose a certain amount of data (ie everything new/changed since last backup)

I would say this is the first thing the company need to define. I call it the DR strategy. There is no perfect world, so even if the technology could allow you to restore in seconds with no loss, the solution could be financial out of reach for your organisation.

Once the requirement has been defined you can work on the technical solution

For us we take backup of everything (vaults, ldap, database) once a day. Therefore we know in case of a DR we will lose at the most a day or work. We have tested our backup in a no "under pressure" situtation. ie everytime we refresh our test server, we take the latest back up we have but it quite a long time because we need to copy the vaults from production to the test server (quicker than taking them from the backup tapes !!!!)

For the production server itself, we have a phantom server (in a different location 15km away), the vaults are copied (not sure about the interval) but I guess at least once a day to match with the database backup on a clone of our production. The theory says that in production goes down, IT redirect the traffic to this clone/phantom server and make it the new production server. That is the theory because we never tested it.

We know in the worse case scenario we can lose as much time as it takes us to refresh our test server. It is not ideal but the company would survive in such condition and no one would be fired.

That is where we stand at the moment.