Windchill Parallel Environment.

klakshminarayan · ‎Feb 24, 2015

Hello All,

I just wanted to know if any one has setup a parallel Windchill environment for Disaster Recovery. Is there a way to setup Windchill Architecture so if one Windchill system goes down or install updates, we can switch to the other Windchill system.

Any ideas would be helpful.

Thanks,

Kiran Lakshminarayanan

BineshKumar1 · ‎Feb 24, 2015

Hello Kiran
If your infrastructure is virtualized, you have a wide variety of
options.
Technology to chosen based on RPO and RTO targets and solutions vary
from Storage replication to crash consistent fail over.
Vsphere Replication, SRM and Zerto are some of the technologies on which
I have worked with.

Thanks,

Binesh Kumar

8020 Forsyth Boulevard | St. Louis, MO 63105-1707

+1 (314) 583-1193

Binesh.Kumar@b-wi.com

jessh · ‎Feb 24, 2015

If you are on Windchill 10.2 and have a cluster, then:

1. Various cluster nodes can fail while the others continue (as in
older releases)
2. You can allow multiple nodes to process each background queue so as
to have hot failover of queue processing (since 10.0)
3. The cache master node can fail and another node will automatically
take up this role (new in 10.2)

The only single point of failure that I'm aware of is Solr, where
currently only 1 instance can be run in the cluster. If Solr is down, I
believe searches fallback to being database-based. If
higher-availability of Solr is critical, then one should be able to
apply various hot node replacement technologies.

There are other technical options for file storage, database,
WindchillDS, etc.

--
Jess Holle

dpoisson · ‎Feb 25, 2015

Jess,

We have implemented the various techniques you mentioned in our Windchill environment to support DR and HA requirements.

Do you know of any companies that have successfully been able to deploy Windchill builds without incurring long system outages?Our customers are looking to eliminate / minimizethe down timeassociated with deploying new capabilities and bug fixes to the system.

Thank you in advance for any insight you may be able to provide.

Danny N. Poisson

Raytheon Company

clawrenz-2 · ‎Feb 25, 2015

Dan,

I have seen the same issues with many of our clients and their Test and Dev environments as typically they have them configured as different rehosted Windchill systems.

I have addressed these issues by configuring the Test and Dev environments differently.

The simplification of rolling out new builds can be accomplished by configuring your test environment as a clone of you production environment (not a rehost). This requires
some local configuration of the test servers hosts files (to isolate naming conflicts with production servers) and only requires changing a few wt.properties settings to allow the Test and Dev servers to have their own URL’s. We have been successfully using such a configuration on both Monolithic and Clustered environments.

When setup properly, your Database, LDAP and codebase will be identical to production, so rolling out an update can be accomplished by updating your test and environment and after validation simply copy over the codebase and import the updated database and LDAP form the test system exports. This configuration also removes any risks associated with changing the production environment when implementing new changes and can be accomplished with much less time and effort.

Additional benefits of this configuration is that it makes it pretty trivial to refresh the Test & Dev systems with updated data from Production.

Best Regards,

Carsten Lawrenz
KALYPSO<">http://kalypso.com/?utm_source=internal&utm_medium=email&utm_campaign=kalypsosig>
MOBILE
415.378.6374
kalypsonian.com/Carsten.Lawrenz<">http://kalypsonian.com/Carsten.Lawrenz>

[cid

1A412BA-60B4-42EC-869F-9D62EBA1B3C7]

On Feb 25, 2015, at 9:38 AM, Danny Poisson <-<<a style="COLOR:" blue;=" text-decoration:=" underline"=" target="_BLANK" href="mailto:-">>">mailto:->> wrote:

Jess,

We have implemented the various techniques you mentioned in our Windchill environment to support DR and HA requirements.

Do you know of any companies that have successfully been able to deploy Windchill builds without incurring long system outages? Our customers are looking to eliminate / minimize the down time associated with deploying new capabilities and bug fixes to the system.

Thank you in advance for any insight you may be able to provide.

Danny N. Poisson

Raytheon Company

dpoisson · ‎Feb 25, 2015

Carsten,

Thank you for sending the information on how you are deploying builds. My
next question is what happens to any work done on the live system that
occurs while build is being deployed on the clone system? I am assuming it
will be overwritten during the database import.

Danny N. Poisson
PDM IT Solution Architect
Common PDM Technical Director

Corporate Information Technology
Raytheon Company
(Cell) +1.978.888.3696
-

880 Technology Park Drive
Billerica, MA 01821 USA

Business Travel

PTO Plans
3/6

This message contains information that may be confidential and privileged.
Unless you are the addressee (or authorized to receive mail for the
addressee), you should not use, copy or disclose to anyone this message or
any information contained in this message. If you have received this
message in error, please so advise the sender by reply e-mail and delete
this message. Thank you for your cooperation.

ddemay · ‎Feb 25, 2015

Everyone does different levels of changes. Why the requirement to deploy to
production hot? You do not see this often as with all information systems,
typically there are windows for maintenance.

How long does your build take to deploy on average?

On Wed, Feb 25, 2015 at 11:57 AM, Danny N Poisson <-<br/>> wrote:

> Carsten,
>
> Thank you for sending the information on how you are deploying builds. My
> next question is what happens to any work done on the live system that
> occurs while build is being deployed on the clone system? I am assuming it
> will be overwritten during the database import.
>
> *Danny N. Poisson*
> PDM IT Solution Architect
> Common PDM Technical Director
>
> Corporate Information Technology
> Raytheon Company
> (Cell) +1.978.888.3696
> -* <->
>
> 880 Technology Park Drive
> Billerica, MA 01821 USA
>
> Business Travel
>
> PTO Plans
> 3/6
>
> *This message contains information that may be confidential and
> privileged. Unless you are the addressee (or authorized to receive mail for
> the addressee), you should not use, copy or disclose to anyone this message
> or any information contained in this message. If you have received this
> message in error, please so advise the sender by reply e-mail and delete
> this message. Thank you for your cooperation.*
>
> [image: Inactive hide details for Carsten Lawrenz ---02/25/2015 11:30:38
> AM---Dan, I have seen the same issues with many of our clients]Carsten
> Lawrenz ---02/25/2015 11:30:38 AM---Dan, I have seen the same issues with
> many of our clients and their Test and Dev environments as typ
>

BineshKumar1 · ‎Feb 25, 2015

Applying it one of the cluster nodes and rsync it to other nodes
(modified & new) is what we follow. We should omit node specific files
- xconfs etc. These entire steps could be scripted and over a TWINAX 10G
network, it requires only limited downtime to copy over the updates

Thanks,

Binesh Kumar

8020 Forsyth Boulevard | St. Louis, MO 63105-1707

+1 (314) 583-1193

Binesh.Kumar@b-wi.com

clawrenz-2 · ‎Feb 25, 2015

Dan,

Ideally In that scenario you want to leverage your Dev environment.

The process will vary depending on what exactly you are rolling out so this is just a general overview and workflow.

1. Do all of the development work and get your configuration and changes documented on Dev.
2. Refresh your Test environment with Production data exports and make a backup of Test environment
3. Install new functionality developed on Dev onto the Test system and validate functionality
4. If you run into validation issues, revert to your Test backup and go to step 3.

After Successful Validation and ready for Go Live

6. Backup your Production System (Copy or Rename PTC/Windchill_10.2 to PTC/Windchill_10.2_bak)
7. Redo Steps 2-3
8. Sync Test environment out to Production (new PTC/Windchill_10.2, implement production site.xconf changes, import updated DB/LDAP from Test)
9. Validate Production functionality
10. If issues are found simply revert to your Production backup.

Also note, if you are running a Clustered Production environment, your Test environment should be clustered as well.

Best Regards,

Carsten Lawrenz
KALYPSO<">http://kalypso.com/?utm_source=internal&utm_medium=email&utm_campaign=kalypsosig>
MOBILE
415.378.6374
kalypsonian.com/Carsten.Lawrenz<">http://kalypsonian.com/Carsten.Lawrenz>

On Feb 25, 2015, at 10:57 AM, Danny N Poisson <-<<a style="COLOR:" blue;=" text-decoration:=" underline"=" target="_BLANK" href="mailto:-">>">mailto:->> wrote:

Carsten,

Thank you for sending the information on how you are deploying builds. My next question is what happens to any work done on the live system that occurs while build is being deployed on the clone system? I am assuming it will be overwritten during the database import.

Danny N. Poisson
PDM IT Solution Architect
Common PDM Technical Director

Corporate Information Technology
Raytheon Company
(Cell) +1.978.888.3696
-<">mailto:->

880 Technology Park Drive
Billerica, MA 01821 USA

Business Travel

PTO Plans
3/6

This message contains information that may be confidential and privileged. Unless you are the addressee (or authorized to receive mail for the addressee), you should not use, copy or disclose to anyone this message or any information contained in this message. If you have received this message in error, please so advise the sender by reply e-mail and delete this message. Thank you for your cooperation.

<graycol.gif>Carsten Lawrenz ---02/25/2015 11:30:38 AM---Dan, I have seen the same issues with many of our clients and their Test and Dev environments as typ

BineshKumar1 · ‎Feb 25, 2015

Exporting and importing large databases(>=1 TB) even using parallel
data pumps on a best tuned database would take significant time unless
you are using any advanced fast clone technologies.

Thanks
Binesh
Barry Wehmiller

dpoisson · ‎Feb 25, 2015

David,

To support a global application it is hard to find down-times that work for
everyone.

The amount of time depends on what we are deploying (new capabilities, bug
fixes, data corrections, PTC critical patch sets), but on average it takes
5 - 8 hours from end to end.

Danny N. Poisson
PDM IT Solution Architect
Common PDM Technical Director

Corporate Information Technology
Raytheon Company
(Cell) +1.978.888.3696
-

880 Technology Park Drive
Billerica, MA 01821 USA

Business Travel

PTO Plans
3/6

This message contains information that may be confidential and privileged.
Unless you are the addressee (or authorized to receive mail for the
addressee), you should not use, copy or disclose to anyone this message or
any information contained in this message. If you have received this
message in error, please so advise the sender by reply e-mail and delete
this message. Thank you for your cooperation.

ddemay · ‎Feb 25, 2015

Thanks Danny, I'm going to have to mull this over. Not being the only
around the sun system does call for some innovation here, perhaps at vendor
level, in the meantime, it sounds like hardware and software hosting
solutions will not resolve this issue alone. Have you spoken to your
SAM/TSAM about the deployment times? Are you using the PTC BIF? What else
is in use, automated scripts for deployment; certainly more information is
useful if it can be provided without breaking any rules. Is the 5-8 hours
just one node of a cluster?

On Wed, Feb 25, 2015 at 2:13 PM, Danny N Poisson <
-> wrote:

> David,
>
> To support a global application it is hard to find down-times that work
> for everyone.
>
> The amount of time depends on what we are deploying (new capabilities, bug
> fixes, data corrections, PTC critical patch sets), but on average it takes
> 5 - 8 hours from end to end.
>
>
> *Danny N. Poisson*
> PDM IT Solution Architect
> Common PDM Technical Director
>
> Corporate Information Technology
> Raytheon Company
> (Cell) +1.978.888.3696
> -* <->
>
> 880 Technology Park Drive
> Billerica, MA 01821 USA
>
> Business Travel
>
> PTO Plans
> 3/6
>
> *This message contains information that may be confidential and
> privileged. Unless you are the addressee (or authorized to receive mail for
> the addressee), you should not use, copy or disclose to anyone this message
> or any information contained in this message. If you have received this
> message in error, please so advise the sender by reply e-mail and delete
> this message. Thank you for your cooperation.*
>
> [image: Inactive hide details for "DeMay, David" ---02/25/2015 12:51:42
> PM---Everyone does different levels of changes. Why the require]"DeMay,
> David" ---02/25/2015 12:51:42 PM---Everyone does different levels of
> changes. Why the requirement to deploy to production hot? You do n
>

dpoisson · ‎Feb 25, 2015

David,

Answers to your questions:

Have you spoken to your SAM/TSAM about the deployment times?
[DNP]: We have been working with PTC GS / TS/ Product Management on this.

Are you using the PTC BIF?
[DNP]: We use the BIF. Note that we have a deployment path to production
that goes through at a minimum of a Dev and QA level environment.

What else is in use, automated scripts for deployment; certainly more
information is useful if it can be provided without breaking any rules.
[DNP]: Anything that can be automated has been automated and we continue to
look for additional areas to automate.

Is the 5-8 hours just one node of a cluster?
[DNP]: This is for 10 node cluster. The time includes preparing the system
for the build through the execution of a validation test to ensure
everything is working properly.

Danny N. Poisson
PDM IT Solution Architect
Common PDM Technical Director

Corporate Information Technology
Raytheon Company
(Cell) +1.978.888.3696
-

880 Technology Park Drive
Billerica, MA 01821 USA

Business Travel

PTO Plans
3/6

This message contains information that may be confidential and privileged.
Unless you are the addressee (or authorized to receive mail for the
addressee), you should not use, copy or disclose to anyone this message or
any information contained in this message. If you have received this
message in error, please so advise the sender by reply e-mail and delete
this message. Thank you for your cooperation.

jessh · ‎Feb 25, 2015

In general once you have a set of changes to apply to your production
environment, these should be applied to 1 node and then replicated to
other cluster nodes.

In the *worst* case this should be done via rsync or robocopy. It
should *never* be done by ad hoc per-node installation or file copying
-- as this almost unavoidably becomes a source of system inconsistencies
that then cause countless problems down the road.

*Ideally* you should use a source control system instead rather than
simple rsync or robocopy. In this approach you make all changes to one
node, check-in/commit/push these changes to a source control repository
-- along with a detailed comment as to the changes being made and their
purpose, and then checkout/pull these changes to all other nodes. This
ensures that you have a detailed history of changes to the software
installation including any file-based configuration, complete with quick
and easy access to difference reports for all files. This is quick,
easy, and free to set up via open source software like Git. Pulling
changes from a git repository is also quite fast and efficient
(especially as compared to some older source control systems, free or
commercial).

In either case, i.e. whether you're using rsync/robocopy or a source
control system, the only downtime for nodes after the original
installation is that required for a shutdown, pull/rsync, and restart.
If there are database changes (beyond simply adding new tables and
columns or that sort of purely additive thing), then these also must
generally be done during the downtime.

Prior to Windchill 10.2, one had to manage node-specific properties,
particularly in the case of the master, but every node at least had a
unique value for java.rmi.server.hostname. As of Windchill 10.2, the
master is dynamically elected (both initially and upon master failure)
and the only node specific properties required are those which are
specifically desired (particularly to run Solr on a specific node, since
there can only be one Solr instance in the cluster). In cases where
per-node changes /are /required (or even between test and production)
one can use a version control system's branching capabilities to manage
node or test vs. production specifics.

If not running a cluster one should still be able to essentially clone
the production node, do the installation, testing, etc, there, and then
pull the changes back to the production node. Additionally, if using a
cluster, the node to which changes are originally applied doesn't have
to be one of the normal production cluster nodes either.

Finally, as of 10.2 M010, one should failrly easily be able to add a new
cluster node by doing the following:

1. Add a new hostname to wt.cache.master.slaveHosts and propagate it to
wt.properties via xconfmanager
2. Push this change to the source control repository
3. Pull this change to all existing cluster nodes (without restarting them)
* As of 10.2 M010, this change to wt.properties will immediately
be noticed and immediately be incorporated into the running
server processes
4. Pull an installation from source control to a new cluster node, and
start it up.

--
Jess Holle

ddemay · ‎Feb 25, 2015

Jess, good recap, I think the point missed here is not technical in origin,
but business, how do you let systems users get on to a system
post-deployment without insurance that the new changes will not negatively
impact a system that is used 24 hours a day.

Smaller deployments, to reduce failure risk and rollback, and data loss;
say if a WTPart or WTDocument created/iterated post deployment of change.
Big deployments do carry more risk. Perhaps a snapshot capacity for
configuration in the API would allow someone to say rollback changes, but
all data created in past 6 hours gets administratively auto-adjusted to use
old baseline?

That is heavy thought, but not impossible. Workflows and lifecycle
templates already being iterated, but OIRs/preferences for example are not.

Obviously with clustering you can cycle nodes, but there is still some lack
of elasticity. The old saying goes now that PTC has improved clustering,
you give a customer a cookie, they'll wisely ask for that glass of milk.
The idea is not unreasonable at its core, just R&D dollars versus
opportunity cost have to play in here. (And Danny, to be fair, I have no
idea if this is even accurate as to what you were inquiring with the
blaster about.)

David

On Wed, Feb 25, 2015 at 2:49 PM, Jess Holle <->
wrote:

> In general once you have a set of changes to apply to your production
> environment, these should be applied to 1 node and then replicated to other
> cluster nodes.
>
> In the *worst* case this should be done via rsync or robocopy. It should
> *never* be done by ad hoc per-node installation or file copying -- as this
> almost unavoidably becomes a source of system inconsistencies that then
> cause countless problems down the road.
>
> *Ideally* you should use a source control system instead rather than
> simple rsync or robocopy. In this approach you make all changes to one
> node, check-in/commit/push these changes to a source control repository --
> along with a detailed comment as to the changes being made and their
> purpose, and then checkout/pull these changes to all other nodes. This
> ensures that you have a detailed history of changes to the software
> installation including any file-based configuration, complete with quick
> and easy access to difference reports for all files. This is quick, easy,
> and free to set up via open source software like Git. Pulling changes from
> a git repository is also quite fast and efficient (especially as compared
> to some older source control systems, free or commercial).
>
> In either case, i.e. whether you're using rsync/robocopy or a source
> control system, the only downtime for nodes after the original installation
> is that required for a shutdown, pull/rsync, and restart. If there are
> database changes (beyond simply adding new tables and columns or that sort
> of purely additive thing), then these also must generally be done during
> the downtime.
>
> Prior to Windchill 10.2, one had to manage node-specific properties,
> particularly in the case of the master, but every node at least had a
> unique value for java.rmi.server.hostname. As of Windchill 10.2, the
> master is dynamically elected (both initially and upon master failure) and
> the only node specific properties required are those which are specifically
> desired (particularly to run Solr on a specific node, since there can only
> be one Solr instance in the cluster). In cases where per-node changes *are
> *required (or even between test and production) one can use a version
> control system's branching capabilities to manage node or test vs.
> production specifics.
>
> If not running a cluster one should still be able to essentially clone the
> production node, do the installation, testing, etc, there, and then pull
> the changes back to the production node. Additionally, if using a cluster,
> the node to which changes are originally applied doesn't have to be one of
> the normal production cluster nodes either.
>
> Finally, as of 10.2 M010, one should failrly easily be able to add a new
> cluster node by doing the following:
>
> 1. Add a new hostname to wt.cache.master.slaveHosts and propagate it
> to wt.properties via xconfmanager
> 2. Push this change to the source control repository
> 3. Pull this change to all existing cluster nodes (without restarting
> them)
> - As of 10.2 M010, this change to wt.properties will immediately be
> noticed and immediately be incorporated into the running server processes
> 4. Pull an installation from source control to a new cluster node,
> and start it up.
>
> --
> Jess Holle
>
>

jessh · ‎Feb 25, 2015

On 2/25/2015 2:59 PM, DeMay, David wrote:
> Jess, good recap, I think the point missed here is not technical in
> origin, but business, how do you let systems users get on to a system
> post-deployment without insurance that the new changes will not
> negatively impact a system that is used 24 hours a day.
Okay, that seems much different than the the original query, though --
which was about a parallel environment for disaster recovery or
installing updates.

Ensuring quality in an update is a separate question -- and largely a
matter of validation prior to production deployment. I suppose one
might read the original query as being about a parallel environment test
environment for initially installing and qualifying updates, though. In
that case I believe that the suggestions of cloning one's production
environment to the extent possible (within reason) was appropriate.

If a bad update is applied, then in cases where an update can be safely
rolled back, then this is an easy matter -- revert to an older commit in
one's source control environment. In cases where this cannot safely be
done, however, things get much uglier.
> Smaller deployments, to reduce failure risk and rollback, and data
> loss; say if a WTPart or WTDocument created/iterated post deployment
> of change. Big deployments do carry more risk. Perhaps a snapshot
> capacity for configuration in the API would allow someone to say
> rollback changes, but all data created in past 6 hours gets
> administratively auto-adjusted to use old baseline?
>
> That is heavy thought, but not impossible. Workflows and lifecycle
> templates already being iterated, but OIRs/preferences for example are
> not.
>
> Obviously with clustering you can cycle nodes, but there is still some
> lack of elasticity. The old saying goes now that PTC has improved
> clustering, you give a customer a cookie, they'll wisely ask for that
> glass of milk. The idea is not unreasonable at its core, just R&D
> dollars versus opportunity cost have to play in here. (And Danny, to
> be fair, I have no idea if this is even accurate as to what you were
> inquiring with the blaster about.)
Generalized rollback of new data to an old schema is not an especially
tractable problem by any stretch.

Apart from schema changes, however, yes, there are some downsides to
non-iterated configuration stored in the database (domains, policies,
groups, preferences, etc) without any "snapshot" capability. That's
really a separate issue than the software installation files and
file-based configuration, which is mostly what I was getting at.

--
Jess Holle

ddemay · ‎Feb 25, 2015

All said, rollback and snapshot features would be a more convincing means
to apply updates sooner, possibly faster, and cheaper whilst promoting
security. Tradeoffs in everything right?

On Wed, Feb 25, 2015 at 4:12 PM, Jess Holle <->
wrote:

>

dpoisson · ‎Feb 25, 2015

Jess / David,

My apologies, maybe my request didn't belong in this thread. I saw the
link on the topic for providing virtually unlimited up-time (HA / DR) to
how can we also achieve near zero down-time for deploying system builds. I
am trying to make the next big leap in service up time for a global
application. We already provide a "Read Only" system for longer than
normal system outages. The "Read Only" system is snapshot in time from
Production and is only used for reference. All users are warned (multiple
ways) that any changes made to the "Read Only" system will be lost. Of
course we have full security audit logging on and we can see who made
changes when they should not have. The issue with standing up a "Read
Only" system today is the length of time it takes to import the production
DB.

As Jess stated the complexity lies in the merger of core business objects
(WTPart, WTDocs,... ) and all their associated workflows and relationships
not to mention the system configurations that are also maintained in the
codebase, DB, LDAP, and Vaults. I was hoping someone had figured this out.

As for source control, build automation, and validation discussion I
believe we have all of which was recommended in place today. I am not
saying it is 100% perfect but we are always refining to improve it.

Thank you all for your insight on the topic.

Danny N. Poisson
PDM IT Solution Architect
Common PDM Technical Director

Corporate Information Technology
Raytheon Company
(Cell) +1.978.888.3696
-

880 Technology Park Drive
Billerica, MA 01821 USA

Business Travel

PTO Plans
3/6

This message contains information that may be confidential and privileged.
Unless you are the addressee (or authorized to receive mail for the
addressee), you should not use, copy or disclose to anyone this message or
any information contained in this message. If you have received this
message in error, please so advise the sender by reply e-mail and delete
this message. Thank you for your cooperation.

clawrenz-2 · ‎Feb 25, 2015

A database that is over 1 TB seems very excessive for Windchill. Is most of the space devoted to BLOBs? If so it sounds like a revaulting job is in order.

Best Regards,

Carsten Lawrenz
KALYPSO<">http://kalypso.com/?utm_source=internal&utm_medium=email&utm_campaign=kalypsosig>
MOBILE
415.378.6374
kalypsonian.com/Carsten.Lawrenz<">http://kalypsonian.com/Carsten.Lawrenz>

On Feb 25, 2015, at 1:10 PM, BINESHKUMAR S <mail@bineshkumar.me<<a style="COLOR:" blue;=" text-decoration:=" underline"=" target="_BLANK" href="mailto:mail@bineshkumar.me">>">mailto:mail@bineshkumar.me>> wrote:

Exporting and importing large databases(>=1 TB) even using parallel data pumps on a best tuned database would take significant time unless you are using any advanced fast clone technologies.

Thanks
Binesh
Barry Wehmiller