Skip to main content
1-Visitor
February 24, 2015
Question

Windchill Parallel Environment.

  • February 24, 2015
  • 18 replies
  • 7998 views

Hello All,


I just wanted to know if any one has setup a parallel Windchill environment for Disaster Recovery. Is there a way to setup Windchill Architecture so if one Windchill system goes down or install updates, we can switch to the other Windchill system.


Any ideas would be helpful.


Thanks,


Kiran Lakshminarayanan


18 replies

1-Visitor
February 25, 2015
Thanks Danny, I'm going to have to mull this over. Not being the only
around the sun system does call for some innovation here, perhaps at vendor
level, in the meantime, it sounds like hardware and software hosting
solutions will not resolve this issue alone. Have you spoken to your
SAM/TSAM about the deployment times? Are you using the PTC BIF? What else
is in use, automated scripts for deployment; certainly more information is
useful if it can be provided without breaking any rules. Is the 5-8 hours
just one node of a cluster?


On Wed, Feb 25, 2015 at 2:13 PM, Danny N Poisson <
-> wrote:

> David,
>
> To support a global application it is hard to find down-times that work
> for everyone.
>
> The amount of time depends on what we are deploying (new capabilities, bug
> fixes, data corrections, PTC critical patch sets), but on average it takes
> 5 - 8 hours from end to end.
>
>
> *Danny N. Poisson*
> PDM IT Solution Architect
> Common PDM Technical Director
>
> Corporate Information Technology
> Raytheon Company
> (Cell) +1.978.888.3696
> -* <->
>
> 880 Technology Park Drive
> Billerica, MA 01821 USA
>
> Business Travel
>
> PTO Plans
> 3/6
>
> *This message contains information that may be confidential and
> privileged. Unless you are the addressee (or authorized to receive mail for
> the addressee), you should not use, copy or disclose to anyone this message
> or any information contained in this message. If you have received this
> message in error, please so advise the sender by reply e-mail and delete
> this message. Thank you for your cooperation.*
>
> [image: Inactive hide details for "DeMay, David" ---02/25/2015 12:51:42
> PM---Everyone does different levels of changes. Why the require]"DeMay,
> David" ---02/25/2015 12:51:42 PM---Everyone does different levels of
> changes. Why the requirement to deploy to production hot? You do n
>
1-Visitor
February 25, 2015
David,

Answers to your questions:

Have you spoken to your SAM/TSAM about the deployment times?
[DNP]: We have been working with PTC GS / TS/ Product Management on this.

Are you using the PTC BIF?
[DNP]: We use the BIF. Note that we have a deployment path to production
that goes through at a minimum of a Dev and QA level environment.

What else is in use, automated scripts for deployment; certainly more
information is useful if it can be provided without breaking any rules.
[DNP]: Anything that can be automated has been automated and we continue to
look for additional areas to automate.

Is the 5-8 hours just one node of a cluster?
[DNP]: This is for 10 node cluster. The time includes preparing the system
for the build through the execution of a validation test to ensure
everything is working properly.


Danny N. Poisson
PDM IT Solution Architect
Common PDM Technical Director

Corporate Information Technology
Raytheon Company
(Cell) +1.978.888.3696
-

880 Technology Park Drive
Billerica, MA 01821 USA

Business Travel

PTO Plans
3/6

This message contains information that may be confidential and privileged.
Unless you are the addressee (or authorized to receive mail for the
addressee), you should not use, copy or disclose to anyone this message or
any information contained in this message. If you have received this
message in error, please so advise the sender by reply e-mail and delete
this message. Thank you for your cooperation.


12-Amethyst
February 25, 2015
In general once you have a set of changes to apply to your production
environment, these should be applied to 1 node and then replicated to
other cluster nodes.

In the *worst* case this should be done via rsync or robocopy. It
should *never* be done by ad hoc per-node installation or file copying
-- as this almost unavoidably becomes a source of system inconsistencies
that then cause countless problems down the road.

*Ideally* you should use a source control system instead rather than
simple rsync or robocopy. In this approach you make all changes to one
node, check-in/commit/push these changes to a source control repository
-- along with a detailed comment as to the changes being made and their
purpose, and then checkout/pull these changes to all other nodes. This
ensures that you have a detailed history of changes to the software
installation including any file-based configuration, complete with quick
and easy access to difference reports for all files. This is quick,
easy, and free to set up via open source software like Git. Pulling
changes from a git repository is also quite fast and efficient
(especially as compared to some older source control systems, free or
commercial).

In either case, i.e. whether you're using rsync/robocopy or a source
control system, the only downtime for nodes after the original
installation is that required for a shutdown, pull/rsync, and restart.
If there are database changes (beyond simply adding new tables and
columns or that sort of purely additive thing), then these also must
generally be done during the downtime.

Prior to Windchill 10.2, one had to manage node-specific properties,
particularly in the case of the master, but every node at least had a
unique value for java.rmi.server.hostname. As of Windchill 10.2, the
master is dynamically elected (both initially and upon master failure)
and the only node specific properties required are those which are
specifically desired (particularly to run Solr on a specific node, since
there can only be one Solr instance in the cluster). In cases where
per-node changes /are /required (or even between test and production)
one can use a version control system's branching capabilities to manage
node or test vs. production specifics.

If not running a cluster one should still be able to essentially clone
the production node, do the installation, testing, etc, there, and then
pull the changes back to the production node. Additionally, if using a
cluster, the node to which changes are originally applied doesn't have
to be one of the normal production cluster nodes either.

Finally, as of 10.2 M010, one should failrly easily be able to add a new
cluster node by doing the following:

1. Add a new hostname to wt.cache.master.slaveHosts and propagate it to
wt.properties via xconfmanager
2. Push this change to the source control repository
3. Pull this change to all existing cluster nodes (without restarting them)
* As of 10.2 M010, this change to wt.properties will immediately
be noticed and immediately be incorporated into the running
server processes
4. Pull an installation from source control to a new cluster node, and
start it up.

--
Jess Holle

1-Visitor
February 25, 2015
Jess, good recap, I think the point missed here is not technical in origin,
but business, how do you let systems users get on to a system
post-deployment without insurance that the new changes will not negatively
impact a system that is used 24 hours a day.

Smaller deployments, to reduce failure risk and rollback, and data loss;
say if a WTPart or WTDocument created/iterated post deployment of change.
Big deployments do carry more risk. Perhaps a snapshot capacity for
configuration in the API would allow someone to say rollback changes, but
all data created in past 6 hours gets administratively auto-adjusted to use
old baseline?

That is heavy thought, but not impossible. Workflows and lifecycle
templates already being iterated, but OIRs/preferences for example are not.

Obviously with clustering you can cycle nodes, but there is still some lack
of elasticity. The old saying goes now that PTC has improved clustering,
you give a customer a cookie, they'll wisely ask for that glass of milk.
The idea is not unreasonable at its core, just R&D dollars versus
opportunity cost have to play in here. (And Danny, to be fair, I have no
idea if this is even accurate as to what you were inquiring with the
blaster about.)

David



On Wed, Feb 25, 2015 at 2:49 PM, Jess Holle <->
wrote:

> In general once you have a set of changes to apply to your production
> environment, these should be applied to 1 node and then replicated to other
> cluster nodes.
>
> In the *worst* case this should be done via rsync or robocopy. It should
> *never* be done by ad hoc per-node installation or file copying -- as this
> almost unavoidably becomes a source of system inconsistencies that then
> cause countless problems down the road.
>
> *Ideally* you should use a source control system instead rather than
> simple rsync or robocopy. In this approach you make all changes to one
> node, check-in/commit/push these changes to a source control repository --
> along with a detailed comment as to the changes being made and their
> purpose, and then checkout/pull these changes to all other nodes. This
> ensures that you have a detailed history of changes to the software
> installation including any file-based configuration, complete with quick
> and easy access to difference reports for all files. This is quick, easy,
> and free to set up via open source software like Git. Pulling changes from
> a git repository is also quite fast and efficient (especially as compared
> to some older source control systems, free or commercial).
>
> In either case, i.e. whether you're using rsync/robocopy or a source
> control system, the only downtime for nodes after the original installation
> is that required for a shutdown, pull/rsync, and restart. If there are
> database changes (beyond simply adding new tables and columns or that sort
> of purely additive thing), then these also must generally be done during
> the downtime.
>
> Prior to Windchill 10.2, one had to manage node-specific properties,
> particularly in the case of the master, but every node at least had a
> unique value for java.rmi.server.hostname. As of Windchill 10.2, the
> master is dynamically elected (both initially and upon master failure) and
> the only node specific properties required are those which are specifically
> desired (particularly to run Solr on a specific node, since there can only
> be one Solr instance in the cluster). In cases where per-node changes *are
> *required (or even between test and production) one can use a version
> control system's branching capabilities to manage node or test vs.
> production specifics.
>
> If not running a cluster one should still be able to essentially clone the
> production node, do the installation, testing, etc, there, and then pull
> the changes back to the production node. Additionally, if using a cluster,
> the node to which changes are originally applied doesn't have to be one of
> the normal production cluster nodes either.
>
> Finally, as of 10.2 M010, one should failrly easily be able to add a new
> cluster node by doing the following:
>
> 1. Add a new hostname to wt.cache.master.slaveHosts and propagate it
> to wt.properties via xconfmanager
> 2. Push this change to the source control repository
> 3. Pull this change to all existing cluster nodes (without restarting
> them)
> - As of 10.2 M010, this change to wt.properties will immediately be
> noticed and immediately be incorporated into the running server processes
> 4. Pull an installation from source control to a new cluster node,
> and start it up.
>
> --
> Jess Holle
>
>
12-Amethyst
February 25, 2015
On 2/25/2015 2:59 PM, DeMay, David wrote:
> Jess, good recap, I think the point missed here is not technical in
> origin, but business, how do you let systems users get on to a system
> post-deployment without insurance that the new changes will not
> negatively impact a system that is used 24 hours a day.
Okay, that seems much different than the the original query, though --
which was about a parallel environment for disaster recovery or
installing updates.

Ensuring quality in an update is a separate question -- and largely a
matter of validation prior to production deployment. I suppose one
might read the original query as being about a parallel environment test
environment for initially installing and qualifying updates, though. In
that case I believe that the suggestions of cloning one's production
environment to the extent possible (within reason) was appropriate.

If a bad update is applied, then in cases where an update can be safely
rolled back, then this is an easy matter -- revert to an older commit in
one's source control environment. In cases where this cannot safely be
done, however, things get much uglier.
> Smaller deployments, to reduce failure risk and rollback, and data
> loss; say if a WTPart or WTDocument created/iterated post deployment
> of change. Big deployments do carry more risk. Perhaps a snapshot
> capacity for configuration in the API would allow someone to say
> rollback changes, but all data created in past 6 hours gets
> administratively auto-adjusted to use old baseline?
>
> That is heavy thought, but not impossible. Workflows and lifecycle
> templates already being iterated, but OIRs/preferences for example are
> not.
>
> Obviously with clustering you can cycle nodes, but there is still some
> lack of elasticity. The old saying goes now that PTC has improved
> clustering, you give a customer a cookie, they'll wisely ask for that
> glass of milk. The idea is not unreasonable at its core, just R&D
> dollars versus opportunity cost have to play in here. (And Danny, to
> be fair, I have no idea if this is even accurate as to what you were
> inquiring with the blaster about.)
Generalized rollback of new data to an old schema is not an especially
tractable problem by any stretch.

Apart from schema changes, however, yes, there are some downsides to
non-iterated configuration stored in the database (domains, policies,
groups, preferences, etc) without any "snapshot" capability. That's
really a separate issue than the software installation files and
file-based configuration, which is mostly what I was getting at.

--
Jess Holle

1-Visitor
February 25, 2015
All said, rollback and snapshot features would be a more convincing means
to apply updates sooner, possibly faster, and cheaper whilst promoting
security. Tradeoffs in everything right?

On Wed, Feb 25, 2015 at 4:12 PM, Jess Holle <->
wrote:

>
1-Visitor
February 25, 2015

Jess / David,

My apologies, maybe my request didn't belong in this thread. I saw the
link on the topic for providing virtually unlimited up-time (HA / DR) to
how can we also achieve near zero down-time for deploying system builds. I
am trying to make the next big leap in service up time for a global
application. We already provide a "Read Only" system for longer than
normal system outages. The "Read Only" system is snapshot in time from
Production and is only used for reference. All users are warned (multiple
ways) that any changes made to the "Read Only" system will be lost. Of
course we have full security audit logging on and we can see who made
changes when they should not have. The issue with standing up a "Read
Only" system today is the length of time it takes to import the production
DB.

As Jess stated the complexity lies in the merger of core business objects
(WTPart, WTDocs,... ) and all their associated workflows and relationships
not to mention the system configurations that are also maintained in the
codebase, DB, LDAP, and Vaults. I was hoping someone had figured this out.

As for source control, build automation, and validation discussion I
believe we have all of which was recommended in place today. I am not
saying it is 100% perfect but we are always refining to improve it.

Thank you all for your insight on the topic.

Danny N. Poisson
PDM IT Solution Architect
Common PDM Technical Director

Corporate Information Technology
Raytheon Company
(Cell) +1.978.888.3696
-

880 Technology Park Drive
Billerica, MA 01821 USA

Business Travel

PTO Plans
3/6

This message contains information that may be confidential and privileged.
Unless you are the addressee (or authorized to receive mail for the
addressee), you should not use, copy or disclose to anyone this message or
any information contained in this message. If you have received this
message in error, please so advise the sender by reply e-mail and delete
this message. Thank you for your cooperation.


1-Visitor
February 26, 2015

A database that is over 1 TB seems very excessive for Windchill. Is most of the space devoted to BLOBs? If so it sounds like a revaulting job is in order.

Best Regards,

Carsten Lawrenz
KALYPSO<">http://kalypso.com/?utm_source=internal&utm_medium=email&utm_campaign=kalypsosig>
MOBILE
415.378.6374
kalypsonian.com/Carsten.Lawrenz<">http://kalypsonian.com/Carsten.Lawrenz>



On Feb 25, 2015, at 1:10 PM, BINESHKUMAR S <mail@bineshkumar.me<<a style="COLOR:" blue;=" text-decoration:=" underline&quot;=" target="_BLANK" href="mailto:mail@bineshkumar.me">>">mailto:mail@bineshkumar.me>> wrote:

Exporting and importing large databases(>=1 TB) even using parallel data pumps on a best tuned database would take significant time unless you are using any advanced fast clone technologies.

Thanks
Binesh
Barry Wehmiller