Your understanding of the concept of redundancy filters is directionally correct. We are looking to penalize features that are providing redundant information, as in your example of two features (motor power and motor current) providing very similar information about a specific goal (for example, motor downtime in a future time period).
One thing to clarify is that we are not evaluating the correlation between the features but instead we are interested in “information gain”: how much new, non-redundant information each new feature provides. The redundancy filter functionality runs signals through an iterative process to identify "next" feature that provides the most "information gain" in relation to the goal. At each iteration we are choosing the feature that provides highest information gain given the features already selected.
In your example of the motor, we would first identify the feature with the highest information gain: motor power with an MI of 0.6 . Then we iterate through the next set of features to determine which feature has the most information gain (IG) where: information gain for a new feature = [information we will have about the goal from both motor power AND the new feature] - [information motor power already gave us about the goal]. Once we get the next feature with the most information gain, then we cycle through features again to find the next feature with the largest amount of information gain given the first two features have already been selected.
Below is example showing the results of signals without the redundancy filter and then with the redundancy filter. In this example, we are looking at finding which features related to operating conditions in a factory provide the most information about time losses in a future production block. The feature TimeLossPlannedDowntimeCurrentPB (Planned Downtime) provides the highest MI on its own. Without using redundancy filters, the next feature identified is TimeLossUnplannedDowntimeCurrentPB (Unplanned Downtime). When using redundancy filters, we see TimeLossUnplannedDowntimeCurrentPB (Unplanned Downtime) drop down in the list (but not out of the list) as it is providing redundant information that is already provided by the TimeLossPlannedDowntimeCurrentPB (Planned Downtime) feature. Instead, when using redundancy filters, the next feature that provides the most information gain is TimeLossScrapCurrentPB (Scrap), a feature that was much further down the list when just evaluating mutual information. Going further down the list with redundancy filter enabled, we see the list of features prioritized by information gain: how much more information it provides given we have already selected the features higher in the list.
Hopefully this provides more details on the capability and how this capability enables us to maximize “information gain”: how much new, non-redundant information each new feature provides.
