The Paradox of Knowledge: Using Correlations

Apr. 7, 2020
Abigail Jacobsen
Published: Apr. 7, 2020

Modeling from empirical data takes observed information and attempts to replicate that information in a set of calculations. There are a number of relationships to account for when incorporating those data in a model. These relationships include dependencies and/or correlations. Correlations are often omitted for a variety of reasons, which can lead to critical errors in your results. Some knowledge of the situation leads to a more credible representation of the relationships in the data. Added knowledge, perhaps from subject matter experts, or other sources, aids the refinement of the conclusions one can draw from the data. Whether the correlations are direct or aggregate, involving simple mathematics or greater complexities, ultimately the model is likely to be used in some form of analysis for projecting future outcomes. The knowledge brought to the model and the analysis with embedded correlations improves knowledge about inherent uncertainty in a given problem.

Correlation is a principal relational element which describes relationships between variables in datasets. There may be general tendencies and patterns which drive the input risks to move together or differently from each other. It is these relationships between variables which need to be expressed in a model to bolster its usefulness, which is accomplished with correlation. It is important to remember there may be observed correlation between variables but it is not necessarily a causal relationship; it may be only a general tendency of paired behavior.

One significant aspect to note: positive correlations appear to increase uncertainty. Wait, you say, how is that possible? Knowledge is supposed to reduce uncertainty. Doesn’t knowing counteract unknowing? Think about it for a moment. In effect, the correlations included in the model reduce the uncertainty about reality while increasing the range of predicted values, adding uncertainty. What may seem illogical on the outset really is quite logical. If two (or more) risks are positively correlated, their aggregation will produce a larger range as a consequence of Monte Carlo sampling. In fact, failing to account for correlations that really are there reduces the validity of the analysis.

Correlations are easily incorporated in models set up for Monte Carlo simulation. MCS, as a technique, generates many ‘random’ samples allowing the modeler to study a variety of scenarios and their impact on decisions. A correlation matrix defines the sampling relationship between any pair of input variables in the model. Using a tool such as @RISK facilitates matrix construction. Once the correlations are in place, running the MCS will produce results and scenarios that are more credible. We want decisions to be based on the best information available and the correlations lend a hand to the knowledge we already incorporate into the process.

Modeling from empirical data takes observed information and attempts to replicate that information in a set of calculations. There are a number of relationships to account for when incorporating those data in a model. These relationships include dependencies and/or correlations. Correlations are often omitted for a variety of reasons, which can lead to critical errors in your results. Some knowledge of the situation leads to a more credible representation of the relationships in the data. Added knowledge, perhaps from subject matter experts, or other sources, aids the refinement of the conclusions one can draw from the data. Whether the correlations are direct or aggregate, involving simple mathematics or greater complexities, ultimately the model is likely to be used in some form of analysis for projecting future outcomes. The knowledge brought to the model and the analysis with embedded correlations improves knowledge about inherent uncertainty in a given problem.

Correlation is a principal relational element which describes relationships between variables in datasets. There may be general tendencies and patterns which drive the input risks to move together or differently from each other. It is these relationships between variables which need to be expressed in a model to bolster its usefulness, which is accomplished with correlation. It is important to remember there may be observed correlation between variables but it is not necessarily a causal relationship; it may be only a general tendency of paired behavior.

One significant aspect to note: positive correlations appear to increase uncertainty. Wait, you say, how is that possible? Knowledge is supposed to reduce uncertainty. Doesn’t knowing counteract unknowing? Think about it for a moment. In effect, the correlations included in the model reduce the uncertainty about reality while increasing the range of predicted values, adding uncertainty. What may seem illogical on the outset really is quite logical. If two (or more) risks are positively correlated, their aggregation will produce a larger range as a consequence of Monte Carlo sampling. In fact, failing to account for correlations that really are there reduces the validity of the analysis.

Correlations are easily incorporated in models set up for Monte Carlo simulation. MCS, as a technique, generates many ‘random’ samples allowing the modeler to study a variety of scenarios and their impact on decisions. A correlation matrix defines the sampling relationship between any pair of input variables in the model. Using a tool such as @RISK facilitates matrix construction. Once the correlations are in place, running the MCS will produce results and scenarios that are more credible. We want decisions to be based on the best information available and the correlations lend a hand to the knowledge we already incorporate into the process.

magnifierarrow-right
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram