Generally speaking, when making any choice – from selecting a stock right down to choosing a slice of a neatly divided cake – a person’s decision will inevitably become skewed by his or her own biases. People subconsciously do this all the time and plenty of academics in the world of psychology have written extensively about selection bias. However, the research presented in this article takes a different approach and looks at this bias from a statistician’s perspective; in particular, it examines the role that dichotomous valuation can play in selection bias.
Essentially, this means looking at a valuation from two different perspectives. With decision-making, this comes down to the way in which people view the results of their decisions. Using the cake analogy, imagine someone is looking at a cake that has just been sliced up. When choosing which slice to eat, people will subconsciously weigh it up with the marginal loss or marginal gain. If they have a big slice with not many toppings on it, would they enjoy it as much as a slimmer slice that’s covered in lots of sprinkles and cherries? Besides, the first chooser has the privilege of first-mover advantage while the last one has no options left.
For another example, consider the value of a bachelor’s degree (BD) on the annual income of a man aged 50 – we aren’t certain if he possesses a BD or not. Depending on the uncertain ownership of the degree, the value has the following two perspectives. If he has a BD, the marginal gain is the difference between his current annual income and his estimated one, assuming he had no BD, all else being equal. If he has no BD, on the other hand, the marginal loss is the difference between his estimated annual income, assuming he had a BD, and his current one, all else being equal. The ownership also interweaves with other factors that determine his income.
While these are simple examples, the concept of selection bias has merit across the world of statistics. In a more complex situation, the cake cutter may have no idea about how large the cake is and how many people will get a slice. Similarly, in selecting features from many candidates to model and forecast data, the selector has no prior knowledge about a given feature’s belonging to the data-generating process; his or her purpose is to mitigate the ownership uncertainty. People should measure the value of a feature, an event, a property, or an outcome, by the expected marginal gain and marginal loss, incorporating the ownership uncertainty and interdependence of other factors. Essentially, the selection bias is defined as the expected difference between the marginal gain and the marginal loss.
The role of selection bias
In his recent paper, “A theory of dichotomous valuation with applications to variable selection”, Dr Xingwei Hu, econometric expert at the International Monetary Fund, decided to look further into the role this bias plays in feature selection (otherwise known as variable selection). Feature selection, in regard to machine learning, is the process of selecting a subset of relevant features (variables, predictors, etc.) for use in model construction. Dr Hu’s approach uses game theory (assessing the payoffs for all coalitional scenarios of players), Bayesian theory (assuming three classes of equal-opportunity prior probability for candidate features; prior probability referring to one’s belief about the possible scenarios of the data-generating process), and mechanism design (setting the end goals as constraint equations for the valuation solutions). This allowed him to directly evaluate the overall performance of each variable in a broad set of modelling scenarios, creating a better way to pick and choose given variables.
According to Dr Hu, the use of four dichotomous value methods reduces overfitting by 90% within simulated linear models.
There are some benefits to this approach. By dichotomising the marginal effect (splitting this into the marginal gain and marginal loss) and acknowledging the model uncertainty, an analyst could generalise the Shapley Value and the Banzhaf Value. In game theory, the Shapley Value, introduced by Nobel laureate Lloyd S. Shapley, is the solution of fairly distributing gains and costs to several actors working in coalition even if their contributions are unequal. It looks at the number of combinations it would take among a set of actors before one’s contribution to their overall success is on a fair par to everyone else’s. The Banzhaf Value, which is simpler, takes the average marginal gain in all possible coalitions. By using dichotomisation, Dr Hu is able to introduce a concept of bias and discover the symmetry in the Banzhaf Value and the asymmetry in the Shapley Value. For the Banzhaf Value, the bias is zero for each player. If the payoff function has, on average, diminishing (or increasing) marginality as the coalition size grows, the Shapley Value has a negative (or positive, respectively) bias.
Introducing dichotomous valuation
From the outset, it’s important to recognise the applications of Dr Hu’s work across machine learning, statistics, and econometrics. This means that by developing the Shapley Value (which has become increasingly popular in a data-modelling and forecast context), statisticians can also supply solutions to reduce endowment biases in behavioural economics. At the same time, it’s important to understand that all data modelling is subject to both objective and subjective uncertainty.
So, when assessing a general data set, it makes sense to approach this decision with a blank slate and assume no prior knowledge about the relationship between covariates. As a result, this means someone is equally likely to select a given variable before the selection process begins (the equality of opportunity is formally justified by Keynes’ principle of indifference).
This leads onto the focus of Dr Hu’s work: dichotomous valuation. Simply put, this is a way of equally weighing up binary outcomes of a given variable. For instance, imagine a jury made up of 12 jurors. A given juror may have two pivotal situations, either turning a potentially losing vote into a winning one or vice versa. The probability of such pivotal situations measures the power of that juror. That means when a player is looking at the relationship between a marginal gain and a marginal loss; these can be combined to produce a dichotomous value.
The endowment bias
A benefit of this is that a dichotomous value is not a constant and is dependent on underlying data. This in itself is useful, but the real value for statisticians comes when the dichotomous value separates the marginal gain and marginal loss so a bias between the two can be formulated and therefore analysed. To use the technical term, the ‘endowment bias’ can be boiled down to the concept of people ascribing more value to things they own, rather than things they do not own (even when such things are exchangeable for value). However, mitigation of the bias changes somewhat when used in the context of Shapley Values and Banzhaf Values.
New selection methods can be formulated which Dr Hu’s work suggests will succeed in removing the first-mover advantage – a significant source of skewing and bias.
With Banzhaf Values, the bias is simply zero for all players. However, this ignores the uncertainty in the marginal effects. For example, imagine a company that is setting up a new division and deciding who is going to head it up. If the company does this internally, this means looking at candidates they already employ. As a result, they will know these candidates’ marginal production and how well-suited they are to the job. However, if they go external and hire someone in, then there is a degree of subjective uncertainty about how productive that person will be. Therefore, a positive endowment bias is preferred in a risk-averse valuation, other things remaining unchanged. In layman’s terms, this is very much a case of ‘better the devil you know’.
Looking briefly to the Shapley Value and its use in modelling and forecasting, the Shapley Value tends to demonstrate substantial negative endowment bias. This counterintuitive issue could bring users undesirable inference from the data and eventually limit the usage of the value concept.
Therefore, to help make for more objective valuations, Dr Hu suggests systematically eliminating aggregate bias by unevenly weighting the two dichotomous marginal effects such that the weighted marginal gain matches the weighted marginal loss. According to Dr Hu’s work, the use of unbiased dichotomous value methods reduces overfitting by a significant 90% within the studied simulations. An overfitted model in statistics and machine learning is inaccurately trying to predict a trend in data when in reality, the model is too noisy for the data.
This is all important as it addresses a fundamental issue in machine learning, econometrics, and statistics, especially with Dr Hu’s discovery of several new properties for the widely used Shapley Value. By decomposing the value of an expected marginal loss and expected marginal gain, statisticians can assume a particular type of equal-opportunity priors (by a mechanism design, two of the four new value concepts are fair-division solutions to the expected performance of the latent data-generating process; another one is an approximate fair-division solution). Also, the unbiased Shapley Value can be represented as a compound of another class of dichotomous value.
Based on this, new selection methods can be formulated which Dr Hu’s work suggests will succeed in removing the first-mover advantage – a significant source of skewing and bias. As touched upon, the key is how these gains and losses are weighted, and this can be changed in different ways depending on the selection system being used. The practical applications of this work are intriguing, and Dr Hu has already been able to use analysis of aggregated marginal loss and gain in a labour market to allocate the net profit between employed and unemployed workers. Depending on specific contexts, such ideas can be applied to other areas of economics, political science, and statistics.
What are you planning to study next?
<> Like a coin, everything has two sides; both head and tail should be valued properly. Therefore, the dichotomous valuation could be used in any decision-making analysis. For example, I have applied it to elections (https://doi.org/10.1007/s00182-006-0011-z). I have also used it to derive a fiscal-policy rule to allocate unemployment benefits (arXiv 1808.08563) and to integrate the rule within a monetary policy framework. Fair valuation of financial assets is also on my research agenda. Finally, forecasting GDP is always challenging; each predictor variable could contribute positively or noisily at the same time. I am looking for a way to filter out the noise.