Risk in Machine Learning and Life

Introduction

Until recently, the explanatory power of the risk-reward framework was lost on me. The ubiquity of advice that encourages risk-taking and the abundance of quotes that variate on the theme of “no risk, no reward” familiarized me with the concept of risk to such a degree that the topic did not feel novel enough or controversial enough to think about. I subconsciously trusted that such advice and quotes were probably true – that, indeed, an individual’s outcome in life could be attributed at least in part to the magnitudes and directions of the risks that the individual took. But I understood neither how nor why this was the case.

While realizing and developing the analogy between neural network training and life progress that I wrote about in my last blog post, I analyzed more closely the interaction between risk and achievement in machine learning systems and humans systems alike. The result of such analysis was the deeper appreciation for the necessity of risk-taking in the search for optimal machine learning models and the pursuit of personal fulfilment that I will describe throughout the rest of this post.

Here, I first elaborate on the claim that the learning rate hyperparameter is an embodiment of risk. I examine the effect of small, moderate, and large learning rates on the performance trajectory of machine learning models during training. I then return to the human domain, justifying on top of the prior context how and why, in my view, risk is a necessary ingredient for the fulfilment of personal ambitions.

Learning Rate as a Measure and Modulator of Risk

In the context of neural network training, the learning rate hyperparameter governs the extent to which the parameters of a neural network are updated or changed during training. When the learning rate is very small, the parameters of the network change very slightly; when the learning rate is very large, the parameters of the network change very significantly; and when the learning rate is zero, the parameters of the network do not change at all. This maps well onto the commonsense understanding of the relationship between risk and reward in life: small risks generally yield small changes, large risks generally yield large changes, and the absense of risk altogether generally yields no change at all.

The analogy becomes more complete when we acknowledge how parameter changes of different magnitudes, as induced by learning rates of different magnitudes, tend to affect the convergence behavior of machine learning systems during training:

Very slight changes in the parameters of a machine learning model during training tend to move the state of the model towards the point of optimality, but the movement is not by much; it is likely that the model, just narrowly inching in the right direction, will never reach the point of optimality in a reasonable timeframe under these conditions.
Very significant changes in the parameters of a machine learning model during training tend to move the state of the model significantly, but the movement may overshoot the point of optimality by a large margin; it is likely that the model, volleying and long jumping between distant suboptimal states that straddle the point of optimality, will never reach the point of optimality under these conditions.
The absense of change in the parameters of a machine learning model during training, of course, will not move the state of the model at all; unless it was luckily initialized there, the model will never reach the point of optimiality under these conditions.

The assertion that a very small learning rate generally yields a model that never reaches the point of optimality while a model trained with a very large learning rate overshoots it suggests the existence of a moderate learning rate that achieves what the extremes do not. To me, this solidifies learning rate as a representation of risk magnitude: the relationship between risk magnitude and expected life outcome seems to parallel that betweeen learning rate and expected model quality, as I discuss in the next section.

Risk as a Necessary Ingredient in the Pursuit of Personal Fulfilment

My understanding of risk as a necessary ingredient in the pursuit of personal fulfilment and the realization of one’s ambitions aligns with this analogy. When an individual iteratively takes very small risks, they make progress towards the ideal outcomes to which they aspire, but the progress may be too slow and incremental for the individual to ever achieve those outcomes. When an individual iteratively takes very large risks, they may overshoot the ideal the ideal outcomes to which they aspire in a favorable or unfavorable direction, the flip-flopping whiplash of which may prevent them reaching the desired state. And when an individual takes no risks, they stagnate and move no nearer to or further from the realization of their goals. It is when an individual iteratively takes risks of a moderate and appropriate size that they make sufficient progress and build sufficient momentum to satisfy the objective.

To me, this lesson seems to explain the classical and archetypal stories of great accomplishment and impact in entrepreneurial ventures spanning technology, academia, and medicine; I’ll save a presentation and discussion of such stories for a future post.

In conclusion, we can rationalize and build an intuition for the risk-reward framework in life by examining its influence in the context of machine learning. We observe that risk-taking seems vital to the achievement of an optimal outcome and that risk-sizing is an essential exercise here.

Note: I do not attempt here to define or provide specific examples of very small risks, very large risks, the absense of risk, or perfectly-sized risks. The magnitude of a risk is, as I see it, extremely circumstantial, varying as a function of an infinite set of variables that includes much more than even an individual’s history and current state . To craft a scenario that effectively relays, appropriately weighs, and captures with nuance the interations between all these variables in sizing a risk is infeasible.

Enjoy Reading This Article?