Wikipedia version

The stochastic gradient descent (usually shortened to SGD), also known as incremental gradient descent, is an iterative method for optimizing the differentiable objective function, a stochastic approximation of the gradient descent optimization.

An article in 2018 year implies Herbert Robbins and Sutton Monro developed SGD in their 1951 year article entitled "Random Approximation." See Random Approximation for more information. It is called random because the samples are randomly selected (or shuffled) rather than as a single group (such as a standard gradient descent) or in the order in which the training set appears.

Read More