This is just mathematical background needed to understand the book.It contains the Math Appendix and Ch5 (the Math tools chapter). I will collect substantive results from the book in a separate post.
For further reference:
- Bertsekas-Shreve 1978: Stochastic Optimal Control, The Discrete Time Case (fairly advanced due to its generality and abstractness, to my surprise)
- Yong-Zhou 1999: SMP & HJB
- Fleming-Soner 2006: Controlled Markov Processes and Viscosity Solutions
- Oksendal-Sulem 2007: Applied Stochastic Control of Jump Diffusions
- Pham 2010: Cont-time Stochastic Control and Optimization with Fin Applications
- Touzi 2013: Optimal Stochastic Control, Stochastic Target Problems, and Backard SDE (advanced)
Appendix: Stochastic Calculus
definition of Brownian Motion;
definition of stochastic (Ito) integral. Shorter way:
for adapted, L2 proecss , define
this is called an Ito process, and often written as . Can show that this SP is a martingale.
More generally, Ito process can be written as
generally, we just need and to be adapted and satisfy certain integrability conditions. But in the special case where , the equation is called SDE. But namewise, book also mentioned that Ito processes are stochastic processes satisfying SDEs with Brownian noise terms.
definition of stochastic (Ito) integral. Rigorous way:
- define Ito integral for simple functions
- prove that any can be approximated, in , by sequence of simple functions
- define Ito integral for as the limiting value of the Ito integral of the sequence of simple functions
Ito isometry: for adapted ,
infinitesimal generator: the generalization of derivative of a function, to make it applicable to stochastic process.
This is the generator of an Ito process satisfying a certain SDE, e.g., .
Jump Processes
Poisson process
, valued in
, with intensity param
, is a SP s.t.:
- , a.s.
- has Poisson distro with param :
- has independent increments: is independent of
- has stationary increments:
Classic result 1: time between successive jumps of are independent, and exponentially distributed.
compensated Poisson process:
with
. Note that this is a martingale.
as with BM, we can define stochastic integrals wrt compensated Poisson processes in a way that the resulting integral process is a martingale.
let be adapted, define stochastic integral of wrt by:
, where the ’s are jump times.
- need , not , to make integral a martingale.
- alt. def: replace first term with , where , which in this case is either 0 or 1. sum over a continuum of , what is the formal def?
Ito formula for Poisson process
recall we can write
Ito’s formula for such process
is:
suppose
satisfies
for differentiable
. Then:
or written in compensated poisson process.
we also see from this (compensated version) formula that the generator of the process is
jump diffusion
Ito formula for jump diffusion for the above , let be defined by , then:
again, common to write it using compensated poisson process .
compound Poisson process is built out of:
- a Poisson process with intensity
- a collection of iid RVs , with common distro
. The process jumps when Poisson event arrives, but the jump size is drawn from .
We can show, as before:
defined by is a martingale
we can define stochastic integral wrt compound Poisson too.
note that
a Ito’s formula for where
Doubly Stochastic Poisson Processes (Cox process)
these are jump processes which has stochastic intensity
given counting process , we want its intensity process be stochastic
the approach is to give a way to compute the probability that an event arrives at , given info we have at time : to define , where is the natural filtration generated by .
this means:
the driver of the intensity process can be diverse, leading to Feller/OU/Hawkes processes.
as before, can define its compensated version which is a martingale; can define stochastic integral wrt the compensated doubly stochastic Poisson process, and can derive a Ito’s formula for such integral processes, and from which we can derive an expression for the generator of the joint process .
Feynman-Kac
certain linear PDEs are related to SDEs.
Let be an Ito process satisfying: .
The generator of is then where
Now suppose we try to solve PDE: , with terminal condition ,
then we have a probabilistic representation of solution :
(note there is a typo in the book)
Consider the simplest example:
Now introduce a BM , and define
is a martingale, and Markov: for some
use Ito’s lemma to write out
devide the above by and take limit, we get
by definition, . Thus, satisfies the PDE. Recall .
Ch5 Stochastic Optimal Control and Stopping
A few motivating examples (just to be familiar with notation)
Merton Problem
value function: , where:
- at , place dollars in risky asset
- wealth level is
state dynamics follow:
is the admissible set, the set of -predictable, self-financing strategies satisfying . (to prevent doubling strategies)
Optimal Liquidation
state dynamics follow:
- (note the sign)
is the set of -predictable, non-negative bounded strategies (excluding repurchasing of shares, and keep liquidation rate finite)
optimal Limit Order placement identical value function expression, just change the to , which means that agent posts a LO at when current stock price is .
state dynamics:
- denotes market orders
book mentioned uniform distribution , but I don’t quite understand it. The logic is straightforward though: the probability of your limit order getting executed is a decreasing function of .
below the book gave 3 types of problems:
- control for diffusion proecsses
- control for counting processes
- countrol of stopping times
The derivation of DPP & HJB may seem pedantic, but one has to see such arguments at some point. I find the book’s exposition at a nice balance of rigor and accessibility (sacrificing a bit generality, compared with other treatment say in Nisio’s book) therefore I will not skip the derivations and only provide a “cookbook”. Due to the similarities, I will only copy the derivation of DPP&HJB for control of diffusion processes.
general control problem for diffusion processes
problem statement:
where:
Here is the set of -predictable processes s.t. the state dynamics admits a strong solution. Also assume some nice properties of such as Lipschitz continuity.
note that predictability is necessary since otherwise the agent may be able to peek into the future to optimize her strategy.
we embed optimization problem into a larger class of problems indexed by time, but equal to the original problem at .
performance criterion (associated with )
value function
First, we establish DPP:
, and stopping times , we have:
We prove this by showing two-sided inequality.
using LIE:
we know that
taking supremum over on the RHS, and then taking supremum on the LHS, we finally get:
Now, we show the reverse inequality.
assuming the value function is continuous in the space of controls, we pick a control such that it is almost perfect:
Take an arbitrary control , modify our almost-optimal control:
Note that this modified control is almost-optimal after , but suboptimal on . Anyway, we have
by LIE:
the RHS above is equal to:
the RHS above satisfies inequality:
let , and take the supremum of RHS, we arrive at the desired inequality.
the DPP is really a sequence of equations. An even more powerful equation can be found by looking at its infinitesimal version – DPE (HJB).
two key ideas in deriving DPE:
let be small: specifically, let it be the minimum between the time it takes for to exit a ball of radius around the starting position, and a fixed small time . note that if we let , we would eventually have , since as shrinks, it is less and less likely that will exit the ball first.
assuming enough regularity of value function, write the value function using Ito’s lemma:
Note that here is an arbitrary control, and is the infinitesimal generator of . Note also that as an example, , so it is about the local behavior: note that the in is an action, not the whole (time-indexed) strategy.
Now, we derive HJB:
We prove this by showing two-sided inequality.
take arbitrary such that it is CONSTANT over
as shown before (DPP),
by our choice of , can show the stochastic integral is indeed a martingale, therefore we have:
Now we let , so that
RHS is equal to: , where we used the mean value theorem
this this inequality holds for arbitrary , take the supremum we have:
note that the here is used a bit loosely here. Sometimes it denotes the whole strategy, sometimes it denotes the constant action applied in .
Now we show the reverse inequality, by showing that for the optimal control , we have
by LIE,
apply Ito’s lemma as before, writing in terms of , we will find the desired equality.
Combined these two parts, we arrive at DPE (HJB):
Note that optimal control in HJB is an action, and can be written in feedback form in terms of the value function. Substituting this optimal control back into HJB we get non-linear PDEs.
Often we define (maximized) Hamiltonian as:
Note that some define Hamiltonian with generic costates, and optimality is associated with those costates being equal to partial derivatives of value function.
DPE provides a necessary condition for optimality. We use verification theorems to prove sufficiency. Basically, it says that if you can find a solution to DPE, and demonstrate that it is a classical solution (once differentiable in time and twice differentiable in state vars), and the resulting control is admissible, then the solution is indeed the value function, and the resulting control is indeed the optimal Markov control. Under some more technical assumptions, can show that the optimal control is indeed Markov, and therefore we have found not just the optimal Markov control but the optimal predictable control.
(not so general) control problem for counting processes
just a special case to build intuition: agent control the frequency of jumps of a counting process .
performance criteria
value function , where:
- is control process
- is a controlled doubly stochastic Poisson process, starting at , with intensity
- as a result, is a martingale
DPP
DPE
Note that
Thus, if we plug in the infinitesimal form, we get:
if , then optimal control is to make as large as possible or as small as possible depending on the sign of , i.e., bang-bang controls.
Two ways to break this uninteresting feature:
- add another SP driven by the counting process, thus this new SP is controlled indirectly, and have this new SP affect performance
Similar DPE for the problem with jump-diffusions.
Stopping Problems
performance criterion
where is a jump diffusion following:
where is a multi-dim counting process with intensities .
value function
The difficult problem is to characterize the (boundary) of the stopping region. Again we have our DPP and DPE.
DPP , for all stopping times .
intuition: if occurs prior to , then agent’s value function is just the reward at . If not, then at agent receives the value function evaluated at the current state.
DPE solves the variational inequality:
, on .
The proof follows Touzi 2013, which is nice as it is related to viscosity solutions (needed when value function itself is not smooth enough to differentiate)
Interpretation of this HJBQVI:
in the continuation region, is a martingale
in the stopping region, if you don’t stop, the linear operator tries to render the value function negative. But we pin it to the reward (constant). therefore, it is again a martingale.
so the SP corresponding to the flow of the value function is a martingale on the entire .
Finally, for combined stopping and control problems, we have DPE of the same format, but now we need to also do optimization in the continuation region, therefore the DPE reads:
You see now that in the continuation region, the value function needs to satisfy a general non-linear HJB, instead of a linear PDE.