Wednesday, December 18, 2019 - 10:00am to 12:00pm


6501 Gates & Hillman Centers


Daniel Clothiaux

For More Information, Contact:

Stacey Young,

The LTI is proud to announce the following PhD Thesis Proposal:

Dynamic Data Augmentation in Complex Low Data Domains


Ravi Starzl, (Chair)
Jaime Carbonell
Oana Carja
Randy Viola, (The Steadman Clinic)


Training data is a critical component of most supervised learning systems, but in many cases there is not enough (or any) labelled in domain training data to get good results.  In order to be able to train systems in these cases, we apply a data augmentation framework as follows: first, we collect a small set of exemplars, which we then generalize from by applying a generative model based on an underlying theory of the system to create a broad range of synthetic data.  We augment this with unsupervised learning to push the generated data towards the unlabelled target data, as well as through active learning.

We first apply this approach to handwriting.   We generate by first developing a physics and spline based handwriting generation system, applied to handwritten fonts, and a generative adversarial network to generate data that mimics the target dataset without requiring labels, followed by active learning.  We then tested the validity of the set of handwritten fonts that we collected, by running a large scale qualitative and quantitative comparison between handwriting, fonts, and handwritten fonts, and found that handwritten fonts occupy a space between fonts and handwriting.

Finally, to show that our approach can be used more broadly than in just handwriting, we will apply it to another domain, modeling the dynamic ecology of niches of the microbiome.  We will model competition between two similar species that occupy the same niche, using both stochastic methods as well as information theory to create a set of simulated microbiomes, which can be used to elucidate the evolution of the system over time.  We will then test our predictions empirically both in vitro and in vivo.

For a copy of the thesis proposal, please go here


LTI PhD Thesis Proposal