Gustavo Lacerda
 

+1 9I7 655 87O7


I am a PhD student in Statistics at Columbia University, working with Liam Paninski. I am interested in machine learning and information theory, the geometry of exponential families, graphical models, state-space models, functional data analysis / spatial statistics, structured non-iid data (e.g. networks), low-rank methods (Independent Component/Subspace Analysis, manifold learning), sparsity, and techniques to make computation tractable.

CV                    Projects

academic history

Columbia University 2010-, PhD in Statistics

University of British Columbia 2008-2010, MSc in Computer Science

Carnegie Mellon University 2006-2008, programmer for HCII, researcher at Machine Learning Department

Universiteit van Amsterdam 2003-2005, MSc in Logic at ILLC

Bucknell University 1997-2001, B.S. in Mathematics and Computer Science

why so many places, so many degrees?

conferences and summer schools

IPAMGSS 2007       ICML/UAI 2008       SFI Summer School 2009
NIPS 2008, 2009       CogSci 2008, 2009.

technical tips (this is largely for myself)

Collaborative Q&A sites: mathoverflow, math.stackexchange, CrossValidated

If interested in computationally-intensive problems, learn Julia, learn distributed programming.

Learn R. Use IRC. Visit the #R channel on FreeNode and get your R questions answered. Emacs users can use IRC by doing "M-x erc".

Use my code: R-helpers

statistical puzzles

Suppose you have (Xi,Yi) are i.i.d. from a bivariate Gaussian with correlation ρ, which we are interested in estimating. Suppose further that marginally, X and Y are standard normal. Unfortunately, the data manager was using a spreadsheet and accidentally sorted X without sorting Y, losing the information of which X goes with which Y. Is the data useless?



picture of Gustavo


blog

papers all publications Google Scholar

Identification of gene modules using a generative model for relational data (PDF, slides) - UBC Master's thesis (2010), supervised by Jennifer Bryan.

Discovering Cyclic Causal Models by ICA (UAI2008) (paper, video lecture with slides) extends LiNGAM to discover cyclic models; The non-Gaussian model leads to a finer level of identifiability than what can be achieved in the Gaussian case (e.g. by Richardson's CCD), and allows us to relax the faithfulness assumption. We prove theorems about identifiability, specifically about when a unique model can be identified.

(draft) Upper-Bounding Proof Length with the Busy Beaver (2008) (PDF) - This note presents a Chaitin-esque result. I derive an (uncomputable) upper bound on the length of the shortest proof of any given statement, as a function of the length of the statement; and briefly discuss implications. Mathematically trivial, but original (to the best of my knowledge). Could possibly be useful if we ever have good estimates of BB for n large enough to encode an interesting question.

see all papers



tutorials

- Independent Component Analysis (ICA) (slides) Introduces ICA, and tackles some very common misconceptions, 30 minutes.

- Introduction to Kolmogorov Complexity (with Liliana Salvador) (slides), 45 minutes.

- Introduction to Machine Learning and Bayesian inference (slides), 45 minutes.

demos

slice sampling



some things I like

argument mapping, bikes, bluegrass, contact improvisation, DreamWidth, emacs, functional programming, GiveWell, infoviz, musical instruments, open data, Quantified Self, wikis.

food for thought

"You and Your Research", by Richard Hamming

"Why People Are Irrational about Politics", by Michael Huemer

"Why I defend scoundrels", by Yvain

Paul Graham: "How to do Philosophy", "Why nerds are unpopular"

LessWrong: Applause Lights
"Illusion of Transparency: Why No One Understands You"

Ribbonfarm: "A Big Little Idea Called Legibility"

Ben Goldacre: "The Information Architecture of Medicine is Broken"

blogs

Andrew Gelman - Statistical Modeling, Causal Inference, and Social Science

Cosma Shalizi - Three-Toed Sloth

Cathy O' Neil - mathbabe

Peter Gray - Freedom to Learn


This website is permanently under construction. You may notice that behind this frontpage is a MediaWiki site. Someday I'd like to have indexing. For now, keyword searches will have to do. RIP Xanadu
.