Publications

From GusWiki

Revision as of 11:08, 8 December 2009 by Gusl (Talk | contribs)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search

See also: Tutorials, Class projects


Note: I take full responsibility for the content on this page and any other pages that don't have an "edit" link, as they are only editable by me. This notice also appears on my homepage. -- Gustavo Lacerda


Contents


Statistics on Structured Data

Essentially a write-up of the first two months of my MSc research (minus the "learning R" part).

Using simulations we find that, in a maximum likelihood setting, the true block structure is recovered most often when the clustering strength parameter is underestimated. Perhaps not too surprising, considering the size of the data is fixed.

Causality

Generalizes the LiNGAM method to deal with cycles, and proposes stability as a partial solution to the underdetermination. Cyclic SEMs correspond to linear dynamical systems. The non-Gaussian model leads to a finer level of identifiability than what can be achieved in the Gaussian case (e.g. by Richardson's CCD), and allows us to relax the faithfulness assumption. We prove theorems about identifiability, specifically about when a unique model can be identified. Besides the new results, this paper also contains a novel presentation of the LiNGAM method.


How to intelligently combine LiNGAM with methods based on conditional independence tests (this is useful when it may be the case that more than 1, but not all error terms are Gaussian). (Future work: to make this smoother, use a Bayesian search that considers many equivalence classes)


NLP / Information Retrieval

I think this paper was a blend of many people's independent projects. My part was building a bilingual Portuguese-English dictionary from a parallel corpus. This involved doing statistical word-alignment before I knew anything about machine learning. Since we had a very large corpus, it worked out ok. I created a score that used proximity in location, a cognate heuristic, word-length correlations and an assumption that synonyms do not appear in the same sentence; finally I bootstrapped from a hand-made dictionary of 100 word-pairs.

Student modeling



Logic

subtitle: "Short theorems can't have arbitrarily long proofs as their shortest proof" (a.k.a. "Upper-Bounding Proof Length with the Busy Beaver")

This note presents a Chaitin-esque result. I derive an (uncomputable) upper bound on the length of the shortest proof of any given statement, as a function of the length of the statement; and briefly discuss implications. Mathematically trivial, but original (to the best of my knowledge). Could possibly be useful if we ever have good estimates of BB for n large enough to encode an interesting question (Disclaimer: this seems VERY unlikely)

Almost certainly my last excursion in mathematical logic.

Navigation