Read this paper in November 2012 (backdated post).
Compare this paper to many similar ones with general advice or observations on Machine Learning, such as the one by Andrew Ng.
These are the useful things Pedro chooses to highlight in this paper.
- Selection of an ML algorithm is simpler if you understand the three components: representation, evaluation (scoring function), optimization.
- Generalization is the goal. Use cross validation, optimize for the test set, not training data.
- Data and algorithms must be combined with domain knowledge and experience for good results. This is a good thing. Anti-Kaggle.
- Understand bias vs. variance in overfitting. Use techniques like regularization to combat them.
- Curse of dimensionality
- Theoretical guarantees in ML algorithms are not a criterion for practical decisions.
- Feature engineering is the most important contributor to success/failure of ML.
- More data > algorithmic sophistication, but adds scalability issues. Try the simplest learners first.
- Use ensemble methods. More models are better.
- “… Simpler hypotheses should be preferred because simplicity is a virtue in its own right, not because of a hypothetical connection with accuracy. This is probably what Occam meant in the first place.”
- Just because something has a representation doesn’t mean it can be learned.
- Correlation is not Causation.