This textbook shows how to bring theoretical concepts from finance and econometrics to the data. Focusing on coding and data analysis with R, we show how to conduct research in empirical finance from scratch. We start by introducing the concepts of tidy data and coding principles using the tidyverse family of R packages. Code is provided to prepare common open-source and proprietary financial data sources (CRSP, Compustat, Mergent FISD, TRACE) and organize them in a database. We reuse these data in all the subsequent chapters, which we keep as self-contained as possible. The empirical applications range from key concepts of empirical asset pricing (beta estimation, portfolio sorts, performance analysis, Fama-French factors) to modeling and machine learning applications (fixed effects estimation, clustering standard errors, difference-in-difference estimators, ridge regression, Lasso, Elastic net, random forests, neural networks) and portfolio optimization techniques.
Highlights
Self-contained chapters on the most important applications and methodologies in finance, which can easily be used for the reader's research or as a reference for courses on empirical finance Each chapter is reproducible in the sense that the reader can replicate every single figure, table, or number by simply copying and pasting the code we provide A full-fledged introduction to machine learning with tidymodels based on tidy principles to show how factor selection and option pricing can benefit from Machine Learning methods Chapter 2 on accessing and managing financial data shows how to retrieve and prepare the most important datasets financial economics: CRSP and Compustat. The chapter also contains detailed explanations of the most relevant data characteristics Each chapter provides exercises based on established lectures and classes which are designed to help students to dig deeper. The exercises can be used for self-studying or as a source of inspiration for teaching exercises