Econometrics

Intro & definitions

 * Exogenous variables are entered INTO to the model, endogenous ones are determined BY the model. Endogenous variables are entered into the model with a certain value, but the model then has the mechanism that accounts for how the variable changes over time. Change of exogenous variables comes not from the model but from outside. Say we came up with a model and now there is a policy change. We enter the latter as exogenous variables and look at how the endogenous variables respond.
 * Gap between textbook econometrics (structural models to be solved analytically and which are - per economic theory - assumed to be true) and applied econometrics. Textbook econometrics then yields structural parameters which can be estimated - i.e. it's not the model itself that is estimated
 * Structural as opposed to regular parameters have a theoretical significance, i.e. are not mere mathematical devices
 * The disturbance term in a regression equation reflects the influence of those variables affecting the dependent variable that have not been included in the regression equation. Inertia or the persisting effects of excluded variables of the model –and included in u- is probably the most frequent cause of positive autocorrelation.
 * First order condition: Suppose you have a differentiable function f(x), which you want to optimize by choosing x. If f(x) is utility or profit, then you want to choose x (i.e. consumption bundle or quantity produced) to make the value of f as large as possible. If f(x) is a cost function, then you want to choose x to make f as small as possible. FOC and SOC are conditions that determine whether a solution maximizes or minimizes a given function. At the undergrad level, what is usually the case is that you need to choose x∗ such that the derivative of f is equal to zero: f′(x∗)=0. This is the FOC. The intuition for this condition is that a function attains its extremum (either maximum or minimum) when its derivative is equal to zero (see picture below). (You should be aware that there are more subtleties involved: look up terms like "interior vs corner solutions", "global vs local maximum/minimum", and "saddle point" to learn more). First_order_condition.jpg However, as the picture illustrates, simply finding x∗ where f′(x∗)=0 is not enough to conclude that x∗ is the solution that maximizes or minimizes the objective function. In both graphs, the function attains a zero slope at x∗, but x∗ is a maximizer in the left graph, but a minimizer in the right graph. To check whether x∗ is a maximizer or a minimizer, you need the SOC. The SOC for maximizer is f′′(x∗)<0 and the SOC for minimizer is f′′(x∗)>0. Intuitively, if x∗ maximizes f, the slope of f around x∗ is decreasing. Take the left graph, where x∗ is a maximizer. We see that the slope of f is positive on the left of x∗ and negative on the right. Thus, around the neighborhood of x∗, as x increases, f′(x) decreases. The intuition for the case of minimizer is similar.
 * First order differential equation: the order corresponds to the biggest derivative in the equation (if it has a second derivative then its a second order differential equation)

Tricks/procedures

 * Why do we sometimes take the log of a time-series? What does this do to the time-series?
 * Because the coefficients obtained directly give the respective elasticities instead of having to take the partial derivatives; i.e. the log-log functional form, where both the dependent and independent variables are log-transformed, is very convenient to compare growth rates in % and %
 * To normalize a skewed series for e.g. OLS regression assumes that the errors, as estimated by the residuals, are normally distributed
 * Linearize: It changes the dependence of Y on the Xs so that curved relationships look linear (i.e. the functional form)
 * Substantively, sometimes the meaning of a change in a variable is more multiplicative than additive. For example, income. If you make $20,000 a year, a $5,000 raise is huge. If you make $200,000 a year, it is small. Taking logs reflects this: Difference between log (25,000) and log (20,000) is 0.22; between log (200,000) and log (205,000) is just 0,03

VAR
values of all the variables in the system.
 * A vector autoregression represents each variable as a function of the lagged
 * In vector autoregressions with no restrictions all variables are equally endogenous, i.e. determined interdependently via time lags. In SVAR models theoretical restrictions are imposed in order to permit the unambiguous identification of impulses and propagation mechanisms

Time-series

 * Any random time-series variable has a data-generating process (e.g. random walk or martingale), which can be described in a univariate manner using only that time-series and adding, say, the average of the time-series + some stochastic process
 * Stochastic process is stationary when unconditional joint probability distribution does not change when shifted in time. Consequently, parameters such as mean and variance also do not change over time.
 * The most common cause of violation of stationarity is a trend in the mean, which can be due either to the presence of a unit root or of a deterministic trend. In the case of a unit root, stochastic shocks have permanent effects, and the process is not mean-reverting. In the latter case of a deterministic trend, the process is called a trend stationary process, and stochastic shocks have only transitory effects after which the variable tends toward a deterministically evolving (non-constant) mean.
 * If there exists a stationary linear combination of non-stationary random variables, the variables combined are said to be cointegrated
 * Cointegration
 * Helps identify the degree to which two variables are sensitive to the same average price over a specific period of time
 * Does not reflect whether the pairs would move in the same or opposite direction, but can tell you whether the distance between them remains the same over time
 * The movement of these variables are not related. However, in the longer term, the variables may track a common average value
 * Identifies variables that would not drift too far away from each other in the longer term and would revert to a mean distance between them
 * Time-series data is regarded as generated by a data-generating process that is a random walk with drift
 * Unique data ordering (linearly ordered)
 * Three types of time-series variables: rates/ratios, flows, stocks which have different timewise features - stocks are very autocorrelated/path-dependent while rates are little related to the past
 * Most times-series variables are interdependent and don't generate themselves (i.e. exogenously given)
 * In-sample out-of-sample: suppose sample of n=10; divide into two parts (training set and validation set) - e.g. first 7 data points for estimating the model parameters and next 3 data points to test the model performance. Using the fitted model, predictions made for the first 7 data points will be called in-sample forecast and same for last 3 data points will be called out of sample forecast. This is same as the idea of splitting the data into training set and validation set. Put differently: the difference between the y-predicted and y-observed in your sample is the in-sample-loss while, once you have your model and add new data (without adjusting the model), the difference between new y-predicted and new y-observed is the out-of-sample loss
 * Target function y=g(x1,...z1) is unknown because it includes unknown associates and non-observables (i.e., z). So a postulated theoretical structural model with parameters is devised (y=g(x1..., κ) where the target variable is already known and we try to estimate the parameters. This requires the assumption that the omitted variables don't matter for the structural parameter estimates and that we've captured the correct functional form (which we come up with based on theory). The whole (conventional econometrics) point is: how do we consistently estimate our κ rather than how close is our g(·) to our unknown, true g(·)
 * Variance decomposition: variance in sample residuals + bias of estimator
 * Over- and underfitting of the model corresponds to the trade-off between variance bias of your model estimator. Undefit = too high variance but decreased bias or overfit = too high bias which decreases variance