Why Random Forest is better than Regression model to catch weak alpha signals?
One factor might be useful only under some specific conditions and becomes random noise under other circumstances. The high alpha happens in a short period, but it is diluted by the long useless time, and it results in the alpha is insignificant on average. Furthermore, it is also possible some signals are good at predicting catastrophe but not useful in a quiet environment.
On linear regression methodology, we use information ratio to measure signal performance all-weather. It is unfair for the weak signal mentioned above. For example, there are two famous signals in volatility surface shape: volatility surface skew and spread between put and call options. They result from an imbalance of supply and demand. It is shown in the paper “Option Prices Leading Equity Prices: Do Option Traders Have an Information Advantage?” that these signals have the significant predictive ability in information events, like earning announcement, but insignificant predictive ability in a usual quiet period. Because the OTM put option is regarded as a low-cost method to short stock, these two measures are good at predicting big drops.
When we are building decision trees, we intrinsically use factor timing to create a random forest prediction model. All combinations of timing probability are considered in the “random building process.” That is why we think a random forest is better than a regression model for the super weak alpha signal.