**Predicting ETF Returns using Information in Option ** **Volume and Implied Volatility **

Author: David Popescu Student Number: 12273716

University of Amsterdam, Amsterdam Business School MSc Finance

Quantitative Finance Track Master Thesis Supervisor: Dr. Xiao Xiao

Date: 03/07/2022

**Statement of Originality **

This document is written by student David Popescu who declares to take full responsibility for the contents of this document.

I declare that the text and the work presented in this document are original and that no sources other than those mentioned in the text and its references have been used in creating it.

The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

**Acknowledgements **

I would like to express my gratitude to Dr. Xiao Xiao for their unmatched supervision. I am especially grateful to my parents for supporting my education and for helping me become the man I am today. I sincerely enjoyed conducting this research and I hope that the reader will find my paper enjoyable as well.

**Abstract **

The literature around ETF return prediction using option information is still in its inception phases.

This paper analyzes whether measures of option volume and option implied volatility can predict future returns of the ETFs from the U.S. market. My methodological approach is double-edged, consisting of portfolio analysis and cross-sectional regressions with controls in order to evaluate the predictability potentials from multiple points of view. The results indicate that one of the option volume measures, the options-to-shares ratio, is negatively linked to ETF returns at 5 and 10 days in the future, and that the choice of measure is of paramount importance. Moreover, I find that option implied volatility measures can strongly and positively predict next-day ETF returns.

However, the implied volatility measures can negatively predict ETF returns at 10 days in the future when using options with expirations of 91 days.

*Keywords: Exchange-traded funds (ETFs), ETF returns, option volume, implied volatility, *
return predictability

**Table of Contents **

1. Introduction ... 7

1.1 Motivation and Research Question ... 7

1.2 Brief Methodology and Results ... 8

1.3 Contribution to the Literature ... 9

2. Literature Review ... 11

2.1 Understanding ETFs... 11

2.2 Advantages of Options ... 13

2.3 The Leading Market Debate ... 14

2.4 Option Volume ... 15

2.5 Implied Volatility ... 17

2.6 Option Volume and Implied Volatility Together ... 19

2.7 ETFs and Options ... 19

2.8 Hypotheses Formulation ... 20

3. Methodology ... 22

3.1 Independent Variables Construction ... 22

3.1.1 Put-Call Ratio ... 22

3.1.2 Options-to-Shares Ratio ... 22

3.1.3 IV Spread ... 23

3.1.4 Spread in Implied Volatility Innovations ... 26

3.2 Control variables ... 26

3.2.1 Beta ... 26

3.2.2 Size ... 27

3.2.3 Illiquidity ... 27

3.2.4 Short-term Reversal ... 27

3.2.5 Momentum ... 28

3.2.6 Realized Volatility ... 28

3.2.7 Realized-Implied Volatility Spread ... 28

3.2.8 Call/Put Open Interest... 29

3.2.9 Bid-Ask Spread ... 29

3.3 Main Analysis Methodology ... 29

3.4 Robustness Analyses Methodology ... 33

4. Data... 35

4.1 Descriptive Statistics ... 36

5. Results and Discussion ... 40

5.1 Main Analysis ... 40

5.1.1 Single Sorts... 40

5.1.2 Double-Independent Sorts ... 44

5.1.3 Predictability over Time ... 47

5.1.4 Fama-MacBeth Regressions ... 49

5.1.5 Fama-MacBeth Regressions over Time ... 52

5.2 Robustness Checks ... 53

5.2.1 Continuous Sorting Procedure ... 54

5.2.2 Predictability using the Options-to-Share Ratio ... 57

5.2.3 Predictability using the Spread in Implied Volatility Innovations... 58

5.2.4 Predictability using Options with Expirations of 91 Days ... 59

5.3 Discussion ... 61

6. Conclusion ... 63

Reference List ... 67

Appendix A ... 72

Appendix B ... 72

Appendix C ... 73

Appendix D ... 76

Appendix E... 77

Appendix F ... 85

Appendix G ... 93

**1. Introduction **
**1.1 Motivation and Research Question **

How does one predict returns? This question has always been a much-debated topic at all levels of the finance field. The motivation behind this debate is simple and stems from investors’

desire for high returns with minimal risk, although the answer to it is not that straightforward. In accordance, the academic literature has continuously investigated the cross-section of future returns in hopes of finding valid factors and predictors. Harvey et al. (2016) provide a comprehensive study in which they show that more than 316 factors and predictors have been identified by the finance literature to be related to the cross-section of returns. Nevertheless, after the publication of some of them, their predictability wore off as the investors have become aware of their potential (McLean & Pontiff, 2016).

In this context, my study turns to the options market for identifying valid future return predictors. Since 2012, the global derivatives market has been on a constantly rising trend (Fia, 2021). Due to the Covid-19 crisis, the year 2020 marked record figures for the options market where more than 21 billion contracts were traded worldwide as reported by the Futures Industry Association (Fia, 2021). Due to this rise in popularity, I choose to study the informational content of both option volume measures and option implied volatility measures with regard to the cross- section of future returns.

Chan et al. (2002) argue that in a perfect market options should not contain any new information about the underlying securities from which they derive their prices. Nevertheless, the authors add that markets are not perfect in reality and discrepancies between the two markets exist which leaves room for exploitable trading opportunities by informed investors. First, Cremers and Weinbaum (2010) show that deviations from the well-known put-call parity which ties option prices to stock prices occur and they can be represented by the spread between the implied volatilities of call and put options. Therefore, the implied volatility spread has been proved by past literature to positively predict future returns (Bali & Hovakimian, 2009; Cremers & Weinbaum, 2010). Second, measures of option volume can also be indicative of future returns if the choice between call and put options matches the positive or negative quality of information about the underlying securities (Pan & Poteshman, 2006). Once again, previous literature found that option

volume measures were capable of negatively predicting returns (Johnson & So, 2012; Pan &

Poteshman, 2006).

However, in spite of the recent growing importance of ETFs in the passive investment market (Ben-David et al., 2017), literature around ETF return prediction remains underdeveloped with few noticeable studies. An exception is the work of Brown et al. (2021) which evaluates the predictive ability of ETF flows on next month’s ETF returns, but their paper is not related to options. Other studies related to option information and future returns only focus on the S&P 500 (Atilgan et al., 2015; Bali & Hovakimian, 2009; Chen & Liu, 2020; Han & Li, 2021) or of a small number of indices (Pan & Poteshman, 2006). Noticeable is that all the papers using option information for prediction base their findings on one or very few ETFs and therefore, conclusions at the ETF market level are not externally valid. As a result, the intersection of ETF return prediction literature and prediction using the information in option volume and implied volatility measures leaves an untapped academic research opportunity. Through this research paper I intend to fill that gap and answer the following research question:

**Can returns of ETFs be predicted using the information in option implied volatilities and ****option trading volumes of ETFs? **

My main hypothesis is that measures of option volume of ETFs can significantly and negatively predict future ETF returns and that measures of option implied volatility of ETFs can significantly and positively predict ETF returns.

**1.2 Brief Methodology and Results **

In my analysis I will look at the U.S. ETF universe and use those ETFs with options available. For every single ETF, I will examine the daily return and option data in the period January 2013 to December 2020. The main independent variables that I employ are the put -call ratio for option volume and implied volatility spread for option implied volatility. Additionally, I will use two methods in testing my hypothesis. The first one relates to a portfolio sorting analysis in which I will create various trading strategies based on portfolios of ETFs sorted on the two measures that I employ. Here I will create single and double sorts. The second one consists of various Fama-MacBeth (1973) cross-sectional regressions of future ETF returns on the two

independent variables accompanied by various control variables. Both methods use ETF returns over 1 day, as well as farther horizons such as at 2, 5, or 10 days in the future.

The results of the main analysis indicate that the put-call ratio is not a significant predictor of future ETF returns in any of the methods employed. Conversely, the implied volatility spread has the potential of strongly and positively predicting ETF returns over 1 day in the future. This predictability is mostly based on the Fama-MacBeth (1973) regression results proving that the measure can become significant in the presence of other control variables.

Alternatively, I employ several robustness checks to inspect my results from different perspectives. For the first one, I utilize an alternative way of double sorting which is based on a continuous sorting process. The results are similar to the main analysis but more internally valid.

The following two robustness checks use the options-to-shares ratio and spread in implied volatility innovations as alternative measures for option volume and respectively, implied volatility. The first one is found to negatively predict ETF returns, in contrast with the main analysis, while the latter is having similar results to the main analysis. Lastly, I use another expiration of 91 days for the implied volatility spread. The results confirmed the supposition created in the main analysis that the implied volatility measure can become a negative significant predictor at longer horizons of returns.

**1.3 Contribution to the Literature **

My research is extending the analysis of the prediction potential of option volume measures in the case of ETF returns. Here, I use a version of Pan and Poteshman’s (2006) put-call ratio which uses unsigned option volume data. According to the initial findings of the authors, the put - call ratio with signed option volume can negatively predict future returns. Additionally, my research also adds to the strand of literature which investigates the options-to-shares volume ratio and finds that it can also negatively predict stock returns (Ge et al., 2016; Han et al., 2017; Johnson

& So, 2012).

This research is closely related to the work of Cremers and Weinbaum (2010) which analyzes the deviations from the put-call parity using the implied volatility spread. They further show that these deviations closely reflected by the implied volatility spread are indicative of future returns, a fact which is supported by similar studies in various settings (Atilgan et al., 2015; Bali

& Hovakimian, 2009; Han & Li, 2021; Jones et al., 2018; Lin et al., 2013). Therefore, my study

comes as a completion for this literature as it investigates whether the predictive potential of the implied volatility spread is present in the case of ETF returns as well. In addition, I also complement the research of An et al. (2014) by testing their spread in implied volatility innovations in the setting described above.

At the ETF level, this research augments the literature around ETF return prediction represented by Brown et al. (2021), although they use ETF flows as predictors. Likewise, I strengthen the dispersed literature that utilizes option measures as predictors for the returns of few and very popular ETFs or indices (Atilgan et al., 2015; Bali & Hovakimian, 2009; Han & Li, 2021;

Pan & Poteshman, 2006).

Using the ETF market instead of the stock market comes with an advantage. A large body of literature (Amin & Lee, 1997; Atilgan et al., 2015; Cao et al., 2005; Chan et al., 2015; Han &

Li, 2021; Jin et al., 2012; Jones et al., 2018; Lei et al., 2020; Lin et al., 2013) states that the options market becomes more important and the return predictors are more relevant during informationally rich periods such as corporate events and announcements. Although convincing, this fact is bypassed in this study since my sample includes equity ETFs that contain hundreds of stocks, all with different event dates, or non-equity ETFs altogether. Therefore, I argue that taking into account these events would not be feasible in this case since their impact would be diversified away. In consequence, these events should not play a significant role in the results of this paper.

Finally, this research contributes to the debate about the leading market which is used to impound information into prices. One side of the debate defends the fact that the stock market leads the options market as it is supposed to be (Chan et al., 1993; Stephan & Whaley, 1990), while the other side advocates for the informational lead of the options market (Anthony, 1988;

Chakravarty et al., 2004; Cremers & Weinbaum, 2010; Easley et al., 1998; Jennings & Starks, 1986; Jin et al., 2012; Johnson & So, 2012; Manaster & Rendleman, 1982; Pan & Poteshman, 2006). Based on the results, I will try to pick a side for my research in this empirical argument in the finance literature.

The remainder of the paper is organized as follows. Section 2 presents the findings of previous literature regarding ETFs, option volume and option implied volatility. Section 3 lays out the construction of the main variables and control variables, as well as the empirical methodology that I am going to use. Section 4 presents the data and descriptive statistics. Section 5 documents

the results of the main analysis and the four robustness checks. Last, section 6 ends the paper by concluding, acknowledging limitations and coming up with suggestions for further research.

**2. Literature Review **

In this section I present the findings of the relevant research surrounding ETFs, option volume and option implied volatility. I begin with a brief description of the main entities that I investigate, the exchange-traded funds, as well as listing some advantages of options. Then I present the leading market debate, followed by findings of the previous literature related to the return predictability potential of option volume and implied volatility measures. I close the section by formulating my hypotheses for this paper.

**2.1 Understanding ETFs **

Since around 1995, exchange-traded funds, or ETFs, have gained in popularity among investors looking for passive investment opportunities due to their low transaction costs and high intraday liquidity (Ben-David et al., 2017). As a whole, the entire ETF market has been growing rapidly and, at the U.S. level, reaching by 2017 more than 10% of the market capitalization of the stock exchanges, almost a third of the aggregated U.S. trading volume per day and around 20% of the aggregated short interest as shown in a peer review by Ben-David et al. (2017). According to them, ETFs have gained a solid market share in the passive industry which was previously dominated by closed-end funds, index funds and index mutual funds.

First and foremost, before diving into the relation between ETFs and options, one needs to give a proper formal description of how ETFs work. Ben-David et al. (2017) provide a comprehensive analysis of the literature around ETFs and their mechanics. Therefore, at the heart of ETF trading relies the mission to replicate the trading of an index. Ben-David et al. (2017) distinguish between two types of replications done by ETFs. On one hand, the physical replication entails holding all or a majority of securities of a benchmark index in the bundled form of a basket with weighting schemes similar to those of the index. On the other hand, synthetic replication involves tracking the performance of an index through derivative contracts. ETFs can use one or both of these methods. As a result, ETFs issue shares that aim to mimic the underlying securities and these shares can be longed or shorted on the exchanges. Furthermore, arbitrage opportunities

can arise when the ETF market price differs from the net asset value of the underlying securities basket. This can happen due to non-synchronous trading of the ETFs and their underlying securities, and becomes profitable once the divergence is higher than the transaction costs, according to Ben-David et al. (2017). Therefore, a key aspect in the functioning of an ETF is holding its price aligned with the index or the basket of underlying securities that it tracks. This is done by arbitrage activity. Here again, Ben-David et al. (2017) identify two categories. On the primary market, designated intermediaries named authorized participants create or redeem shares of an ETF in order to correct the mispricing between the ETF and its underlying assets. More specifically, these activities represent the classical arbitrage process where one would buy the cheap security and sell the expensive one. On the secondary market, market makers and high- frequency funds deplete the arbitrage opportunity through price pressure stemming from long and short positions.

Second, ETF trading offers several advantages to the market. Most noticeably, Ben-David et al. (2017) argue that ETFs bring additional liquidity to the market, in surplus to the liquidity of the underlying assets. Hence, the trading of ETFs reveals more information, increasing the price discovery process (Glosten et al., 2021). However, the authors also reveal that noisy trading at the ETF level could result in mispricing due to, for example, behavioral reasons such as under- or overreaction. Moreover, recent research has shown that this enhanced liquidity can also come in the form of ETFs being a way to bypass short-sale constraints and costs in the underlying assets (Li & Zhu, 2016) or even market-wide short-sale bans (Karmaziene & Sokolovski, 2022). This should not be surprising since ETFs lead the underlying assets during illiquidity crises (Marshall et al., 2015). Several studies (Lettau & Madhavan, 2018; Madhavan & Sobczyk, 2016) report that ETFs reinforce the efficiency of the market since they provide investors with a practical solution for trading on directional views, thus yielding new information that is not yet integrated into the underlying assets’ prices. Equally important, Wermers and Xue (2015) show that informed traders prefer using ETFs for trading on their convictions and this becomes evident at times when ETFs lead the underlying assets. This might provide an additional explanation for the informed investors’ story in the options market presented later in this paper.

**2.2 Advantages of Options **

Why do investors use the options market? The short answer relies on the fact that options diversify the way one can trade based on their conviction. From a firm’s perspective, having traded options available enables information to be impounded faster into prices (Amin & Lee, 1997;

Jennings & Starks, 1986). Likewise, Amin and Lee (1997) notice that in the options market private information influences the stock prices both during earnings announcement periods as well as in normal periods. Finally, Mayhew and Mihov (2004) observe that options are listed for securities with high market capitalization, high trading volume and high volatility which is strongly positively correlated with the value of an option.

For traders, however, the advantages of trading options are numerous. First of all, it is unanimously agreed in the options literature that the embedded leverage is the most attractive feature of options (An et al., 2014; Bhattacharya, 1987; Black, 1975; Cremers & Weinbaum 2010;

Diamond & Verrecchia, 1987; Ge et al., 2016; Manaster & Rendleman, 1982). This embedded leverage in the options market is more accessible than trading in the stock market where high margin requirements make it harder to obtain leverage (Manaster & Rendleman, 1982). As a result, an investor can potentially profit more from their convictions as compared to the stock market, which makes the options market an appealing trading venue for informed investors (Bhattacharya, 1987; Black, 1975; Manaster & Rendleman, 1982). Secondly, options may offer a lower-cost solution compared to the stock market when brokers’ fees are lower, liquidity is higher and when options are under- or overpriced (Amin & Lee, 1997; Bhattacharya, 1987; Manaster & Rendleman, 1982). Thirdly, the limited loss potential of buying options creates an embedded hedge, although it comes at a premium. Lastly, options make a viable alternative for short-selling when constraints such as an up-tick rule limit investors’ ability to profit from the bear states in the traditional stock market (Bhattacharya, 1987; Black, 1975; Diamond & Verrecchia, 1987; Manaster & Rendleman, 1982). However, Bhattacharya (1987) and Ge et al. (2016) argued that many institutional investors are not mandated to use options which reduces the price discovery process and deem the stock market more crowded than the options market. Nevertheless, the trading vehicle that brings the highest liquidity and least frictions will be the dominant one in determining the value of the securities (Manaster & Rendleman, 1982), which deems the options market as a potential contender for the more information-filled market. This brings me to the debate of the leading

informational market in which I will try to ascertain what side my study takes based on the evidence in the results section.

**2.3 The Leading Market Debate **

Since options derive their prices from the underlying asset prices, it is intuitive that in a perfect market the causality should move from the stock market to the options market as depicted by Easley et al. (1998). However, in reality the market is not perfect due to information asymmetry, leaving room for the options market to become an endogenous component in relation to the stock market. As a consequence, since the 1980s there has been an ongoing academic debate on establishing which market has a leading role in determining the stock prices.

On one side lies the strand of literature represented by Stephan and Whaley (1990) and Chan et al. (1993) which supports the principle that the stock market leads the options market at the price and volume levels. Notwithstanding, Chan et al. (1993) prove that this lead fades away when the mid-point of the bid-ask prices is used in place of the transaction prices.

Nevertheless, the other side of the debate, where options market leads the stock market, finds much more factual support (Anthony, 1988; Manaster & Rendleman, 1982). Additionally, the lead of the options market has been shown to last several days (Cremers & Weinbaum, 2010).

A vast number of papers prove that the options market leads the stock market due to being a preferred venue for trading by informed investors (Chakravarty et al., 2004; Easley et al., 1998;

Jennings & Starks, 1986; Jin et al., 2012; Johnson & So, 2012; Pan & Poteshman, 2006). Some studies argue that this preference originates from the embedded leverage of the options (Jennings

& Starks, 1986; Jin et al., 2012; Johnson & So, 2012) while others consider that informed investors can access other sources of leverage and their only intention is to disguise their trades (Amin &

Lee, 1997; Muravyev et al., 2013). Related to my study, the options market leads the stock market also by the fact that implied volatility displays predictive ability for stock returns (Cremers &

Weinbaum, 2010; Lin et al., 2013). At the ETF level, Dong et al. (2021) find evidence of this lead when analyzing the Chinese CSI 300 ETF and its options. Bhattacharya (1987) validates the lead of the options market, but states that it becomes infeasible after transaction costs. Ultimately, Chakravarty et al. (2004) take a nonaligned position in this debate showing that leading information is born in both markets all the same. To this end, my convictions are that the truth lies somewhere in the middle. Therefore, I intend through this study to take a neutral stance in this

debate and challenge the two strands of literature by analyzing whether the options market does have a lead in the information contained by the option volume and implied volatility upon the ETF return.

**2.4 Option Volume **

In the context of options, I come down to presenting the empirical evidence found so far in the literature around option volume. I begin by emphasizing that under the weak form of the efficient market hypothesis (Fama, 1970) technical analysis of the volume adds no value as prices already reflect historical information. However, Pedersen (2015) offers a different picture of the market, calling it “efficiently inefficient” (p. 25): inefficient enough so that there are active trading opportunities and efficient enough so that these opportunities are exhaustive. In this setting, I claim that volume can convey information about future returns. This view is supported by Blume et al.

(1994) who explain that the lag with which prices adjust to new information leaves enough space for technical analysts to profit from investigating the volume. They further show that volume as a measure has descriptive content about the quality of the information in the market. Stock trading volume has been shown by past research to strongly predict future returns, although the findings are mixed about the sign of the causality (Blume et al., 1994; Brennan et al., 1998; Lee &

Swaminathan, 2000; Lo & Wang, 2006). Owing to these arguments about traditional trading volume, I want to inquire whether the market is inefficient enough to allow for option trading volume to reflect information about future ETF returns. Cao et al. (2005) bring supporting evidence that stock volume has no informational content while the call option volume is positively related to future returns prior to a takeover event.

The literature around option volume comes with mixed findings on its potential for predicting returns. Using high-frequency data which was informationally categorized, Easley et al.

(1998) found that the signed option volume can predict future stock prices and this phenomenon is due to stock market hedging with options. Above all, they document that stock returns are able to positively (negatively) predict call (put) option volume and not the other way around. In the same way, Chan et al. (2002) obtain puzzling results: using high-frequency data, they discover that net trade volume for stocks or options does predict stock prices, albeit the prediction sign contradicts the expectations of puts and calls. However, when they change the time interval from 5 minutes down to 100 seconds, the predictive ability of option net trade volume is no longer

present. As a result, the relation of option trading volume with regard to the price of the underlying security is not clear at the intraday level.

At the daily level, Pan and Poteshman (2006) develop a unique measure for option volume
which they name the put-call ratio. It represents the ratio between the put volume and the total
option volume. As this refers to one of the two main independent variables of this study, greater
attention will be given to it. The main result of Pan and Poteshman (2006) indicates that a portfolio
sorting strategy where one goes long on stocks with low put-call ratio and goes short on stocks
with high put-call ratio generates a risk-adjusted return of 40 basis points for the following day,
*with a t-statistic of approximately 29. From an informational perspective, their results illustrate *
that informed investors choose to buy calls when there is good news and buy puts when there is
bad news about the stock. In addition, the average put-call ratio in their sample is 0.3 indicating
that, on average, the call volume is higher than the put volume. This implies that informed investors
prefer the options market for trading more on positive news rather than on negative news. The put-
call ratio remained a robust measure also when conducting the analysis using a cross-sectional
regression approach. Furthermore, the authors reveal that the informational content of the put-call
ratio lasted for three weeks, decreasing over time. This progressive fading of predictability
suggests that the information in the put-call ratio is impounded into the stock price over this time.

A keynote in Pan and Poteshman’s discussion is that their result is not linked to a market inefficiency, but rather to a sign of private information that is traded in the options market as the coefficients do not change their signs, but rather decrease in size over time.

Nevertheless, the put-call ratio may be a good future returns predictor solely based on signed options data as Pan and Poteshman (2006) used in their sample. They disclose that their options data is unique due to classification based on four types of trades and four types of investors, resulting in 16 categorical groups. Nonetheless, I will keep calling the main option volume variable the put-call ratio since it has the same structural form as the one of Pan and Poteshman (2006), although their augmentations relating to signed option volume and different moneyness fall outside the scope of my study. As a result, the put-call ratio that I employ is a non-directional and unsigned option volume measure and its results will be prudently interpreted with the ones of Pan and Poteshman (2006) since their measure resembles this one the most.

A rather new option volume measure is the options-to-shares ratio, or O/S ratio coined in the study of Roll et al. (2010) where they investigate its relation with returns around earnings

announcements. Moreover, the O/S ratio has been shown to negatively predict stock returns, thereby pointing out that a high O/S is a good indicator of the arrival of negative news (Ge et al., 2016; Johnson & So, 2012). This is highlighted by the long-short portfolio analysis of Johnson and So (2012) which sorts returns into deciles based on the O/S ratio and earns a 19.3% return per annum. Similar findings are reported by Han et al. (2017), although for stocks with sloped volatility smiles which they interpret as an indication of directional trading. Lastly, the negative return predictive ability of the O/S ratio is also validated by the means of Fama -MacBeth (1973) cross- sectional regressions (Ge et al., 2016; Han et al., 2017; Johnson and So, 2012).

**2.5 Implied Volatility **

My objective in this paper is to investigate the return predictability of information stemming from the options market. I aim to use a different independent variable than the option price since option prices are caused by stock prices through the Black and Scholes (1973) formula.

Therefore, I will use the implied volatility since it is truly a unique measure and does not suffer from the reversed causality in stock prices. As Bali and Hovakimian (2009) show that the implied volatility alone cannot predict future returns, other variations of this measure will be employed.

Implied volatility spread, or IV spread, has been accepted in past literature as being a valid predictor of future returns. Most notably, Cremers and Weinbaum (2010) prove that deviations from put-call parity are equivalent to deviations of the IV spread. In this light, they show that stocks with high IV spread outperform stocks with low IV spread by 50 basis points in risk- adjusted returns on a weekly basis. Moreover, this result stems from buying high IV spread stocks which earn positive and significant subsequent returns, and selling short low IV spread stocks which earn negative and significant future returns. Other studies have found as well that the IV spread positively predicts future returns in stock prices at daily, weekly (Atilgan et al., 2015; Lin et al., 2013), monthly (Bali & Hovakimian, 2009) and quarterly horizons (Jones et al., 2018). All of these studies show that results are robust to cross-sectional regression analyses using controls such as size, illiquidity and bid-ask spread. A surprising result is that the IV spread significance in the regression analysis increases when checked using controls (Jones et al., 2018).

As far as concerning the predictability potential of IV spread over time, Cremers and Weinbaum (2010) discover that it dies out over a couple of days when the information will be completely impounded into the stock price. In contradiction, Jones et al. (2018) show that this

predictability lasts from one to six months. According to Jin et al. (2012), the IV spread has the potential to reflect the direction of the buying pressure in the options market which is a result of the accumulated information of all investors.

At the same time, Cremers and Weinbaum (2010) inform that trading based on IV spread has become more challenging over time which supports the claim of McLean and Pontiff (2016) regarding the fact that stock market irregularities might fade in time due to being learned by t he traders. Nevertheless, the return predictability of the IV spread is still positive and significant as shown in a recent study (Han & Li, 2021). In sum, all these arguments around the IV spread further strengthen the idea that the options market leads the stock market (Atilgan et al., 2015; Lin et al., 2013) and that the IV spread can be an indicator of informed investors preferring the options market to trade (Cremers & Weinbaum, 2010; Han & Li, 2021).

Other measures of implied volatility are the implied volatility innovations of An et al.

(2014). They represent the change in implied volatilities for puts and calls separately. The authors find that securities with options that have had positive changes in call (put) implied volatilities will have high (low) returns over the next month. Their analysis is based on sorting stock returns into decile portfolios based on the call and/or put implied volatility innovations. Moreover, the predictability of returns using the implied volatility innovations is robust to the Fama-MacBeth (1973) regression using controls and has a decay timespan of more than three months according to An et al. (2014).

An et al. (2014) employ another variable in the form of the difference between the two implied volatility innovations to capture their joint effect. I will call it the spread in implied volatility innovations. By this, An et al. (2014) reasoned that under positive news investors buy call options while restraining from buying put options, thereby increasing the call implied innovation and decreasing the put implied innovation. Taking the difference between the call and put innovations leads to a higher spread in implied volatility innovations which signals positive future returns, according to the authors. Conversely, the rationale is reversed in the case of negative news. Therefore, the spread in implied volatility innovations positively predicts future returns (An et al., 2014), like the IV spread.

**2.6 Option Volume and Implied Volatility Together **

Option volume and implied volatility have very few instances in the literature where they are studied together. For example, An et al. (2014) observe that the predictability of the volatility innovations on stock returns is higher for stocks with high trading volume and option trading volumes. Another instance is provided by Cremers and Weinbaum (2010) who document a significant negative relation between the main independent variables of my study, namely, the IV spread and the put-call ratio. This is a logical result since good news would imply, firstly, that the volume of calls rises leading to a decrease in the put-call ratio and, secondly, that the call implied volatility rises making the IV spread to rise as well. For bad news, the volume in puts rises and the put-call ratio rises as well, while the implied volatility in puts rises leading to a decrease in the IV spread. However, Cremers and Weinbaum (2010) do not associate both of these measures with stock returns. As stated above, they note as well that the signed information in the put-call ratio used by Pan and Poteshman (2006) is not public, compared to the volatility spread which is publicly available. To this end, Zhou (2022) uses the publicly available O/S ratio as an option volume measure and proves that it is a significant negative cross-sectional predictor for future returns while controlling for the implied volatility innovations of An et al. (2014). This shows that publicly available option volume and implied volatility can be effective together in predicting future returns. Nonetheless, Zhou (2022) stated that funds such as ETFs have been excluded from their sample, and therefore this leaves an unfulfilled gap in the literature that I will try to fill with this study. Following up, I bring the ETFs in the context of return prediction using option information and I provide the relevant literature on these topics.

**2.7 ETFs and Options **

Using the S&P 100 index options, Amin et al. (2004) found that stock returns are a significant predictor of option prices. For the same index, Canina and Figlewski (1993) found that implied volatility does not have any predictive content for the realized volatility. In a study of non- fundamental demand shocks, Brown et al. (2021) discovered that ETF flows negatively predict the next-month returns of ETFs with available options. On the other hand, the IV spread of the S&P 500 options has been shown to be a good indicator of jump risk which has a positive relation with the index returns (Bali & Hovakimian, 2009). Also using the S&P 500 index options, Atilgan et

al. (2015) concluded that depending on the four reversed IV spread measures that they employ, an increase of 1% in a measure leads to a decrease in S&P 500 excess returns between 2.8% and 7.4%

per year, thereby indicating a negative predictive relation of the reversed IV spreads and future index returns. This is not surprising since their IV spreads are reversed compared to my IV spread, meaning that they measure the differences between put and call implied volatilities instead of the opposite. This indicates that the return predictive ability should be positive in terms of my IV spread. However, in contradiction stand Han and Li (2021) who found that the IV spread of the S&P 500 does not significantly predict future returns. A recent study also on the S&P 500 index has shown that implied volatility has more return predictive potential when analyzed for the ask prices instead of the bid prices (Chen & Liu, 2020).

Regarding option volume, Pan and Poteshman (2006) document that the put-call ratio is not a significant return predictor when analyzed in the options market of indices for four separate investor groups.

In sum, all these studies indicate that limited progress has been made in the scope of predictability of ETF returns using option information and that only a few very large ETFs have been considered. Therefore, this calls for a greater sample of various types of ETFs to be studied in order to give a diligent conclusion about a viable predictive analysis in this field.

**2.8 Hypotheses Formulation **

Lastly, the intersection of these three topics consisting of ETF returns, option volume and implied volatility remains an unexplored frontier in the literature around return prediction.

Therefore, I want through this study to shed light upon the predictive content of the ETF option trading volumes and ETF option implied volatility on the ETF returns. Firstly, in the case of option volume, positive news increases the call option volume and decreases the put option volume in anticipation that the market is going to be on a “bullish” trend and that the ETF returns are going to rise subsequently (Chan et al., 2002; Pan & Poteshman, 2006). Conversely, when there is negative news, the call option volume decreases while the put option volume increases, in anticipation of a “bearish” market which will cause the ETF returns to fall (Chan et al., 2002; Pan

& Poteshman, 2006). This is in line with the negative relation between the independent variable for option volume, the put-call ratio, and future returns (Pan & Poteshman, 2006). As I use the O/S ratio for my robustness check on option volume, the rationale is somewhat similar. Therefore, the

O/S ratio increases (decreases) when the option volume increases (decreases) and the trading volume decreases (increases) with the arrival of bad (good) news (Ge et al., 2016; Han et al., 2017;

Johnson & So, 2012). Once again, previous literature indicates that option volume in the for m of the O/S ratio negatively predicts future returns (Ge et al., 2016; Han et al., 2017; Johnson & So, 2012). Thereby I state my first set of hypotheses regarding the option volume measures:

*H*0a: The option trading volume measures do not predict ETF returns.

*H*1a: The option trading volume measures negatively predict ETF returns.

Secondly, for the implied volatility, positive news indicates a “bullish” market in the future, a fact that motivates traders to buy call options on ETFs (An et al., 2014; Bali & Hovakimian, 2009). This makes the calls overpriced and increases the call implied volatility along with the IV spread which dictates a higher ETF return the next day (An et al., 2014; Atilgan et al., 2015; Bali

& Hovakimian, 2009; Cremers & Weinbaum, 2010; Jones et al., 2018; Lin et al., 2013). At the same time, if negative news arrives, then this is indicative of a subsequent “bearish” market and, as a result, traders go and buy put options on the ETFs (An et al., 2014; Bali & Hovakimian, 2009).

This makes the puts more expensive and therefore increases the put implied volatility and decreases the IV spread which will generate low ETF returns the next day (An et al., 2014; Atilgan et al., 2015; Bali & Hovakimian, 2009; Cremers & Weinbaum, 2010; Jones et al., 2018; Lin et al., 2013). Both situations require that the investors are possessing a form of private information and that the market is slow to react (Cremers & Weinbaum, 2010). Correspondingly, both implied volatility variables of this study, the IV spread which is the main analysis variable and the spread of call and put volatility innovations which is the robustness variable, positively predict future returns (An et al., 2014; Atilgan et al., 2015; Bali & Hovakimian, 2009; Cremers & Weinbaum, 2010; Jones et al., 2018; Lin et al., 2013). As a consequence, my second set of hypotheses relates to the implied volatility:

*H*0b: The option implied volatility measures do not predict ETF returns.

*H*1b: The option implied volatility measures positively predict ETF returns.

**3. Methodology **

This section presents the construction of the independent variables used for the main analysis and the robustness checks. Similarly, I document the construction of the control variables that I employ. Next, I lay down the methodological approach for the main analysis and for the alternative checks that I will use.

**3.1 Independent Variables Construction **
**3.1.1 Put-Call Ratio **

Firstly, as already stated, I will use the put-call ratio of Pan and Poteshman (2006) as the
independent variable representing the option volume in my main analysis. The construction of the
put-call ratio is straightforward: according to Pan and Poteshman (2006), let P/Ci,t be the put-call
of the i^{th} ETF in the sample at day t. Then:

𝑃/𝐶_{𝑖,𝑡} = 𝑃_{𝑖,𝑡}

𝑃_{𝑖,𝑡}+ 𝐶_{𝑖,𝑡}, (1)

where, Pi,t and Ci,t represent the number of option contracts for puts and calls traded by the market participants for ETF i at day t. Pan and Poteshman (2006) explain that when positive news arrives at day t for ETF i, investors will acquire calls, therefore increasing the volume of calls, Ci,t, while also decreasing the volume for puts, Pi,t, resulting in an overall decrease of the put-call ratio described above. Conversely, when negative news arrives at day t for ETF i, investors will acquire puts, thus increasing the volume for puts, Pi,t, and decreasing the volume for calls, Ci,t, resulting in an overall increase of the put-call ratio. According to Pan and Poteshman (2006), the put-call ratio is a dynamic measure as it has the potential to greatly change values from one day to another based on the information and convictions of the investors.

**3.1.2 Options-to-Shares Ratio **

Secondly, I will use the options-to-shares ratio, or O/S ratio, as the independent variable for option volume in one of the robustness checks in order to confirm that the results that I find for the put-call ratio are consistent for another measure of option volume. The reason that I do not use the O/S ratio as my main analysis volume variable is that it also includes total stock volume besides option volume as compared to the put-call ratio which consists only of the volumes of put and call options. Therefore, compared to the put-call ratio which shows the prevalence of positive or

negative news in the options market, the O/S ratio should indicate an increased options market activity when “bearish” news arrives (Johnson & So, 2012). However, both measures should contain information regarding the cross-section of returns (Johnson & So, 2012) namely, that they negatively predict future ETF returns which is one of the primary purposes of my study. Similar to the put-call ratio, the O/S ratio is constructed using publicly available information about options.

To this end, Johnson and So (2012) define the O/S ratio as the ratio between the total option volume, consisting of the sum of call and put volume, and the total trading volume of ETF i at day t. To correctly scale against the share volume, the ratio is multiplied by 100 to account for the fact that an option contract is intended for 100 ETF shares:

𝑂/𝑆_{𝑖,𝑡} =𝑃_{𝑖,𝑡}+ 𝐶_{𝑖,𝑡}

𝑆𝑉_{𝑖 ,𝑡} × 100, (2)

where Pi,t is the daily volume of puts, Ci,t is the daily volume of calls and SVi,t is the daily share trading volume of ETF i. Equivalently to the case of the put-call ratio, Roll et al. (2010) also describe the O/S ratio as a dynamic and volatile measure on a daily basis.

**3.1.3 IV Spread **

Thirdly, before defining the IV spread, one needs to have a holistic understanding of the mechanics behind it. In a perfect world where markets are simultaneously complete and competitive, options are redundant assets as new information gets incorporated instantaneously into prices since there are no transaction costs and no short-selling constraints (An et al., 2014; Jin et al., 2012). Thus, under these perfect arbitrage conditions, the simple put-call parity which depicts the relation between option prices, stock price and a risk -free security, should hold, thereupon creating an equilibrium between the stock market and the options market (Amin et al., 2004; Bhattacharya, 1987; Cremers & Weinbaum, 2010). However, in the real world such scenario is not applicable since these assumptions do not hold, therefore enabling the occurrence of violations in the put-call parity (Amin et al., 2004; Bhattacharya, 1987; Cremers & Weinbaum, 2010; Jin et al., 2012). Consequently, these represent theoretical arbitrage opportunities because they cannot be traded on if one considers frictions in the market such as early-exercise possibility of the American options, short-sale constraints, dividends, transaction costs, inequality between lending and borrowing rates, margins, taxes and others (Amin et al., 2004; Cremers & Weinbaum, 2010; Ofek & Richardson, 2003; Ofek et al., 2004).

In constructing the IV spread, I will utilize the intuitive methodological approaches of Cremers and Weinbaum (2010).

In perfect market conditions, Stoll (1969) affirms that the put-call parity for European options on stocks that are not paying dividends should hold:

𝐶 + PV(𝐾) = 𝑃 + 𝑆, (3)

where C is the call option price, P is the put option price, S is the ETF share price and PV(K) is the present value of the strike price K discounted at the risk -free rate. Reorganizing equation (3) gives:

𝐶 − 𝑃 = 𝑆 − PV(𝐾). (4)

According to the methodology of Cremers and Weinbaum (2010), the Black -Scholes (1973) formula should respect the put-call parity given any non-null positive value of the volatility, σ:

C^{BS}(𝜎) + 𝑃𝑉(𝐾) = P^{BS}(𝜎) + 𝑆, ∀𝜎 > 0, (5)

where the C^{BS}(σ) and P^{BS}(σ) represent the call and put option prices obtained through the Black-
Scholes (1973) formula (Cremers & Weinbaum, 2010). In this case, both of them are functions of
the volatility σ. Similar to Cremers and Weinbaum (2010), subtracting equation (4) from equation
(5) gives:

C^{BS}(𝜎) − 𝐶 = P^{BS}(𝜎) − 𝑃, ∀𝜎 > 0. (6)

Using the rationale of the implied volatility, the following definition of implied volatility for a call has to be true (Cremers & Weinbaum, 2010):

C^{BS}(𝐼𝑉^{𝑐𝑎𝑙𝑙}) = 𝐶. (7)

Conversely, I get the following equation for put implied volatility (Cremers & Weinbaum, 2010):

P^{BS}(𝐼𝑉^{𝑝𝑢𝑡}) = 𝑃. (8)

Bringing equalities (7) and (8) under equation (6) gives that:

𝐼𝑉^{𝑐𝑎𝑙𝑙} = 𝐼𝑉^{𝑝𝑢𝑡}. (9)

As a result of this, Cremers and Weinbaum (2010) state that disregarding whether option prices do not comply with the Black-Scholes (1973) formula, the put-call parity should be equivalent to the equality between the call and put implied volatilities derived using the Black - Scholes formula. Furthermore, they acknowledge the existence of the American options which allow for early exercise and deem that the put-call parity becomes an inequality in this case. Similar to past studies (Cremers & Weinbaum, 2010; Jin et al., 2012; Lin et al., 2013; Ofek et al., 2004), I incorporate the effect of the American options in my analysis. Therefore, as the premium for early exercise is assimilated given the log-normal distributional assumption, equation (9) does not represent a no-arbitrage condition anymore as indicated by Cremers and Weinbaum (2010). In consequence, they abstain from calling deviations from equation (9) as arbitrage opportunities, but rather divergences from the true values of the model. Nonetheless, these deviations still encompass information that can be used to predict future returns (Cremers & Weinbaum, 2010; Ofek et al., 2004).

The deviations of equation (9) lead to the logical conclusions that calls are overpriced compared puts when the implied volatility of calls is higher than the implied volatility of puts, and that puts are overpriced compared to calls when the implied volatility of puts is higher than the implied volatility of calls (Bali & Hovakimian 2009; Cremers & Weinbaum, 2010; Jones et al., 2018; Ofek et al., 2004). Previous literature regards the IV spread as an indicator of deviations from the put-call parity and defines it as the difference between the call implied volatility and put implied volatility with the same exercise price and time to expiration (Bali & Hovakimian, 2009;

Chen & Liu, 2020; Cremers & Weinbaum, 2010; Han & Li, 2021). However, compared to previous literature which uses the IV spread, my study only employs at-the-money options with 30 days to maturity for the main analysis and 91 days to maturity for the robustness analysis. Therefore, no averaging across the IV spread of every option of ETF i at day t is needed. Let IVSi,t be the difference of the call and put implied volatilities for ETF i at day t:

𝐼𝑉𝑆_{𝑖,𝑡} = 𝐼𝑉_{𝑖,𝑡}^{𝑐𝑎𝑙𝑙}− 𝐼𝑉_{𝑖,𝑡}^{𝑝𝑢𝑡}. (10)

**3.1.4 Spread in Implied Volatility Innovations **

Finally, as a robustness replacement variable for the IV spread I will use the spread in
implied volatility innovations of An et al. (2014). Similar to their study, I create two implied
volatility innovation variables to account for the changes in implied volatilities of ETF call and
put options separately. Let ΔIV^{call}i,t be the call implied volatility innovation and ΔIV^{put}i,t be the put
implied volatility innovation for ETF i at day t:

𝛥𝐼𝑉_{𝑖,𝑡}^{𝑐𝑎𝑙𝑙} = 𝐼𝑉_{𝑖,𝑡}^{𝑐𝑎𝑙𝑙} − 𝐼𝑉_{𝑖,𝑡−1}^{𝑐𝑎𝑙𝑙}, (11)

𝛥𝐼𝑉_{𝑖,𝑡}^{𝑝𝑢𝑡}= 𝐼𝑉_{𝑖,𝑡}^{𝑝𝑢𝑡}− 𝐼𝑉_{𝑖,𝑡−1}^{𝑝𝑢𝑡}. (12)

These innovations represent changes in implied volatilities from day t − 1 to day t, for both call and put options of ETF i. In the spirit of the IV spread, An et al. (2014) take the difference between the two types of innovations and find a significant predictor of stock returns alone.

Therefore, let CPVi,t be the spread of implied volatility innovations equal to the difference of the call implied volatility innovation and put implied volatility innovation for ETF i at day t:

𝐶𝑃𝑉_{𝑖,𝑡} = Δ𝐼𝑉_{𝑖,𝑡}^{𝑐𝑎𝑙𝑙}− Δ𝐼𝑉_{𝑖,𝑡}^{𝑝𝑢𝑡}. (13)

According to An et al. (2014), this measure is a raw way of analyzing the call implied volatility innovation while controlling for the put volatility innovation. However, when reinterpreting their findings, a portfolio that is long on ETFs with high CPV should generates a significantly higher next-day return than a portfolio that is short on ETFs with low CPV, thereby indicating a positive relation between CPV and future returns.

**3.2 Control variables **

In this study, I intend to augment the Fama-MacBeth (1973) regression analysis of the effect of option volume and implied volatility with control variables which are proven to be significant predictors of returns. Therefore, below I explain how they are constructed.

**3.2.1 Beta **

Similarly to An et al. (2014), I use the beta coefficient for each ETF as a control variable.

I employ this measure for every ETF in my sample from the Beta Suite of the Wharton Research

Database Services (Beta Suite, 2016). I retrieve the daily betas estimated over a monthly rolling window. According to the Beta Suite (2016), this coefficient is calculated in line with Scholes and Williams (1977) in order to consider non-synchronous trading activity in the market for each day:

𝑅_{𝑖,𝑡}− 𝑟𝑓_{𝑡} = 𝛼_{𝑖}+ 𝛽_{1,𝑖}(𝑅_{𝑚,𝑡−1}− 𝑟𝑓_{𝑡−1}) + 𝛽_{2,𝑖}(𝑅_{𝑚,𝑡}− 𝑟𝑓_{𝑡}) + 𝛽_{3,𝑖}(𝑅_{𝑚,𝑡+1}−
𝑟𝑓_{𝑡+1}) + 𝜀_{𝑖,𝑡},

(14)

where Ri,t represents the return of ETF i on day t, rft is the risk-free rate equal to the daily return of the 1-month Treasury Bill and Rm,t is the value-weighted return of the market on day t from CRSP (Beta Suite, 2016). Then, the market beta, Betai,t, for each ETF i at day t is calculated by summing up all the estimated beta coefficients (Beta Suite, 2016) from equation (14):

𝐵𝑒𝑡𝑎_{𝑖,𝑡} = 𝛽̂_{1,𝑖}+ 𝛽̂_{2,𝑖}+ 𝛽̂_{3,𝑖}. (15)

**3.2.2 Size **

ETF size consists of the natural logarithm of the market capitalization, MktCapi,t, of ETF i at day t, which is the closing ETF price, Si,t, multiplied by the number of shares outstanding of that day, SharesOuti,t, measured in millions (An et al., 2014; Bali & Hovakimian, 2009; Cremers &

Weinbaum, 2010; Ge et al., 2016; Zhou, 2022):

𝑆𝑖𝑧𝑒_{𝑖,𝑡} = ln(𝑀𝑘𝑡𝐶𝑎𝑝_{𝑖,𝑡}) = ln(𝑆ℎ𝑎𝑟𝑒𝑠𝑂𝑢𝑡_{𝑖,𝑡}× 𝑆_{𝑖,𝑡}). (16)

**3.2.3 Illiquidity **

The illiquidity measure is constructed based on the work of Amihud (2002) and it is equal to the absolute return, |Ri,t| divided by the dollar trading volume of ETF i at day t. The dollar trading volume is further equal to the closing price, Si,t, multiplied by the share volume, SVi,t, of ETF i at day t. The illiquidity measure is multiplied by 10 ^ 9 for a better presentation (Ge et al., 2016):

𝐼𝑙𝑙𝑖𝑞_{𝑖,𝑡} = |𝑅_{𝑖,𝑡}|

𝐷𝑜𝑙𝑙𝑎𝑟 𝑡𝑟𝑎𝑑𝑖𝑛𝑔 𝑣𝑜𝑙𝑢𝑚𝑒_{𝑖,𝑡}× 10^{9} = |𝑅_{𝑖,𝑡}|

𝑆_{𝑖,𝑡}× 𝑆𝑉_{𝑖,𝑡}× 10^{9}. (17)
**3.2.4 Short-term Reversal **

Let Revi,t be the short-term reversal equal to the return from day t – 1 to day t of ETF i (Jegadeesh, 1990):

𝑅𝑒𝑣_{𝑖,𝑡} = 𝑅_{𝑖,𝑡}. (18)

**3.2.5 Momentum **

Since Jegadeesh and Titman (1993) showed that momentum has been a strong return predictor due to the under-reaction and delayed over-reaction of the investors to news, I will employ it in my study as well. Let Momi,t be the momentum of ETF i at day t equaling to the return over the past trading week, from day t − 6 to t − 1, leaving 1 day to account for the short-term reversal:

𝑀𝑜𝑚_{𝑖,𝑡}= 𝑃_{𝑖,𝑡−1}− 𝑃_{𝑖,𝑡−6}

𝑃_{𝑖,𝑡−6} . (19)

**3.2.6 Realized Volatility **

Let Rvoli,t be the rolling realized volatility of ETF i at day t, equal to the standard deviation of daily ETF returns estimated over a past rolling period of 1 month (An et al., 2014; Bali &

Hovakimian, 2009):

𝑅𝑣𝑜𝑙_{𝑖,𝑡}= √var(𝑅_{𝑖,𝑡−30:𝑡}). (20)

**3.2.7 Realized-Implied Volatility Spread **

An alternative for the realized volatility will be the realized-implied volatility spread of Bali and Hovakimian (2009) which can negatively predict future returns. Therefore, let the average implied volatility, Ivoli,t, be the mean between call and put implied volatilities of ETF i on day t (Bali & Hovakimian, 2009):

𝐼𝑣𝑜𝑙_{𝑖,𝑡} =𝐼𝑉_{𝑖,𝑡}^{𝑐𝑎𝑙𝑙}+ 𝐼𝑉_{𝑖,𝑡}^{𝑝𝑢𝑡}

2 . (21)

Then, as described by Bali and Hovakimian (2009), the realized-implied volatility spread, called RIvoli,t, is simply the difference between the realized volatility and the average implied volatility for ETF i at day t:

𝑅𝐼𝑣𝑜𝑙_{𝑖,𝑡} = 𝑅𝑣𝑜𝑙_{𝑖,𝑡}− 𝐼𝑣𝑜𝑙_{𝑖,𝑡}. (22)

**3.2.8 Call/Put Open Interest **

Similar to An et al. (2014), I employ the call/put open interest ratio as a control for option volume. An et al. (2014) find it insignificant when used in a cross-sectional regression for predicting future returns. Intuitively, it is formed by dividing the call open interest by the put open interest for an ETF i at day t:

𝐶/𝑃 𝑂𝐼_{𝑖,𝑡} =𝐶𝑎𝑙𝑙 𝑜𝑝𝑒𝑛 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡_{𝑖,𝑡}

𝑃𝑢𝑡 𝑜𝑝𝑒𝑛 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡_{𝑖,𝑡}. (23)

**3.2.9 Bid-Ask Spread **

Finally, several studies use the relative bid-ask spread as a liquidity control (Bali &

Hovakimian, 2009; Pan & Poteshman, 2006; Zhou, 2022). Let BAi,t be the relative bid-ask spread which is equal to the difference between the ask and bid prices divided by the mid-point between the ask and bid prices of an ETF i at day t:

𝐵𝐴_{𝑖,𝑡} = 𝐴𝑠𝑘 𝑝𝑟𝑖𝑐𝑒_{𝑖,𝑡}− 𝐵𝑖𝑑 𝑝𝑟𝑖𝑐𝑒_{𝑖,𝑡}

(𝐴𝑠𝑘 𝑝𝑟𝑖𝑐𝑒_{𝑖 ,𝑡}+ 𝐵𝑖𝑑 𝑝𝑟𝑖𝑐𝑒_{𝑖,𝑡})/2 . (24)
Last, it is important to mention that the independent variables as well as the controls will
be lagged to the time horizon of the ETF returns which I investigate. In the next subsection I lay
down my main analysis process and walk through its various steps.

**3.3 Main Analysis Methodology **

My main analysis consists of several components in order to build a comprehensive view about the predictive ability of option volume and implied volatility of an ETF for its future return.

In the first half of the main analysis I will perform a quintile portfolio sorting analysis in which I will investigate the returns of theoretical trading strategies for predicting returns using the two independent variables separately or jointly, as several return prediction papers before me did (An et al., 2014; Bali & Hovakimian, 2009; Brown et al., 2021; Cremers & Weinbaum, 2010; Han et al., 2017; Pan & Poteshman, 2006). In the second half I will analyze the predictive power of the two independent variables using the cross-sectional Fama-MacBeth (1973) regression, again, as performed in the past return-prediction literature (An et al., 2014; Bali & Hovakimian, 2009; Ge et al., 2016; Han et al., 2017; Zhou, 2022). Considering the metatheorem of Pedersen (2015), the quintile portfolio sorts and the Fama-MacBeth (1973) should come to similar results about the

predictive ability of the two independent variables on ETF returns. If, however, the results of the two methods do not coincide, this is indicative that the addition of the control variables can make a difference in the prediction process. Also, in both sections of the main analysis I will perform tests for checking the predictability potential of the independent variables over time. Then, four robustness checks will be performed to validate the findings from the main analysis and draw a knowledgeable conclusion regarding the results and economic significance of this research endeavor.

First of all, I will perform a single, or univariate, quintile portfolio sorting analysis for the put-call ratio and the IV spread separately. According to the past literature (An et al., 2014; Bali

& Hovakimian, 2009; Brown et al., 2021; Cremers & Weinbaum, 2010; Han et al., 2017; Pan &

Poteshman, 2006), I sort the next-day returns at t + 1 of all ETFs into quintile portfolios based on the put-call ratio of today, at time t. Naturally, the first quintile portfolio contains future ETF returns with the lowest levels of the put-call ratio and the fifth portfolio contains future ETF returns with the highest level of the put-call ratio. I specify that the ETFs are equally weighted in each quintile portfolio (An et al., 2014) and that there are approximately 21 ETFs in each portfolio. For each quintile portfolio, I subtract the risk-free rate from the portfolio return, thus generating excess portfolio returns. Subsequently, I perform a long-short trading strategy that buys the fifth quintile portfolio of high put-call ratio ETFs and sells the first quintile portfolio of low-put call ratio ETFs.

The portfolios are rebalanced daily. This means that I repeat this procedure for every day in the sample, thus creating a series of 2014 daily ETF excess returns, which is 1 day less than the sample size due to lagging the put-call ratio in order to reflect the future predictability component.

Although not particularly relevant, this strategy and the double sorts can be considered as partially
self-financing strategies since buying and selling excess returns entails lending and shorting the
risk-free rate simultaneously, assuming equal lending and borrowing rates (Frazzini & Pedersen,
2014). In order to check that the excess returns of the strategy and the individual portfolios are not
due to market risk factors or firm characteristics, I calculate abnormal returns (risk-adjusted
returns) using the CAPM, Fama and French (1993) three-factor model and Fama and French
(1993) with Carhart’s (1997) momentum model. I present in order the CAPM, the Fama and
French (1993) three-factor and the model containing the addition of Carhart’s (1997) momentum
in equations (A1), (A2) and (A3) in Appendix A. For the excess and risk-adjusted returns of the
*strategies, I will also report the Newey-West (1987) t-statistics with one lag since the significance *

is particularly relevant here. Finally, I report the averages of the put-call ratio, IV spread, size and realized volatility for each quintile portfolio, as done by Bali and Hovakimian (2009). This is to ensure the validity of the procedure and check for the size and realized volatility in the quintile portfolios. For brevity reasons, I will repeat the entirety of the above procedure for the IV spread.

Second, I will use double-independent portfolio sorts in order to test the return prediction viability of both the put-call ratio and IV spread in a theoretical trading strategy (An et al., 2014;

Bali & Hovakimian, 2009; Cremers & Weinbaum, 2010). Daily, I perform two independent sorts based on the put-call ratio and the IV spread of future ETF returns into quintiles (An et al., 2014;

Cremers & Weinbaum, 2010). As a result, the first quintile will consist of ETFs with the lowest values of the put-call ratio and IV spread and the fifth quintile will contain ETFs with the highest values of the put-call ratio and IV spread. Therefore, intersecting the two quintile sorts I get 5 × 5

= 25 portfolios which I will represent in a matrix format. Furthermore, the order of the two variables for sorting does not matter here since sorting first on the put-call ratio and then on the IV spread would yield a portfolio matrix which is the pivoted result of sorting first on the IV spread and then on the put-call ratio. I specify that the ETFs are equally weighted in each quintile portfolio (An et al., 2014) and that there are roughly four ETFs in each portfolio. For each portfolio I subtract the risk-free rate from the portfolio return, thus generating excess portfolio returns. Subsequently, I perform a long-short trading strategy in which I buy the fifth quintile portfolio of high put-call ratio ETFs and sell the first quintile portfolio with low put-call ratio ETFs within each IV spread quintile (An et al., 2014). I present the average excess returns across these five long-short trading strategies. Also, taking advantage of the double sorts, I present the main long-short trading strategy which buys the portfolio with a low put-call ratio and high IV spread and sells the portfolio with a high put-call ratio and low IV spread. The argument behind this is that these two portfolios should predict the ETFs with the highest and lowest, respectively, future excess returns for each day which should make the long-short trading strategy profitable. Additionally, all the portfolios are rebalanced daily. This means that I repeat this process for each day in the sample, thus creating a series of 2014 daily ETF excess returns, which is 1 day less than the sample size. As already mentioned, this is due to lagging the put-call ratio and the IV spread in order to reflect the future predictability component. To check that the excess return of the main strategy is not due to market risk factors or firm characteristics, I calculate abnormal returns (risk-adjusted returns) according to factor equations (A1), (A2) and (A3) from Appendix A. For the excess and risk-adjusted returns