• No results found

Confidence intervals for dummy percentage effects in loglinear regression models

N/A
N/A
Protected

Academic year: 2021

Share "Confidence intervals for dummy percentage effects in loglinear regression models"

Copied!
24
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

 

Confidence  intervals  for  dummy  

percentage  effects  in  loglinear  regression  

models            

Tim  Gunneweg  

Supervisor:  dhr.  Dr.  K.J.  (Kees  Jan)  van  Garderen  

ABSTRACT  

           This  paper  considers  confidence  intervals  of  the  percentage  effect  of  

dummy  variables  in  semilogarithmic  regression  models.    First,  a  method  

to  construct  exact  confidence  intervals  in  such  models  is  introduced.  

Next,  this  method  is  tested  by  comparing  it  with  normal  approximated  

confidence  intervals  using  Monte  Carlo  simulation.  From  the  simulation  

experiment  it  is  concluded  that  the  normal  approximated  confidence  

intervals  are  misleading  for  small  samples  due  to  the  non-­‐normality  of  

the  finite  sample  distribution  of  the  percentage  estimator.  Furthermore,  

two  adjustments  to  the  new  technique,  that  can  be  used  in  the  presence  

of  heteroskedasticity,  are  discussed  and  tested:  a  method  based  on  FGLS  

results  and  a  method  based  on  heteroskedastic  consistent  estimators.  

The  latter  method  preforms  better  in  terms  of  coverage  probability  for  

different  model  parameters,  but  the  first  might  be  improved  using  a  

correction  for  small  samples.    

(2)

Table  of  Contents  

1.   INTRODUCTION   3  

2.   THEORETICAL  FRAMEWORK   5  

THE  MODEL   5  

DIFFERENT  ESTIMATORS  OF  𝒑𝒋   6  

NORMAL  APPROXIMATED  CONFIDENCE  INTERVALS   7  

3.   CONSTRUCTING  CONFIDENCE  INTERVALS   8  

4.   MONTE  CARLO  SIMULATION   10  

RESULTS   10  

5.   HETEROSKEDASTIC  ERROR  TERMS   12  

MONTE  CARLO  SIMULATION   14  

RESULTS   14  

6.   DISCUSSION   16  

7.   REFERENCES   18  

APPENDIX  A  HYPERGEOMETRIC  FUNCTIONS   19  

APPENDIX  B  MATLAB  COMPUTATIONS   19  

MATLAB  CODE  MONTE  CARLO  SIMULATION  HOMOSCEDASTIC  ERROR  TERMS   19  

MATLAB  CODE  MONTE  CARLO  SIMULATION  HETEROSKEDASTIC  ERROR  TERMS  USING  GLS   20  

MATLAB  CODE  MONTE  CARLO  SIMULATION  HETEROSKEDASTIC  ERROR  TERMS  USING  HAC  

ESTIMATORS   22  

 

(3)

1. Introduction  

In  economic  literature,  log  transformations  are  often  used  to  test  the  percentage   impact  of  dummy  regressors  on  a  dependent  variable.  For  example,  Immergluck   (2008)  studies  the  effect  of  different  financial  regulators  on  investments  of   American  banks  in  housing  and  community  development.  In  the  United  States,   banks  are  encouraged  to  invest  in  housing  and  community  development  by  the   Community  Reinvestment  Act  (CRA).  This  is  a  piece  of  legislation  that  makes  it   possible  for  financial  regulators  to  base  their  approval  of  certain  banking   activities  partly  on  the  value  of  a  bank’s  investment  in  housing  and  community   development.  In  the  United  States,  there  are  four  different  regulators  for  

different  types  of  banks.  Immergluck  (2008)  estimates  a  linear  regression  model   with  the  log  of  a  bank’s  CRA-­‐qualified  investments  as  dependent  variable.  As   independent  variables  he  uses  various  control  variables  together  with  a  dummy   for  the  regulators.  Immergluck  (2008)  concludes  that  the  effect  of  two  regulators   is  significant,  based  on  the  significance  of  the  OLS  estimate  of  the  dummy  

coefficient.  Furthermore,  he  finds  the  magnitudes  of  the  three  dummies  to  be   183%,  112%  and  82%  respectively,  with  the  last  one  not  being  significant.    

While  concluding  that  the  percentage  effect  is  present  and  making  a  claim   about  its  size,  Immergluck  (2008)  does  not  provide  confidence  intervals  of  the   percentage  effect  to  show  the  concentration  of  the  magnitude.  Although  not   necessary  to  make  conclusions  about  the  presence  of  a  percentage  effect,  

confidence  intervals  could  be  very  useful  to  gain  more  insight  in  the  spread  of  an   effect.  However,  in  the  special  case  of  a  percentage  effect  of  a  dummy  variable,   the  construction  of  confidence  intervals  is  not  as  straight  forward  as  it  is  for   continuous  variable  because  of  the  binary  features  of  a  dummy  variable.  

The  characteristics  of  the  percentage  effect  of  dummy  variables  have  been   widely  studied.  Contrary  to  a  continuous  variable,  the  estimated  coefficient  of  a   dummy  variable  in  a  log  linear  model  cannot  be  interpreted  as  the  percentage   effect  of  that  variable  on  the  independent  variable.  In  contrast,  showed  by   Garderen  and  Shah  (2002),  the  percentage  effect  of  a  dummy  variable  should  be   estimated  using  the  approximately  unbiased  Kennedy  estimator,  which  is  a   function  of  the  OLS  estimate  of  the  dummy  coefficient  and  the  OLS  estimate  of  its   variance.  Furthermore  they  argue  that  this  Kennedy  estimator  should  be  used   together  with  an  approximately  unbiased  estimator  of  its  variance  to  measure  its  

(4)

spread.  Nevertheless,  they  do  not  elaborate  on  how  this  spread  should  be   interpreted  exactly.  These  statistics  could  be  used  to  estimate  normal  

approximations  of  the  confidence  intervals  of  the  percentage  effect  when  the   distribution  of  the  Kennedy  estimator  is  close  to  normal.  To  examine  the  

reliability  of  these  estimates,  Giles  (2011)    formulates  an  expression  for  the  finite   sample  density  function  of  the  Kennedy  estimator  and  concludes  that  it  is  far   from  normal  and  therefore  bootstrap  methods  should  be  used  to  estimate   confidence  intervals.  However,  the  provided  density  function  is  incorrect  as  van   Garderen  points  out  (personal  communication,  May  9,  2014).  Furthermore  it  is   not  clear  whether  bootstrap  methods  are  optimal  under  these  conditions.    

The  main  goal  of  this  study  is  to  develop  and  test  a  method  to  construct   exact  confidence  intervals  for  dummy  variables  in  log  linear  models.  The  

resulting  intervals  are  compared  with  confidence  intervals  based  on  the  normal   approximation  of  the  percentage  estimator.  

The  method  to  construct  confidence  intervals  is  examined  in  three  ways.   First,  a  technique  to  construct  confidence  intervals  under  perfect  model  

assumptions  is  derived  theoretically.  Next,  Monte  Carlo  simulation  is  used  to   compare  confidence  intervals  resulting  from  this  technique  with  confidence   intervals  based  on  the  normal  approximation  of  the  Kennedy  estimator.   Furthermore,  it  is  demonstrated  that  these  techniques  and  the  normal  

approximation  technique  can  lead  to  different  results.  Finally,  the  implications  of   heteroskedasticity  are  discussed  and  two  methods  to  solve  this  issue  are  

compared  using  Monte  Carlo  simulation.  

   This  paper  is  organized  as  follows.  In  the  second  section,  the  theoretical   framework  is  developed  by  formulating  the  considered  model,  explaining   different  estimators  of  the  percentage  effect,  and  investigating  the  normal   approximation  based  on  the  Kennedy  estimator.  In  the  third  section,  a  method   for  constructing  confidence  intervals  is  derived  and  explained.  In  the  fourth   section,  the  first  simulation  experiment  is  explained  and  its  most  important   results  are  discussed.  In  the  fifth  section,  the  implications  of  heteroskedasticity   are  discussed  and  two  methods  to  solve  this  issue  are  compared  using  Monte   Carlo  simulation.  Finally,  in  the  sixth  section,  some  concluding  remarks  are   made.  

(5)

2. Theoretical  Framework  

In  the  following  section,  the  necessary  theoretical  framework  is  established.   First,  the  considered  model  is  specified.  Thereafter,  two  different  estimators  of   the  percentage  effect  are  discussed:  the  minimum  variance  unbiased  estimator   and  the  Kennedy  estimator.  Finally,  the  normal  approximation  of  confidence   intervals  of  the  Kennedy  estimator  is  provided.  

The  model  

The  considered  model  can  be  specified  as  follows:     𝑌 = exp  {𝑎 + 𝑏!𝑋! ! !!! + 𝑐!𝐷! ! !!! +  𝜀}    

where  exp .  is  defined  as  the  element  wise  exponential  function,  the  𝑋!‘s  are   continuous  variables,  the  𝐷!’s  are  dummy  variables  and  𝜀  ~  𝑁(0, 𝜎!𝐼

!).  After   taking  the  element  wise  log  at  both  sides  of  the  model  equation,  the  model   becomes  linear.           𝐿𝑜𝑔 𝑌 =  𝑎 + 𝑏!𝑋! ! !!! + 𝑐!𝐷! ! !!! +  𝜀   (2.1)        

Since  the  model  is  linear  after  the  transformation,  the  optimal  estimators  of  the   coefficients  can  be  obtained  using  OLS.  In  the  continuous  case,  the  coefficients  of   the  resulting  linear  model  multiplied  by  100,  can  be  interpreted  as  the  

percentage  effect  of  the  independent  variable  on  the  dependent  variable.  To  see   why,  differentiate  both  sides  of  (2.1)  with  respect  to  𝑋!  to  obtain  

  𝑝! = 100 ∗1 𝑌 𝜕𝑌 𝜕𝑋!   = 100 𝜕𝑙𝑛𝑌 𝜕𝑋! = 100𝑏!    

In  order  to  simplify  notation,  from  now  𝑝!  is  defined  as  the  relative  change,   without  the  factor  100.  So  in  this  case  𝑝! = !!!!!"

!   = !"#$

!!! = 𝑏!.  

For  dummy  variables,  this  does  not  hold  since  a  dummy  variable  𝐷!  is   binary  and  hence  no  continuous  derivative  of  Y  with  respect  to  𝐷!  exists.  The  

(6)

percentage  change  𝑝!  of  Y,  from  𝑌!  to  𝑌!,  resulting  from  the  change  of  𝐷!  from  0  to   1,  should  be  calculated  directly  using  𝑝! =  !!!!!

!!  (van  Garderen  &  Shah,  2002).   Using  (2.1),  this  leads  to  

  𝑝! = !!!!! !! = !! !!− 1 =   !"# !! !!!!!!!!!!!∗!!  ! !"# !! !!!!!!!!!!!∗!!  ! − 1 = exp 𝑐! − 1       so     𝑝! = exp 𝑐! − 1.   (2.2)  

So  an  unbiased  estimator  of  𝑝!  should  have  this  expectation.  

Different  estimators  of  𝒑𝒋  

Since  in  general  c  (from  now  on,  the  subscript  j  is  dropped  for  clarity)  is  

unknown,  (2.2)  cannot  be  used  directly  to  calculate  p,  but  p  has  to  be  estimated   using  the  OLS  estimate  𝑐  of  c.  A  simple,  but  wrong,  solution  to  this  problem,   which  is  often  used  in  literature  (see  references  van  Garderen  and  Shah,  2002),   would  be  to  replace  𝑐!  in  (2.2)  with  its  OLS  estimate  𝑐!.  However,  it  is  easy  to  see   that  this  results  in  a  biased  estimator  of  p  because:  

 

𝐸 exp 𝑐 − 1 𝑋 > exp 𝐸 𝑐 𝑋 − 1 = exp 𝑐 − 1    

where  the  inequality  sign  follows  directly  from  Jensens  inequality  and  the  strict   convexity  of  the  exponential  mapping.    

Goldberger  (1968)  shows  that  the  expected  value  of  exp 𝑐  in  fact  equals   exp 𝑐 +!!𝑉(𝑐) ,  with  𝑉 𝑐  the  variance  of  𝑐.  Therefore,  Kennedy  (1981)  argued   to  use  the  following  estimator  of  p:  

 

  𝑝 = exp 𝑐 −12𝑉 𝑐 − 1)   (2.3)  

     

where  𝑐  is  the  OLS  estimate  of  c  and  𝑉 𝑐  is  the  OLS  estimate  of  its  variance.     In  their  study,  van  Garderen  and  Shah  (2002)  show  that  the  Kennedy   estimator  is  biased  and  that  the  minimum  variance  unbiased  estimator  of  p   equals:  

(7)

  𝑝 = exp  {𝑐}!𝐹! 𝑚;  −1

2𝑚𝑉 𝑐 − 1   (2.4)  

     

where  𝑐  and  𝑉 𝑐  are  OLS  estimates,  𝑚 =!!!!  with  n  the  number  of  observations   and  k  the  number  of  regressors  and    !𝐹!  is  the  confluent  hypergeometric  

function  (for  an  explanation  of  the  used  hypergeometric  functions,  see  Appendix   A).  They  also  show  that  the  variance  of  (2.4)  equals:  

 

  𝑉 𝑝 =   exp{2𝑐} exp 𝑉 𝑐 !𝐹! 𝑚; 𝑉 𝑐

! − 1  

   

Furthermore,  they  proof  that  the  minimum  variance  unbiased  estimator  of  𝑉 𝑝   is  

𝑉 𝑝 =   exp{2𝑐} {[!𝐹! 𝑚; −1 2𝑚𝑉 𝑐

!

!𝐹! 𝑚;  −2𝑚𝑉 𝑐 }  

In  addition  to  the  minimum  variance  unbiased  estimator  of  the  variance  of  𝑝,  van   Garderen  and  Shah  (2002)  derive  the  following  approximately  unbiased  

estimator  of  its  variance:    

  𝑉 𝑝 =   exp 2𝑐 [exp  {−𝑉 𝑐) − exp −2𝑉 𝑐 ]   (2.5)  

     

In  their  research,  they  further  show  that  the  unbiased  estimates  of  𝑝  are  very   close  to  those  calculated  by  the  much  more  convenient  Kennedy  estimator.  This   leads  them  to  suggest  that,  in  most  applications,  Kennedy’s  estimator  should  be   used  together  with  their  approximately  unbiased  estimator  of  the  variance  (2.5)   when  estimating  the  percentage  impact  of  a  dummy  variable.  

Normal  Approximated  Confidence  Intervals  

Although  arguing  that  (2.3)  should  be  used  to  measure  the  size  of  a  percentage   effect  and  that  (2.5)  should  be  used  to  measure  its  variance,  van  Garderen  and   Shah  do  not  explain  how  these  two  statistics  should  be  used  to  construct  precise   confidence  intervals  of  p.  Using  only  a  point  estimator  and  an  estimator  of  its   variance,  confidence  intervals  are  usually  approximated  using  the  normal  

distribution.  So  in  this  case,  approximated  equal  tailed  1 − 𝛼  confidence  intervals   could  be  constructed  using  

(8)

  𝑐! 𝑥, 𝑦 =  𝑝 − 𝑆𝑞𝑟𝑡 𝑉 𝑝 𝑧1

2𝛼    and  𝑐! 𝑥, 𝑦 =𝑝 + 𝑆𝑞𝑟𝑡 𝑉 𝑝 𝑧 1

2𝛼   (2.6)   Although  being  asymptotically  accurate,  (2.6)  could  be  misleading  when   the  distribution  of  𝑝  is  very  different  from  normal.  In  order  to  find  a  basis  for   inference  about  𝑝,  Giles  (2011)  derives  the  finite  sample  distribution  of  𝑝,  but   apparently  contains  an  error.  Van  Garderen  (personal  communication,  May  9,   2014)  shows    that  the  pdf  equals  

  𝑓 𝑝 = 1 2!!!! 𝜋 exp −σ!+ 𝐿𝑜𝑔 1 + 𝑝 ! 2𝑑𝜎! ∗   1 + 𝑝 !!! !!!!𝑣!! 𝑑𝜎! !!!!! 𝐻𝑦𝑝𝑒𝑟𝑔𝑒𝑜𝑚𝑈[𝑣 4; 1 2; −𝑐 + 𝑣 + 𝐿𝑜𝑔 1 + 𝑝 ! 2𝑑𝜎! ]     (2.5)  

where  HypergeomU  is  Tricomi’s  hypergeometric  function.  Figure  1  shows  a  plot   of  this  density  function  with  parameter  values  𝜎! = 0.5, 𝑣 = 10, 𝑑 = 0.5,  and    𝑐 = 0.25.  From  Figure  1  it  is  clear  that  the  finite  sample  distribution  of  𝑝  is  far   from  normal,  with  the  density  function  being  positively  skewed.  Therefore,   normal  approximated  confidence  intervals  could  be  misleading  since  they  are   shifted  to  the  left  compared  to  exact  confidence  intervals.  To  test  the  scale  of  this   error,  an  exact  method  to  find  confidence  intervals  of  𝑝  is  developed  in  the  next   section.  

 

Figure  1:  pdf  of  𝒑  with  𝝈𝟐= 𝟎. 𝟓, 𝒗 = 𝟏𝟎, 𝒅 = 𝟎. 𝟓,  and  𝒄 = 𝟎. 𝟐𝟓   3. Constructing  Confidence  Intervals  

In  this  section,  a  method  for  the  construction  of  exact  confidence  intervals  is   developed.  The  technique  is  based  on  confidence  intervals  for  the  coefficients  in   the  transformed  model  using    OLS  estimates.  

Consider  the  model  as  stated  in  (2.1).  To  distinguish  between  the   stochastic  random  sample  and  an  observed  sample,  first  some  notation  is  

(9)

introduced.  X= (𝑋!𝑋!… 𝑋!𝐷!𝐷!… 𝐷!)  and  Y  represent  the  stochastic  random   variables  X  and  Y,  before  the  sample  is  observed,  and  x  and  y  represent  observed   outcomes  of  the  random  sample  X  and  Y.    

The  goal  is  to  find  a  two  sided  1 − 𝛼  confidence  interval  𝐼  of    

𝑝! = exp 𝑐! − 1  based  on  a  random  sample  with  explanatory  variables  𝑥  and   dependent  variable  y.  In  this  context,  a  two-­‐sided  1 − 𝛼  confidence  interval  is   defined  by  Bain  and  Engelhardt  (1992,  p.  360)  as  follows:  

Definition  1.   𝐴𝑛  𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙  (𝑐! 𝑥, 𝑦 , 𝑐!(𝑥, 𝑦))  𝑖𝑠  𝑐𝑎𝑙𝑙𝑒𝑑  𝑎  100%  (1 − 𝛼)  𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒  𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙  𝑓𝑜𝑟  𝑝    𝑖𝑓  𝑃 𝐶! 𝑋, 𝑌 < 𝑝 < 𝐶! 𝑋, 𝑌 = 1 − 𝛼     𝑤ℎ𝑒𝑟𝑒  𝛼 ∈ 0,1 , 𝑥  𝑎𝑛𝑑  𝑦  𝑎𝑟𝑒  𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑  𝑣𝑎𝑙𝑢𝑒𝑠  𝑜𝑓  𝑡ℎ𝑒  𝑟𝑎𝑛𝑑𝑜𝑚  𝑠𝑎𝑚𝑝𝑙𝑒  𝑋  𝑎𝑛𝑑  𝑌   𝑎𝑛𝑑  𝑐! 𝑥, 𝑦 𝑎𝑛𝑑  𝑐! 𝑥, 𝑦  𝑎𝑟𝑒  𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑠  𝑜𝑓  𝑥  𝑎𝑛𝑑  𝑦      

Note  that  in  definition  1,  the  statistics  𝐶! 𝑋, 𝑌  and  𝐶! 𝑋, 𝑌  are  stochastic   because  they  are  functions  of  the  random  variables  X  and  Y.  On  the  other  hand,   𝑐! 𝑥, 𝑦  and  𝑐! 𝑥, 𝑦  are  observed  values  of  these  statistics  in  the  case  that  the   observed  value  of  X  is  x  and  the  observed  value  of  Y  is  y.  

  In  this  context,  confidence  intervals  can  be  constructed  using  Theorem  1.    

 

Theorem  1.  

Let  𝑐  be  the  OLS  estimate  of  the  coefficient  of  a  dummy  variable,  D,  in  a  loglinear   regression  model  as  specified  in  (2.1),  and  𝑉(𝑐)  the  OLS  estimate  of  its  variance.   An  equal  tailed,  two  sided  1 − 𝛼  confidence  interval  of  the  relative  change,  𝑝,  in  Y   due  to  D  changing  from  0  to  1  is  given  by  (𝑐! 𝑥, 𝑦 , 𝑐! 𝑥, 𝑦 )  with  

  𝑐! 𝑥, 𝑦 = exp 𝑐 − 𝑆𝑞𝑟𝑡 𝑉 𝑐 ∗ 𝑡! !!,!!! − 1   𝑐! 𝑥, 𝑦 = exp 𝑐 + 𝑆𝑞𝑟𝑡 𝑉 𝑐 ∗ 𝑡! !!,!!! − 1   (3.1)  

where  n  is  the  sample  size,  k  is  the  number  of  regressors  and    𝑡!

!!,!!!  is  the  t   value  with  n-­‐k  degrees  of  freedom  such  that  the  probability  to  the  right  of  it  is  𝛼.      

  Proof.  

Using  (3.1)  in  Definition  1  gives    𝑃 𝐶! 𝑋, 𝑌 < 𝑝 < 𝐶! 𝑋, 𝑌 =  

(10)

𝑃 exp 𝑐 − 𝑆𝑞𝑟𝑡 𝑉 𝑐 ∗ 𝑡! !!,!!! − 1 < 𝑝 < exp 𝑐 + 𝑆𝑞𝑟𝑡 𝑉 𝑐 ∗ 𝑡!!!,!!! − 1   = 𝑃 𝑐 − 𝑆𝑞𝑟𝑡 𝑉 𝑐 ∗ 𝑡! !!,!!! < 𝐿𝑜𝑔 𝑝 + 1 < 𝑐 + 𝑆𝑞𝑟𝑡 𝑉 𝑐 ∗ 𝑡!!!,!!!   = 𝑃 𝑐 − 𝑆𝑞𝑟𝑡 𝑉 𝑐 ∗ 𝑡! !!,!!! < 𝑐 < 𝑐 + 𝑆𝑞𝑟𝑡 𝑉 𝑐 ∗ 𝑡!!!,!!!   = 𝑃 −𝑡! !!,!!! < 𝑐 − 𝑐 𝑆𝑞𝑟𝑡 𝑉 𝑐 < 𝑡!!!,!!! = 1 − 𝛼  

where  the  fourth  equality  follows  from  (2.2)  and  the  last  equality  follows  from   the  fact  that  !"#$ ! !!!! ~𝑡 𝑛 − 𝑘 .  The  equality  of  the  tails  follows  from  the   symmetry  of  the  problem.  This  completes  the  proof.  

4. Monte  Carlo  Simulation  

Monte  Carlo  simulation  is  used  to  examine  small  sample  properties  of  the  exact   confidence  intervals  (3.1)  and  their  normal  approximation  (2.6).  All  simulations   are  done  using  Matlab  and  can  be  found  in  appendix  B.  The  simulations  are   based  on  the  model:  

𝐿𝑜𝑔 𝑌! = 𝑎 + 𝑏!𝑋!!+ 𝑏!𝑋!!+ 𝑐𝐷! + 𝜀!,   where  𝜀!~𝑖. 𝑖. 𝑑. 𝑁 0, 𝜎! .  

To  cancel  out  the  effect  of  specific  values  of  the  regressors,  𝑋!  and  𝑋!  are  

regenerated  each  replication  as  standard  normal  variables.  The  first  ½*n  values   of  the  variable  D  are  set  equal  to  1  and  the  second  ½*n  values  are  set  equal  to   zero.  The  initial  values  of  the  model  parameters  are  set  to  𝑎 = 1, 𝑏! = 𝑏! = 0.2, 𝑐 = 0.2231  𝑎𝑛𝑑  𝜎! = 0.25,  so  that  the  percentage  effect  of  𝑋

!  and  𝑋!  equals   20%  and  the  percentage  effect  of  𝐷!  equals  𝑝 = exp 𝑐 − 1 = 25%.  Using  (2.6)   and  (3.1),  equal  tailed  95%  confidence  intervals  are  calculated  for  different   sample  sizes  using  10000  replications  for  each  sample  size.    

Results  

In  Table  1,  for  both  methods  the  average  values  of  the  confidence  limits  are   reported  together  with  their  coverage  probabilities.  From  Table  1,  it  is  clear  that   the  normal  approximated  confidence  intervals  are  much  wider  than  the  exact   confidence  intervals,  especially  for  small  sample  sizes.  Furthermore,  the  normal   approximated  intervals  are  shifted  downwards  compared  to  the  exact  intervals.   As  a  result  of  these  two  effects,  the  coverage  probability  of  the  normal  

(11)

especially  the  case  for  small  sample  sizes.  As  the  sample  size  increases,  the   normal  approximation  approaches  the  exact  intervals,  and  thereby  the  coverage   ratio  increases.  This  result  is  illustrated  by  figures  1  and  2  were  the  confidence   limits  of  the  different  methods  and  the  coverage  probabilities  are  plotted  for   different  sample  sizes.  From  these  results,  it  is  clear  that  normal  approximated   confidence  intervals  can  be  very  misleading  when  sample  sizes  are  small.    

Table  1.  Average  confidence  intervals  and  coverage  probabilities  for  different  sample  sizes    

n   Exact  confidence  intervals   Normal  Approximation  

    cl   cu   CP   cl   cu   CP   10   -­‐0.416   2.380   0.950   -­‐0.574   1.069   0.858   20   -­‐0.214   1.118   0.952   -­‐0.314   0.812   0.912   30   -­‐0.134   0.877   0.951   -­‐0.206   0.706   0.926   40   -­‐0.087   0.758   0.949   -­‐0.143   0.642   0.930   50   -­‐0.054   0.687   0.950   -­‐0.100   0.600   0.934   60   -­‐0.029   0.640   0.951   -­‐0.069   0.570   0.939   70   -­‐0.010   0.604   0.950   -­‐0.045   0.545   0.940   80   0.005   0.575   0.949   -­‐0.026   0.525   0.940   90   0.018   0.553   0.952   -­‐0.010   0.510   0.944   100   0.029   0.535   0.950   0.004   0.496   0.943   150   0.067   0.475   0.951   0.050   0.451   0.946   250   0.106   0.420   0.949   0.095   0.406   0.947   500   0.146   0.366   0.949   0.140   0.359   0.948   1000   0.175   0.331   0.950   0.173   0.328   0.949   5000   0.216   0.285   0.950   0.215   0.285   0.950   10000   0.226   0.275   0.949   0.226   0.275   0.949      

Figure  2:  confidence  intervals  of  𝒑  for  different  sample  sizes  

-­‐1,0   -­‐0,5   0,0   0,5   1,0   1,5   2,0   2,5   3,0   Value   Sample  Size   cl   cu   Ncl   Ncu  

(12)

 

Figure  3:  Coverage  probabilities  of  both  confidence  intervals  for  different  samples  sizes    

5. Heteroskedastic  Error  Terms  

Although  Theorem  1  gives  an  exact  method  to  construct  confidence  intervals,  it   rests  on  quite  strong  assumptions  about  the  data  generating  process.  Especially   the  assumption  about  homoscedastic  error  terms  is  in  practice  often  violated.  In   this  section,  the  implications  of  heteroskedastic  error  terms  are  explored.    

First,  consider  the  model:  

𝑦 =  𝐿𝑜𝑔 𝑌 =  𝑋𝑏 + 𝜀,  

where  b  is  a  vector  with  dummy  and  continuous  variables  and  𝜀~𝑁(0, Ω)  with   known  positive  definite  diagonal  matrix  Ω.  The  model  can  now  be  estimated   using  GLS.  The  model  is  transformed  by  multiplying  both  sides  of  the  model   equation  with  Ω!!!.  Consequently  :  

Ω!!!𝑦 =   Ω!!!𝑋𝑏 + Ω!!!𝜀,  

where  Ω!!!𝜀~𝑁 0, 𝐼! .  In  this  case,  the  estimated  dummy  coefficients  are   𝑏!"# = 𝑋!Ω!!𝑋 !!𝑋′Ω!!𝑦  with  variance  𝑉𝑎𝑟 𝑏

!"# =   𝑋!Ω!!𝑋 !!.  Since  these   estimates  result  from  a  linear  regression  equation,  the  t-­‐statistic  follows  a  t-­‐ distribution  with  n-­‐k  degrees  of  freedom.  Therefore,  Theorem  1  can  be  applied   using  the  GLS  estimates.  This  results  in  the  following  expressions  for  the   confidence  limits:     𝑐! 𝑥, 𝑦 = exp 𝑐!"# − 𝑆𝑞𝑟𝑡 𝑉 𝑐!"# ∗ 𝑡! !!,!!! − 1   (5.1)   0,80   0,82   0,84   0,86   0,88   0,90   0,92   0,94   0,96   Pr ob ab ili ty   Sample  Size   Exact  CP   N  App  CP  

(13)

𝑐! 𝑥, 𝑦 = exp 𝑐!"#+ 𝑆𝑞𝑟𝑡 𝑉 𝑐!"# ∗ 𝑡!

!!,!!! − 1  

However,  in  practice  the  structure  of  Ω  is  unknown,  and  consequently   (5.1)  cannot  be  used  to  construct  confidence  intervals.  In  the  following  section,   two  methods  to  solve  this  issue  are  discussed.  The  first  method  uses  an  estimate   of  the  matrix  Ω  in  combination  with  the  results  derived  above.  The  second  

method  is  based  on  a  heteroskedastic  consistent  estimators  of  the  variance  of  the   dummy  coefficient.  

The  first  method  uses  two-­‐step  FGLS  results  to  estimate  the  covariance   matrix  Ω,  by  estimating  the  following  model  of  𝜎!!  

𝜎!! = exp  {𝑧

!!𝛾},  where    𝑧 = (1, 𝑧!… . 𝑧!)  is  an  vector  of  explanatory  variables   (Heij  et  al,  2004,  p,337).  The  exponential  transformation  is  used  to  guarantee   that  estimated  values  of  𝜎!!  are  positive.  In  the  first  step,  OLS  is  applied  in  the   model  𝐿𝑜𝑔 𝑌 =  𝑋𝑏 + 𝜀.  If  𝑏  is  consistent,  the  squared  residuals  𝑒!!  of  this   regression  are  asymptotic  unbiased  estimates  of  𝜎!!.  Therefore,  in  the  second   step,  the  following  model  is  estimated  to  find  the  values  of  𝛾    

𝐿𝑜𝑔 𝑒!! =   𝑧

!!𝛾 + 𝜂!  

The  coefficients  𝛾!  are  estimated  consistently  for  𝑖 = 2, … , 𝑝  but    𝛾!  should  be   estimated  using  𝛾!− 𝐸 log 𝜒! 1 =   𝛾

!+ 1.27    as  Heij  et  al  (2004,  p337)  point   out.  Finally,  Ω  is  estimated  by  σ!!! = exp 𝑧

!!𝛾 .  Using  this  estimate  of  Ω,  (5.1)  can   be  used  to  construct  confidence  intervals.  The  normal  approximation  (2.6)  can   be  calculated  using  the  FGLS  estimate  of  the  coefficient  and  the  estimated   covariance  matrix  in  (2.3)  and  (2.5).  

  The  second  method  uses  White  standard  errors  with  a  correction  for   small  samples  to  estimate  the  variance  of  the  coefficient  estimator.  This  is  a   method  to  find  hetroskedastic  consistent  estimators  of  the  variance  of  𝑏.  The   biggest  advantage  of  this  method  is  that  it  can  be  used  when  no  model  of  the   variance  is  known.  Instead  of  using  GLS,  OLS  is  used  to  estimate  the  model   parameters.  Furthermore,  the  variance  of  the  coefficients  is  estimated  using  

  𝑣𝑎𝑟 𝑏 = 𝑋!𝑋 !!𝑋!𝑑𝑖𝑎𝑔 𝑒!!

1 − ℎ!! 𝑋 𝑋!𝑋 !!   (5.2)   where  ℎ!!  is  the  𝑖𝑖!!  element  of  𝐻 = 𝑋 𝑋!𝑋 𝑋′.  This  estimator  of  the  variance  can   be  used  in  (5.1)  together  with  the  OLS  estimate  of  the  dummy  coefficients  to  

(14)

construct  confidence  intervals.  The  normal  approximation  (2.6)  can  be   calculated  using  the  OLS  coefficient  together  with  (5.2)  in  (2.3)  and  (2.5).  

Monte  Carlo  simulation  

Monte  Carlo  simulation  is  used  to  compare  the  techniques  developed  in  previous   section.  Simulation  is  based  on  the  same  model  as  before,  but  now  with  

 𝜀!~𝑁(0, 𝜎!!𝐷! + 𝜎!! 1 − 𝐷! )  thus  the  variance  of  𝜀!  depending  on  the  value  of  𝐷!.   This  specification  of  the  heteroskedasticity  is  used  because  it  is  logical  to  assume   that  the  variance  is  different  for  different  groups  within  the  sample.  Again,  to   cancel  out  the  effect  of  specific  values  of  the  regressors,  𝑋!  and  𝑋!  are  

regenerated  each  replication  as  standard  normal  variables.  The  first  ½*n  values   of  the  variable  D  are  set  equal  to  1  and  the  second  ½*n  values  are  set  equal  to   zero.  The  initial  values  of  the  model  parameters  are  set  to  𝑎 = 1, 𝑏! = 𝑏! = 0.2, 𝑐 = 0.2231  𝑎𝑛𝑑  𝜎!! = 0.25  𝑎𝑛𝑑  𝜎

!! = 0.64,  so  that  the  percentage  effect  of  𝑋!   and  𝑋!  equals  20%  and  the  percentage  effect  of  𝐷!  equals  𝑝 = exp 𝑐 − 1 = 25%.   Using  both  techniques,  equal  tailed  95%  confidence  intervals  are  calculated  for   different  sample  sizes,  calculating  10000  replications  for  each  sample  size.  In  the   GLS  method,  the  covariance  matrix  is  estimated  using  the  dummy  variable  and  a   constant  as  regressors,  thus  𝑧! = 1, 𝐷! !.    

Results  

In  Tabel  2  the  average  confidence  intervals  and  coverage  probabilities  for  both   methods  and  their  normal  approximations  are  reported.  It  is  clear  that  both   methods  outperform  their  normal  approximations  significantly.  Again,  this  effect   vanishes  as  the  sample  size  increases.  Furthermore,  the  coverage  probability  of   the  White  method  is  larger  than  that  of  the  GLS  method  for  all  sample  sizes,  with   the  difference  being  smaller  for  larger  samples.  This  is  illustrated  by  figure  4,   where  the  coverage  probability  of  both  methods  is  plotted  for  different  sample   sizes.  

(15)

Table  2:  Confidence  intervals  and  coverage  probabilities  for  𝝈𝟏𝟐= 𝟎. 𝟐𝟓  𝒂𝒏𝒅  𝝈 𝟐

𝟐= 𝟎. 𝟔𝟒    

n   White   Normal  approximation  White   GLS   Normal  approximation  GLS       cl   cu   CP   cl   cu   CP   cl   cu   CP   cl   cu   CP  

10   -­‐0.502   4.154   0.935   -­‐0.769   1.288   0.814   -­‐0.459   3.632   0.911   -­‐0.695   1.256   0.795   20   -­‐0.316   1.526   0.944   -­‐0.486   0.961   0.886   -­‐0.298   1.544   0.927   -­‐0.467   0.974   0.874   30   -­‐0.235   1.149   0.948   -­‐0.360   0.834   0.902   -­‐0.222   1.147   0.940   -­‐0.346   0.835   0.903   50   -­‐0.137   0.869   0.946   -­‐0.218   0.709   0.917   -­‐0.132   0.868   0.943   -­‐0.213   0.708   0.921   100   -­‐0.033   0.648   0.947   -­‐0.077   0.579   0.935   -­‐0.030   0.651   0.947   -­‐0.074   0.581   0.936   250   0.062   0.482   0.952   0.044   0.457   0.946   0.062   0.481   0.950   0.043   0.456   0.947   500   0.113   0.408   0.949   0.103   0.396   0.947   0.116   0.412   0.949   0.106   0.400   0.948      

Figure  4:  Confidence  intervals  and  coverage  probabilities  for  𝝈𝟏𝟐= 𝟎. 𝟐𝟓  𝒂𝒏𝒅  𝝈 𝟐

𝟐= 𝟎. 𝟔𝟒  

 

Figure  5:  Coverage  probabilities  for  𝝈𝟏𝟐= 𝟎. 𝟏  𝒂𝒏𝒅  𝝈𝟐𝟐= 𝟎. 𝟗     0,89   0,90   0,91   0,92   0,93   0,94   0,95   0,96   10   20   30   50   100   250   500   Co ve ra ge  P ro b ab il it y   Sample  Size   White  CP   GLS  CP   0,87   0,88   0,89   0,90   0,91   0,92   0,93   0,94   0,95   0,96   10   20   30   50   100   250   500   Co ve ra ge  P ro b ab il it y   Sample  Size   White  CP   GLS  CP  

(16)

To  test  the  effect  of  the  size  of  the  heteroskedasticity,  the  simulation   experiment  is  repeated  with  𝜎!! = 0.1  𝑎𝑛𝑑  𝜎

!! = 0.9.  The  results  are  summarized   in  Table  3.  Again,  the  coverage  probability  of  the  White  confidence  intervals   outperforms  the  coverage  ratio  of  the  GLS  confidence  intervals.  Furthermore,  the   normal  approximations  are  unreliable  in  terms  of  coverage  probabilities.  These   results  indicate  that  the  White  confidence  intervals  preform  better  for  different   magnitudes  of  heteroskedsticity.    

Table  3:  Coverage  probabilities  for  𝝈𝟏𝟐= 𝟎. 𝟏  𝒂𝒏𝒅  𝝈 𝟐 𝟐= 𝟎. 𝟗  

n   Exact  White   Normal  approximation  White   Exact  GLS   Normal  approximation  GLS       cl   cu   CP   cl   cu   CP   cl   cu   CP   cl   cu   CP  

10   -­‐0.515   4.995   0.928   -­‐0.803   1.309   0.795   -­‐0.468   3.774   0.902   -­‐0.713   1.259   0.776   20   -­‐0.327   1.687   0.937   -­‐0.519   1.020   0.879   -­‐0.308   1.623   0.922   -­‐0.488   0.993   0.871   30   -­‐0.244   1.232   0.943   -­‐0.383   0.875   0.906   -­‐0.230   1.216   0.932   -­‐0.365   0.866   0.899   50   -­‐0.150   0.922   0.947   -­‐0.241   0.740   0.925   -­‐0.141   0.920   0.937   -­‐0.230   0.739   0.917   100   -­‐0.047   0.677   0.947   -­‐0.096   0.598   0.937   -­‐0.045   0.673   0.943   -­‐0.094   0.595   0.930   250   0.053   0.498   0.949   0.031   0.469   0.942   0.052   0.494   0.946   0.031   0.466   0.941   500   0.106   0.418   0.950   0.095   0.404   0.948   0.107   0.419   0.949   0.096   0.405   0.948     6. Discussion  

Log  linear  regression  models  are  often  used  to  model  percentage  effects  in   economic  relations.  For  continuous  variables,  interpretation  of  the  estimated   coefficients  follows  from  the  differentiation  of  the  model  with  respect  to  the   corresponding  variable.    Due  to  their  binary  characteristics,  the  interpretation  of   dummy  variables  is  not  as  straightforward  since  no  continuous  derivative  with   respect  to  a  dummy  exists.  In  recent  studies,  unbiased  and  approximately   unbiased  estimators  of  the  percentage  effect  of  dummy  variables  have  been   developed  and  tested  together  with  unbiased  and  approximately  unbiased   estimators  of  their  variance.  In  their  research,  van  Garderen  and  Shah  (2002)   argue  that  the  estimator  provided  by  Kennedy’s  (1985)  could  be  used  safely  to   estimate  the  size  of  a  percentage  effect.  Furthermore,  they  derive  a  convenient   approximately  unbiased  estimator  of  it  variance.    Although  providing  point   estimates  and  measures  of  spread,  none  of  the  recent  studies  gives  an  exact   method  for  the  construction  of  confidence  intervals  of  the  percentage  effect  of   dummy  variables.  

 In  this  paper,  an  exact  method  to  construct  confidence  intervals  under   perfect  model  assumptions  is  developed.  Furthermore,  two  possible  adjustments  

(17)

that  can  be  made  in  the  case  of  heteroskedastic  error  terms  are  discussed:  a   method  based  on  HC  estimates  of  the  variance  of  the  coefficient  and  a  method   based  on  two  step  FWLS.  Using  Monte  Carlo  simulation,  all  methods  are  tested   together  with  normal  approximations  based  on  Kennedy’s  (1981)  estimator  of   the  percentage  effect  and  van  Garderen  and  Shah’s  (2002)  approximately   unbiased  estimator  of  its  variance.    

From  the  simulation  experiment,  it  is  clear  that  small  sample  confidence   intervals  based  on  the  normal  approximation  can  be  misleading  for  two  reasons:   they  are  shifted  to  the  left  and  they  are  much  smaller  than  the  exact  intervals.   Therefore,  under  classic  model  assumptions,  the  exact  method  should  be   preferred  above  the  normal  approximation  when  samples  sizes  are  small.    

In  the  case  of  heteroskedastic  error  terms,  the  method  based  on  

heteroskedastic  consistent  estimates  of  the  variance  of  the  dummy  coefficient,     outperforms  the  method  based  on  FGLS  results  in  terms  of  coverage  probability   for  all  sample  sizes  and  different  magnitudes  of  heteroskedasticitiy.  These   results  are  counterintuitive  since  the  FGLS  method  uses  more  information  about   the  data  generating  process.  However,  one  should  be  cautious  drawing  

conclusions  from  these  seemingly  strong  results  for  the  following  reason.  In  the   first  step  of  the  FGLS  estimation,  the  possible  model  of  the  variance  is  estimated   by  replacing  𝜎!!  by  𝑒

!!  because  𝑒!!  is  an  asymptotic  unbiased  estimate  of  𝜎!!.   Nevertheless,  it  is  known  that  under  classic  model  assumptions  𝐸[𝑒!!] =  𝜎! 1 − ℎ

!!    where  ℎ!!  is  the  𝑖!!  diagonal  element  of  𝐻 = 𝑋 𝑋!𝑋 !!𝑋′.  As  a  result,   a  small  sample  correction  factor  might  be  needed  when  estimating  the  value  of   𝜎!!.  Therefore,  more  research  is  needed  before  conclusions  about  best  responses   to  heteroskedastic  error  terms  can  be  drawn.    

                         

(18)

7. References  

Bain,  L.  J.,  &  Engelhardt,  M.  (1992).  Introduction  to  probability  and  

mathematical  statistics  (Vol.  4).  Belmont,  CA:  Duxbury  Press.  

van  Garderen,  K.J.,  &  Shah,  C.  (2002).  Exact  interpretation  of  dummy  

variables  in  semilogarithmic  equations.  The  Econometrics  Journal,  5(1),  149-­‐159.   Giles,  D.  E.  (2011).  Interpreting  dummy  variables  in  semi-­‐logarithmic  

regression  models:  Exact  distributional  results  (Working  Paper  No.  1101).  

Department  of  Economics,  University  of  Victoria.  

Heij,  C.    de  Boer,  P.,    Franses,  P.H.,    Kloek,  T.,&  van  Dijk.,  H.K.  (2004)  

Econometric  methods  with  applications  in  business  and  economics  (Vol.  5).  Oxford  

University  Press.    

Immergluck,  D.  (2008).  Out  of  the  goodness  of  their  hearts?  Regulatory  and   regional  impacts  on  bank  investment  in  housing  and  community  development  in   the  United  States.  Journal  of  Urban  Affairs,  30(1),  1-­‐20.  

Kennedy,  P.  E.  (1981).  Estimation  with  correctly  interpreted  dummy  

variables  in  semilogarithmic  equations  [the  interpretation  of  dummy  variables  in   semilogarithmic  equations].  American  Economic  Review,  71(4).  

                           

(19)

Appendix  A  Hypergeometric  functions  

The  hypergeometric  series  !𝐹! 𝑎!, … , 𝑎!, 𝑏!, … 𝑏!; 𝑧  is  defined  as    

 !𝐹! 𝑎!, … , 𝑎!, 𝑏!, … 𝑏!; 𝑧 =   𝑎! !… 𝑎! ! 𝑏! !… 𝑏! ! ! !!! 𝑧! 𝑛!   ,  where   𝑥 !  represents  the  Pochhammer  symbol  defined  as  

𝑥 ! = 1  𝑎𝑛𝑑   𝑥 ! = 𝑥 𝑥 + 1 … (𝑥 + 𝑖 − 1)    

Tricomi’s  hypergeometric  function,  used  in  (2.5),  can  be  defined  in  terms  of   hypergeometric  series  by  

𝐻𝑦𝑝𝑒𝑟𝑔𝑒𝑜𝑚𝑈 𝑎, 𝑏, 𝑧 = 𝜋 csc 𝜋𝑏    !𝐹! 𝑎, 𝑏, 𝑧 Γ 𝑎 + 𝑏 − 1 − 𝑧!!!   !𝐹! 𝑎 − 𝑏 + 1,2 − 𝑏 Γ 𝑎    

Appendix  B  Matlab  computations  

Matlab  code  Monte  Carlo  simulation  homoscedastic  error  terms   tic % start time measure

% Set initail values for main parameters:

nobs = 10; nvar = 4; nreps = 10000; alpha=0.05; p = 0.25; sigma2 = 1/2

% Set initial values for model elements:

b = 2/10*ones(nvar,1); % first true beta's = 0.2

b(nvar) = log(1+p); %dummy coefficient equals log(1+p)

b(1) = 1; % first beta equals 1

normv = norminv(1-1/2*alpha,0,1);% upper normal value for

confidence intervals

tv = tinv(1-1/2*alpha,nobs-nvar); %calculate the upper

t-value s.t. p(t>tw) = 1/2*Alpha

% Create storage for the used arrays:

kenp = ones(nreps,1); %space for Kennedy estimator

bout = zeros(nvar,nreps); % storage for coefficient estimates

y = zeros(nobs,1); % Storage for y values

cl=zeros(nreps,1); cu=zeros(nreps,1); ncu=zeros(nreps,1);

ncl=zeros(nreps,1); %storage for confidence intervals

sum1 = 0; %initialize sum variable

sum2 = 0 ;

%Loop in which one random sample is created and confidence

intervals are constructed

for i = 1:nreps

x = [ones(nobs,1) randn(nobs,nvar-1)]; % random x for

every replication

x(:,nvar) = vertcat(ones(nobs/2,1),zeros(nobs/2,1));

%replace last column of X by dummy variables

evec = sigma2*randn(nobs,1); % random sample of errors

(20)

logy = log(y); % transform y

bout(:,i)= lscov(x,logy); % save ols estimators of beta's

e = logy-x*bout(:,i); % calculate residuals

s2 = (transpose(e)*e)/(nobs-nvar); %calculate s^2

varbout = inv(transpose(x)*x)*s2; %calculate estimated

variance of bout

vardum = varbout(nvar,nvar); %calculate variance of the

estimated dummy coefficient

c = bout(nvar,i); %dummy coefficient

% Excact confidence intervals

cl(i) = exp(c-tv*sqrt(vardum))-1; cu(i) = exp(c+tv*sqrt(vardum))-1;

%count number of times the confidence intervals contain

the true value

%of p

if cl(i)<p && cu(i)>p sum1 = sum1 +1; end

%Normal approximation of the confidence intervals

kenp(i) = exp(bout(nvar,i)-1/2*vardum)-1; %calculate the

kennedy approximation of p

varapp = exp(2*c)*(exp(-vardum)-exp(-2*vardum));

%calculate the approximatly unbiased estimator of the variance of kenp

% normal approximation of the confidence intervals

ncl(i) = kenp(i) - normv*sqrt(varapp); ncu(i) = kenp(i) + normv*sqrt(varapp);

%count number of times the approximated confidence

intervals contains

%the true value of p

if ncl(i)<p && ncu(i)>p sum2 = sum2+1; end

end

conf = [mean(cl) mean(cu)]; %average confidence intervals

nconf = [mean(ncl) mean(ncu)]; %average approximated

confidence intervals

perc = sum1/nreps; % probability confidence interval contains

p

nperc = sum2/nreps; % probability approximated confidence

interval contains p

Eind = [conf perc nconf nperc]; disp(Eind)

toc %end timing

 

Matlab  code  Monte  Carlo  simulation  heteroskedastic  error  terms  using  GLS  

tic;% Start time measure

% Set initail values for main pars:

nobs = 500; nvar = 4; nreps = 10000; alpha=0.05; p = 0.25; sigma1 = 0.3162; sigma2= 0.9487;

(21)

b = 2/10*ones(nvar,1); % first true beta's = 0.2

b(nvar) = log(1+p); %dummy coefficient equals log(1+p)

b(1) = 1; % first beta equals 1

normv = norminv(1-1/2*alpha,0,1);% upper normal value

tv = tinv(1-1/2*alpha,nobs-nvar); %calculate the upper

t-value s.t. p(t>tw) = 1/2*Alpha

% Create storage for the used arrays:

kenp = ones(nreps,1); %space for kennedy estimator

bout = zeros(nvar,nreps); % storage for estimates

y = zeros(nobs,1); % Storage for y values

cl=zeros(nreps,1); cu=zeros(nreps,1); ncu=zeros(nreps,1);

ncl=zeros(nreps,1); %storage for confidence intervals

boutgls = zeros(nvar,nreps); %storage for the gls estimators

sum1 = 0; %initialize sum variable

sum2 = 0 ;

for i = 1:nreps

x = [ones(nobs,1) 1/2*randn(nobs,nvar-1)]; % random x for

every replication

x(:,nvar) = vertcat(ones(nobs/2,1),zeros(nobs/2,1));

%replace last column of X by dummy variables

% random sample

evec =

(sigma1*x(:,nvar)+sigma2*(1-x(:,nvar))).*randn(nobs,1); % heteroskydastic random sample of

errors

y = exp(x*b + evec); % generate true values of y

logy = log(y); % transform y

%GLS

bout(:,i)= lscov(x,logy); % save ols estimators of beta's

e1 = logy-x*bout(:,i); % calculate residuals

z = [ones(nobs,1) x(:,4)]; %regress residuals on dummies

and constants

gam = lscov(z,log(e1.*e1));%estimate variance model

gam(1,1) = gam(1,1)+1.27; % exponential model correction

omega = diag(exp(z*gam)); % estimate omega matrix

boutgls(:,i) =

(transpose(x)*inv(omega)*x)\transpose(x)*inv(omega)*logy;%esti

mate gls coeficients

egls = logy-x*boutgls(:,i); % gls residuals

% estimate variance of bout

s2 = (transpose(egls)*egls)/(nobs-nvar);

varboutgls = inv(transpose(x)*inv(omega)*x); %calculate

estimated variance of bout

vardum = varboutgls(nvar,nvar); %calculate variance of the

estimated dummy coefficient

%calculate the confidence intervals

c = boutgls(nvar,i); %gls coefficient estimate

cl(i) = exp(c-tv*sqrt(vardum))-1; cu(i) = exp(c+tv*sqrt(vardum))-1;

%count number of times the confidence intervals contains

the true value of c

if cl(i)<p && cu(i)>p sum1 = sum1 +1;

(22)

end

%normal approximation of the confidence intervals

kenp(i) = exp(boutgls(nvar,i)-1/2*vardum)-1; %calculate

the kennedy approximation of p

varapp = exp(2*c)*(exp(-vardum)-exp(-2*vardum));

%calculate the approximatly unbiased estimator of the variance of kenp

% calculate the normal approximation of the confidence

interval of p

ncl(i) = kenp(i) - normv*sqrt(varapp); ncu(i) = kenp(i) + normv*sqrt(varapp);

%count number of times the approximated confidence

intervals contains

%the true value of c

if ncl(i)<p && ncu(i)>p sum2 = sum2+1; end

end

conf = [mean(cl) mean(cu)]; %average confidence intervals

nconf = [mean(ncl) mean(ncu)]; %average approximated

confidence intervals

perc = sum1/nreps; % probability confidence interval contains

p

nperc = sum2/nreps; % probability approximated confidence

interval contains p

Eind = [nobs conf perc nconf nperc]; disp(Eind)

toc; %end timing

Matlab  code  Monte  Carlo  simulation  heteroskedastic  error  terms  using  HAC   estimators  

tic; %Start timing

% Set initail values for main pars:

nobs = 500; nvar = 4; nreps = 10000; alpha=0.05; p = 0.25; sigma1 = 0.3162; sigma2= 0.9487; hac = 2;

% Set initial values for model elements:

b = 2/10*ones(nvar,1); % first true beta's = 0.1

b(nvar) = log(1+p); %dummy coefficient equals log(1+p)

b(1) = 1; % first beta equals 1

% Create storage for the used arrays:

kenp = ones(nreps,1); %space for kennedy estimator

bout = zeros(nvar,nreps); % storage for estimates

y = zeros(nobs,1); % Storage for y values

cl=zeros(nreps,1); cu=zeros(nreps,1); ncu=zeros(nreps,1);

ncl=zeros(nreps,1); %storage for confidence intervals

hc = zeros(nobs,1);

normv = norminv(1-1/2*alpha,0,1);% upper normal value

tv = tinv(1-1/2*alpha,nobs-nvar); %calculate the upper

t-value s.t. p(t>tw) = 1/2*Alpha

sum1 = 0; %initialize sum variable

sum2 = 0 ;

(23)

x = [ones(nobs,1) 1/2*randn(nobs,nvar-1)]; % random x for every replication

x(:,nvar) = vertcat(ones(nobs/2,1),zeros(nobs/2,1));

%replace last column of X by dummy variables

% random sample

evec =

(sigma1*x(:,nvar)+sigma2*(1-x(:,nvar))).*randn(nobs,1); % heteroskydastic random sample of

errors

y = exp(x*b + evec); % generate true values of y

logy = log(y); % transform y

%OLS

bout(:,i)= lscov(x,logy); % save ols estimators of beta's

e = logy-x*bout(:,i); % calculate residuals

H = x*inv(transpose(x)*x)*transpose(x);

% estimate variance of bout

e2 = e.*e./((1-diag(H)).^(hac-1)); % calculate squared

residuls

varbout =

inv(transpose(x)*x)*transpose(x)*diag(e2)*x*inv(transpose(x)*x

); %calculate white estimated variance of bout

vardum = varbout(nvar,nvar); %calculate variance of the

estimated dummy coefficient

%calculate the confidence intervals

c = bout(nvar,i); %Ols coefficient estimate

cl(i) = exp(c-tv*sqrt(vardum))-1; cu(i) = exp(c+tv*sqrt(vardum))-1;

%count number of times the confidence intervals contains

the true value of c

if cl(i)<p && cu(i)>p sum1 = sum1 +1; end

kenp(i) = exp(bout(nvar,i)-1/2*vardum)-1; %calculate the

kennedy approximation of p

varapp = exp(2*c)*(exp(-vardum)-exp(-2*vardum));

%calculate the approximatly unbiased estimator of the variance of kenp

% calculate the normal approximation of the confidence

interval of p

ncl(i) = kenp(i) - normv*sqrt(varapp); ncu(i) = kenp(i) + normv*sqrt(varapp);

%count number of times the approximated confidence

intervals contains

%the true value of c

if ncl(i)<p && ncu(i)>p sum2 = sum2+1; end

end

conf = [mean(cl) mean(cu)]; %average confidence intervals

nconf = [mean(ncl) mean(ncu)]; %average approximated

confidence intervals

perc = sum1/nreps; % probability confidence interval contains

(24)

nperc = sum2/nreps; % probability approximated confidence interval contains p

Eind = [nobs conf perc nconf nperc]; disp(Eind)

toc; %end timing

 

s  

Referenties

GERELATEERDE DOCUMENTEN

De vraag is dus nu, wat deze wiskunde zal moeten omvatten. Nu stel ik bij het beantwoorden van deze vraag voorop, dat ik daarbij denk aan de gewone klassikale school.

Alkindi® is op basis van de geldende criteria niet onderling vervangbaar met de andere orale hydrocortisonpreparaten die zijn opgenomen in het GVS als vervangingstherapie

Er werd een significant verschil verwacht tussen de verschilscores van de stoornis-specifieke vragenlijsten en de generieke vragenlijst, waarbij de stoornis-specifieke

The business model of G2G relies heavily creating a unique value proposition by reimagining many facets of a traditional service platform within the context of

Seminar &#34;Modelling of Agricultural and Rural Develpment Policies. Managing water in agriculture for food production and other ecosystem services. Green-Ampt Curve-Number mixed

These systems are highly organised and host the antenna complexes that transfer absorbed light energy to the reaction centre (RC) surrounding them, where the redox reactions

To investigate the effect of landscape heterogeneity on macroinvertebrate diversity, aquatic macroinvertebrate assemblages were compared between water bodies with similar

show high number of zeros.. Figure D2: Total honeybee colony strength characteristics in the six sites in the Mwingi study region, Kenya estimated using Liebefeld methods: a)