• No results found

Strengthening methods of diagnostic accuracy studies - Chapter 1: Overinterpretation and misreporting of diagnostic accuracy studies: Evidence of “spin”

N/A
N/A
Protected

Academic year: 2021

Share "Strengthening methods of diagnostic accuracy studies - Chapter 1: Overinterpretation and misreporting of diagnostic accuracy studies: Evidence of “spin”"

Copied!
22
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Strengthening methods of diagnostic accuracy studies

Ochodo, E.A.

Publication date 2014

Link to publication

Citation for published version (APA):

Ochodo, E. A. (2014). Strengthening methods of diagnostic accuracy studies. Boxpress.

General rights

It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons).

Disclaimer/Complaints regulations

If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You

(2)

 

Chapter  1  

 

 

 

Overinterpretation  and  

misreporting  of  diagnostic  

accuracy  studies:  Evidence  of  

“Spin”.  

                             

Eleanor  A.  Ochodo,  Margriet  C.  de  Haan,  Johannes  B.  Reitsma,  Lotty  Hooft,  

Patrick  M.  Bossuyt,  Mariska  M.G.  Leeflang    

 

Radiology.  2013;  267(2):  581-­‐8  

(3)

Abstract  

Purpose:   To   estimate   the   frequency   of   distorted   presentation   and  

overinterpretation  of  results  in  diagnostic  accuracy  studies.  

 

Materials   and   Methods:   MEDLINE   was   searched   for   diagnostic   accuracy  

studies   published   between   January   and   June   2010   in   journals   with   an   impact   factor  of  4  or  higher.  Articles  included  were  primary  studies  of  the  accuracy  of   one  or  more  tests  in  which  the  results  were  compared  with  a  clinical  reference   standard.   Two   authors   scored   each   article   independently   by   using   a   pretested   data-­‐extraction   form   to   identify   actual   overinterpretation   and   practices   that   facilitate  overinterpretation,  such  as  incomplete  reporting  of  study  methods  or   the  use  of  inappropriate  methods  (potential  overinterpretation).  The  frequency   of  overinterpretation  was  estimated  in  all  studies  and  in  a  subgroup  of  imaging   studies.  

 

Results:   Of   the   126   articles,   39   (31%;   95%   confidence   interval   [CI]:   23,   39)  

contained   a   form   of   actual   overinterpretation,   including   29   (23%;   95%   CI:   16,   30)   with   an   overly   optimistic   abstract,   10   (8%;   96%   CI:   3%,   13%)   with   a   discrepancy  between  the  study  aim  and  conclusion,  and  eight  with  conclusions   based   on   selected   subgroups.   In   our   analysis   of   potential   overinterpretation,   authors  of  89%  (95%  CI:  83%,  94%)  of  the  studies  did  not  include  a  sample  size   calculation,  88%  (95%  CI:  82%,  94%)  did  not  state  a  test  hypothesis,  and  57%   (95%  CI:  48%,  66%)  did  not  report  CIs  of  accuracy  measurements.  In  43%  (95%   CI:  34%,  52%)  of  studies,  authors  were  unclear  about  the  intended  role  of  the   test,   and   in   3%   (95%   CI:   0%,   6%)   they   used   inappropriate   statistical   tests.   A   subgroup  analysis  of  imaging  studies  showed  16  (30%;  95%  CI:  17%,  43%)  and   53   (100%;   95%   CI:   92%,   100%)   contained   forms   of   actual   and   potential   overinterpretation,  respectively.  

 

(4)

                                                                                                                                                     Overinterpretation  of    diagnostic  accuracy  studies        

15

1.1  Introduction  

Reporting   that   distorts   or   misrepresents   study   results   in   order   to   make   the   interventions   look   favourable   may   lead   to   overinterpretation   of   study   results.   Such   reporting   is   also   referred   to   as   ‘spin’(1).   Authors   may   overinterpret     scientific   reports   by   using   exaggerated   language   or   by   presenting   an   overoptimistic   abstract   compared   to   the   main   text,   or   by   drawing   favourable   conclusions   from   results   of   selected   subgroups   (2,3).   Overinterpretation   may   also  be  introduced  by  methodological  shortcomings  such  as  failure  to  specify  a   study  hypothesis,  not  making  a  sample  size  calculation,  or  using  statistical  tests   that  produce  desirable  results  (1,3–6).These  forms  of  misleading  representation   of  the  results  of  scientific  research  may  compromise  decision  making  in  health   care  and  thus  the  well  being  of  patients.  

 

Overinterpretation   has   been   shown   to   be   common   in   randomized   controlled   trials.   Boutron   and   colleagues   described   how   they   had   identified   overinterpretation   in   reports   of   randomized   clinical   trials   with   a   clearly   identified   primary   outcome   showing   statistically   non-­‐significant   results.   More   than  40%  of  the  reports  had  distorted  interpretation  in  at  least  two  sections  of   the  main  text  (7).  

 

Overinterpretation   may   also   play   a   role   in   diagnostic   accuracy   studies.   Such   studies  evaluate  the  ability  of  a  test  or  marker  to  correctly  identify  those  with  the   target   condition.   The   clinical   use   of   tests   based   on   inflated   conclusions   may   trigger   physicians   to   make   incorrect   clinical   decisions,   thereby   compromising   patient   safety.   Exaggerated   conclusions   could   also   lead   to   unnecessary   testing   and  avoidable  health  care  costs  (8).The  purpose  of  this  study  was  to  estimate  the   frequency  of  of  distorted  presentation  and  overinterpretation  of  results  “spin”  in   diagnostic  accuracy  studies.  

 

1.2  Materials  and  Methods  

This   study   was   based   on   a   systematic   search   of   the   literature   for   diagnostic   accuracy  studies  and  the  evaluation  of  included  study  reports.  

(5)

1.2.1.  Literature  search  

Two  authors  (E.O-­‐PhD  student;  2-­‐years  experience  and  M.L-­‐Assistant  professor,   clinical  epidemiology;  8-­‐years  experience)  independently  searched  MEDLINE  in   January   2011for   diagnostic   accuracy   studies   published   between   January   and   June  2010  in  journals  with  an  impact  factor  of  4  or  higher.  We  focused  on  these   journals   because   a   prior   study   found   that   overinterpretation   of   the   clinical   applicability   of   molecular   diagnostic   tests   was   more   likely   in   journals   with   higher  impact  factors  (9).  This  impact  factor  cut-­‐off  was  based  on  a  previously   published  analysis  of  accuracy  studies  (10).  We  limited  our  search  to  retrieve  the  

most  recent  studies  indexed  in  MEDLINE.    

 

The   search   combined   a   previously   validated   search   strategy   for   diagnostic   accuracy   studies   ("sensitivity   AND   specificity.sh"   OR   "specificit*.tw"   OR   "false   negative.tw"  OR  "accuracy.tw"  (where  ".sh"  indicates  subject  heading  and  ".tw"   indicates  text  word)  (11)  and  a  list  of  622  international  standard  serial  numbers   (ISSN)  of  journals  with  an  impact  factor  of  4  or  higher  obtained  from  the  medical   library  of  the  University  of  Amsterdam  (See  Appendix-­‐1).    

 

Eligible   for   inclusion   were   primary   studies   that   evaluated   the   diagnostic   accuracy   of   one   or   more   tests   against   a   clinical   reference   standard.   Excluded   were   non-­‐English   studies,   animal   studies   and   studies   that   did   not   report   any   accuracy  measure.  

 

One  author  (E.O)  independently  identified  potentially  eligible  articles  by  reading   titles   and   abstracts.   To   ensure   that   no   articles   were   missed,   a   second   author     (M.L)   independently   screened   a   random   sample   of   titles   and   abstracts   of   1,000articles  resulting  from  the  total  articles  that  the  search  strategy  yielded.  We   aimed  to  score  a  random  sample  of  the  potentially  eligible  articles  as  outlined  in   the  analysis  and  results  sections.  A  summary  of  the  search  process  is  outlined  in   Figure  1.  

(6)

                                                                                                                                                     Overinterpretation  of    diagnostic  accuracy  studies        

17  

Figure  I.  Flow  chart  of  search  results  

 

1.2.2.  Definition  of  overinterpretation  in  diagnostic  accuracy  studies  

Diagnostic  accuracy  studies  vary  in  study  question,  design,  type  of  test  evaluated   and   number   of   tests   evaluated   (12,13).   We   aimed   to   use   a   definition   of  

overinterpretation  based  on  common  features  that  could  apply  to  a  wide  range   of  tests.    

 

We   defined   overinterpretation   as   any   reporting   of   diagnostic   accuracy   studies   that   makes   tests   look   more   favorable   than   the   results   justify.   We   further   distinguished   between   actual   and   potential   forms   of   overinterpretation.   We   defined  actual  overinterpretation  as  explicit  overinterpretation  of  study  results  

6978  initial  studies  identified  in   MEDLINE  

6558  ineligible  articles   excluded  after  screening  titles  

&  abstracts   420  potentially  eligible  articles  

140  articles  randomly  selected  from   pool  of  potentially  eligible  articles  

14  articles  excluded  after  reading   full  texts    

Reasons  for  exclusion  (n)  

Not  accuracy  study  (8)  

No  accuracy  measures  reported   (1)  

Animal  study  (1)  

Studies  evaluating  laboratory   strains  (1)  

Studies  evaluating  analytical   sensitivity  /specificity  (2)   Article  published  in  2011  (1)  

 

(7)

and   potential   overinterpretation   as   practices   that   facilitate   overinterpretation   such  as  incompletely  reporting  applied  study  methods  and  assumptions  or  using   inappropriate   methods.   Incomplete   reporting   of   data   may   hinder   objective   appraisal   of   a   paper   and   mislead   readers   into   thinking   that   tests   are   favorable   (3,4,14).  

 

This   definition   of   overinterpretation   was   based   on   items   extracted   from   published  literature  on  spin  (1–3,5,6),  the  Standards  for  Reporting  of  Diagnostic   Accuracy   (STARD)   (8)  and   experience   of   content   experts   in   the   team.     We   first  

listed   the   potential   items   that   could   introduce   overinterpretation   in   diagnostic   accuracy   studies,   based   on   the   experience   of   content   experts   in   the   team.   We   then   searched   MEDLINE   for   key   literature   that   had   been   published   on   poor   reporting   and   interpretation   of   scientific   reports   up   to   January   2011.   We   identified   key   articles   on   misrepresentation   of   study   findings   in   randomized   control   trials   (7),   molecular   diagnostic   tests   (9),   and   in   scientific   research   generally   (1–3,5,6,15,16).     From   these   articles,   we   extracted   a   list   of   potential   items  that  could  identify  overinterpretation  in  diagnostic  accuracy  studies.    

We  then  designed  and  pre-­‐tested  a  data  collection  form  with  the  potential  list  of   items   that   may   introduce   overinterpretation   on   10   diagnostic   accuracy   studies   published  in  2011,  which  were  independently  evaluated  by  five  authors.  These   studies   were   not   included   in   the   final   analysis.     This   process   of   identifying   overinterpretation  is  outlined  in  figure  2.  

         

(8)

                                                                                                                                                           Overinterpretation  of  diagnostic  accuracy  studies                                                                                                                                                                                                                                                                                                                                                

Figure  2:  Flow  chart  of  the  development  of  the  definition  of  overinterpretation   “spin”  in  diagnostic  accuracy  studies  

*In  diagnostic  tests,  p-­‐values  can  be  used  to  compare  the  accuracy  measures  of  an  index  test  with  a  pre-­‐ specified  value  or  to  compare  the  accuracy  of  multiple  tests.  In  this  case  overinterpretation  can  occur  when   tests  with  non-­‐significant  differences  are  reported  as  significant,  or  when  setting  a  low  pre-­‐specified  value   to  compare  against  or  a  poor  comparator  test,  and  then  show  nicely  statistically  significant  results.   However,  scoring  this  was  difficult  as  this  depends  on  the  pre-­‐specified  value  (which  was  rarely  stated  in   our  sample  of  tests)  and  the  comparator    

Potential  list  of  spin  items  

(From  content  experts  C,  

STARD  [4,  8]  &  published  

literature  L)  

 

-­‐Study  hypothesis  not  stated  C,  [4,   8]  

-­‐Sample  size  calculation  not   stated  C,  [3,  4,  6,  8]  

-­‐Study  design:  case  control   study,  non-­‐consecutive  or  non   random  sampling  C,  [4,  8,  14]  

-­‐Subgroups  not  pre-­‐specified  C,   [18]  

-­‐Selective  reporting  of   subgroups  C,  [1]  

-­‐Thresholds  of  continuous  test   not  pre-­‐specified  C,  [4,  8]  

-­‐Confidence  intervals  not   reported  C,  [4,  8]  

-­‐Favorable  recommendations       not  reflecting  reported  accuracy   measures  C,  [14]    

-­‐Recommendations  based  on   other  test  characteristics  (e.g.   cost,  shelf  life,  storage,  ease  of   use,  user  acceptability,  adverse   events  8)  C  

-­‐Overoptimistic  titleC  

-­‐Over  interpretation  of  p-­‐values  

[1,  7]    

-­‐Over-­‐interpretation  of  clinical   applicability  of  tests  [14]    

 

2nd  list    

-­‐Role  of  test  under   evaluation  not  stated   -­‐Study  hypothesis  not   stated  

-­‐Sample  size  

calculation  not  stated   -­‐Subgroups  not  pre-­‐ specified  

-­‐Study  conclusions   based  on  selective   subgroups   -­‐Thresholds  of   continuous  test  not   pre-­‐specified     -­‐Confidence  intervals   not  reported  

 -­‐Discrepancy  between   aim  and  conclusion   studying  main  text   -­‐Discrepancy  between   abstract  and  main  text   (overoptimistic  results   in  abstract)     Final  list     Active  spin   -­‐Discrepancy  between   abstract  and  main  text   (overoptimistic  results  in   abstract)  

-­‐Study  conclusions  based   on  selective  subgroups   -­‐Discrepancy  between  aim   and  conclusion  studying   main  text  

Passive  spin  

-­‐  Role  of  test  under   evaluation  not  stated   -­‐Study  hypothesis  not   stated  

-­‐Sample  size  calculation  not   stated  

-­‐Subgroups  not  pre-­‐ specified  

-­‐Thresholds  of  continuous   tests  not  pre-­‐specified   -­‐Confidence  intervals  not   reported  

-­‐Inappropriate  statistical   tests    

 

Items  excluded  after  pretesting  of  data  collection   form  (reasons)  

-­‐Overoptimistic  title  (diagnostic  accuracy  articles  had   neutral  ways  of  writing  titles)  

-­‐  Study  design:  case  control  study,  non-­‐consecutive  or   non  random  sampling    

-­‐Favorable  recommendations  not  reflecting  accuracy   measures  (difficult  to  standardize  due  to  wide  range  of   tests)    

-­‐Recommendations  based  on  other  test  characteristics   (difficult  to  score  due  to  varied  reporting)  

 

Extra  items  included  after   data  extraction  of  eligible   articles    

-­‐Inappropriate  statistical   tests  (item  was  rescored  by   2  authors)  

 

Extra  items  included   after  pretesting  data   collection  form  

-­‐Role  of  test  under   evaluation  not  stated   -­‐Discrepancy  between   aim  and  conclusion   studying  main  text  

Items  excluded  (reasons)  

-­‐  *Over  interpretation  of  p-­‐ values  (difficult  to  score  as  

many  diagnostic  studies  don’t   report  p-­‐values)  

-­‐Over-­‐interpretation  of  clinical   applicability  of  tests  (this  is   contextual  and  can’t  be   objectively  scored  when   evaluating  a  wide  range  of   tests)  

(9)

  *In  intervention  studies,  how  p-­‐values  are  interpreted  has  been  used  to  evaluate  overinterpretation.7In  diagnostic  tests,  p-­‐values  can  be  used  to  compare  the  accuracy  measures  of  an  index  test  with  a  pre-­‐specified  value  or  to  compare  the  accuracy  of  multiple  t*overinterpretation   1.2.3  Data  Extraction  

 

From  the  included  articles  we  extracted  data  on  study  characteristics  and  items   that  can  introduce  overinterpretation,  using  the  pre-­‐tested  data-­‐extraction  form   (See  Appendix-­‐2).  We  looked  for  items  that  can  introduce  overinterpretation  in   the  abstract  (with  special  focus  on  the  results  and  conclusion  section),  and  main   text  (introduction,  methods,  results  and  conclusion  sections).  

The  actual  forms  of  overinterpretation  that  we  extracted  included:  

An   overoptimistic   abstract.   This   was   considered   to   be   present   when   the   abstract  only  reported  the  best  results  while  the  main  text  had  an  array  of   results;   or   when   stronger   test   recommendations   or   conclusions   were   reported   in   the   abstract   compared   to   the   main   text.   For   the   latter,   we   evaluated  the  language  that  was  used  to  make  the  recommendations.      If   the   authors   used   affirmative   language   in   the   abstract   while   they   used   conditional   language   in   the   main   text   to   make   recommendations   or   conclusions,   we   scored   this   as   a   stronger   abstract.   Affirmative   language   included   words   such   as   ‘is   definitely’,   ‘should   be’,   ‘excellent   for   use’,   ‘strongly   recommended’;   conditional   language   included   words   such   as   ‘appears  to  be’,  ‘may  be’,  ‘could  be’,  and  ‘probably  should’.  

Favorable   study   conclusions   or   test   recommendations   based   on   selected  

subgroups.  We  scored  this  as  overinterpretation  when  multiple  subgroups  

were   reported   in   the   methods   or   results   section,   while   the   recommendations   were   only   based   on   one   of   these   subgroups.   Caution   must   be   employed   when   analyzing   results   of   subgroup   analysis   as   they   may   have   a   high   false   positive   rate   due   to   the   effect   of   multiple   testing   (17,18).  

Discrepancy  between  study  aim  and  conclusion.  A   conclusion   that   does   not   reflect  the  aim  of  the  report  may  be  indicative  of  flawed  results  (3,5).We   evaluated  the  main  text  of  the  article  to  note  if  the  conclusion  was  in  line   with  the  specified  aim  of  the  study.  

(10)

                                                                                                                                                     Overinterpretation  of    diagnostic  accuracy  studies        

21

Not  stating  a  test  hypothesis.  We   evaluated   whether   a   specific   statistical  

test   hypothesis   was   stated   like   one   test   being   superior   over   another,   or   whether   a   specific   measure   of   diagnostic   accuracy   surpasses   a   pre-­‐ specified   value.   The   minimally   acceptable   accuracy   values   (null   hypothesis)  and  anticipated  performance  values  (alternative  hypothesis)   of   a   test   under   evaluation   depend   on   the   clinical   context.   These   performance   values   can   be   obtained   from   pilot   data,   prior   studies   or   in   cases  of  novel  studies,  from  experts  who  may  give  estimates  of  clinically   relevant   performance   values.   A   priori   specifying   a   hypothesis   limits   the   chances  of  post  hoc  or  subjective  judgment  about  the  test’s  accuracy  and   intended  role  (4,19).  The  anticipated  or  desirable  performance  measures  

also  guide  sample  size  calculations.  

Not   reporting   a   sample   size   calculation.   We   assessed   if   the   sample   size  

required   to   give   a   study   the   power   to   estimate   test   accuracy   with   sufficient  precision  and  the  method  used  to  calculate  the  sample  size  were   reported.   By   failing   to   report   the   sample   size   calculation   used   at   the   outset   of   the   study,   readers   cannot   know   if   the   sample   size   used   in   the   study   was   sufficient   to   estimate   the   accuracy   measures   with   enough   precision  (19).  

Not   stating   or   unclearly   stating   the   intended   role   of   the   test   under   evaluation.  We  assessed  whether  the  role  of  the  test  was  clearly  defined  in  

the   main   text.   Before   evaluating   and   recommending   a   test   its   intended   role   ought   to   be   clearly   defined.   A   new   test   may   be   used   to   replace   the   existing  test,  or  may  be  used  before  (triage)  or  after  (add-­‐on)  an  existing   test  (20,21).The  preferred  accuracy  value  of  a  test  depends  on  its  role.    

Not   pre-­‐specifying   groups   for   subgroup   analysis   a   priori   in   the   methods   section.     We   assessed   whether   the   subgroups   presented   in   the   results  

section  were  pre-­‐specified  at  the  start  of  the  study.  Failure  to  pre-­‐specify   subgroups  can  lead  to  post-­‐hoc  analyses  motivated  by  initial  inspection  of   data   and   may   give   room   for   manipulation   of   results   to   look   favourable   (17,18,22).  

Not   pre-­‐specifying   positivity   thresholds   of   tests.   For   continuous   tests,   we  

(11)

either  considered  positive  or  negative  was  pre-­‐specified  before  the  start   of   the   study.   Stating   a   threshold   value   after   data   collection   and   analysis   may   give   room   for   manipulation   to   maximize   a   test   characteristic   (4,23,24).    

Not  stating  confidence  intervals  of  accuracy  measures.  We   assessed   if   the  

confidence  intervals  of  the  accuracy  estimates  were  reported.  Confidence   intervals   enable   readers   to   appreciate   the   precision   of   the   accuracy   measures.   Without   these   data,   it   is   difficult   to   assess   the   clinical   applicability  of  the  tests  (25–27).  

Using   inappropriate   statistical   tests.   Here   we   evaluated   the   tests   of  

significance  used  to  compare  the  accuracy  measures  of  the  index  test  and   reference   standard   or   those   that   compared   the   accuracy   measures   of   multiple  tests.  In  diagnostic  accuracy  studies,  the  test  of  significance  to  be   used  depends  on  the  role  of  test  under  evaluation  and  whether  the  tests   are   performed   in   different   groups   of   patients   (unpaired   design)   or   the   same  patients  (paired  design)  (28).  

 

Two   authors   scored   each   article   independently.   The   abstract   and   main   text   sections  (Introduction,  Methods,  Results,  discussion  and  conclusion)  were  read.   One   author   (E.O)   scored   all   the   selected   articles.   The   other   five   authors   (M.H-­‐ Resident   in   Radiology;   4-­‐years   experience),   (JBR-­‐Associate   Professor,   clinical   epidemiology;   18-­‐years   experience),   (L.H-­‐Assistant   professor,   clinical   epidemiology;  13-­‐years  experience),  (P.B-­‐Professor  of  clinical  epidemiology;  25-­‐ years   experience)   and   (M.L)   scored   the   same   articles   in   predetermined   proportions.  Disagreement  was  resolved  by  consensus  or  by  a  third  party  when   needed.  

   

1.2.4.  Analysis  

(12)

                                                                                                                                                     Overinterpretation  of    diagnostic  accuracy  studies        

23

a  two-­‐sided  95%  confidence  interval  that  extends  from  30%  to  50%  around  the   sample  proportion,  using  the  exact  (Clopper-­‐Pearson)  method  (29,30).  

We   analyzed   all   the   included   studies   and   the   subset   of   imaging   studies   to   estimate   the   frequencies   of   actual   and   potential   overinterpretation   using   SAS   version  9.2  (SAS  Institute  Inc,  Cary,  North  Carolina).    

 

1.3  Results  

1.3.1  Search  results  

Our   initial   search   yielded   6,978   articles.   After   reading   the   titles   and   abstracts   6,558  articles  were  deemed  ineligible.    Of  the  remaining  420  potentially  eligible   articles,  we  randomly  selected  140  articles  for  evaluation  using  STATA  version   10.0  with  the  code  “sample  140,  count”.  We  sampled  more  articles  than  indicated   by   our   sample   size   calculation   to   make   up   for   any   false   positive   articles   that   would  be  present  in  the  random  selection.  After  assessing  the  full  texts  of  these   140  articles,  14  studies  were  excluded  (See  Figure  1).    In  total,  126  studies  were   included  in  the  final  analysis.    

 

1.3.2  Characteristics  of  included  articles  

Details   of   the   study   characteristics   are   outlined   in   Table   1.   In   summary,   the   median  impact  factor  of  all  the  included  articles  was  5.5  (range  4.0  to  16.2)  and   the  median  sample  size  was  151  (range  12  to  20,765).  Of  all  the  tests  evaluated   in   the   included   articles,   imaging   tests   formed   the   largest   group   (n=53/126,   42%).                        

(13)

Table  1.  Summary  of  study  characteristics  

Characteristic   Number  (%  [95%  Cl]  )  

 N=126  

Journal  Impact  factor,  median  [range]   5.491  [4.014-­‐16.225]  

Sample  size,  median  [range]   150.5  [12  -­‐  20765]  

   

Type  of  test  evaluated      

  Clinical  diagnosis  (history/physical  

examination)   13  (10)  [5-­‐16]       Imaging  test   53  (42)  [33-­‐51]     Biochemical  test   13  (10)  [5-­‐16]     Molecular  test   14  (11)  [6-­‐17]     Immunodiagnostic  test   14  (11)  [6-­‐17]     Other   19  (15)  [9-­‐21]   Study  design         Single  test   64  (51)  [42-­‐60]     Comparator  test   62  (49)  [40-­‐58]   Study  design         Cross  sectional   102  (81)  [74-­‐88]  

  Longitudinal  (with  follow  up)   17  (13)  [7-­‐19]  

  Diagnostic  randomized  design   1  (1)  [0-­‐2]  

  Case  control   2  (2)  [0-­‐4]     Unclear   5  (3)  [0-­‐6]   Sampling  method         Consecutive  series   60  (48)  [39-­‐56]     Random  series   6  (5)  [1-­‐8]     Convenience  sampling   23  (18)  [11-­‐25]  

  Multistage  stratified  sampling   1  (1)  [0-­‐2]  

  Unclear   36  (28)  [21-­‐37]  

Number  of  groups   patients  are  sampled   from         One  group   87  (69)  [61-­‐77]     Different  groups   30  (24)  [16-­‐31]     Unclear   9  (7)  [3-­‐12]     1.3.3  Agreement  

The   inter-­‐rater   agreements   for   scoring   both   actual   and   potential   overinterpretation  are  outlined  in  Appendix  3.    

 

1.3.4  Actual  overinterpretation  

Of   126   included   articles,   39(31%[95%   Cl   23-­‐39])   contained   a   form   of   actual   overinterpretation  (Table  2).  The  most  frequent  form  of  overinterpretation  was   an  overoptimistic  abstract  (n=29/126,  23%  [95%  Cl  16-­‐30).  Of  these,  22  articles   (17%[95%  Cl  11-­‐24)  had  stronger  test  recommendations  or  conclusions  in  the  

(14)

                                                                                                                                                     Overinterpretation  of    diagnostic  accuracy  studies        

25

Of   the   included   imaging   studies   (n=53),   16   articles   (30%   [95%   Cl   17-­‐43])   contained   a   form   of   actual   overinterpretation   (Table   2).   Similar   to   the   overall   studies,   the   most   frequent   form   of   actual   overinterpretation   was   an   overoptimistic  abstract  (n=13/53,  25%  [95%  Cl  13-­‐37).  

 

Table  2.  Actual  overinterpretation  in  diagnostic  accuracy  studies    

Forms  of  actual  overinterpretationª   All  studies   Imaging   studies   No.     (%  [95%   Cl])     N=126   No.     (%[95%   Cl])   N=53     Overoptimistic  abstract     29  (23)   [16-­‐30]   13  (25)   [13-­‐37]           Stronger   recommendations  or   conclusions  in  the   abstract  

22  (17)  

[11-­‐24]   9  (17)  [9-­‐30]  

  Selective  reporting  of  

results  in  the  abstract   7  (6)  [2-­‐10]   4  (8)  [3-­‐20]  

     

Study  conclusions  based  

on  selected  subgroups     8/80

b  (10)  

[5-­‐19]   3/41

c  (7)  

[2-­‐21]  

     

Presence  of  discrepancy   between  aim  and  

conclusion  

  10  (8)  [3-­‐

13]   2  (4)  [0-­‐9]  

     

Overall  proportion  of   articles  with  one  or   more  forms  of  

actualoverinterpretation  

  39  (31)  

[23-­‐39]   16  (30)    [17-­‐43]  

aOne  study  may  fall  under  multiple  categories  

bEighty  articles  out  of  the  overall  126  articles  analyzed  subgroups   cForty  one  articles  out  of  the  53  imaging    articles  analyzed  subgroups  

 

1.3.5  Potential  overinterpretation  

Details  of  the  potential  forms  of  overinterpretation  are  outlined  in  Table  3.  Of  the   126  included  articles,  only  14  (11%)  reported  a  sample  size  calculation.  Only  15   articles  (12%)  reported  an  explicit  study  hypothesis.    All  imaging  studies  (n=53)   contained   a   form   of   potential   overinterpretation.   Of   these,   only   5   included  

(15)

articles  (9%)  reported  a  sample  size  calculation.  Only  6  articles  (11%)  reported   an   explicit   study   hypothesis.   Examples   of   overinterpretation   are   provided   in   Table  4.  

   

Table  3.  Potential  overinterpretation  in  diagnostic  accuracy  studies  

 

Forms  of  potential  overinterpretation  ª   All  studies   Imaging  studies  

Total  number  with   potential  spin     (%  [95%  Cl])  N=126  

Total  number  with   potential  spin     (%  [95%  Cl])  N=53   Sample  size  calculation  not  reported   112  (89)  [83-­‐94]   48  (91)  [82-­‐99]  

     

Test  hypothesis  not  stated   111  (88)  [82-­‐94]   47  (89)  [80-­‐98]  

     

Confidence  interval  of  accuracy  measures  

not  reported   72  (57)  [48-­‐66]   26  (49)  [35-­‐63]  

     

Role  of  test  not  stated  or  unclear   54  (43)  [34-­‐52]   17  (32)  [19-­‐45]  

     

Groups  for  sub-­‐group  analysis  not  pre-­‐ specified  in  the  methods  section  of  main   text  

25/80b  (31)  [21-­‐42]   8/41c  (20)  [8-­‐36]  

     

Positivity  thresholds  of  continuous  tests  

not  reported   22/63

d  (35)  [24-­‐48]   6/25e  (24)  [10-­‐45]  

     

Use  of  inappropriate  statistical  tests   4  (3)  [0-­‐6]   3  (6)  [0-­‐12]  

     

Overall  proportion  of  articles  with  one  or   more  forms  of  potential  

overinterpretation  

125  (99)  [98-­‐100]   53  (100)  [92-­‐100]  

 

aOne  study  may  fall  under  multiple  categories  

bEighty  articles  out  of  the  overall  126  articles  analyzed  subgroups   cForty  one  articles  out  of  the  53  imaging    articles  analyzed  subgroups  

dSixty-­‐three  articles  out  of  the  overall  126  articles  evaluated  continuous  tests  

 

eTwenty-­‐five  articles  out  of  the  53  imaging    articles  evaluated  continuous  tests  

     

(16)

                                                                                                                                                     Overinterpretation  of    diagnostic  accuracy  studies        

27

 

Table  4.  Examples  of  actual  overinterpretation  

 

1.  An  abstract  with  a  stronger  conclusion  (31):    

Conclusion  in  Main  text:   “Detection  of  antigen  in  BAL  using  the  Mvista  antigen  appears  to  be   a  useful  method  (…)  Additional  studies  are  needed  in  patients  with   pulmonary  histoplasmosis.”  

Conclusion  in  Abstract:   “Detection  of  antigen  in  BAL  fluid  complements  antigen  detection   in  serum  and  urine  as  an  objective  test  for  histoplasmosis”    

2.  Conclusions  drawn  from  selected  subgroups  (32):    

A  study  evaluates  the  aptness  of  F-­‐desmethoxyfallypride (F-­‐DMFP)  PET  for  the  differential   diagnosis  of  Idiopathic  Parkinsonian  Syndrome  (IPS)  and  non-­‐IPS  in  a  series  of  81  patients  with  a   clinical  diagnosis  of  Parkinsonism.  The  authors  compared  several  F-­‐DMFP  PET  indices  for  the   discrimination  of  IPS  and  non-­‐IPS  and  reported  only  the  best  sensitivity  and  specificity  estimates.   They  concluded  that  F-­‐DMFP  PET  was  an  accurate  method  for  differential  diagnosis.    

 

3.  Disconnect  between  the  aim  and  conclusion  of  the  study  (33):    

The  study  design  described  in  this  paper  aimed  to  evaluate  the  sensitivity  and  specificity  of  the  IgM   anti-­‐EV71  assay.  However  the  conclusion  is  not  on  accuracy  rather  it  focuses  on  other  measures  of   diagnostic  performance.  

Aim  of  study:   “The  aim  of  this  study  was  to  assess  the  performance  of  detecting   IgM  anti-­‐EV71  for  early  diagnosis  of  patients  with  HFMD”.  

Conclusion:   “The  data  here  presented  show  that  the  detection  of  IgM  anti-­‐EV71   by  ELISA  affords  a  reliable  convenient  and  prompt  diagnosis  of   EV71.  The  whole  assay  takes  90  mins  using  readily  available  ELISA   equipment,  is  easy  to  perform  with  low  cost  which  make  it  suitable   in  clinical  diagnosis  as  well  as  in  public  health  utility”.  

 

Abbreviations  

BAL  Bronchoalveolar  lavage;  PET  Positron  Emission  Tomography;  IgM  Immunoglobulin  M:   EV71  Enterovirus  71;  HFMD  Hand  foot  and  Mouth  Disease;  ELISA  Enzyme  Linked  

Immunosorbent  Assay    

1.4  Discussion  

Our  study  shows  that  about  three  out  of  ten  studies  of  the  diagnostic  accuracy  of   biomarkers  or  other  medical  tests  published  in  journals  with  an  impact  factor  of   4  or  higher  overinterpret  their  results  and  99%  of  studies  contain  practices  that  

(17)

facilitate   overinterpretation.   The   most   common   form   of   actual   overinterpretation  is  an  overoptimistic  abstract  (about  one  in  four)  specifically   reporting   stronger   conclusions   or   test   recommendations   in   the   abstract   compared  to  the  main  text  (about  one  in  five).  In  terms  of  practices  that  facilitate   overinterpretation,  “potential  overinterpretation”,  the  majority  of  studies  failed   to  report  an  a  priori  formulated  test  hypothesis,  did  not  include  a  corresponding   sample   size   calculation,   and   did   not   include   confidence   intervals   for   accuracy   measures.    

 

A   closely   related   study   by   Lumbreras   and   colleagues   evaluated   overinterpretation   the   clinical   applicability   of   molecular   diagnostic   tests   only.9  

Of   the   108   evaluated   articles,   61   (56%)   had   overinterpreted   the   clinical   applicability  of  the  molecular  test  under  study.  

 

A   defining   strength   of   our   study   is   that   we   analyzed   a   sample   of   diagnostic   accuracy  studies  evaluating  a  wide  range  of  tests  and  defined  overinterpretation   in   terms   of   common   features   that   apply   to   most   tests.   To   limit   subjectivity   somewhat,   we   systematically   searched   for   diagnostic   studies   with   a   validated   search  strategy.  Scoring  of  the  articles  was  done  by  two  authors  independently   using  a  pretested  data-­‐extraction  form.    

 

The   forms   of   overinterpretation   that   we   found   in   our   study   may   have   several   implications   for   diagnostic   research   and   practice.     One   of   the   most   important   consequences   might   be   that   diagnostic   accuracy   studies   with   optimistic   conclusions  may  be  highly  cited  leading  to  a  cascade  of  inflated  and  questionable   evidence   in   the   literature.   Subsequently,   this   may   translate   to   the   premature   adoption  of  tests  into  clinical  practice.  A  recently  published  review  by  Ioannidis   and  Panagiotou  reported  that  highly  cited  biomarker  studies  often  had  inflated   results.  Of  the  included  highly  cited  studies  in  their  review,  86%  had  larger  effect   sizes   than   in   the   largest   study   and   83%   had   larger   effect   sizes   than   in  

(18)

                                                                                                                                                     Overinterpretation  of    diagnostic  accuracy  studies        

29

Our   study   was   largely   limited   to   what   was   reported.   For   instance,   there   is   no   guarantee  that  because  subgroups  or  thresholds  were  listed  in  the  methods,  they   were   indeed   pre-­‐specified.   An   alternative   would   be   to   look   at   study   protocols,   but  unlike  in  randomized  trials  protocols  of  diagnostic  accuracy  studies  are  not   always  registered.  Another  limitation  of  our  study  was  the  considerable  variation   in   the   inter-­‐rater   agreement   for   scoring.   The   overall   scoring   of   articles   was   difficult   as   many   articles   suffered   from   poor   reporting.   Many   authors   had   not   employed  the  Standards  for  Reporting  of  Diagnostic  Accuracy  Studies  (STARD)   guidelines  in  preparing  their  manuscripts.  The  suboptimal  use  of  STARD  has  also   been  documented  in  previous  reports  (36–41).  

 

Comprehensively  evaluating  overinterpretation  in  diagnostic  studies  depends  on   the  context  in  which  the  test  is  used.  For  instance,  overinterpretation  can  occur   when  positive  recommendations  are  made  for  the  clinical  use  of  tests  even  if  the   accuracy  measures  do  not  justify  this.  Due  to  the  wide  range  of  tests  evaluated  in   our  study,  it  was  difficult  to  come  up  with  a  standard  cutoff  measure  to  define   low   and   high   accuracy   measures.   Preferred   accuracy   measures   differ   and   are   dependent  on  the  type  of  test  used,  the  role  of  the  test,  the  target  condition  and   the   setting   in   which   the   test   is   being   evaluated   and   on   the   accuracy   of   other   methods   available.   A   sensitivity   of   80%   may   be   ‘definitely   useful’   in   one   situation,  while  it  may  be  useless  in  another  situation.  

 

Additionally,  not  reporting  confidence  intervals  may  be  regarded  as  either  actual   or  potential  overinterpretation  depending  on  the  context.  For  instance,  reporting   very  high  point  estimates  such  as  99%  without  confidence  intervals  based  on  a   small  sample  size  such  as  10  cases  may  be  regarded  as  actual  overinterpretation.   On   the   other   hand,   not   reporting   confidence   intervals   in   cases   of   moderate   or   high   estimates   with   large   sample   sizes   or   in   comparative   evaluations   where   a   trial   compares   two   tests   and   one   is   statistically   superior,   can   be   regarded   as   potential  overinterpretation.    

 

To   curb   the   occurrence   of   overinterpretation   and   misreporting   of   results   in   diagnostic   accuracy   studies   we   recommend   that   journals   continuously  

(19)

emphasize   that   accuracy   manuscripts   submitted   to   them   must   be   reported   according   to   the   STARD   reporting   guidelines.   This   may   also   diminish   the   methodological  conditions  that  may  lead  to  overinterpretation.  Readers  largely   depend  on  abstracts  to  draw  conclusions  about  an  article  and  sometimes  when   full   texts   are   not   available,   decisions   may   be   made   on   abstracts   alone   (7,42,43).Hence  reviewers  need  to  be  more  stringent  when  reading  abstracts  of   submitted   manuscripts   to   ensure   that   the   abstracts   are   fair   representations   of   the   main   texts.   We   hope   that   highlighting   the   forms   of   overinterpretation   will   enable   peer   reviewers   correctly   sieve   overoptimistic   reports   of   diagnostic   accuracy   studies   and   encourage   investigators   to   be   clearer   in   designing,   more   transparent   in   reporting,   and   more   stringent   in   interpreting   test   accuracy   studies.  

 

References  

1.     Fletcher   RH,   Black   B.   “Spin”   in   scientific   writing:   scientific   mischief   and   legal  jeopardy.  Med.  Law  .  2007;26(3):511–25.    

2.     Horton  R.  The  rhetoric  of  research.  BMJ  .  1995;310(6985):985–7.    

3.     Marco  CA,  Larkin  GL.  Research  ethics:  ethical  issues  of  data  reporting  and   the  quest  for  authenticity.  Acad.  Emerg.  Med.  2000;7(6):691–4.  

4.     Bossuyt  PM,  Reitsma  JB,  Bruns  DE,  Gatsonis  CA,  Glasziou  PP,  Irwig  LM,  et   al.   The   STARD   statement   for   reporting   studies   of   diagnostic   accuracy:   explanation  and  elaboration.  Ann.  Intern.  Med.  2003;138(1):W1–12.     5.     Zinsmeister   AR,   Connor   JT.   Ten   common   statistical   errors   and   how   to  

avoid  them.  Am.  J.  Gastroenterol.  2008  ;103(2):262–6.    

6.     Scott   IA,   Greenberg   PB,   Poole   PJ.   Cautionary   tales   in   the   clinical   interpretation   of   studies   of   diagnostic   tests.   Intern.   Med.   J.   2008;38(2):120–9.    

7.     Boutron  I,  Dutton  S,  Ravaud  P,  Altman  DG.  Reporting  and  interpretation  of   randomized   controlled   trials   with   statistically   nonsignificant   results   for   primary  outcomes.  JAMA.  2010  ;303(20):2058–64.    

(20)

                                                                                                                                                     Overinterpretation  of    diagnostic  accuracy  studies        

31

9.     Lumbreras   B,   Parker   LA,   Porta   M,   Pollán   M,   Ioannidis   JPA,   Hernández-­‐ Aguado   I.   Overinterpretation   of   clinical   applicability   in   molecular   diagnostic  research.  Clin.  Chem.  2009;55(4):786–94.    

10.     Smidt   N,   Rutjes   AWS,   van   der   Windt   DAWM,   Ostelo   RWJG,   Reitsma   JB,   Bossuyt   PM,   et   al.   Quality   of   reporting   of   diagnostic   accuracy   studies.   Radiology.  2005;235(2):347–53.    

11.     Devillé   WL,   Bezemer   PD,   Bouter   LM.   Publications   on   diagnostic   test   evaluation  in  family  medicine  journals:  an  optimal  search  strategy.  J.  Clin.   Epidemiol.  2000;53(1):65–9.    

12.     Knottnerus   JA,   Muris   JW.   Assessment   of   the   accuracy   of   diagnostic   tests:   the  cross-­‐sectional  study.  J.  Clin.  Epidemiol.  2003  ;56(11):1118–28.    

13.     Irwig   L,   Bossuyt   P,   Glasziou   P,   Gatsonis   C,   Lijmer   J.   Designing   studies   to   ensure   that   estimates   of   test   accuracy   are   transferable.   BMJ.   2002   ;324(7338):669–71.    

14.     Chalmers   I.   Underreporting   research   is   scientific   misconduct.   JAMA.   1990;263(10):1405–8.    

15.     Ioannidis  JPA.  Why  most  published  research  findings  are  false.  PLoS  Med.   [Internet].   2005   Aug   [cited   2014   Apr   28];2(8):e124.   Available   from:   http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1182327&to ol=pmcentrez&rendertype=abstract  

16.     Young  NS,  Ioannidis  JPA,  Al-­‐Ubaydli  O.  Why  current  publication  practices   may  distort  science.  PLoS  Med.  2008;5(10):e201.    

17.     Wang   R,   Lagakos   SW,   Ware   JH,   Hunter   DJ,   Drazen   JM.   Statistics   in   medicine-­‐-­‐reporting  of  subgroup  analyses  in  clinical  trials.  N.  Engl.  J.  Med.   2007;357(21):2189–94.    

18.     Lagakos   SW.   The   challenge   of   subgroup   analyses-­‐-­‐reporting   without   distorting.  N.  Engl.  J.  Med.  2006;354(16):1667–9.    

19.     Pepe  MS,  Feng  Z,  Janes  H,  Bossuyt  PM,  Potter  JD.  Pivotal  evaluation  of  the   accuracy  of  a  biomarker  used  for  classification  or  prediction:  standards  for   study  design.  J.  Natl.  Cancer  Inst.  2008;100(20):1432–8.    

20.     Fritz   JM,   Wainner   RS.   Examining   diagnostic   tests:   an   evidence-­‐based   perspective.  Phys.  Ther.  2001  Sep;81(9):1546–64.    

21.     Bossuyt  PM,  Irwig  L,  Craig  J,  Glasziou  P.  Comparative  accuracy:  assessing   new   tests   against   existing   diagnostic   pathways.   BMJ.   2006;     332   (7549)   :1089–92.    

(21)

22.     Montori   VM,   Jaeschke   R,   Schünemann   HJ,   Bhandari   M,   Brozek   JL,   Devereaux  PJ,  et  al.  Users’  guide  to  detecting  misleading  claims  in  clinical   research  reports.  BMJ  2004;329(7474):1093–6.    

23.     Ewald   B.   Post   hoc   choice   of   cut   points   introduced   bias   to   diagnostic   research.  J.  Clin.  Epidemiol.  2006  Aug;59(8):798–801.    

24.     Leeflang   MMG,   Moons   KGM,   Reitsma   JB,   Zwinderman   AH.   Bias   in   sensitivity  and  specificity  caused  by  data-­‐driven  selection  of  optimal  cutoff   values:  mechanisms,  magnitude,  and  solutions.  Clin.  Chem.  2008  ;  54  (4):   729–37.    

25.     Harper   R,   Reeves   B.   Reporting   of   precision   of   estimates   for   diagnostic   accuracy:  a  review.  BMJ.  1999;318(7194):1322–3.    

26.     Habbema  JD.,  Eijekmans  R,  Krijnen  P,  Knottnerus  J.  Analysis  of  data  on  the   accuracy   of   diagnostic   tests.   In:   Knottnerus   J.,   editor.   Evid.   Base   Clin.   Diagnosis.  London:  BMJ  Publishing  Group;  2002.  p.  117–44.    

27.     Altman   DG.   Why   we   need   confidence   intervals.   World   J.   Surg..   2005   May;29(5):554–6.    

28.     Hayen  A,  Macaskill  P,  Irwig  L,  Bossuyt  P.  Appropriate  statistical  methods   are  required  to  assess  diagnostic  tests  for  replacement,  add-­‐on,  and  triage.   J.  Clin.  Epidemiol.  2010  Aug;63(8):883–91.    

29.     CJ  C,  ES  P.  The  Use  of  Confidence  or  Fiducial  Limits  Illustrated  in  the  Case   of  the  Binomial.  Biometrika.  1934;26(4):404–13.    

30.     Hintze  JL,  NCSS.  PASS  11  (Power  Analysis  and  Sample  Size).  Kaysville  Utah;   2011.    

31.     Hage  CA,  Davis  TE,  Fuller  D,  Egan  L,  Witt  JR,  Wheat  LJ,  et  al.  Diagnosis  of   histoplasmosis  by  antigen  detection  in  BAL  fluid.  Chest.  2010;137(3):623– 8.    

32.     La  Fougère  C,  Pöpperl  G,  Levin  J,  Wängler  B,  Böning  G,  Uebleis  C,  et  al.  The   value  of  the  dopamine  D2/3  receptor  ligand  18F-­‐desmethoxyfallypride  for   the   differentiation   of   idiopathic   and   nonidiopathic   parkinsonian   syndromes.  J.  Nucl.  Med.  2010  Apr;51(4):581–7.    

33.     Xu  F,  Yan  Q,  Wang  H,  Niu  J,  Li  L,  Zhu  F,  et  al.  Performance  of  detecting  IgM   antibodies   against   enterovirus   71   for   early   diagnosis.   PLoS   One   2010;5(6):e11388.    

(22)

                                                                                                                                                     Overinterpretation  of    diagnostic  accuracy  studies        

33

35.     Bossuyt   PMM.   The   thin   line   between   hope   and   hype   in   biomarker   research.  JAMA    2011;305(21):2229–30.    

36.     Paranjothy  B,  Shunmugam  M,  Azuara-­‐Blanco  A.  The  quality  of  reporting  of   diagnostic  accuracy  studies  in  glaucoma  using  scanning  laser  polarimetry.   J.  Glaucoma    2007;16(8):670–5.    

37.     Bossuyt   PMM.   STARD   statement:   still   room   for   improvement   in   the   reporting  of  diagnostic  accuracy  studies.  Radiology.  2008  Sep;248(3):713– 4.    

38.     Wilczynski   NL.   Quality   of   reporting   of   diagnostic   accuracy   studies:   no   change   since   STARD   statement   publication-­‐-­‐before-­‐and-­‐after   study.   Radiology.  2008;248(3):817–23.    

39.     Fontela   PS,   Pant   Pai   N,   Schiller   I,   Dendukuri   N,   Ramsay   A,   Pai   M.   Quality   and   reporting   of   diagnostic   accuracy   studies   in   TB,   HIV   and   malaria:   evaluation  using  QUADAS  and  STARD  standards.  PLoS  One.  2009  ;  4  (11):   e7753.    

40.     Areia   M,   Soares   M,   Dinis-­‐Ribeiro   M.   Quality   reporting   of   endoscopic   diagnostic  studies  in  gastrointestinal  journals:  where  do  we  stand  on  the   use  of  the  STARD  and  CONSORT  statements?  Endoscopy.  2010  ;42(2):138– 47.    

41.     Selman   TJ,   Morris   RK,   Zamora   J,   Khan   KS.   The   quality   of   reporting   of   primary  test  accuracy  studies  in  obstetrics  and  gynaecology:  application  of   the  STARD  criteria.  BMC  Womens.  Health.  2011;11:8.    

42.     Pitkin  RM,  Branagan  MA,  Burmeister  LF.  Accuracy  of  data  in  abstracts  of   published  research  articles.  JAMA.;281(12):1110–1.    

43.     Beller   EM,   Glasziou   PP,   Hopewell   S,   Altman   DG.   Reporting   of   effect   direction   and   size   in   abstracts   of   systematic   reviews.   JAMA.   2011;306(18):1981–2.    

     

Appendices  

                         Appendices  and  supplemental  material  can  be  accessed  at   http://pubs.rsna.org/journal/radiology  

Referenties

GERELATEERDE DOCUMENTEN

In my position as Science Librarian, I have been responsible for faculty liaison, collection development, reference and research and library instruction for my subject areas

If the number of surface species increases to three, for example CO(ads), OH(ads) and either free Pt sites or O(ads), two adsorption relaxations are needed, circuit 2L, in order

Against this complex contemporary social and cultural context, Tal-choom, as Korea’s popular theatre, exem­ plifies its current place and the future possibilities

But, above all, Bildung’s ideological force remains invisible (Gadamer’s "atmosphere breathed”) so that individuals fieely consent to its demands; its subjects, that is,

Once the combined sets containing only good data points are identified, classical estimation methods such as the least-squares method and the maximum likelihood method can be applied

Conclusions:: A previously tenotomised flexor carpi ulnaris muscle is strong enough to causee recurrence of spastic flexion deformity of the wrist in case functional fibrous

Ourr aim was to determine whether the length and function of the flexor carpi ulnaris musclee were affected by separating it from its soft tissue connections. We measured the

tionn deformity in which passive supination is possible beyond the neutral position off the forearm, but active supination is not, should be surgically corrected in cases