2016 ICSA
APPLIED STATISTICS
SYMPOSIUM
Atlanta, GA
June 12-15, 2016
Abstracts
fies the “Separable After Screening” property. The theoretical anal-ysis also includes a new result about the connection between PCA and factor analysis.
Truth, Knowledge, P-Values, Bayes, & Inductive Inference Edel Pena
University of South Carolina pena@stat.sc.edu
In the past few years the use of P-values in the context of scien-tific research has seen much, sometimes heated, discussions. The American Statistical Association was even compelled to release an official statement in early March 2016 regarding this issue, and a psychology journal has gone to the extreme of banning the use of P-values in articles appearing in its journal. This debate has also been in relation to important issues of reproducibility in scientific research. In fact, this debate goes to the core of inductive inference and the different schools of thought (significance testing approach, Neyman-Pearson paradigm, Bayesian approach, etc.) on how in-ductive inference should be done. In this talk I would like to delve into these issues and to offer some viewpoints on whether P-values should be relegated to the the dustbin of history or whether it will be there to stay as a tool of scientific investigations. In particular, I will touch on the representation of knowledge and its updating based on observations, and ask the question: “When given the p-value, what does it provide in the context of the updated knowledge of the phenomenon under consideration?”,Edel,Pena,University of South Carolina
Regression in heterogeneous problems Hanwen Huang
UGA
huanghw@uga.edu
We develop a new framework for modeling the impact of sub-cluster structure of data on regression. The proposed framework is specif-ically designed for handling situations where the sample is not ho-mogeneous in the sense that the response variables in different re-gions of covariate space are generated through different mecha-nisms. In such situation, the sample can be viewed as a compo-sition of multiple data sets each of which is homogeneous. The traditional linear and general nonlinear methods may not work very well because it is hard to find a model to fit multiple data sets si-multaneously. The proposed method is flexible enough to ensure that the data generated from different regions can be modeled us-ing different functions. The key step of our method incorporates the k-means clustering idea into the traditional regression framework so that the regression and clustering tasks can be performed simul-taneously. The k-means clustering algorithm is extended to solve the optimization problem in our model that groups the samples with similar response-covariate relationship together. General conditions under which the estimation of the model parameters is consistent are investigated. By adding appropriate penalty terms, the proposed model can conduct variable selection to eliminate the uninforma-tive variables. The conditions under which the proposed model can achieve asymptotic selection consistency are also studied. The ef-fectiveness of the proposed method is demonstrated through simu-lations and real data analysis.
Session 71: Design of Experiments I
Minimax designs using clustering
Simon Mak and V. Roshan Vengazhiyil
Georgia Institute of Technology
smak6@gatech.edu
Minimax designs provide a uniform coverage of a design space
X ⊆ Rp
by minimizing the maximum distance from any point in this space to its nearest design point. Although minimax de-signs have many useful applications, e.g., for optimal sensor allo-cation or as space-filling designs for computer experiments, there has been little work in developing algorithms for generating these designs. In this paper, a new clustering-based method is presented for computing minimax designs on any convex and bounded design region. The computation time of this algorithm scales linearly in di-mensionality p, meaning our method can generate minimax designs efficiently for high-dimensional regions. Simulation studies and a real-world example show that the proposed algorithm provides im-proved minimax performance over existing methods on a variety of design regions. Finally, we introduce a new type of experimen-tal design called a minimax projection design, and show that this proposed design provides better minimax performance on projected subspaces of X compared to existing designs.
Optimal Experimental Designs for Nonlinear Conjoint Analysis Mercedes Esteban-Bravo1,Agata Leszkiewicz2and Jose M. Vidal-Sanz1
1Universidad Carlos III de Madrid 2
Georgia State University aleszkiewicz@gsu.edu
Estimators of choice-based multi-attribute preference models have a covariance matrix that depends on both the design matrix as well as the unknown parameters to be estimated from the data. As a consequence, researchers cannot optimally design the experiment (minimizing the variance). Several approaches have been consid-ered in the literature, but they require prior assumptions about the values of the parameters that often are not available. Furthermore, the resulting design is neither optimal nor robust when the assumed values are far from the true parameters. In this paper, we develop efficient worst-case designs for the choice-based conjoint analysis which accounts for customer heterogeneity. The contributions of this method are manifold. First, we account for the uncertainty as-sociated with ALL of the unknown parameters of the mixed logit model (both the mean and the elements in covariance matrix of the heterogeneity distribution). Second, we allow for the unknown pa-rameters to be correlated. Third, this method is also computation-ally efficient, which in practical applications is an advantage over e.g. fully Bayesian designs. We conduct multiple simulations to evaluate the performance of this method. The worst case designs computed for the logit and mixed logit models are indeed more ro-bust than the local and Bayesian benchmarks, when the prior guess about the parameters is far from their true values.
Obtaining locally D-optimal designs for binary response exper-iments via Particle Swarm Optimization
Joshua Lukemire1
, Abhyuday Mandal2and Weng Kee Wong3
1Emory University 2
University of Georgia
3
University of California at Los Angeles joshlukemire@gmail.com
Obtaining optimal designs for experiments in which the outcome takes a binary response and is modeled by a generalized linear model is a difficult task due to the dependence of the optimal de-sign on the model parameters. Theoretical results for these dede-sign problems are often unavailable, and instead computational methods must be used to obtain optimal designs. There are many popular such methods, however they generally require either an explicit