AUEB Stats Seminars 25/2/2022: Subset selection for big data regression: an improved approach

Thu 24 Feb 2022 - 16:06

AUEB STATISTICS SEMINAR SERIES FEBRUARY 2022

Vasilis Chasiotis
Department of Statistics, AUEB, GR

Subset selection for big data regression: an improved approach

FRIDAY 25/2/2022
13:00

Room T102, AUEB New Building, 2 Troias str.
Εναλλακτικά συνδεθείτε μέσω TEAMS εδώ.

ABSTRACT

In the big data era researchers face a series of problems. Such big data occur in several cases. Even standard approaches/methodologies like linear regression can be difficult or problematic with huge volumes of data. For example, traditional approaches for regression in big datasets may suffer due to the large sample size, since they involve inverting huge data matrices or even because the data cannot fit to the memory. Among others, a simple approach may be based on selecting subdata to run the regression. Some approaches for big data regression, already existing in the current literature, are based on selecting data points using information criteria, providing algorithms as well. Some of these approaches are based on the combinatorial properties of an orthogonal array. In the present paper we wish to improve the algorithms proposed in these approaches. We describe an approach, providing a new algorithm whose gain is shown through simulation experiments and analysis of real data. A discussion about the parameters of the proposed algorithm is also provided in order to clarify the trade-offs between execution time and information gain.