Λέσχη Φίλων Στατιστικής - GrStats forum
AUEB Stats Seminars 25/2/2022: Subset selection for big data regression: an improved approach Forumgrstats

Join the forum, it's quick and easy

Λέσχη Φίλων Στατιστικής - GrStats forum
AUEB Stats Seminars 25/2/2022: Subset selection for big data regression: an improved approach Forumgrstats
Λέσχη Φίλων Στατιστικής - GrStats forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.
Για προβλήματα εγγραφής και άλλες πληροφορίες επικοινωνήστε με : grstats.forum@gmail.com ή grstats@stat-athens.aueb.gr

Go down
grstats
grstats
Posts : 923
Join date : 2009-10-21
http://stat-athens.aueb.gr/~grstats/

AUEB Stats Seminars 25/2/2022: Subset selection for big data regression: an improved approach Empty AUEB Stats Seminars 25/2/2022: Subset selection for big data regression: an improved approach

Thu 24 Feb 2022 - 16:06
AUEB Stats Seminars 25/2/2022: Subset selection for big data regression: an improved approach 2022_710


AUEB STATISTICS SEMINAR SERIES FEBRUARY 2022


Vasilis Chasiotis
Department of Statistics, AUEB, GR

Subset selection for big data regression: an improved approach

FRIDAY 25/2/2022
13:00

Room T102, AUEB New Building, 2 Troias str.
Εναλλακτικά συνδεθείτε μέσω  TEAMS εδώ.


ABSTRACT

In the big data era researchers face a series of problems. Such big data occur in several cases. Even standard approaches/methodologies like linear regression can be difficult or problematic with huge volumes of data. For example, traditional approaches for regression in big datasets may suffer due to the large sample size, since they involve inverting huge data matrices or even because the data cannot fit to the memory. Among others, a simple approach may be based on selecting subdata to run the regression. Some approaches for big data regression, already existing in the current literature, are based on selecting data points using information criteria, providing algorithms as well. Some of these approaches are based on the combinatorial properties of an orthogonal array. In the present paper we wish to improve the algorithms proposed in these approaches. We describe an approach, providing a new algorithm whose gain is shown through simulation experiments and analysis of real data. A discussion about the parameters of the proposed algorithm is also provided in order to clarify the trade-offs between execution time and information gain.
Back to top
Permissions in this forum:
You cannot reply to topics in this forum