TY - JOUR
T1 - Multi-sorted inverse frequent itemsets mining for generating realistic no-SQL datasets
AU - Saccà, Domenico
AU - Serra, Edoardo
AU - Rullo, Antonino
N1 - Publisher Copyright:
© 2021 Copyright for this paper by its authors.
PY - 2021
Y1 - 2021
N2 - The development of novel platforms and techniques for emerging “Big Data” applications requires the availability of real-life datasets for data-driven experiments, which are however not accessible in most cases for various reasons, e.g., confidentiality, privacy or simply insufficient availability. An interesting solution to ensure high quality experimental findings is to synthesize datasets that reflect patterns of real ones. A promising approach is based on inverse mining techniques such as inverse frequent itemset mining (IFM), which consists of generating a transactional dataset satisfying given support constraints on the itemsets of an input set, that are typically the frequent and infrequent ones. This paper describes an extension of IFM that considers more structured schemes for the datasets to be generated, as required in emerging big data applications, e.g., social network analytics.
AB - The development of novel platforms and techniques for emerging “Big Data” applications requires the availability of real-life datasets for data-driven experiments, which are however not accessible in most cases for various reasons, e.g., confidentiality, privacy or simply insufficient availability. An interesting solution to ensure high quality experimental findings is to synthesize datasets that reflect patterns of real ones. A promising approach is based on inverse mining techniques such as inverse frequent itemset mining (IFM), which consists of generating a transactional dataset satisfying given support constraints on the itemsets of an input set, that are typically the frequent and infrequent ones. This paper describes an extension of IFM that considers more structured schemes for the datasets to be generated, as required in emerging big data applications, e.g., social network analytics.
KW - IFM
KW - Itemset mining
KW - No-SQL
UR - http://www.scopus.com/inward/record.url?scp=85118785910&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85118785910
SN - 1613-0073
VL - 2994
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - 29th Italian Symposium on Advanced Database Systems, SEBD 2021
Y2 - 5 September 2021 through 9 September 2021
ER -