Generating Synthetic Discrete Datasets with Machine Learning

Giuseppe Manco, Ettore Ritacco, Antonino Rullo, Domenico Saccà, Edoardo Serra

Research output: Contribution to journalConference articlepeer-review


The real data are not always available/accessible/sufficient or in many cases they are incomplete and lacking in semantic content necessary to the definition of optimization processes. In this paper we discuss about the synthetic data generation under two different perspectives. The core common idea is to analyze a limited set of real data to learn the main patterns that characterize them and exploit this knowledge to generate brand new data. The first perspective is constraint-based generation and consists in generating a synthetic dataset satisfying given support constraints on the real frequent patterns. The second one is based on probabilistic generative modeling and considers the synthetic generation as a sampling process from a parametric distribution learned on the real data, typically encoded as a neural network (e.g. Variational Autoencoders, Generative Adversarial Networks).

Original languageEnglish
Pages (from-to)341-350
Number of pages10
JournalCEUR Workshop Proceedings
StatePublished - 2022
Event30th Italian Symposium on Advanced Database Systems, SEBD 2022 - Tirrenia, Italy
Duration: 19 Jun 202220 Jun 2022


  • Constraints-based models
  • Data generation
  • Generative Adversarial Networks
  • Generative models
  • Inverse Frequent Itemset Mining
  • Synthetic dataset
  • Variational Autoencoder


Dive into the research topics of 'Generating Synthetic Discrete Datasets with Machine Learning'. Together they form a unique fingerprint.

Cite this