Abstract:
Age heaping—systematic misreporting of ages due to digit preference or culturally salient ages—introduces distortions into demographic and survey data. Existing methods primarily address heaping at terminal digits (0 or 5) and are not designed to detect irregular or age-specific heaping patterns. We introduce the Heaped Age Adaptive Model (HAAM), a penalized EM framework that jointly estimates a smooth latent true age distribution and age-specific misreporting behavior. HAAM integrates PRISMA, a Poisson-robust smoothing procedure, with a flexible misreport kernel whose parameters ($\mathbb{\alpha}$, $\mathbb{\beta}$) govern the attractiveness and locality of reported ages. An $\ell_1$ sparsity penalty enables the model to adaptively identify a small set of genuinely heaped ages without imposing any predetermined heaping structure. Across simulation studies containing both digit-based and irregular heaped ages, HAAM closely recovers the true age distribution, removes artificial spikes, and correctly locates the ages that attract disproportionate reports. Compared with classical digit-heaping indices and graduation techniques, HAAM provides both superior correction and richer diagnostic insight. This model offers a general, data-driven tool for mitigating age heaping in demographic, epidemiological, and historical datasets.