Home  People  Partners

VOStat  StatCodes  Data & Tutorials  Events  Bibliographies

Astrostatistical ChallengesAs the data volume and complexity of astronomical findings have enormously increased in recent decades, a paradigm shift is underway in the very nature of observational astronomy. While in the past a single astronomer might observe a handful of objects, today data mining of large digital sky archives obtained at all wavelengths of light is becoming a major mode of study. The astronomical community thus faces a key task: to enable efficient and objective scientific exploitation of enormous multifaceted datasets. In recognition of this need, the National Virtual Observatory initiative has recently emerged as a top priority, from the NAS Taylor/McKee Decadal Report on astronomy for 20002010, to federate numerous large digital sky archives and develop tools to explore and understand these vast volumes of data. Statistical problems in astronomy today involve many more
problems than can be addressed by any single statistical method,
any single statistical field, or any single statistician. Innumerable
issues arise in the scientific interpretation of astronomical studies.
Some issues
involve sampling, multivariate and survival analysis, while others
involve image and spatial analysis, signal processing or time
series analysis. Nonlinear regression is needed to model the
spectra of astronomical objects in terms of continuum and line
components deriving from the quantum mechanical properties of
matter. Here are a few of the questions that arise:
Other statistical issues do not appear in research journals but rather arise deep inside the complex machinery of modern observatories. Many testing, monitoring, compressing, fitting and even intelligent decisionmaking operations are embedded in the operation, calibration and data reduction process of a contemporary astronomical satellite. With advances in highspeed radiationhardened chips and highdata rate detectors, sophisticated data analysis operations often take place onboard. Telemetered data are then subject to pipeline processing which provide the basic input to hundreds of astronomical studies. Most of the codes are developed by engineers and scientists who have little formal training in statistics or applied mathematics. Another level of astrostatistical challenge has emerged with the Virtual Observatory. Most of these efforts have focused on computational aspects of data access and mining from distributed, heterogeneous databases. But after the scientists have collected the subdatasets of interest, powerful statistical techniques should be brought to bear to help them make astrophysical inferences. We have begun this effort with the creation of a prototype VOStat Web service where scientists can interactively request a statistical analysis using software (located elsewhere) on data (located at other locations) and receive nearrealtime answers. VOStat is being developed under a Focused Research Group funded by the NSF Division of Mathematical Sciences led by PI Babu. 