Astrostatistics Image Penn State University Eberly College of Science Center for Astrostatistics Center for Astrostatistics

 Home | People | Partners | VOStat | StatCodes | Data & Tutorials | Events | Bibliographies

 

Astrostatistical Challenges

As the data volume and complexity of astronomical findings have enormously increased in recent decades, a paradigm shift is underway in the very nature of observational astronomy. While in the past a single astronomer might observe a handful of objects, today data mining of large digital sky archives obtained at all wavelengths of light is becoming a major mode of study. The astronomical community thus faces a key task: to enable efficient and objective scientific exploitation of enormous multifaceted datasets. In recognition of this need, the National Virtual Observatory initiative has recently emerged as a top priority, from the NAS Taylor/McKee Decadal Report on astronomy for 2000-2010, to federate numerous large digital sky archives and develop tools to explore and understand these vast volumes of data.

Statistical problems in astronomy today involve many more problems than can be addressed by any single statistical method, any single statistical field, or any single statistician. Innumerable issues arise in the scientific interpretation of astronomical studies. Some issues involve sampling, multivariate and survival analysis, while others involve image and spatial analysis, signal processing or time series analysis. Nonlinear regression is needed to model the spectra of astronomical objects in terms of continuum and line components deriving from the quantum mechanical properties of matter.  Here are a few of the questions that arise:

  • Is a collection of objects chosen for study an unbiased sample of the vast underlying population? When should a collection of objects be divided into two or more classes?
  • What is the intrinsic relationship between two properties of a class, particularly in the presence of confounding variables such as redshift?
  • How can we answer such questions in the presence of flux-limited samples and flux-dependent error bars?
  • When is a blip in a spectrum or image a real signal rather than noise?
  • How do we characterize blips embedded in larger structures?
  • When is a signal variable rather than constant?
  • How do we characterize the vast range of periodic, correlated and stochastic variations ranging from the Doppler wobble of normal stars due to invisible planets, X-ray manifestations of accretion onto black holes, and gamma-ray bursts from the exotic end-states of stellar evolution ?
  • How do we understand the 3-to-6-dimensional spatial point processes representing the location and motions of stars in the Galaxy or Galaxies in the Universe?
  • How do we understand the structure of continuous entities like the cosmic microwave background or the interstellar medium?

Other statistical issues do not appear in research journals but rather arise deep inside the complex machinery of modern observatories. Many testing, monitoring, compressing, fitting and even intelligent decision-making operations are embedded in the operation, calibration and data reduction process of a contemporary astronomical satellite. With advances in high-speed radiation-hardened chips and high-data rate detectors, sophisticated data analysis operations often take place on-board. Telemetered data are then subject to pipeline processing which provide the basic input to hundreds of astronomical studies. Most of the codes are developed by engineers and scientists who have little formal training in statistics or applied mathematics.

Another level of astrostatistical challenge has emerged with the Virtual Observatory. Most of these efforts have focused on computational aspects of data access and mining from distributed, heterogeneous databases. But after the scientists have collected the sub-datasets of interest, powerful statistical techniques should be brought to bear to help them make astrophysical inferences. We have begun this effort with the creation of a prototype VOStat Web service where scientists can interactively request a statistical analysis using software (located elsewhere) on data (located at other locations) and receive near-realtime answers. VOStat is being developed under a Focused Research Group funded by the NSF Division of Mathematical Sciences led by PI Babu.