Publications by Author: Goodman, Alyssa A

2014

Alyssa A Goodman, Alberto Pepe, Alexander Blocker, Christine L Borgman, Kyle Cranmer, Merce Crosas, Rosanne Di Stefano, Yolanda Gil, Paul Groth, Margaret Hedstrom, David W Hogg, Vinay Kashyap, Ashish Mahabal, Aneta Siemiginowska, and Aleksandra Slavkovic. 2014. “Ten Simple Rules for the Care and Feeding of Scientific Data.” PLoS Computational Biology, 10, 4, Pp. e1003542. Publisher's Version Abstract

10simplerules_fromplos_site.pdf

Christopher N Beaumont, Alyssa A Goodman, Sarah Kendrew, Jonathan P Williams, and Robert Simpson. 2014. “The Milky Way Project: Leveraging Citizen Science and Machine Learning to Detect Interstellar Bubbles.” The Astrophysical Journal Supplement Series, 214, Pp. 3. Publisher's Version Abstract

We present Brut, an algorithm to identify bubbles in infrared images of the Galactic midplane. Brut is based on the Random Forest algorithm, and uses bubbles identified by >35,000 citizen scientists from the Milky Way Project to discover the identifying characteristics of bubbles in images from the Spitzer Space Telescope . We demonstrate that Brut's ability to identify bubbles is comparable to expert astronomers. We use Brut to re-assess the bubbles in the Milky Way Project catalog, and find that 10%-30% of the objects in this catalog are non-bubble interlopers. Relative to these interlopers, high-reliability bubbles are more confined to the mid-plane, and display a stronger excess of young stellar objects along and within bubble rims. Furthermore, Brut is able to discover bubbles missed by previous searches—particularly bubbles near bright sources which have low contrast relative to their surroundings. Brut demonstrates the synergies that exist between citizen scientists, professional scientists, and machine learning techniques. In cases where "untrained" citizens can identify patterns that machines cannot detect without training, machine learning algorithms like Brut can use the output of citizen science projects as input training sets, offering tremendous opportunities to speed the pace of scientific discovery. A hybrid model of machine learning combined with crowdsourced training data from citizen scientists can not only classify large quantities of data, but also address the weakness of each approach if deployed alone.

beaumont_2014_arxiv.pdf

2013

Nathan E Sanders, Chris Faesi, and Alyssa A Goodman. 2013. “A New Approach to Developing Interactive Software Modules through Graduate Education.” arXiv.org.Abstract

We discuss a set of fifteen new interactive, educational, online software modules developed by Harvard University graduate students to demonstrate various concepts related to astronomy and physics. Their achievement demonstrates that online software tools for education and outreach on specialized topics can be produced while simultaneously fulfilling project-based learning objectives. We describe a set of technologies suitable for module development and present in detail four examples of modules developed by the students. We offer recommendations for incorporating educational software development within a graduate curriculum and conclude by discussing the relevance of this novel approach to new online learning environments like edX.

1308.1908v1.pdf

Christopher N Beaumont, Stella SR Offner, Rahul Shetty, Simon CO Glover, and Alyssa A Goodman. 2013. “Quantifying Observational Projection Effects Using Molecular Cloud Simulations.” The Astrophysical Journal, 777, Pp. 173. Publisher's Version Abstract

The physical properties of molecular clouds are often measured using spectral-line observations, which provide the only probes of the clouds' velocity structure. It is hard, though, to assess whether and to what extent intensity features in position-position-velocity (PPV) space correspond to "real" density structures in position-position-position (PPP) space. In this paper, we create synthetic molecular cloud spectral-line maps of simulated molecular clouds, and present a new technique for measuring the reality of individual PPV structures. Using a dendrogram algorithm, we identify hierarchical structures in both PPP and PPV space. Our procedure projects density structures identified in PPP space into corresponding intensity structures in PPV space and then measures the geometric overlap of the projected structures with structures identified from the synthetic observation. The fractional overlap between a PPP and PPV structure quantifies how well the synthetic observation recovers information about the three-dimensional structure. Applying this machinery to a set of synthetic observations of CO isotopes, we measure how well spectral-line measurements recover mass, size, velocity dispersion, and virial parameter for a simulated star-forming region. By disabling various steps of our analysis, we investigate how much opacity, chemistry, and gravity affect measurements of physical properties extracted from PPV cubes. For the simulations used here, which offer a decent, but not perfect, match to the properties of a star-forming region like Perseus, our results suggest that superposition induces a 40% uncertainty in masses, sizes, and velocity dispersions derived from 13 CO ( J = 1-0). As would be expected, superposition and confusion is worst in regions where the filling factor of emitting material is large. The virial parameter is most affected by superposition, such that estimates of the virial parameter derived from PPV and PPP information typically disagree by a factor of 2. This uncertainty makes it particularly difficult to judge whether gravitational or kinetic energy dominate a given region, since the majority of virial parameter measurements fall within a factor of two of the equipartition level α 2.

beaumont_2013_arxiv.pdf

2012

Alyssa A Goodman. 2012. “Principles of High-Dimensional Data Visualization in Astronomy.” Astronomische Nachrichten, 333, 5-6, Pp. 505-514. Astrobites commentary on this article Abstract

sets, though, interactive exploratory data visualization can give far more insight than an approach where data processing
and statistical analysis are followed, rather than accompanied, by visualization. This paper attempts to charts a course
toward “linked view” systems, where multiple views of high-dimensional data sets update live as a researcher selects,
highlights, or otherwise manipulates, one of several open views. For example, imagine a researcher looking at a 3D volume
visualization of simulated or observed data, and simultaneously viewing statistical displays of the data set’s properties
(such as an x-y plot of temperature vs. velocity, or a histogram of vorticities). Then, imagine that when the researcher
selects an interesting group of points in any one of these displays, that the same points become a highlighted subset in
all other open displays. Selections can be graphical or algorithmic, and they can be combined, and saved. For tabular
(ASCII) data, this kind of analysis has long been possible, even though it has been under-used in Astronomy. The bigger
issue for Astronomy and several other “high-dimensional” fields is the need systems that allow full integration of images
and data cubes within a linked-view environment. The paper concludes its history and analysis of the present situation
with suggestions that look toward cooperatively-developed open-source modular software as a way to create an evolving,
flexible, high-dimensional, linked-view visualization environment useful in astrophysical research.

heidelberg_ag.pdf

2011

Alyssa A Goodman. 2011. “A Guide to Comparisons of Star Formation Simulations with Observations.” Computational Star Formation. IAU. Publisher's Version Abstract

Abstract. We review an approach to observation-theory comparisons we call \Taste-Testing."
In this approach, synthetic observations are made of numerical simulations, and then both real
and synthetic observations are \tasted" (compared) using a variety of statistical tests. We rst
lay out arguments for bringing theory to observational space rather than observations to theory
space. Next, we explain that generating synthetic observations is only a step along the way to
the quantitative, statistical, taste tests that oer the most insight. We oer a set of examples
focused on polarimetry, scattering and emission by dust, and spectral-line mapping in starforming
regions. We conclude with a discussion of the connection between statistical tests used
to date and the physics we seek to understand. In particular, we suggest that the \lognormal"
nature of molecular clouds can be created by the interaction of many random processes, as can
the lognormal nature of the IMF, so that the fact that both the \Clump Mass Function" (CMF)
and IMF appear lognormal does not necessarily imply a direct relationship between them.

1107.2827v1.pdf

Christopher N Beaumont, Jonathan P Williams, and Alyssa A Goodman. 2011. “Classifying Structures in the Interstellar Medium with Support Vector Machines: The G16.05-0.57 Supernova Remnant.” The Astrophysical Journal, 741, Pp. 14. Publisher's Version Abstract

We apply Support Vector Machines (SVMs)—a machine learning algorithm—to the task of classifying structures in the interstellar medium (ISM). As a case study, we present a position-position-velocity (PPV) data cube of 12 CO J = 3-2 emission toward G16.05-0.57, a supernova remnant that lies behind the M17 molecular cloud. Despite the fact that these two objects partially overlap in PPV space, the two structures can easily be distinguished by eye based on their distinct morphologies. The SVM algorithm is able to infer these morphological distinctions, and associate individual pixels with each object at >90% accuracy. This case study suggests that similar techniques may be applicable to classifying other structures in the ISM—a task that has thus far proven difficult to automate.

svm_beaumont_2011.pdf

2009

Alyssa A Goodman and Curtis Wong. 2009. “Bringing the Night Sky Closer: Discoveries in the Data Deluge.” In The Fourth Paradigm: Data-Intensive Scientific Discovery. Publisher's Version Abstract

Throughout history, astronomers have been accustomed to data falling from the sky. But our relatively newfound ability to store the sky's data in "clouds" offers us fascinating new ways to access, distribute, use, and analyze data, both in research and in education. Here we consider three interrelated questions: (1) What trends have we seen, and will soon see, in the growth of image and data collection from telescopes? (2) How might we address the growing challenge of finding the proverbial needle in the haystack of this data to facilitate scientific discovery? (3) What visualization and analytic opportunities does the future hold?

Alyssa A Goodman, Erik W Rosolowsky, Michelle A Borkin, Jonathan B Foster, Michael Halle, Jens Kauffmann, and Jaime E Pineda. 2009. “A role for self-gravity at multiple length scales in the process of star formation.” Nature, 457, 7225, Pp. 63-6.Abstract

Self-gravity plays a decisive role in the final stages of star formation, where dense cores (size approximately 0.1 parsecs) inside molecular clouds collapse to form star-plus-disk systems. But self-gravity's role at earlier times (and on larger length scales, such as approximately 1 parsec) is unclear; some molecular cloud simulations that do not include self-gravity suggest that 'turbulent fragmentation' alone is sufficient to create a mass distribution of dense cores that resembles, and sets, the stellar initial mass function. Here we report a 'dendrogram' (hierarchical tree-diagram) analysis that reveals that self-gravity plays a significant role over the full range of possible scales traced by (13)CO observations in the L1448 molecular cloud, but not everywhere in the observed region. In particular, more than 90 per cent of the compact 'pre-stellar cores' traced by peaks of dust emission are projected on the sky within one of the dendrogram's self-gravitating 'leaves'. As these peaks mark the locations of already-forming stars, or of those probably about to form, a self-gravitating cocoon seems a critical condition for their existence. Turbulent fragmentation simulations without self-gravity-even of unmagnetized isothermal material-can yield mass and velocity power spectra very similar to what is observed in clouds like L1448. But a dendrogram of such a simulation shows that nearly all the gas in it (much more than in the observations) appears to be self-gravitating. A potentially significant role for gravity in 'non-self-gravitating' simulations suggests inconsistency in simulation assumptions and output, and that it is necessary to include self-gravity in any realistic simulation of the star-formation process on subparsec scales.

Alyssa A Goodman. 2009. “Seeing Science.” Proceedings of the International Festival of Scientific Visualization. Tokyo, Japan: Universal Academy Press. Publisher's Version Abstract

The ability to represent scientific data and concepts visually is becoming increasingly important due to the unprecedented exponential growth of computational power during the present digital age. The data sets and simulations scientists in all fields can now create are literally thousands of times as large as those created just 20 years ago. Historically successful methods for data visualization can, and should, be applied to today's huge data sets, but new approaches, also enabled by technology, are needed as well. Increasingly, "modular craftsmanship" will be applied, as relevant functionality from the graphically and technically best tools for a job are combined as-needed, without low-level programming.