Have you ever noticed how large your photo library has grown since mobile technology made it possible to capture every moment? Do you know the frustration of sifting through an endless stream of images to find the “keepers”? If so, you are not alone. NASA feels your pain.
NASA – the US federal agency responsible for space programs and aerospace research – also has a tough time keeping up with its archives. Every day, NASA’s hundreds of satellites capture and transmit back information about our planet, our solar system and the universe beyond. That’s a lot to process.
Deep-space craft send back data to Earth in the order of MB per second. Earth orbiters send back data in the order of GB per second. Missions in planning today will stream over 24TB of data per day. That’s equivalent to 10 billion single-spaced typewritten pages per day. Climate change data repositories alone are projected to reach 350 petabytes (that’s 350 million gigabytes) by 2030.
Why does NASA need all this data? Well, for one, its mission is to “reach for new heights and reveal the unknown for the benefit of humankind.” This includes tackling big challenges facing our planet, such as climate change, sea level rises and extreme weather events. It also relates to better understanding the origin and destiny of our universe. Meanwhile, studying space weather is on the agenda too, as NASA plans to send humans to asteroids and Mars.
Big challenges require big (data) solutions. NASA is continuously working on innovative methods for creating, storing, transmitting and analyzing information. The last step in particular has recently given rise to new value creation. NASA faces a dilemma core to the rise of big data: the volume, velocity and variety of data are overwhelming traditional techniques for management and analysis. New frameworks, infrastructure and algorithms have been key for a broader shift from standard analysis (interpretation, what to look for), to advanced analytics (exploration, how to look).
The car-sized Curiosity rover on Mars uploads all the readings of its onboard sensors four times a day. NASA’s Jet Propulsion Lab built an analytics solution around Elasticsearch, a system also used by Netflix and Goldman Sachs, to enable real time feedback. This means mission control can now rapidly respond to the rover’s environment and change its operations. Along the same vein, the agency’s flight operations teams use their Mission Data Processing and Control System (MPCS) to process mission data and provide visualizations in real time.
Back on Earth, NASA is leveraging the power of crowds to bid all hands on deck. The NASA Open Data project makes climate and earth science data publicly available (in partnership with Amazon Web Services). In addition to the raw data, NASA Earth Exchange provides users with tools for processing and visualizing data, including supercomputing, earth systems modeling, and workflow management. Using graph-based open-source system Neo4J, the agency has been able to reduce time spent sifting through keyword-search results. NASA also maintains a Lessons Learned database, that contains insights from every mission dating back to the 1960’s.
These are just a few examples from a broad array of projects NASA is pursuing to extract value from a growing reservoir of data. The potential opportunity is mind-bending: insights could shape our understanding of the universe and support our efforts to expand beyond a single planet. However, substantial challenges remain on the data front. While NASA is making strides toward data integration, it still operates in a world of discipline-specific Distributed Active Archive Centers. There is no fully coordinated data life-cycle management, and just the task of finding relevant data can be daunting. Perhaps even more importantly, a skill gap exists as we wait for human or algorithmic capabilities to help these rocket scientists find the “keepers” in their data streams.