Posterous theme by Cory Watilo

Project Proposal During IT Summit - Normalization of Space Mission Data Sets

See also Mission Data - Developments

I just posted about my work on Space Mission Data Sets (particularly Mission Parameters). Since I will be in San Francisco for the NASA IT Summit for nearly a week (Aug 15th-19th), and there are at least a few folks who will be there who have expressed interest in doing a hackathon of sorts, I thought I would at least throw my hat into the ring for a proposal of what we actually work on.

My proposal is to have folks work on writing code, importing data into tools like Excel or whatever they are familiar with to normalize sets of data in order to import it into a collaborative, crowdsourcing framework that I am working on. I list a couple of sources of this data below.

Much of this data is what I will call "lightly verified". Enthusiasts like those who run the sites listed below curate this information from multiple sources, but a few people can only do so much, especially since there have been over 4,800 orbital launches since Sputnik. The challenge is to somehow fill in each data set's gaps by correlating and overlaying the data.

So somehow we need a set of tools written in something like Python or Java while can pull all these data sets in, and do some simple coorelation among the Flight Missions and their parameters. Perhaps this could be done with some simple Bayesian analysis, root mean square, or something similar to that. The whole outcome here is to automatically approve data elements which are the same among 3-4 different data sets, and then report on ones which different data sets disagree.

 

Examples

Lets say that two data set say that the Voyager 2 spacecraft was launched on a August 22, 1977, but one says it was launched on August 21. Somehow we should be able to automatically set some thresholds and throw out the August 21 data point. Lets say that one data point says it was the "Voyager 2" spacecraft and one say "Voyager2". We should somehow be able to deduce that these are really the same spacecraft.

 

Software libraries & packages

I just did some simple Google searches in order to find some packages that we may be able to use to accomplish this task:

  • http://code.google.com/p/google-refine/ - An open source package for dealing with "messy data" which is definately the state of this data, its an installed platform, which you then use as a service by uploading the data and then using the interface to parse and collapse the data. I have actually used this package a little in the past and it is pretty incredible.
  • http://code.google.com/p/bayesian-inference/ - python package for Bayesian Interference

 

Data Sets

1. NSSDC Spacecraft Data

My intern Justin for the summer actually wrote a great scraper for the NSSDC web site. You can find it here: http://scraperwiki.com/scrapers/nasa_nssdc_id_table_scraper/  But it only currently scrapes the NSSDC ID, Launch Date, and Spacecraft Name. The next step is to drive one level deeper and scrape the actual pages themselves, like this http://nssdc.gsfc.nasa.gov/nmc/spacecraftDisplay.do?id=2009-031A

 

2. Jonathan's Space Report

JSR's page contains many fixed width text files which contains thousands of lines of data. Here are some of them:

 

3. Other Sources

Unfortunately, many of the other sites found at the link below are not data at all. So that means that this material would need to be scraped from the sites. ScraperWiki is an incredible tool which allows for the easy extracting of data systematically from websites. Check it out here: http://www.scraperwiki.org The site provides a "cron" type system so that every day the tool runs your Python, Ruby, or Java code to do the scraping and the data extracted is literally hosted on their site in a few different formats (CSV, JSON, etc.) and you can download the new data very easily. Check it out.

I have spent many hours combing the web and doing clever web searches to find places on the web where folks (mostly enthusiasts) have assembled Space Mission Data on the web. I have found over 30 different sites, in varying levels of maturity, messiness, and accuracy. See: http://nasatweet.com/wiki/Mission_Data_-_Sources

Space Mission Data - Project Developments

 I have posted a number of times now on how important I feel it is for the historical and future of Space Mission Data to be available on the web in a linked, browsable, and crowdsourced framework. For those not familiar, by Mission Data what I mean is the mission parameters, such as the rocket engine that was used, the date of launch, the mission timeline, and the many other details of the mission.

So, instead of just talking about it, I felt it was very important just to do something about it. So I have! I started working on the wiki that I help run, nasatweet.com. And I found a neat framework, called Semantic MediaWiki (SMW), which is used across the web and is open source, but backed by some serious companies such as Ontoprise and Paul Allen's Vulcan. They describe the framework as a "powerful and flexible collaborative database."

I must mention that there are many sites on the internet which have much of this data already very well filled out and indexed. Some of my favorites are Gunther Space Page, Astronautix, and Jonathan's Space Report*. While many of these sites have a very impressive amount of data on them, as far as I can tell none of these sites are data driven in the sense they you can slice the data any way you want or view the data from different perspectives. For instance, what if I want to see all spacecraft that were launched at Vandenburg in the 1990's which had an Earth science mission? If this same set of data were entered systematically in some type of structured (MySQL) or semi-structured (NoSQL or Semantic MediaWiki in my case) then queries could be created that could easily show these "views".

Implementation

So I just started building the structure of the data in the SMW framework and it I was able to do it quite quickly. I basically chose a few fundamental data structures:

  1. Space mission
    1. ...has a description
    2. ...has a launch vehicle, and vehicle details
    3. ...has mission events
    4. ...has a launch site and a launch pad
    5. ...has a launch date & time
    6. ...has an image
  2. Mission event
    1. ...has a start time
    2. ...has an end time
    3. ...has a description
  3. Launch vehicle
    1. ...has a description
    2. ...has an image
  4. Launch site
    1. ...has a lat/long
    2. ...has a description

 

Samples

So here are some samples of how I have implemented this already:

  1. This is a great example of an actual mission, STS-135. It has all data fields filled out and a couple mission events as well: http://nasatweet.com/wiki/STS-135 Make sure to login and try to edit the page so you get a feeling for what the form looks like. Selecting the Launch vehicle automatically picks from the database of Launch vehicles already entered into the database, so that autopopulates. Notice that you can click on any Property (like Launch date, description, site, etc. and it will allow you to pivot on that Property and see values for all entries in the database).
  2. When you view a Launch site, like KSC: http://nasatweet.com/wiki/Kennedy_Space_Center,_Pad_39A You will be shown all the missions in the database which launched here dynamically, so this is very data driven.
  3. Viewing a Launch vehicle, like this http://nasatweet.com/wiki/Delta_IV you see the specific launch pad and vehicle configuration for it.
  4. Adding a new mission, launch vehicle, or site is very easy. For mission, going to this page http://nasatweet.com/wiki/Category:Missions lets you just add one at the top. Then you will be prompted with a form to enter all the data you saw previously.

 

Epilogue

But just having this data available does not help us solve the problems themselves, there is another step needed. Our bright innovators across our agency, and across our nation need to synthecize this data and information into knowledge to make key technical decisions that can enable the next generation of scientific discovery.

 

 

* In addition to the three pages listed above, I have spent some time building a compilation of all these Space Mission Data sites, please feel free to contribute: http://nasatweet.com/wiki/Mission_Data_-_Sources

 

Adapting spaceflight mission clocks on the web & Spacepoints

 

So, I just tweeted [1] with @bonkoif and @jetforme regarding the possibility of building a crowdsourced version of the iOS app Mission Clock [2] such that there could be a community site that had an up to date list of all space launches that had as much detail as the the incredible app by @jetforme

@bonkoif and I chatted on the phone right away and discussed the details and thought that it was definitely do-able and that he could start coding if I put together some basic “requirements”, which in Software Engineering speak, is just simply a description of what the software will do. One of the rubs though was the need to build a structure for the crowdsourcing piece.

Then I read immediately there after about the discussion of SPACEPOINTS [3] at SpaceUp Houston (aside: I was at the first SpaceUp in San Diego in March 2010) that @harbingeralpha, @tim846, and @txflygirl had. I immediately put the two things together. And since I had talked big ideas many times with @harbingeralpha, I knew exactly where he was coming from.

So, I propose that we connect the two ideas: give folks spacepoints for contributing to this mission launch time/event database on the web. This would provide the incentive for folks to contribute and ALSO provide a way to allow for reputation and promoting folks to become moderators (see specs below).

I would love to get feedback, I think this could be really sweet. My next step of this plan is to do the same for spacecraft in general, so build a database of all spacecraft with all their technical specs.


HERE ARE THE DETAILS!

So what follows is a very basic layout of the software requirements for the site that will drive all of this:

Version 0 of Software Requirements:

i. BASICS
1. This site should run on a standard web server
2. This site should be designed more like an Apple product than a Microsoft product, in that it has incredibly simply & minimalistic user interface (UI) and experience (UX)
3. This site should have a industry standard authentication mechanism (i.e. OAuth), such that folks can log in using their Facebook, Twitter, Google log in
4. This site should not aim too high, but provide very simple functionality, like that of the iOS app, Mission Clock

ii. NAVIGATION and DATA
5. This site should allow for a user to browse nearly all missions from a top level page, and then drill down to view mission details
    a. This top level page should be in the form of a table or a very short block for each mission
    b. From the main page, the user should see: the title of the mission, the launch site, a live countdown/up clock (i.e. using Javascript), the title of the next upcoming event, and the number of events created for that mission
    c. When a user clicks on a individual mission, they should be able to see: window open, window duration, window close, launch vehicle, launch site, overview text, and a listing of all events (which have details of: event title, time start, time end, all in DOY, T-, L-, and a realtime coundown clock)
6. Every data field as listed above should have an accompanying history, URL, and comment text. This is very similar to what a wiki provides, in that every change is kept, and has a comment, so this makes it very easy for changes to be reviewed. In addition, each change should have a referencing URL which provides the citation which will allow for the new data update to be verified.

iii. USERS and SPACEPOINTS
7. Once a user logs in, they should be able to see added features of being able to add/modify/remove missions&events
8. This functionality should be granular based on how many SPACEPOINTS the user has
9. If a user is new, they can make changes, but those changes will not show up unless they are moderated first
10. Users should be able to be set up in the following roles: standard user, moderator, administrator
    a. Standard users will have the ability to recommend changes, but none of these users’ changes will show up until accepted by a moderator
    b. Moderators have the ability to see all the data field additions/modifications/removals which have been recommended and can with one click approve or deny. All moderators can see these actions which all other moderators have made. Moderators can approve their own additions/modifications/removals.
    c. Administrators have the ability to do all that moderators can, plus they can edit all the history/URLs/comments on all data fields. Administrators can also change user rights and ban users.
11. Every user should have a simple profile page which lists their current ranking, SPACEPOINTS, and history of activity. Only on this page can other users see the unapproved additions/modifications/removals which other users have made. Also on this profile page, each user can customize profile text (up to 160 characters) and one link to their web presence (twitter, blog, etc.)
12. Every time a user gets a addition/modification/removal approved, they should get X [TBD] more SPACEPOINTS.
13. Every time a user gets a addition/modification/removal denied, they should get X [TBD] less SPACEPOINTS.
14. Other users can give each other “thumbs up” which will give the receiving user X [TBD] more SPACEPOINTS.

---

[1] - http://yfrog.com/message/thread/id/1_78258843255189504 & http://yfrog.com/message/thread/id/1_78232169285300224
[2] - http://latencyzero.com/products/missionclock & http://itunes.apple.com/us/app/missionclock/id324594672?mt=8
[3] - http://spaceuphouston.org/?p=694
[2] -


 

One image from our internal wiki with spacecraft data for inspiration:

IMAGE 1

IMAGE 2

- - - - - - - - --
Jon Verville, Information Based System Engineer, NASA GSFC/585
 _ AETDwiki & Collaborative Engineering Environment, Project Lead (GSFC/500)
 _ NASA Software Engineering/7150 Handbook, Lead Architect (HQ Office of the Chief Engineer)

NASA GSFC  ::  B23, W415  ::  301.286.xxxx  ::  on the web { http://about.me/jonverve }
= = = = = = = =

New Goddard Cafeteria Vendor Selected

Yup, you heard right, Goddard is getting full Starbucks service (with Barista), Sushi, Subway, and other international options. I am so pumped!

From: GSFC-PAO
Date: Wed, 11 May 2011 14:54:15
Subject: New Center Cafeteria Vendor Selected

Colleagues:

It gives me great pleasure to announce that I.L. Creations has been selected
as the Center’s new cafeteria contractor!  In addition to providing vending,
mobile truck, and catering services to the Center, I.L. Creations will
provide new and exciting food service options in Buildings 1, 21, and 34.

On June 13, 2011, I.L. Creations will hold its grand opening event in the
newly renovated and remodeled Building 21 space, where it will provide full
service cafeteria options including healthy eating, vegan, and international
cuisine choices.  On the same day, I.L. Creations will reopen the Building
34 dining area, where  "grab and go", hot and cold deli and panini
sandwiches, and Starbucks coffee products will be offered.

Later this summer, Building 1 will be reopened by I.L. Creations and will
provide the following franchised food and beverage service options – Subway,
Roasters Chicken Out, Starbucks (with Barista) and Hana Noodles and Sushi.
Finally, in addition to providing catering services made available through
the Building 21 cafeteria, I.L Creations has teamed with Bruce’s Catering to
provide a mobile food truck, and with Canteen for vending machine services.

  ...

Wordle - NASA Technology Roadmap: Instruments and Computing Panel

Yesterday I attended an workshop sponsored by the National Academies of Science, which was one of a series of events as part of a public forum on the future of NASA Technology. This particular workshop was on a sub area of technology, Instruments and Computing. I took copious notes and decided to create what is called a word cloud from my notes. Enjoy.

Also, I would include more information about why I am interested and attended this meeting, but I think its safer to keep this to myself until a bit later. :)

Here is the site which has information about this meeting: http://www8.nationalacademies.org/cp/meetingview.aspx?MeetingID=5085&Meet...

Wordle

Interesting Facts From NASATweetup Participants

Here is my best attempt at capturing what everyone said! I also put it on the wiki for this event. Feel FREE to edit and update, add ones I missed and so on. Thanks! (you can log in with your twitter account on the wiki to edit)

http://nasatweet.com/wiki/STS134_Fun_facts

  1. Im a violinist in salt lake city
  2. nothing interesting about me
  3. Im from brazil and my 2 year old is in love with space
  4. Im a professional librarian + belly dance
  5. three time jeopardy champion
  6. I drove motor home here
  7. Im a PR professor
  8. I operate a brewery
  9. I work for MacWorld magazine, and I cannot fix your computer
  10. Im a ventriloquist
  11. In my luggage is equal bits computer equipment and yarn
  12. I can see Baraka Obama’s condo from my house in Hawaii
  13. in first grade, my parents gave me a blue flight suit, and I wore it every day to scool
  14. I appreciate the support staff, give the guys setting up the lights a round of applause
  15. worked in three national parks before landing back in civilization
  16. I had a space shuttle operators guide and went through it page by page during launch prep
  17. can tie my shoes like anyone you never met
  18. 4,627 days ago, went to space camp, studying space physics as a results
  19. I have a two year old daughter, and another daughter on the way
  20. formula 1 world rally fanatic
  21. had careers as dress designer, illustrator, graphic designer, I just cant make up my mind
  22. I have 15 chickens in my back yard
  23. shared a plane ride here with commander Chris Hadfield (astronaut)
  24. Im an assistant scout master
  25. I recently won the lottery, because I was selected to attend nasa tweetup
  26. hard core packer fan, and did bring cheese curds for #brewup
  27. environmental scientist from Kansas
  28. I have been a Disney cast member for ten years
  29. have a bad Cadbury egg addiction
  30. remember being 4 years old in Scotland, wanting to be at NASA, and here I am 40 years later
  31. this year I took my students to space camp, students said in response to hearing my come here, “oh my gosh, my teacher totally knows someone at NASA”
  32. spending the next couple days trying not to swear like a sailor
  33. I manage a tv station
  34. i recently lost 115 lbs!
  35. I work for twitter! And I was part of the flight team that launched blue origins first rocket!
  36. I am a founding member of twitter beer club and member of Mensa
  37. a tornado landed in my subdivision last night!
  38. I am a children's librarian
  39. I still have all my newspaper clippings from Apollo 11
  40. I am lucky enough to share this event with my better half
  41. Im a national spokesperson for women in IT
  42. this is my third attempt to see a launch, I know it will happen
  43. I have a banner with me that has been to the top of mount everest
  44. I work at amazon with mechanical turk!
  45. I am a huge college basketball fan
  46. I will shoot the launch on color infrared film tomorrow
  47. my sons middle name is Apollo
  48. I was born 99 days before mankind landed on the moon
  49. we helped flat Stanley get down to Antarctica
  50. I built an airplane, an rv-12, that I am flying now
  51. in a very old video game, my ship was called the challenger before the challenger existed
  52. clare grant went to space camp as a kid
  53. seth green brought a poster for you to sign
  54. youth and family pastor from Nebraska
  55. formerly in the air force, worked on ejection systems for b1-lancer
  56. work for twitter in washington DC and im a space camp alum and I still have the manual
  57. my family name is on the Stanley cup, twice
  58. started my first company at the age of 12
  59. I work for thinkgeek.com
  60. im the second oldest person whose parents met online
  61. I represent boeing defense
  62. I have a working QR code on my...chest
  63. ive never been to space camp, but ive seen the movie
  64. I am a meterologist and im going to Kwajalein
  65. I read 450 books in 2010
  66. I cant spell, and im a speech writer for my congressman!
  67. there have been only 3 sittings presidents who have come to a launch, and this is the first presidential family coming to a launch

Space tweeps! I need your ideas for social media curation on the STS134 NASA Tweetup.

405380main_nasa_tweetup_100x75

I need your ideas for social media curation on STS134 NASA Tweetup. So, I am using this post to gather feedback from the #NASATweetup community (and anyone else wanting to help) about the best ways to use the web and technology to help tell the story of those experiencing the #NASATweetup. I am helping with the 134 tweetup event and finding creative ways of curating or amplifying the stories of those experiencing the tweetup.

By virtue of using twitter, we clearly are helping give a voice to everyone who attends and make available their first hand perspective. But, the challenge, for those observing remotely, is this creates quite a bit of social media noise. Now, don’t get me wrong, noise is great in a football stadium, car race, or even during a tweetup itself.

But if someone wants to casually observe the highpoints live or someone comes by afterwords and wants to see the “top tweets”, pictures, or videos, where do they go? How do they sift through without becoming overwhelmed? Retweets help; blog post summaries help; but what other ways can we “curate” this social media stream?

As of right now, I am planning on doing what I did at a past tweetup I volunteered at, which is to use a tool called Storify (they just launched publically). Its a tool that lets you take social media artifacts and link them together in a form of a story. You can view the one I created for the Sun Earth Day tweetup here: http://storify.com/jonverve/sun-earth-day-2011-nasa-tweetup

The Hook

But, beyond Storify, I am looking to you (yes, you!) for help in giving us ideas and examples of how we can do this in a better way. Please post your ideas below on this blog and share with me anything from abstract ideas to web links to examples. Also, feel free to mention me on twitter or email me at jonverve@gmail.com. These ideas could be simple or complex, they just have to be do-able by me and whomever else ends up lending a hand (perhaps you?!) So if you are interested in volunteering to help (in person or remotely) to implement your ideas, contact me!

I look forward to hearing about your killer ideas below... (you can simply log in with your twitter/facebook account below)

- Jon

PS. I am also considering using a platform like Django or PHP to enable some really killer functionality, but to be honest, without much prep, I probably won’t get to this before the Tweetup. If anyone out there in the interwebs wants to help me, please shoot me a gmail @ jonverve.

PSS. A few more great storify examples:
- http://storify.com/abcnews/uk-gears-up-for-royal-wedding
- http://storify.com/breakingnews/japan-earthquake-and-tsunami

Further Details on Data, Openness, and Open Source

First of all, NASA spacecraft designs will never be made on public social media, such as Facebook, Twitter, etc because of the ITAR Category 8 (Category VIII-Aircraft, [Spacecraft] and Associated Equipment) and Category 15 (Category XV-Spacecraft Systems and Associated Equipment).  When I talk about using "social media", I really should be using the term Social Business Software or socially enabled enterprise software, which infers its ALWAYS behind some type of security perimeter.

Within NASA we have engineers whom rely upon tools from vendors which help them address and solve the problems they deal with on a daily basis.  We will never as an agency get rid of some of these, nor should we -- they save incalculable time and effort because they are laser focused on solving on task.  We pay big bucks for some of these software packages and rightly should continue, as the alternative of relying on a open source alternative which is not even 20% as good as the vendor solution would cause us to have to hire 2-3 more advanced engineers at $150,000/person/year.  At that point $20-80k for a license looks quite affordable, given the alternatives.

BUT, I think we should work on the engineering architecture, and there is no reason why we cannot use open source in this area, especially because, in my opinion, that is the open source software that is the most mature.  Packages such as Twisted, AMP, OpenCalias, and potenially Drupal, Wordpress, and others provide wonderful architectures which I think can be leveraged where it makes sense to.

SpaceLog.Org - Use Of Open Source to Organize NASA Data

I want to call your attention to an incredible site that, by example, re-thinks the way we present NASA data – and a story that describes the process which the team behind its creation used, which is equally as incredible.
SpaceLog.Org
Read the stories of early space exploration from the original NASA transcripts. Now open to the public in a searchable, linkable format.
All of the original human spaceflight missions, such as Mercury and Apollo, have all their transcripts published online, but the drawback is their format. It’s all in scanned PDF format – images.  So you can read through the transcripts line by line, but if you are looking for a particular item, or text, it would be a very manual process. Well, this group from the UK, /dev/fort, thought that these logs needed to be made more accessible and wanted to develop a framework to get them on the web in a more accessible format.
Screen_shot_2011-03-22_at_3
It is tremendous. It is not only well designed, but is very easy to use and the data is at your fingertips. I personally would love to see sites build by NASA (or its representative) to take more of this form, especially where there is more data and less narrative. This is especially needed within NASA, where we do our work on state of the art spacecraft -- often designed with less than state of the art engineering tools. We tend to rely on the "tried and true" tools, such as Word, Excel, Point Power, email, share drives, and SharePoint. Attached below is a 3% zoom of a spreadsheet of all the missions that Goddard has launched and their metadata. Imagine trying to use this data collaboratively or making it accessible to more than just a few engineers! Good luck…
/dev/fort
The story of the development of this tool is actually just as impressive as the tool itself. /dev/fort is actually an old fort that folks go to in order to code up projects, but its very different from other hack-a-thon type events, in that it is unplugged. Unplugged? So they don’t ban computers, because much of their work is actually coding, but what they ban are connections – no Internet, no IM, nothing on the information superhighway at your fingertips -- there is Internet, but you have to hike up a hill to get to it. So this also means no distractions. I would say more, but you have to read for yourself the article about this development, you will be glued to the page as I was.


Missiondata

Game Changing Engineering :: Respect for Data, Culture of Openness, and Social Software

This entry is cross posted on openNASA.com. Leave a comment here or on the cross post.

So I have been thinking quite a bit about new possibilities of doing our engineering work of building spacecraft at NASA Goddard. Much of the work we do is truely world-class and routinely we push the edges of science and engineering capability by leading the way technically.

But I find that often we are like the mad scientist who invents new technology that is going to change our lives, but can't seem to find his wallet. It seems that we often cannot do some very practical, day-to-day activities to keep our "capability engine" well tuned, poised, and ready to strike at solving the next big problem. I think there are tremendous opportunities for us at Goddard and more broadly across NASA to improve our process of the way we do engineering and to introduce some new tools that will substantially allow us to stop re-inventing the wheel and focus more on solving the titan challenges we face everyday. There are three areas which I believe can tremendously help. They are the title of this article. I will dive into each of them below.

Read the rest of this post »