Image of Earth

5 solvable tech problems in science

By Andy Chase Aug 27, 2016.

Story of these interviews
Problems
Transcript sources

Story of these interviews

For my senior project at Oregon State University I was assigned to work with NASA JPL to build a science tool for researchers to use. These problems are the results of interviews I performed as part of customer validation.

Problems

(All interviews referenced are dated 2015-11-19 through 2015-11-24)

(P.1) Software packaging is inadequate

Documentation isn’t always there

(Hutchings Q.40) [Talking about derived data] “can’t figure out what’s going on without documentation on how the product was gridded”
(O’Neill Q.16) “I take somebodies stuff and sometimes times it takes a little bit of time– to see how it’s supposed to be used?”

Outside code doesn’t work / Isn’t fully tested

(O’Neill Q.15) “Yeah, occasionally, sometimes I get code from other people that [laughs] doesn’t work? It’s because it works on their stuff and not mine.”
(Hutchings Q.27) [About software bugs:] “So you know how endemic they are then– they are mistakes everywhere!”

Code not available in all languages

(Hutchings Q.25) “It all comes down to where you find your code, so I’ve used R– because there was code available.”

Outside code isn’t trusted

(O’Neill Q.18) “Like, I mean I have had people give me code that I didn’t trust them so I didn’t use it”

Researchers are open to using software packages

(Scientists via Kuuipo Q.13) [Are scientists open to using packages?] “Yes, yeah. Especially Open Source tools and libraries,” – “[For example] students will start off by learning R then they will quickly start using all the libraries that can manipulate the statistics geographically.”
(O’Neill Q.13) “Sometimes [I look for] utilities like [ellipse routines] And yeah I download it and try it once and if it works like it’s supposed to like that then “that’s cool”
(Hutchings Q.26) “It helps to have access to people’s code when they have solved problems” (Q.23) “I think we’re now in a world where free sharing information and algorithms is a good thing to do.”

(P.2) Work is often re-done / Wasted work

Researchers write software that isn’t saved or reused

(Chelton, not noted) – Feel free to re-implement the algorithms I listed in my paper for finding Eddies
(O’Neill, Q.14) Shares only some code and only with certain people – “Yeah. I share it fairly freely. I share my stuff– at least the stuff I know– I’m pretty sure it’s not buggy [laughing]”

Derived Products aren’t trusted

(Shell) A lot of work goes into derived products but many researchers don’t use them

(P.3) Version control is inadequate

(O’Neill Q.28) “I think I ended up having to ask the computer guy to get the backup because they do backups every night– yeah version control would be very good.”

(P.4) Knowledge is not shared / Researchers have to learn about things outside their domain

(O’Neill Q.25) Had to learn how things were encoding for visualization “- you end up getting into the details of like how these things get encoded and as a researcher it’s not-“
(Hutchings Q.47) “Well we don’t even realize that it’s that easy to get the data haha, that’s funny.”
(Kennedy, Q.3) “that’s a lot to manage and it requires a certain level of expertise and interest in doing the computer management and all that stuff which not everyone has”
(Scientists via Kuuipo Q.16) “researchers are still creating their own data” – “and don’t even know that other researchers exist or that other data exists”
(O’Neill Q.17) Researchers don’t always use existing formats – “but sometimes people have like their own binary format or something”

(P.5) Data can be hard to work with

Not indexed in the right way (temporally)

(O’Neill, Q.4) Going through to find time series is a pain – ”so you have to look through [millions of files to find one point in each one] and it’s kind of a pain”

Data not being in the right format / poorly documented formats

(O’Neill Q.17) “but sometimes people have like their own binary format or something– or it’s just put into an unformatted binary file”
(Hutchings Q.16) – ”if the data is not provided with a way of reading it– no one else can use it so– as you said it’s useless.”

Researchers have to deal with a lot of data

(Kennedy) Landsat 100tb
(Jamon) Landsat 100 gb (currently)-100tb (ideally)
(O’Neill) Various 70 terabytes

(Shell) Various 10tb

Data goes away (or at least used to)

(O’Neill, Q.30) “now you don’t have to worry about: is it going to be there in two years? or something..”
(Shell) You don’t want to lose access right before the deadline

Transcript sources

Here are edited transcripts from the interviews

As a typeset PDF [400kb]