November 20

Meeting with Dr. Larry O'Neill

Assistant Professor

Air-sea interactions, satellite meteorology and oceanography, atmospheric boundary layer and ocean mixed layer dynamics.

A Bit about Research Data

Working on Data Remotely

Acquiring Datasets

Using Outside Code / Sharing Code

Front (Oceanography)

Resistance to Change in Research Tools

Making Research Movies

Version Control

It Takes a Generation

A Bit about Research Data

Q.1

So a bit about our project [...] the initial project was essentially trying to help earth researchers like do earth research using remote sensed images easier [ok, that's good] how we're going about that we're still in the process of trying to figure out.

We're really looking for information about how earth researchers use remote sensed images and just information about how that works-- especially any challenges or any problems that people run into.

Yeah, ok, well this is good. Have you talked to anyone else around here?

Um, yeah, I've talked to so far Ted Strub-- Ku'uipo from GIS and then this morning Robert Kennedy

Ok, yeah, that's good one to talk to. Yeah so you probably know now that we all use remote really sensing differently

Yeah, that's right, especially oceanographers vs...

...land surface people. Yeah so I'll just tell you a little bit about what I do and then-- I'm probably going to be a little bit like Ted Strub-- he uses a lot of ocean data especially near the coast. I end up using a lot of ocean data as well. Like ocean winds and moisture and temperature fields. I also use quite a bit of clouds like remote sensing like clouds and things like that as well. And so I do a lot of like-- my visualization is really bad [laughing]

yeah but some of our datasets are really large like they measure into the tens or hundreds of gigabytes and so-- is your project more in the visualization of data or just working with large datasets?

Q.2

Well specifically the team we're working with is from the NASA/JPL GIBS Team so they produce images from data and then they like serve those to researchers hopefully, ideally, to help them do research--

so maybe visualization-- I know Ku'uipo said that earth research like using GIS Systems or prepared images for visualization purposes but not necessarily for actual research 'cause they prefer to use the raw datasets themselves

Yeah, and that's where I am at so I don't know any GIS systems actually. You know, at times wanted to, but most of my stuff very-- kind of raw so I do a lot of stuff like-- processing the data from various things and you know I like to deal with the raw data. And GIS isn't as good for that but, they on the other hand, produce much better visualization and I think there's a lot more tools in there for doing that-- but I don't do as much of that so yeah as far as GIS goes no.

I end up doing a lot of my stuff in either MATLAB or Fortran or just like, shell scripting. I'm not very good at it [laughing] you guys are probably much better at shell scripting and stuff I end up having to you know, if it's anything else other than simple stuff I have to look it up online and see if anyone else [laughing] so yeah that's one thing is that earth, you know like people like me n stuff sometimes we end up having to do deal with very large data sets and shell scripting especially is very useful---

yeah as far as visualization stuff it's really I think that's a real challenges because you know a lot of times you know I just want to make a simple map or something-- you know JPL's having really nice things, I'm actually on their user working group for PO DAAC [yeah, I saw that] yeah, so what's really nice is that they you know, you can-- sometimes you want to just like a day's worth of data and you just go and say "before--", say: "show something interesting or something that I want" and it's really nice there because you can go online and do it without having to-- 'cause otherwise you have to download the data and you have to know how to read it or write the reader and sometimes if you just want to look at like a day's worth of like temperature or something-- say if someone asks you for or you want to look at it, could take you a couple hours to make that map if you don't have the data-- so the stuff is really nice.

So it's really nice to have it segmented temporarily is what you are saying?

Yeah

Q.3

What about spatial as well, do you ever identify an area of interest but you have to download more then [oh yeah]

Yeah actually one thing I think that's really interesting I don't know if this might be completely off topic but it's just maybe something that is interesting-- so you end up downloading and you end up getting each data file for one time so you know it's a 2D field of one time-- it might cover the globe so you have like you know, two thousand grid points in the longitude, and maybe 500 latitude or something and you know maybe--

so the data is stored in these 2D files and if you want, say do a time series of one point in there you have to actually load all the file for every time in one of these systems and-- or you have to look through a bunch of 'em and so the data you know is really a 3D data but the file system like really good for 2Ds and I've always through about-- you know so if you want if you just want to do a time series instead of a map at a point or something it's really a problem to, 'cause you know getting the data--

For someone like me, I think there might be solutions to this but it's not-- as far as earth researchers-- it hasn't really tricked down so I don't know.

Q.4

So the knowledge is there, but it must be available?

I don't know if it is or not, I'm sure it is 'cause it seems like some that's really-- like other people-- this happen in other types of data structures or something, it must be some sort of data structure problem and I just-- you know I have so like I have like data that I can show you you know I.. have--

[Looks up computer stuff]

A lot of monitors, we should have a record for which researcher has the most amount monitors available haha [Yeah]

--so I have like this data set for Sea Surface Temperature so each one of these names is a file name-- and it's just a 2D field and so if I want to know-- like recently was a good example I want to know the temperature-- someone asked me about the temperature at a morning that they had rid off of hawaii-- so, his one point-- so there's one number in every single one of these files that has million numbers in each one-- so you have to look through all of them and it's kind of a pain-- and anyway that's one thing that I think of for-- the difference between a map and a time series.

Working on Data Remotely

Q.5

Okay, well staying on this topic I know this morning I was talking to Robert and he said one development that could be done is kind of keeping the data where it comes from like USGS and then kind of working on it-- he said "bring the algorithms to the data" vs vis-versa

Do think that's something that if it was available that you would take advantage or do you think that's something that you like-- you personally like having the data available right on your own computer or in your own data center?

A lot of times I like having it available, probably because I've-- I think the datasets that I'm involved with are kind of like "medium sized", in the order of hundreds of gigabytes or tens of hundreds of gigabytes-- which now is.. it used to be that was a untrackable amount and now it's.. [small]

So I have a computer down in the-- environmental computing center I just have a-- RAID and its 70 terabytes-- and it's mostly full now [laughing] and so it's-- I have a bit of storage, and actually that was only like $20,000 a couple years ago so storage is really cheap--

A fair amount of money for like an intern student

Well, yeah, yeah it is, but as far as like some of these computer go, you know it's actually not bad.

Q.6

So you have a lot of experience then managing that kind of data-- because you set up that RAID yourself, or--

Yeah, yeah I setup the raid-- it's a RAID-2 it's have about 48GB of unique storage and it raids to ~72tb. So yeah so I have a lot of these datasets-- like I have a bunch.

So back to your original question about-- yeah it is nice to be able to access some of these bigger datasets so I think Robert Kennedy deals with a lot of Landsat data and he-- so Landsat has-- so it's like-- images of the earth at like 20 meter resolution-- so you have like-- every single day you have like, you know, hundreds of gigabytes of all sorts of different things-- for that that's a huge problem-- he tries to store it locally so for him it makes more sense.

For me, it's not quite as much-- although like I don't know-- if you guys are thinking about like OpenDAP servers and things like

Is that the like the same thing as PO DAAC?

Yeah, yeah. So you can so I guess it's a protocol-- I know JPL and PO DAAC they have-- this thing it's called (pnep?) basically you can query data from-- you know scripts you can say-- you can point to-- I think it's a web address or something. I don't use it very much.

Q.7

That's what's I think is really interesting, Robert said LP-DAAC does exist [Yeah], but not very many people use it [No] because they can already just get the data some other way(?)

Yeah-- right now when-- storage is still kind of cheap so you know-- for me I end up using the data for a lot of different projects and stuff so sometimes it's nice to have it so that I can--

So you don't have to find it again, or..?

Yeah. So you don't have to find it again and sometimes I just need a lot, like I'll want 5 years of global wind data-- which you know each file gunna be a couple megabytes and then and then every day 365*maybe say 5mb and then so to-- if I have this script that does store some sort of you know-- it does something to the data-- analyzes somehow-- and then-- do it. So if I you know first time you read the script and its starts loading and I've made some sort of mistakes and it screws up you know if the data-- if I had to access the data-- if it wasn't.. cached.. or something-- I'd have to download it over the network every time I, you know--

Well the idea would be you would be running the script over there, so

Oh! Right, okay yeah--

So it wouldn't be pulling it as you ran it, it would goes through

Oh yeah yeah, okay. That's something else [Face becomes scrunched up with intense thinking and focus]

Q.8

But I can already see that it's like complicated enough where you are like: "Well I kind of like how I'm already doing it"

Yeah, it's like-- I do kind of like having it locally-- so I have done some of this "where you run it remotely" you do some scripting-- one of the things, I'm not very good at-- you know a lot of the scripting was, I think perl or javascript or something-- and we don't learn that--

You really like your Fortran, MATLAB, you're not open to maybe--

I'm open to changing, but at this point sometimes I'm kind of like crunched for time or a deadline or something

because I know this you know-- this might not help you and I know people don't like changing but they say once you learn one of these higher level languages you can achieve productivity of like 10x faster vs Fortran

Yeah, I wouldn't doubt that. I actually use a lot of-- I actually mostly use MATLAB because..

That's more like javascript, that's more higher level so that makes sense

Yeah, I do it there and so I've-- I learned it in Grad school and I have like a core-- probably couple 100 analysis routines that I've written over--

Q.9

You have your like-- util dump?

Kind of yeah, yeah, and so it's stuff that you know-- over the years, you know-- found all the mistakes in them and now I'm kind of comfortable with what they do and whatever and-- so sometimes going to a remote thing you don't always have access to that so when you-- so having it locally is good, so that's one drawback sometimes the--

We've talked about this at the user working meeting is-- 'cause I guess PO-DAAC has something-- they're playing with-- they have like a PO-DAAC labs type thing-- they're working on something and-- I guess one of the things they had is kind of LDAP for whatever--where you can kind of do like simple subsetting of the data locally and doing some stuff and-- I've always-- so one of the things I've wondered about if you're doing that-- if it's popular and lots of people do that-- are they going to have the amount of resources that you know-- it's going to be so if you were all trying to do some things with the data--

So you're worried about basically sharing

Oh ya ya ya, sharing. but I-- I haven't used it very much, so I imagine it's still pretty good right now because not many people use it hahaha, unfortunately

Acquiring Datasets

Q.10

Ok, so another questions that I have is more about your research in general. How do you personally acquire the datasets that are you're working with?

So-- for the longest time it's just simple ftp so I'll run-- I just have like a bash script or something or yeah a bash script that just kind of loops through and uses anonymous ftp-- or like wget and you know put that in a bash script

Q.11

What datasets are you working with? Scatterometry, right?

Yeah, so wind scatterometry-- any sea surface temperature datasets that there is I've probably-- either have or used it-- some like "goes" there's a satellite called "goes" (geostationary satellite) so it's pretty good to--pretty pictures of the earth of the clouds-- so I use some of that data...

Q.12

So you FTP it down and keep it around-- in case you want to ever go back?

Yeah in case I want to go back-- so this is like an example [shows script on screen] it's just anonymous ftp it just builds the filename and I have to go through the anonymous or the hosting a server and figure out the directory structure and then I have to build the filename that I want.

The reason that I do that is so-- 'cause a lot of these datasets are continually updating because the satellite collect the data and every couple months, you know when I say: "hey I need"-- I want to look at the last couple months of data and I haven't run it in a while I'll just run the script and we'll take 20 minutes or something to download and-- here actually-- well I guess everywhere now but here they got-- some really good connections-- like I can get some really good speeds-- so this stuff downloads pretty quickly.

Q.13

So PO DAAC is supposed to pretty similar to that right? Isn't that supposed to be what PO DAAC is supposed to replace?

Yeah.

So you're on the user working group, but [it's] not really something that's used too much?

Yeah that sort of stuff I haven't used too much-- and the reason is is that I think some of that stuff is-- I think their design is set to be used kind of towards you know, kind of like novice to intermediate users who are-- kind of still getting used to or just want small parts of the data-- or something.

So you're worried that it's not all there, or?

No, I just -- I worry about that-- we'll not worry-- but sometimes I need access to more than it's designed to have-- like global-- a lot of times they end up using global datasets for many years and-- the tools they have aren't designed for things that are that big.

Yeah, so that, yeah-- so that was my thing we've-- so there's actually someone else in the working group who, uh-- he tries as hard as he can to "break" some of their tools-- because he knows a lot more about the backend stuff-- I don't know anything about the backend and he breaks it pretty easily-- with fairly small types of commands-- things like a tenth-- he tries to get like a tenth or a hundredth of the amount of data that I would try to get and that breaks their stuff because-- one of the things is it just takes way too long, like if he tries to subset you know, maybe an area within 500km of hawaii or something of temperature you want it for two years and you will enter that command into LDAP and it just-- it essentially freezes [laughs]-- or it takes a long long time and one of the reasons is because they-- I guess how their back end is that they-- and i think they are working on fixing this but they-- it's how they store their data and compress it-- and so when you read it in.

So they store it compressed, they don't uncompress it until there's a user call-- so when there's a user call it takes a file, uncompresses it, reads it in- and then it keeps it in cache, but it expires really quickly-- and I guess there was-- they were thinking about changing the file system to-- I think there's another one-- I forget what it's called-- you guys might know-- it's a different file system structure that it is already compressed-- and so when you read it from disk the disk does the uncompressing.

yeah, it's ZFS

Yeah, ZFS! That's it [laughs] so they were thinking about doing that but it wasn't trivial

Yeah, they have a lot of people there, but.. they also have a lot of stuff to do

Yeah, Well no, it's a great idea if they can do it but I think--

Okay so I get it, you have something that works, and and it's like-- why would you want to spend lots of time re-inventing the wheel when you already have the data

Yeah! That would be it, so if it could be made so that-- I mean it's a great idea-- I've thought about maybe having-- you know because other people from that college sometimes have worked on similar data sets or something and so-- you know, not everyone and not very many people but you having local copies or something or having a local thing where you can--

They should just mail hard drives like every month with all the data and then just everyone shares it! [Laughing]

Yeah, that would be awesome! [Tone of voice: seeing a long awaited package has arrived] "Ah! that's my new.. temperature dataset. Oh man!" [laughing]

Using Outside Code / Sharing Code

Q.14

Okay, going back to what you were saying about you have like your util that you are working.. what are you .. how do you feel about sharing code? I know some researchers are really hesitant on using outside source code.

Yeah. I share it fairly freely. I share my stuff-- at least the stuff I know-- I'm pretty sure it's not buggy [laughing]

Um, yeah, it's just-- I mean there's some code I have-- like MATLAB is really easy to not to-- document your code and stuff it's really easy to right bad, bad code-- and I have got some of that-- and also some nicer stuff.

Sometimes like your own code, yeah you guys know this-- it works for your project-- and you're not 100% sure if someone else uses it-- if they enter a parameter space that it wasn't designed to and it fails or something and-- sometimes it's kind of like pride because here you worked really hard to just get it to work for yourself and it works really well and your stuff but if it somebody uses it for something that's slightly different than what it was designed to then they're gonna say "AH! This guy writes shitty code" [laughs]

Q.15

But what about the other way around, do you ever use other people's code in your work?

Yeah, occasionally, sometimes I get code from other people that [laughs] doesn't work? It's because it works on their stuff and not mine.

Q.16

But it's just like random, individual researchers, not like necessary packages?

Yes, yes. Packages I do. Well, both. Yeah, so some days I'll take somebodies stuff and sometimes times it takes a little bit of time-- to see how it's supposed to be used? or something or maybe there's-- you know, how you're trying to use it is slightly different than what it was meant for and sometimes you can just look at it and see-- you know maybe it's a simple thing-- your array is, you know, transposed where it's supposed to be when you're putting it in or something. But yeah, usually-- yeah it's just learning how to do it basically.

Q.17

How do you learn about these things, is it word of mouth or do you actually spend time looking up different packages?

Both, yeah so sometimes yah like, data readers are a big one-- so sometimes you get like a data set-- you know, now it's becoming better with using like netCDF or .hdf formats which are, you know, easy in MATLAB to get-- but sometimes people have like their own binary format or something-- or it's just put into an unformatted binary file-- and you're like "Wow, the hell is this?" [laughing]

Yeah, and so you just have to know kind of-- if it's like gridded data or something you just have to know how it was formatted or something-- so yeah so definitely that.

Sometimes looking for utilities like-- one example that comes to mind is a couple years ago I needed to fit an ellipse to points-- you know so I had like points and I wanted a least squares fit to fit an ellipse to those points and-- you know I didn't want to spend a whole day writing the code to do that so you just google it [and there it is, laugher]-- and somebody like a mathematician who is very good-- write this bulletproof ellipse routines. And yeah I download it and try it once and if it works like it's supposed to like that then "that's cool"

Q.18

One thing I'm always like curious about-- me personally-- is how can you be sure that that code doesn't have bugs necessary-- looks like you have the right answer but you don't [yeah] especially because I know a lot of researchers don't-- the code isn't -- you don't put your code at the end of your study -- you might hide it, you might even delete it. How do you know?

Yeah so it's a little bit faith.

I mean, you know there are two things: 1) yeah you use it and see if it gives you something reasonable, so if it gives you something reasonable then-- like for the ellipse example if the ellipse looks like it fits okay-- then it's pretty good-- the other thing is too if I try to do it-- you know sometimes you do it right the first time and sometimes you don't. It's the same thing about my own code-- how good are you, and are you sure about your own stuff.

Yeah like over time I kind of have gotten better-- like I don't find many instances where I downloaded-- a piece of code that was just wrong-- most of the stuff I find actually I feel like have for instances from like the matlab central-- people can "rate" it or something

Haha, five stars = it's right

or you can read the comments for that stuff-- the stuff you get from other researchers yeah it's  you kind of trust the reputation. Like, I mean I have had people give me code that I didn't trust them so I didn't use it-- but then there are-- most of the time it's-- you're working with someone that you know is good, so yeah

Q.19

Their reputation extends onto the code-- even though you said for your own code maybe not the best all the time?

Yeah. Although over time the stuff I've written-- obviously over time you get better but also some of the stuff you use a lot is you've kind of used it so many times that you've fixed it-- that all of the.. [well polished?] yeah well polished-- I have some routes that are really good -- bulletproof I've-- any time I get a funny answer from it and I go to look to see if there's a code error or something-- so I've looked through it so many times that yeah it's-- yeah. I'm pretty confident with it.

Q.20

Going into this project I didn't even know that researchers even wrote their own code, I just figured they used tools available, and so it's been a pretty eye opening experience to see that intersection between researchers and software

When I was an undergrad I took a fortran class-- and then I took a regular shell scripting kind of class-- and then when I was in grad school I sat in on a MATLAB class but I basically taught MATLAB and once you learned the basic thing about what a code is-- like a loop, conditional statements, the other statements you can kind of-- they seem logical. It's like speaking a language or something-- once you can at least where the bathroom is, it's not very hard to say "Oh, I'm thirsty, I want a beer" [laughing]

I end up doing a lot of coding, it's not necessarily very difficult coding-- just analyzing the data and just stuff like that-- it's not super challenging coding I don't think.

Front (Oceanography)

Q.21

[So where do you think your project is going?] We'll you know, we're a little bit lost at the moment-- [originally] the idea was to help researchers find things-- for example an ocean example-- your research is more into winds is that right?

Yeah, actually I do both what's called air-sea interaction so I look at both, so how the ocean atmospheric influences each other-- so one of the things that it does is surface winds it also has to do a lot of ocean data-- ocean eddies--

So that's what I was going to mention, I talked to Dr. Chelton, Eddies would have been an example where you might not know exactly where and when it happens but from what I understand, finding eddies isn't too easy-- it took him a long time to find algorithms to do that

Yeah, so actually Dudley was my Grad school advisor-- and actually I've worked with him a bit on the project. One thing that's kind of interesting is I'm looking a lot on how storms affect the ocean, one thing that-- one example is front-- [pulls out images]

Yeah Dudley mentioned fronts

Yeah, so there's fronts in the atmosphere so this is a-- looking like at hemisphere fronts or something. This might-- I'll just give you an example of something I'm working on to see if it helps anything so here's like the east coast of the united states here [points to the map] and this is a cold front here--

so it's like a low pressure system here and this huge front-- and this is kind of indication that temperature is top of clouds so the top of the clouds are really cold and the surface is warm-- and so its like this huge cloud system and its really windy here and a lot of rain and snow and stuff and so there's this big ocean front that comes under here and so I'm studying how storms like this are affected by the ocean front and one thing is the you get-- so this is an example a surface winder field how it converges air--

so air converges along the front and it causes upward motion that causes rain underneath-- and it turns out-- these longs tails funnels stuff and with time they kind of propagate along-- I'm trying to figure out ways of tracking fronts-- how they evolve and stuff-- and it's been difficult [laughing] it's been difficult to do that-- so I mean I've ended up doing other things it-- might be an example of something..

Q.22

Something you are tracking or monitoring?

 

Yeah, monitoring or something, so if that helps too-- that might be an example for you to looking on. I was looking at weather models-- here's just that same front in the weather model-- so it actually looks-- this was the satellite data here and this the model-- so there's some differences so yeah-- trying to compare them-- yeah so things with like sharp funnel-- eddies are one things-- they have like little fronts around the eddie--

Resistance to Change in Research Tools

Q.23

Thanks. [...] one big thing that I've noticed-- I think there's some resistance in terms of how information gets out there like-- the GIS people would say: "GIS Systems can be used for analysis", but in order to convince a researcher that a tool can be reliable or can be actually used for analysis would be a huge barrier and people wouldn't be able to cross that

Yeah, I would be open to something except there's a learning curve to use these systems. Sometimes you don't want to climb that hill, but yeah I mean I would be open if there's an easier way to do something.

Yeah, learning-- you don't want to take risks in terms of time if there's no payoff

Possibly, or if you're sometimes you get a system set up-- like Dudley-- when I was in Grad School-- they do everything in like Fortran-- they do everything in this old OLD 1970s... [IDL?] 

No, no, it's a Fortran based program except it was for-- they adapted it for so back in like the early 80s there was this old printer systems they used to make these maps it was really system, it was really slow but it was this plotting system and it was basically like a dot plotter-- they called it-- and so it has this code that did it and so basically to make a map you had to write a thing that said "GOTO POSITION X Y" put a DOT-- and then "goto next one" and put a DOT, so anyway he adapted this system to make his own plotting thing-- and that's what he likes.

and I tried using it once and it's not-- it's ughhhh. It doesn't do very much and so-- it's like well I started just using MATLAB so he used to give me all sorts of crap because, "Oh MATLAB :(", "Oh, you have to pay for a license :(", and "You can't take it everywhere :(", and "Oh, they might go out of business, and then you're up shit creek!" [laughing] and I got all this grief

Q.24

 He might go out of business and then how are you going to use his plotter?

Yeah! So now, you know, he's retiring-- and he's going to take that with him, and now I'm with MATLAB and now people are coming up with better things and I'm going to say: [Creaky, old voice] "Oh, MATLAB, it's great :}" [laughing] and it's, you know, not because MATLAB is great, it's just because all I know-- well kind of what I know-- yeah I do want to learn some of these other things because it's really cool to there's a lot of-- I like to make better visualizations and that's hard-- making research movies-- seems simple.

Making Research Movies

Q.25

Are you talking about when it shows changes over time?

Yeah, so if I were to make-- I've done this I've made animations like this front or something and I end up-- the easiest thing-- MATLAB has something-- like an MPEG encoder-- and it's not very good-- it comes out really grainy with a really big file size and so I end up making PNGs of all my imagal thing and then I have a folder with a thousand PNGs for each frame and then I use this UNIX utility called FFMPEG. [..]

I played around a lot with it but I felt in the process of doing that I was completely over my head. I mean I'm not-- you end up getting into the details of like how these things get encoded and as a researcher it's not-- it's like you're axis are the same in every frame but somehow in the file it has to like-- every frame has to have like information about where the little axis are and so it makes this huge-- to me it seems like it made this huge file size-- and so necessary parts of it didn't need to be repeated necessary I don't know if-- I'm sure there's a solution out there-- better one then I thought

Q.26

there are people in that specialize in encoding video specifically-- that's like its own field

Definitely-- yeah so I feel completely-- and you know I tried to like-- [re-assuring yourself voice] "oh, I'm a smart guy I'll to figure this out"-- and you get into and you're like "No! This is very complicated stuff", so I use FFMPEG-- I'm one of the only people I know who even got that far-- most people just use MATLAB-- most people just use the simple tool which isn't very...

Customizable?

Yeah, not very customizable, and it doesn't do a very good job I don't think. FFMPEG does seem to do a little bit better if you can give it just-- but then FFMPEG it's own limitations, you have to name the files sequentially

Have to have enough zeroes [padding]

YES! And you can only have ten thousand [images]-- at least from what I can see-- [..]

So yeah that's maybe my two cents, I would use animations a lot more if [it were easier]. You know using FFMPEG is not that hard I guess you can just write your loop in MATLAB and make a map. You know all your map and different and then just point FFMPEG to that directly.

Version Control

Q.27

So you mostly work by yourself on your research projects? I know there's a lot of other co-authors, but specifically-- like do you view research projects as more like an indie novel or more like writing [meant making] an indie movie?

That's a good question. I guess starting off in my career-- I mean you interact a lot with your advisor and maybe some other people on your committee it was mostly by myself and as I gotten more in my career have noticed-- yeah now I work more with people. I still do a lot of-- somebody says, "Oh can you..." you know, "Can we look at this fruit-- this thing and this variable", and I just kind of sit here and do it and then print it off and then show them. But yeah we don't have a very good system for you know-- if you work on a little bit of code or something somebody else can go and look at your code and say...

Q.28

You guys don't use version control? (!)

No [laughs]

Actually I used to work I used to work with the naval research lab down in monterey-- so they run like an operational weather model and they have-- so people are continuously working various elements of the model and they have version control for that so-- but they really need it.

But with this stuff-- we don't work enough together were that would be as useful-- I would really like-- for a nascent individual sometimes the really good code I use all the time-- actually I did this recently where I script I use to read in a lot of my data sets-- and it's kind of really nice-- it subsets and all this stuff-- and I made a small change and it broke it. [laughing] and it was like [regret voice] "I just want to go back to the old one" but I can't-- and I know that at the very least I should make copies or something I think I ended up having to ask the computer guy to get the backup because they do backups every night-- yeah version control would be very good.

and it definitely sucked because I don't know how much trouble it would be.. [laughing] yeah we need that.

and it depends on what level researcher you get but from me-- now that I'm kind of on the faculty doing the sort-- making maps and analysis-- I end up doing that a lot less then I used to-- I just feel like sometimes I just don't have enough time to learn some of this newer stuff

It Takes a Generation

Q.29

I know at some level you get some sort of students to do all your code for you

[Laughing] Yeah, I'm not there yet. Yeah well I have an undergrad student-- and I think she'll continue on to be a grad student under me but she's-- she's learning MATLAB and stuff but yeah it would be kind of nice to actually eventually to be able to share some of my code with her-- so that she could learn how to use it or something-- 'cause I mean there's some of my code that's okay.

Q.30

You got that one folder that's like "Perfect Code"

Haha, yeah there's some stuff that would be-- a lot of my stuff would be good for her to learn or something-- but for like the student perspective-- usually it's good to just kind of-- when you're starting out to kind of let them develop their own stuff and 'cause you know in the long term it will be good for them to learn how to do that and-- It kind of gives them some examples but then you start in Grad school it's really good to be able to say "ok, can you look at this dataset" and let them go at it and figure it out how to download it and how to read it in and how to extract what they need to.

Letting them developing their own stuff while doing this, it might be soon that LDAP becomes more prevalent

So it takes a generation

Yeah I'd be really happy to figure that out especially now that these datasets-- there's these repositories now you don't have to worry about: is it going to be there in two years? or something..

Q.31

That definitely used to be a problem with the web, things would be shut down we'd never seen again

Yeah research data sometimes this is a problem-- you have a dataset you really like and the grant ends on that and it doesn't get renewed and so that dataset doesn't get-- sometimes it just gets removed, and that's it, it's gone. And so if you didn't download it haha-- but with PO DAAC & JPL that's not a problem..

Until NASA gets defunded

Well.. that's where my funding comes from soooo..

[Laughing]