Meeting with Dr. Jennifer Hutchings
Sea ice dynamics
Difficulties with Remote Sensing: Uncertainty and Limited Resolutions
Sharing Data in Ice Research & Binary Formats
Code: Students using Python, and Trusting and Sharing Code
Sharing Data Part 2
Personal: Do you read your own research, and why become an Ice Scientist?
General Public: Untapped resource?
Discovering NASA Web Services Exist
First off I want to tell you a bit about my project so you know where I'm coming from. I'm working with NASA JPL to help build a project to help earth researchers and specifically we're trying to use remote sensed images to help researchers do their work
[Who are you working with at NASA?] Well, specifically it's the GIBS team, which is like the Global Imagery Browser Service Team, and they make WorldView Maps basically, and people's names are Lewis McGibbney and Charles Thompson.
Hm, I think I'm familiar with the data.. but not the people :)
Okay, so my first question is what challenges do you think researchers face-- ice researchers face using remote sensed images in their research?
Oh, wow, that's a big question. So the biggest challenges is a lot of the remote sensed products-- not necessarily the images but the products-- are not necessarily provided with.. uncertainty...?
Uncertainty? You mean like bounds of error?
Bounds of error and bias. and this is actually a huge area of research because on some of the products we don't necessarily know what uncertainties and bias are. Thinking of particular some of the ice things/products that are out there...
Like SAR? I saw you used SAR in your research.. the scatterometry?
Yes.. I do, I used-- I've used the radarSAT geophysical-- processing system? Which [? Ron Clark] worked on-- which is looking at Ice Drift. Yeah! one of the biggest problems with that product is that it isn't fully documented. So that can be-- a huge challenges when you realize you're working with a dataset and you don't understand what's causing differences between the data you're working on and another dataset.
So I'm interested-- you said --remote sensing-- you don't the images at all in your work?
I do use the images, so typically I use visual or thermal images [like modis?], modis yeah. Normally it's level 1b basically-- the calibrated stuff I'm not interested in calibrating my own data [laughing].
I noticed you have a Ph.D. in remote sensing is that right?
My Ph.D. advisor was doing remote sensing, I used remote sensing in my thesis, sometimes these titles get mixed up-- so the majority of my Ph.D. was in modeling.
Ok-- I was interested in that because I noticed-- some of your latest work is deploying buoys and bots-- so you're actually collecting data on the ground [Yeah] not necessarily using remote sensing.
Yeah, that's right.
I saw that--earlier work you looked at tracking ice using remote sensing-- but it wasn't reliable enough to not need buoys-- seems like you mostly used that to figure out where to put the buoys.
Yeah, so what happens is with satellite overpasses you only get a snapshot of the ice path-- somewhere between every six hours in particular locations, or three days.. depending on the orbits. So I'm interested in sub synoptic scale-- [What's that?] which is meaning "below the weather scale", so you know weather synoptic time scales is the time scale a weather system exists over which can be 3 days to a week, or 2 days to a week. A lot of the work I'm doing is at inertial motion of the ice-- or tidal motion of the ice pack, which is all on subdaily timescales, and so you can't actually look at that with remote sensing.
So it's nice to-- when you're trying to understand how ice deforms and moves you have to look at it on high time resolution as well as high spatial resolution-- so that's where we need to blend remote sensing together
So how do you blend that?
So-- I don't blend it right now [laughing] I'll say-- That's a high level product that would be quite difficult to make.
So you mostly use [remote sensing] for visualization purposes?
Yeah I'll use the imagery for visualization-- what the field of drift looks like-- and then overplot buoys on that and I use the buoys information for time series analysis. So yeah it's-- you end up synthesize different data sets together that are not necessarily compatible with each other.
How so? The resolutions are so different?
Yeah. And the errors on the position for remote sensing products is very different. GPS.
Because of clouds?
Actually for tracking ice drift what's happening is you're looking at-- feature tracking-- on the path so you do that in many different ways it's often cross-correlation analysis between two images to find those features and there's an error associated with that.
So you might be tracking the noise, not the ice
So you're hoping you are tracking the ice-- and they're normally quite good at doing that, but the position error associated with that can normally be quite large. Yeah, sorry, I'm sending you in all different directions here. [No no, it's great.] The fundamental thing that it all comes down to is we need to know what the uncertainties are in the products to be able to use them.
Cool, makes sense. [..] How small are tracking?
So with drifting buoys-- with GPS I'm limited to not being able to track below about a 1 km
So you'd like to track more but you can't?
No, well I could if I used differential GPS methods, so you can go down scale if you improve your position error-- so there's a fundamental limit on what you can actually track with remote sensing-- which is not a bad thing, we just need to know what the limit is.
I should point out, you know a lot of people when they talk about their frustration with using remote sensing imagery will bring up this issue of them not providing error-- is not provided with error in some cases-- but I actually think for many of the products [error] is a high level research question to identify what those are.
If you're looking for recommendations for what a database should provide, this might be a tricky one.
I'm just trying to see-- in general problems researchers face, but I do think this is really helpful [..]
So another question I had is about the status of sharing science in research. I know I've heard from before that in some disciplines the sharing [of] data is pretty poor-- a lot of people are repeating the same data collection-- tell me about the status of sharing in ice-- I saw that you run an Ice Watch service-- was that in response to seeing a lot of fragmented data?
It was! Yeah- actually it wasn't my idea it came out of a workshop in 2009-- but the climb to the cryosphere sponsored, and the work theme was to coordinate and standardize our measurement methods-- and it was for industry field work-- specifically.
There's always a need for sharing data and-- if you're sharing data that's difficult to collect-- it helps to standardize your method for collecting that. That's in relation to this particular field data, it's not necessarily how you would treat remote sensed data.
Yeah, I noticed that that's just all on the ground data-- you don't combine it all.
Yes, it is and I-- at the moment we're all just working on collecting the data and then-- the next step is providing it in formats the users need.
So right now you're not providing the data for people?
It's free available in .csv format! Yeah, so I'm beta testing a different format that ice charting group use, called seger3--[*writes it down*]-- yeah don't worry it's all jargon [laughs]
Yeah I know-- maybe this is more atmospheric and ocean-- but I know they use netcdf and hdf formats--but I guess there's a wide range. I know a researcher was talking about a couple days ago that some research groups provide data in binary formats-- and they don't say how to use them.
So the problem with binary format is unless you're given a piece of code of how to read it with the algorithm of how to read-- and you know whether it's big endian or small endian you can't use it-- I've seen providing binary data with no additional metadata as a way of preventing people from using the data.
What do you mean?
It's like-- people will-- the zero order is they say providing the data-- but-- if the data is not provided with a way of reading it-- no one else can use it so-- as you said it's useless.
So if you are a researcher that's protective of their the data-- maybe that's what they want. There's two things that could be at work, they could be protective of their data or they might not just have enough time.
So have you run into that-- have any ships said they didn't want to provide their data?
Yeah, I've run into oil industry ships that didn't provide their data-- but said would when they reach the embargo period which is a couple years?
Is that just because that's a trade secret?
Yeah, they just don't want to share where they've been, and that you can respect because they have an embargo period.
What's more frustrating is when you're trying to use data that other people have created and they do provide it to you, but you can't actually use it-- I've run into that.
Oh really? So they provide it to you but you can't use it?
It's impossible to use-- at the moment. It might just be that more communication is required in order to extract the information that needed, but it's really hard to do that--
I've a big advocate for people providing data in netcdf format. The problem is is that it's expensive to get in netcdf format
What is that, just writing the converters?
Well to get anything into a data format takes time, and mad hours. Because you have to write a piece of code to do it.
So what do you work in when you work with data? MATLAB, IDL?
I do use IDL mostly, I have written some C code for processing imagery. I will use MATLAB, my students are using Python.
[Excited] Oh, really? That's the first time I've actually heard any reference to a student or teacher using it
Really? It's becoming the standard.
I've heard that, but when I talk to researchers they say: Well, I learned MATLAB in grad school and when my students come to me for help, I can only help them in MATLAB..
I'm stuck with IDL because-- IDL is like breathing to me it's so easy for me to write in IDL, but.. for my students I would-- never suggest they learn IDL
Really? Because it's old, or not as easy to use?
and it's not free, and I think we're now in a world where free sharing information and algorithms is a good thing to do. It's seems that the Python community is actually growing-- so I think Python is the way forward.
So it's a good, strong reference on that one
Yeah, the MATLAB community is also good I see shared MATLAB code on the internet.
Yeah, I was talking to a researcher that says: [exaggerating] if he's looking for something and it has five stars on the MATLAB central he just uses it, trust it
That's pretty amazing, I wouldn't trust everything.
It all comes down to where you find your code, so I've used R-- because there was code available.
So going off of that, how do you know if it's giving you the right answer, previous researchers have said if it looks right then they trust it-- do you dive into reading it?
When I'm using someone else's code-- if it's a researcher I work with I trust it. If I find bugs, we report them-- really it's-- mostly quality control is logical checks-- what you're actually seeing in the data-- I don't know I can't really describe how I do my debugging-- it's really involved.
You always have to spend time convincing yourself that you're getting the right answer [laughing]. It's a process. It helps to have access to people's code when they have solved problems-- yeah I don't know how you trust someone's code.
That's just something I'm curious about because I know in studies you don't publish the code you use to get results so no one can really audit that without repeating your whole experiment basically-- I know I'm from CS so we spend a LOT of time looking at other people's code and finding bugs-- it's just a constant process of finding mistakes
So you know how endemic they are then-- they are mistakes everywhere! [yeah]
There is a motion in researcher, in geophysiscs, towards having models and code archived-- [as part of the studies?] yeah, if you publish something you should be able to go back and look at the code. It's just a discussion that's happening now, I haven't seen it really implemented
From talking to researcher it sounds like there's a huge pride thing there-- were researchers don't necessarily want to publish what they've been working because it's not-- when you put something out there it's like you're putting your name attached to it [mhm]-- and so a lot that they put in, there's pressure, deadlines-- they just want to get the right answer, they're not concerned with quality--
You're talking about quality of the code: so how well commented it is, and how usable it is for someone else. I would say that's totally true.
and if it works for someone else's parameter space. Even in the open source community I would say people don't want to publish their code unless they've but a lot of time into it.
I think that makes sense. Yeah.
Going back to the Ice Watch service, would you consider it a success? Or do you think there's still people that aren't really sharing data or are collecting data that already exists?
I think Ice Watch is a success, because people that are collecting data are now collecting data in the same format. We've haven't entrained the whole Sea-Ice community yet-- but I think it's achievable. I mean that's just one small-- case though, one particular type of data [Because you are mostly looking at- ships?] It's just ships, it's just visual observations of Sea-Ice upon the bridge of the ship, so...
So what happens to the Buoy data that you put out with Alison Kohout from New Zealand, does that get shared at all?
Ah, so the stuff that Allie did-- She shared her data with me I'm not actually sure what she does with her data. So there's two different database for Sea-Ice drifting buoy data, one is the International Arctic Buoy program, and all of my drifters send data in realtime to the International Arctic Buoy program, and then it's distributed on the global telecommunication servers, the gts, which goes to weather services.
Okay. So the weather services are using the data?
So the weather services are using the data. The international program for antarctic buoys, which is kind of the sister to the I.A.B.P., they don't have quite the same logistics in place, so they archive data, and it might go out to the gts, but I'm not sure if all of it does, and I have no idea where you access the data. So Allie's buoys.. that'd be a question for her I'm not sure what she actually did. But that data doesn't go into a public forum where it's immediate available.
It's that just not a priority?
It's not a priority I think that program doesn't have the infrastructure that the Arctic Buoy program has, because she needs someone in charge of the program who has time to make sure the data is going from one place to another.. yeah.
Yeah, and skills required for doing that.
Yeah, but the important thing is that the data gets archived.
This is more unrelated, but I was curious, I wanted to ask at least one researcher this: Do you ever read your own research after it's published?
[Big grin, laughing] Yes I do haha, I read my papers all the time because I forget what I've done.
Are you serious? So you are just like: "Oh I wonder what I did then?"
Yup, I am serious hahaha- they're kind of like bookmarks of haha where you were at- at that particular time.
You also have to re-read them when people read my papers and think they see something in there that I was like: "Really, did I write that?" [hmm, interesting]
So you have to kind of become an expert in your own work then.
haha yeah, or re-expert [laughing]
What inspired you to go into this field and become a Sea-Ice scientist?
Oh wow, I wanted to go to Antarctica since I was eight. [Really?] Does that answer the question?
Did you ever go?
I've been once. So when I finished my physics degree I decided I'd like to try and do some-- I'd like to go into Earth Science-- I asked in my department if they knew anyone who was good to take on a Ph.D student who was doing anything Earth related and--- the only person I talked to was Seymour Laxon who was my Phd advisor and he was studying ice, and as soon I heard as I heard he was studying ice--- I'm doing that...
"Sign me up!"
Yeah, it was like hooked at first bait. Yeah, so I don't know if I really thought too deeply about why, it's just something I really wanted to do.
Have you ever regretted it?
For our project- originally the idea was to help people identify things where they didn't know the space and time of where those things were located-- the idea was to visually find things-- and I don't really know of any good examples for Ice Research, but we were looking at some things like Ocean Eddies [mhm], Coastal Pollution hazards-- but one thing we ended up discovering is that this won't really be useful to most of these researchers because they really prefer to work with the raw data-- you know, like level 1, level 2, not necessarily like.. whatever level we're working at..
Derived data products are more useful to the general public.
Yeah, so that's what we were thinking, is one direction we could go is, creating more of a tool for the general public to just explore the data and have fun basically vs helping researchers
Yeah, the best tools I've ever come across are ones where you-- sattelite track simulators-- software where you can plot up the images with the extract footprint shape of the sensors-- so you know you can actually build swatches of images. The other thing that's useful for researchers is-- understandable gridding tools? So modellers like to have their data gridded, and often remote sensing data is provided in gridded format? It's really important that we understand how that gridding is done-- I'm at the moment working with a dataset where I-- there's no documentation on how the gridding is done.
Just so I understand-- are you talking about-- I know satellites float over and they're not going in Latitude/Longitude-- are you talking about where they map the datasets to..?
So just mapping. So if you have a-- model that you want to-- compare directly with satellite data you need to map the satellite data into real space or if you have a field campaign
Like for images for example, you have to map those points onto the pixels? [Yeah] So understanding how that's done, basically?
Yeah, I mean I've actually some really cool IDL code that does that really slick-- so you could basically have a satellite track analyzer that plots up the data for you.
From the raw data?
Yeah, so that's kind of cool. So when it comes to gridded products which the public use all the time I find it frustrating that the documentation is not provided on how those gridded products were created. I'm working at the moment the GRPS product, and it has a gridded product that's publically available and it does not look anything like my drifting buoy data. It's.. missing signal somewhere.. that's all I can say it's frightening.
Could be pushing wrong data out there..? or maybe your model just doesn't match up somehow?
I don't know what's going on, but I can't figure out what's going on without documentation on how the product was gridded.
[..] So that's the end of my questions, but one thing I wanted to show you 'cause you mentioned Python was an idea that I came up with as a replacement that we kind of scratched, but I thought was kind of interesting, [hands over paper prototype] the idea was basically to try as hard as we can to stream acquiring datasets from NASA.
Okay, so you can select regions and time in the classic way.
Showing that to other researchers, what I heard was, first of all: acquiring data currently is not difficult, in fact a lot of the ways that are streamlined already they just don't use 'cause they can just go ftp in and download it
Yeah that's pretty much what I do. Actually I find, anytime there's a new web interface to get ahold of data I have to spend an extra half--to a few hours learning how to go through the interface in order to get the data-- but on the other hand-- so for a researcher's perspective I'll figure out a way to get the data-- that's not the worry-- from the public perspective-- getting/being able to get ahold of data in a transparent way is probably important, I would imagine.
I don't know how much the public deals with scientific data.
Yeah, who knows how much the public is using MODIS imagery.
Actually, I should show you something, have you seen this? Ok, so talking about the public using NASA data, there is a website, a blog that has a wide following of people that like Sea-Ice, for whatever reason I don't know why.
Maybe they all just want to go to Antarctica
Maybe they just enjoy looking at pretty pictures of ice, but what they do do is-- this blog has access to data on here-- the guy who is making it is pulling data from different websites where it's made available and presenting it here in a nice way and then people come to this blog for information about what the ice packs doing-- and this guy is not a scientist he-- I think he might be an engineer and he does this in his free time.
Just interest in ice?
Just interest in ice. And they [laughs]-- they're doing crowdsourced science-- they participated in something called the Sea Ice outlook were members of the blog voted on when we would see the summer minimum of sea ice and what the extend of the ice would be, at summer minimum in the arctic-- kind of wacky stuff-- but they're actually using data on this site! and they're parsing it-- and it would be hard to find-- so look they've got satellite imagery-- they're pulling it from somewhere-- this guy has plotted up trends in ice extent in area by month-- so they're taking data from the national snow and ice data center--
so imagine these guys if they could search the NASA database and do anything they wanted with the data.
Yeah, I think most of the NASA data is available for public consumption-- maybe a bit hidden?
Yeah. So they're definitely using the MODIS imagery that NASA uses, and probably AVHR as well, but I've seen mostly MODIS-- and they also tend to parse a lot of-- data from researchers to the public. You know, they'll see something interesting like a researcher has done and pretty it up.
That's interesting. That could be like a trend for the general public to more interested in Earth research, and playing around with it themselves
And participating in Earth research. I think there's a lot of people out there that if you give them data they'll actually look at it. Not just be passive, you know.. NASA always been providing pretty imagery to the public that everyone loves-- the whole space telescope stuff comes to mind right now because that's in the news but-- I think we underestimate how much the public would use the data too.
I think in the Computer Science community, we rely vastly on just general interest and-- I mean the actual computer science academics-- they're doing things that the general public not going to use for the next 10-20 years but-- I think a lot of stuff that's pushing us forward is coming from people that don't have CS degrees or maybe just doing it for fun or are in countries where they can't easily get CS degrees so-- that's an interesting though that just making it easier for people to use could lead to more eyes looking at the data-- and more discoveries.
Yeah, and that would be the polar opposite from what I need. I basically just don't want you messing with the data [laughing] I need to have the raw levels of the raw data there, and documented.
[..] so helping Earth researchers is a tall order, so trying to figure that out is.. hard.
Yeah.. so one of the things that's kind of cool is being able to look through imagery in real time. I use that all the time while I'm tracking field campaigns-- at the moment-- do you know the NASA Lance?
Yeah, that's the one that provides data every three hours
Yeah! So the near real time site has these little thumbnails of modis imagery and you're like-- it's sitting there searching through it trying to find-- which-- and you start memorizing the orbit paths so you can remember which part of the screen I'm going to find that particular location [laughing] on the planet-- that site can do with a bit of a "spiff up" make the data more--
Well they actually-- the team that I work with GIBS will take that imagery and put it on a "Google Maps" type service called WorldView and then you can see it show up as it comes out
O.k.! Now we're talking haha
So that's already available. So they're dream is that researchers will use those images in their work.
So the connection there is to make the WorldView imagery searchable
[Reserved] Yeah... so that's basically something my client thinks is useful
It would be.
[Excited] Really?! So searchable in terms of visual- what are you looking for?
Looking for a position on the planet at a particular time.
[Disheartened] Okay.. so spatially and temporarily.
[Yeah, just] tell me what's there
Yeah, so that's available.. you can actually see what datasets are available by putting in your space and time that you are looking at-- and it will tell you what datasets are available and it will give you tons of metadata about the different datasets-- it's called-- it's not WorldView-- [thinking search.earthdata.nasa.gov]-- if you look up "NASA Search"-- there's a word for it but there's a service they provide that uses-- the team I'm working with's data and it will give you not only Satellite images but also-- the drones-- the ones that fly over and take pictures they'll show you.
So, if you are looking for a space and time and you want to see all the data NASA has-- you can put in the space and time and it will show you everything
That's pretty cool
I guess the big problem is just telling people about it [..] Researchers often will just talk to people and just bypass the easy-to-use tool.
Well we don't even realize that it's that easy to get the data haha, that's funny.