Cloud Fill Algorithm
When there is a pixel with missing data, we essentially search an expanding zone around that pixel, looking for "good" data. All "good" values in that zone around the pixel (such as, say, 30 to 40 km away) get averaged together for the replacement value. What gets entertaining is that there comes a point when it is better to search forwards and backward in time (+/- 8days, for example) than it is to search further and further away. So, the short answer is that it is a simple averaging method, but the complexity is that the "zone" that I look at isn't constrained by the current time-slice.
Gaps in the data
The Level-3 data that one gets from NASA's Ocean Color site have gaps in the coverage, often due to clouds. The number of gaps one has depends on the variable being examined. SST4, for example, is one of the best at being the least disturbed by water vapor, and it has very good coverage compared to, say, Chl. Further, when a gap is present on an 8-day file, that means there has been persistent inability to get a "good" pixel value in any of the daily files. Often we have analyses that are better performed if we can utilize some consistent approach to filling these gaps. What we describe in this white-paper is the algorithm used on the ancillary data here at the Ocean Productivity web site, before they are sent into the NPP algorithms. It is a minor detail, but one cannot gap-fill the NPP values directly, since that would be projecting "full sun" NPP values into an area with diminished light (because of the clouds) and the estimate would not be correct. The proper way is to gap-fill the ancillary data, and then calculate NPP.
There are different methods to interpolating data to fill missing coverage, ranging from simple linear interpolation to the use of bicubic splines, to other more elegant statistical methods. What we apply here is a simple method to find an estimate for the average value of the pixel. We search through expanding, predefined zones until good values are found, and then those values are averaged together to represent the pixel at the center of the search.
Expanding spatial zones
The search distance around the pixel runs from zero to 400 km. The first zone is from 0 to 20; the second from 20 to 30; then 30 to 40; etc. Starting at 60 km the step size is 20 km (60 to 80, then 80 to 100, etc). Once we get to 300 km the step size is 50. There are 19 possible search zones total.
Expanding temporal slices
Along with the spatial coverage around the "gap", we also have the same spatial coverage at different sanpshots in time (ie, observations 8-days before and after the pixel of interest, etc). The 19 search zones which are applied to the current 8day composite hdf file are also applied to the hdf file before and after that file. Later, if we are still searching for "good data", we then look at two files before/after the file of interest, etc. For example, zones one through six (0 to 20km through 60 to 80 km) are searched first on the 8day file that the gap is in. However, the data in zones one, two, and three in the 8day file before and after are more highly correlated to the pixel of interest than zone seven on the original file, so zones one, two, and three in those files are searched before zone seven at time zero.
Here's the search order for the expanding spatial zones, and expanding temporal slices.
You can see the search switches from time 0 (first column) to time +/- 8days after spatial zone 6, searching spatial zones 1, 2, and 3, and then goes back to time 0, zone seven, etc.
This search pattern applied to 8day files is the key to the fill algorithm. The first data that are found as "good" (after a complete search of the files and zone defined by the searchkey) are then averaged and inserted into the gap.
Application to data
In practice, I do not use more than +/- seven 8-day files surrounding the hdf file of interest to help determine a missing value. I started this a few years back when there were persistent gaps that weren't being filled near the Arabian sea. In general, such an expansive search is not needed. In the simplest case of a big gap, which has data on the 8day file before and after, the pixel of interest simply becomes a value that is very similar to a linear interpolation of the values before and after.
This approach gap-fills data in a fashion that is based on global statistics. As long as the gaps are not too long-lasting, the fill results give estimates that appear visually "reasonable". However, this method does not take into account changing correlation distances, either spatially or temporally. A full blown kriging approach, applied with data both forwards and backwards in time, and solved for each missing pixel in every unique location (for every 8-day time slice), that would be the ideal case. At the moment, I can't do that, but if you come across an approach that generalizes to what has just been described, please do let me know. Until then, this is my best guess possible for the global solutions that we work with.