You are on page 1of 8

Article

pubs.acs.org/est

Improving the Accuracy of Daily PM2.5 Distributions Derived from the


Fusion of Ground-Level Measurements with Aerosol Optical Depth
Observations, a Case Study in North China
Baolei Lv,†,‡ Yongtao Hu,§ Howard H. Chang,⊥ Armistead G. Russell,*,§ and Yuqi Bai*,†,‡

The Ministry of Education Key Laboratory for Earth System Modeling, Center for Earth System Science, Tsinghua University,
Beijing 100084, China

Joint Center for Global Change Studies, Beijing 100875, China
§
School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States

Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia 30322, United States
*
S Supporting Information

ABSTRACT: The accuracy in estimated fine particulate


matter concentrations (PM2.5), obtained by fusing of station-
based measurements and satellite-based aerosol optical depth
(AOD), is often reduced without accounting for the spatial
and temporal variations in PM 2.5 and missing AOD
observations. In this study, a city-specific linear regression
model was first developed to fill in missing AOD data. A novel
interpolation-based variable, PM2.5 spatial interpolator
(PMSI2.5), was also introduced to account for the spatial
dependence in PM2.5 across grid cells. A Bayesian hierarchical
model was then developed to estimate spatiotemporal
relationships between AOD and PM2.5. These methods were
evaluated through a city-specific 10-fold cross-validation
procedure in a case study in North China in 2014. The cross validation R2 was 0.61 when PMSI2.5 was included and 0.48
when PMSI2.5 was excluded. The gap-filled AOD values also effectively improved predicted PM2.5 concentrations with an R2 =
0.78. Daily ground-level PM2.5 concentration fields at a 12 km resolution were predicted with complete spatial and temporal
coverage. This study also indicates that model prediction performance should be assessed by accounting for monitor clustering
due to the potential misinterpretation of model accuracy in spatial prediction when validation monitors are randomly selected.

■ INTRODUCTION
Fine particulate matter, a complex mixture of particles with
an increase in AOD. While the association between PM2.5 and
AOD is typically strong, the relationship can change due to
aerodynamic diameters of 2.5 μm or less (PM2.5), has been a changes in the particle size distribution, particle composition,
major component of the severe air pollution levels experienced mixing height, humidity, and other factors.
in many cities in China in recent years. PM2.5 is the dominant Early studies used various linear regression modeling
pollutant of concern, especially during the winter.1−3 Since techniques to estimate PM2.5 concentrations using AOD as
PM2.5 can efficiently penetrate into human lungs and bronchi,4,5 well as other spatial and spatial-temporal predictors, such as
long-term and short-term exposures to PM2.5 can increase land cover, elevation, meteorological parameters, and indicators
premature mortality and morbidity.5 for holidays.10,11 More complex models were then developed,
One major limitation with ground-level measurements of including the linear mixed effect (LME) model,12,13 geo-
PM2.5 is the sparse spatial coverage of monitoring networks. graphically weighted regression (GWR) model,14,15 remote
Alternatively, satellite-based aerosol optical depth (AOD) sensing formula,16 semisupervised learning approach based on
observations have been used to estimate continuous spatial multiple factors,17 and complex ensemble models.18 Atmos-
and temporal patterns of ground PM2.5 concentrations.6−8 pheric chemistry models were also used to determine the
Optical properties of atmospheric aerosols are sensitive to the relationships between AOD and PM2.5.8,19 Among all these
optical properties of the particles that make up particulate studies, treating the linear slopes and intercepts between AOD
matters, including size and composition. A large fraction of the
particles that make up PM2.5 particle mass scatter sunlight, Received: December 10, 2015
leading to a strong association between the mass of PM2.5 and Revised: March 23, 2016
observed AOD.9,10 All else being equal, as the number (and Accepted: April 4, 2016
mass) of particles increases, light scattering increases, leading to Published: April 4, 2016

© 2016 American Chemical Society 4752 DOI: 10.1021/acs.est.5b05940


Environ. Sci. Technol. 2016, 50, 4752−4759
Environmental Science & Technology Article

Figure 1. Study area and the locations of PM2.5 monitoring stations as shown by points. The right panel depicts the average satellite-based AOD
values in 2014.

and PM2.5 at different locations and at different time points as μg/m 3 in 2014, about nine times the World Health
spatially and temporally correlated random effects has been Organization (WHO) recommended standard of 10 μg/m3.
shown to be promising in a study conducted in the southeast Long-term exposure to such heavily polluted air is a major
United States.20 However, to our knowledge, such a statistical concern for more than 250 million people in this area.32,33
model structure has not been previously applied in China. Many atmospheric chemistry modeling applications in this
AOD data derived from instruments aboard satellites, such as region, such as the use of the Community Multiscale Air
the moderate resolution imaging spectroradiometer (MODIS), Quality model (CMAQ), have been run at a 12 × 12 km grid
are often missing due to clouds, high surface albedos21 (e.g., resolution on a Lambert Conformal Conic map projection.
snow and ice), and potentially, very high PM2.5 pollutant Therefore, the same grid was chosen in this study to enable the
levels.22,23 If the missing AOD values are related to high PM2.5 comparison of and fusion with atmospheric chemistry model
pollution levels, the average PM2.5 concentrations estimated output in future studies. The whole area was divided into 6561
using only the retrieved AOD fields can be underestimated. grid cells (81 × 81).
Several attempts have been made to solve this issue. Kloog et al. Ground-Level PM2.5 Data. Hourly ground-level PM2.5
filled in missing AOD values by smoothing the mean PM2.5 concentrations were obtained from the China National Urban
levels across the study area.24,25 Van Donkelaar et al. used the Air Quality Real-time Publishing Platform (http://113.108.142.
sampling bias correction factor method.7,26 Their calculations 147:20035/emcpublish/). Calibration and quality control of
were usually month or year specific,19,27 such that the bias the monitors are conducted by the China National Environ-
could not be effectively corrected when calculating the mean in mental Monitoring Center.34 Two hundred ninety-eight
a small number of days. Just et al. predicted PM2.5 levels within monitors from 53 cities in the study area are located in 169
the grid cells with missing AOD by using PM2.5 levels in the grid cells, described in Table S1, Supporting Information (SI).
neighboring grid cells, based on season-specific spatial To facilitate computation in the data fusion process, at each
patterns.28 Ma et al. used ordinary Kriging to interpolate the time point we averaged PM2.5 concentrations from monitors
available retrieved AOD values.14 However, their method could that are located in a same grid cell. It is worth noting that point
have significant uncertainties if only a limited number of AOD measurements of PM2.5 concentrations were used to represent
values were present.29 the area-average PM2.5 levels over a grid cell. This may cause
In this study, we applied the Bayesian model proposed by potential bias due to a change of support. Specifically, the grid-
Chang et al.20 in North China to estimate the spatially and average concentrations may not fully reflect the spatial variation
temporally varied coefficients in a linear regression setting. We in PM2.5 as captured by multiple monitors within a grid cell.
also considered using spatially interpolated PM2.5 concen- MODIS AOD Data. MODIS is a sensor on board two of the
trations as a predictor and filled the missing AOD values with U.S. National Aeronautics and Space Administration (NASA)’s
novel methods. The rest of the paper is organized as follows. Earth Observation System (EOS) satellites: Terra and Aqua.
The study area, data, statistical model, and missing AOD gap-
The sun-synchronous satellites provide two column aerosol
filling method are introduced in the Materials and Methods.
observations at approximately 10:30 a.m. (Aqua) and 1:30 p.m.
Model estimation, performance evaluation, and PM2.5 pre-
(Terra) local solar times every day. The MODIS AOD values
diction results are presented in the Results and Discussion.
are constrained between −0.05 and 5.0 to avoid bias introduced

■ MATERIALS AND METHODS


Study Area. Figure 1 shows the North China study area.
during the retrieval procedure. A detailed description of the
MODIS AOD retrieval procedure is discussed in Remer et al.35
and Levy et al.21 In this study, MODIS AOD Level 2 data in
This region includes five provincial administrative divisions: 2014 were obtained from the Level 1 and Atmosphere Archive
Shandong, Hebei, Liaoning, Beijing and Tianjin. Economic and Distribution System (LAADS, https://ladsweb.nascom.
development in this region is highly dependent on heavy nasa.gov/). The nominal resolution of this data set is 10 × 10
industries, such as iron and steel factories and cement km at nadir. A nearest neighbor approach was utilized to regrid
production, that are typical sources for emissions of PM2.5 the data to the 12 × 12 km setting.
and its precursors.30 Increasing vehicular population and MODIS AOD Missing Data Pattern. For each of the 169
electricity demand also worsen the air quality in this region.31 grid cells in which at least one ground monitoring station is
The annual mean PM2.5 concentrations in this region was 93 available, averaged PM2.5 concentrations were computed among
4753 DOI: 10.1021/acs.est.5b05940
Environ. Sci. Technol. 2016, 50, 4752−4759
Environmental Science & Technology Article

Figure 2. (a) Average PM2.5 concentrations; (b) daily SVVR within the study area.

days when AOD observations were available, among days when weights for predicting PM2.5 levels at unobserved locations. The
AOD values were missing, and across the whole year of 2014 uncertainties (prediction standard errors) of the interpolated
(plotted, respectively, in black, red, and blue points in Figure PM2.5 concentration field are not spatially uniform, and the
2a). The AOD averages were plotted in an ascending order closer to the monitors a grid cell is, the lower its uncertainty
based on the mean PM2.5 concentrations in that grid cell. It is will be.29 To ensure that the uncertainties of the interpolated
clear that mean PM2.5 concentrations were consistently higher values were generally uniform, a 10-fold leave-10%-cities-out
on days when AOD observation were missing than when AOD method was used to generate the interpolated PM2.5 (PMI2.5)
data were present (Figure 2a). value using the OK method. Specifically, we first randomly
To quantitatively measure the spatial and temporal pattern of removed the monitors in 10% of 53 cities within the study area.
AOD missing values, SVVR (spatial valid value ratio, defined in PMI2.5 values at the removed sites were obtained from kriging
SI) and TVVR (temporal valid value ratio, defined in SI) were the monitors from the remaining cities. This procedure was
used. Daily SVVR values, shown in Figure 2b, indicate that repeated 10 times to obtain interpolated PMI2.5 values at all
missing AOD was more significant in winter than in other monitors.
seasons. The annual TVVR was generally lower among grid To further control the uncertainties with the variable PMI2.5,
cells in the northwest grassland areas and urban areas (Figure we defined the final PM2.5 spatial interpolator (PMSI2.5) as
S1). In winter, the TVVR was quite low (as low as 10%) in follows
most grid cells, e.g., in heavily polluted south Hebei. Because
the missing AOD values were associated with higher pollution σt PMI 2.5 × f (PMI 2.5)
PMSI 2.5 = PMI 2.5 = PMI 2.5
levels (Figure 2a), negative bias in long-term PM2.5 estimates σ(PMI 2.5) SE(PMI 2.5)
will be present if only the retrieved AOD values are used in data (2)
fusion.
MODIS AOD Gap Filling. To overcome the problem of and
missing AOD data, we proposed a novel two-step method. 1
First, a city- and season-specific formula was introduced, based f (x) = 10% × x
on a linear assumption of the relationship between PM2.5 and ( α
+1 ) (3)
AOD:10
where σ(PMI2.5) and σt refer to, respectively, the uncertainties
PMi , j of PMI2.5 and the target uncertainties in that grid cell. The
AODei , j = as AODos , j + bs
PMs , j (1) function f is a continuous inverse proportional function to
constrain the uncertainties of PMI2.5 to be a relatively small
AODei,,j refers to the estimated AOD value on day i in grid cell proportion of the original interpolated values (PMI2.5). The
j. PMi,j refers to the daily mean PM2.5 concentration observed proportion begins with 10% and decreases with increasing
on day i in grid cell j. AODos,j refers to the average retrieved PMI2.5. The parameter α determines rate of decrease. When
AOD values in grid cell j and in season s, where s = 1 denotes PMI2.5 equals α, f is 5%. In this study, α was determined to be
warm season from Apr 16−Oct 15 and s = 2 denotes cold 700 based on an optimization between decreasing uncertainties
seasons as the remaining days in a year. PMs,,j refers to the and decreasing variance information on PMSI2.5 as α increases.
average PM2.5 concentrations in grid cell j in season s. The Other Spatiotemporal Predictors. First, we obtained
linear relationship was fitted separately in 53 cities. The city- meteorological data from the NCEP (National Centers for
specific models were then used to predict missing AOD values Environmental Prediction) FNL (Final) Operational Global
in grid cells containing PM2.5 monitors. Second, we combined Analysis that are on 1° × 1° grids and are prepared
the estimated and retrieved AOD values to interpolate missing operationally every 6 h. This product is from the Global Data
AOD values in the other grid cells without PM2.5 monitors Assimilation System (GDAS), which continuously collects
using ordinary Kriging (OK) with exponential covariance observational data from the Global Telecommunications
function and obtained a final AOD data set with complete System (GTS) and other sources. In this study, we used the
spatial coverage for each day. parameters of ground temperature (T), relatively humidity
Interpolated Ground PM2.5 Data. We proposed a novel (RH), and planetary boundary height (PBL) at 6:00 UTC
variable based on the interpolation of PM2.5 observations to be (14:00 Beijing local standard time). The second data source
used in combination with AOD. A spatial interpolation utilizes included land cover and elevation data. Land cover data were
the spatial dependence among observations to obtain optimal obtained from the MODIS land-cover classification of the
4754 DOI: 10.1021/acs.est.5b05940
Environ. Sci. Technol. 2016, 50, 4752−4759
Environmental Science & Technology Article

International Geosphere-Biosphere Program (IGBP) product model formulation based on out-of-sample prediction. We
in 2010 (http://lpdaac.usgs.gov). The data set was assigned to designed a workflow (depicted as the Figure S2) to evaluate the
the base grid. We used two parameters: green fractions and three methods using both 10-fold and spatial cross-validation
urban fractions. Finally, the mean elevation from the USGS experiments. For the 10-fold cross-validation, the data were
GTOPO30 elevation data set (https://lta.cr.usgs.gov/ randomly divided into 10 groups. For a particular experiment,
GTOPO30) for each grid cell was obtained. nine groups were used to fill the missing AOD values and to fit
Statistical Model Framework. A linear regression frame- two AOD− PM2.5 models (eq 4), with and without the variable
work was assumed to model the relationships between the PMSI2.5. The remaining group served as the test data set where
MODIS AOD values and ground PM2.5 measurements20 (eq 4) the predicted PM2.5 values were compared to the observed
measurements. The PM2.5 concentrations in the test data set
PM(s , t ) = α0(s , t ) × AOD + α1(s , t ) + ε(s , t ) (4) could be predicted by either the retrieved AOD values or the
where PM(s,t) denotes the PM2.5 concentrations measured at filled AOD values. The evaluation was done by withholding a
location s and time t. Location s is the index of a monitor- different group each time for 10 times.
contained grid cell, which is georeferenced as in Figure 1, and We performed a 10-fold spatial cross-validation by leaving
time t is the index of a day within the study period. The 10% of the cities out each time or 10% of the monitors out each
location and time specific slope and intercept are, respectively, time. We considered leaving all observations out by cities
designated as α0(s,t) and α1(s,t). The residual errors ε(s,t) are instead of by individual monitors to address a weakness that has
assumed to be normally distributed with mean zero and been reported in the literature.14,27 Specifically, given the dense
variance σ2. Model coefficients have both a temporal and number of monitors in cities, there were typically one or more
spatially dependent structure. monitors near any monitor. Hence, removing individual
The coefficients of the linear regression are separated into a monitors one-at-a-time in a cross-validation experiment may
second level falsely show that the model can perform spatial predictions
well. Results from removing monitors by a group of cities,
α0(s , t ) = β0(s) + β0(t ) + λ0 Z0 (5) however, provide confidence in the model’s ability to predict at
locations far away from monitors.
and Model prediction performance was evaluated using the
α1(s , t ) = β1(s) + β1(t ) + λ1Ζ1 normalized mean absolute error (NME), root mean square
(6) error (RMSE), R2, and the linear regression coefficients (along
where βi(s) and βi(t), respectively, refer to the underlying with their standard errors). NME and RMSE are defined in the
SI.


independent spatial and temporal processes that determine the
spatiotemporal variations of the slope (for i = 0) and intercept
(for i = 1). The parameter vectors λi represent fixed effects for RESULTS AND DISCUSSION
meteorological, land cover variables: Z0 for the slopes (for i = Model Parameters Estimation. The mean PM2.5 concen-
0) and Z1 for the intercepts (for i = 1). Specifically, eq 5 models trations from monitors for the final data set was 58.8 μg/m3,
the relationships between AOD and PM2.5, with potential which is 1.7 times that of the China Class I standard. On
interactions between AOD and other predictors, and eq 6 average, AOD and PMSI2.5 were significantly associated with
models the underlying processes that have direct effects on the their 95% posterior intervals not including zero (Table 1), with
PM2.5 levels. PMSI2.5 is a predictor variable included in Z1 to
capture addition variation in PM2.5 not explained by AOD and Table 1. Statistical Assessment of the Estimated Coefficients
the other predictors.
terms in λ1 eEstc SEd Low95e Up95f
In this study, we used a tapered conditional Kriging method
to interpolate the spatial coefficients βi(s) in eq 2 for spatial intercept 23.44 18.80 −13.19 60.99
dependence.20 For those days without PM2.5−AOD pairs, we AOD 21.66 3.93 15.34 30.81
used a linear autocorrelation function to model the temporal PMSI2.5 1.16 0.016 1.12 1.19
coefficients βi(t), which specifies that βi(t) is proportional to Green.faca 0.21 0.31 −0.4 0.82
the weighted average between the neighboring days.20 There- Urban.facb 0.64 1.82 −2.97 4.20
fore, we could obtain, with complete coverage, the spatially and elevation −0.012 0.004 −0.02 −0.0044
temporally dependent coefficient fields through interpolating temperature 0.04 0.062 −0.084 0.163
−0.0013 −0.0022 −0.00035
the fitted AOD field. We used the Markov Chain Monte Carlo PBL 0.00047
−0.033 −0.065
(MCMC) method for model fitting under the Bayesian a
RH 0.016 0.000065
framework. One advantage of employing a Bayesian hierarchical Green.fac refers to the fraction of green land in each of the grid cells.
b
framework is that the model could account for uncertainties in Urban.fac refers to the fractions of urban area in each of the grid cells.
c
parameter estimates and PM2.5 predictions. This facilitates the Est denotes the estimated coefficients. dSE denotes the standard
errors of the estimated coefficients. eLow95 denotes the lower bound
model outputs being incorporated in human exposure and of 95% posterior interval. fUp95 denotes the upper bound of 95%
health impact evaluations that considers exposure uncertain- posterior interval.
ties.36,37 It is worth noting that the fitted model can only be
used to predict AOD values within the temporal extent of the
available data, as day-specific AOD−PM2.5 relationships beyond a positive coefficient indicating a positive linear relationship
the time scope (i.e., temporal extrapolation) were not estimated with the PM2.5 levels. The mean uncertainty of the variable
in our model structure. PMSI2.5 was less than 5 μg/m3. Hence, due to the presence of
Model Evaluation Workflow. We evaluated the effective- PMSI2.5, an average uncertainty of approximately 5 μg/m3 was
ness and performance of the AOD gap-filling method, the novel incorporated into the estimations considering its coefficient
variable PMSI2.5 as an additional predictor, and the statistical being close to 1 (Table 1). Both PBL and terrain height were
4755 DOI: 10.1021/acs.est.5b05940
Environ. Sci. Technol. 2016, 50, 4752−4759
Environmental Science & Technology Article

negatively associated with PM2.5 levels. Regarding the daily captured by the spatiotemporal predictors, including AOD.
specific random intercepts and slopes (β0(t) and β1(t), When using the random leave-10%-monitors-out cross-
respectively, in eq 5 and eq 6), they are larger in winter validation method, the R2 was 0.68 (Figure 3d), which is
because the PM2.5 concentrations were higher (Figure S3, SI). similar to the R2 of 0.69 reported by Zheng et al.27 using the
Model Comparisons and PMSI2.5 Evaluations. In model leave-one-monitor-out validation method but is 0.07 larger than
fitting, R2 improved by 0.07 (from 0.66 to 0.73) when the that using the leave-10%-cities-out cross validation method in
variable PMSI2.5 was considered in the model. In the 10-fold this study. Hence, performing cross-validation at the individual
leave-10%-cities-out cross validation procedure, the R 2 monitor level could exaggerate the model’s prediction perform-
improved from 0.48 (Figure 3b) to 0.61 (Figure 3c) by a ance.
The annual PM2.5 levels were calculated by averaging the
estimated PM2.5 concentrations by retrieved AOD values. The
pollution levels in the cities in Hebei province along the foot of
the Taihang Mountains were much higher than those in other
cities. The full model with PMSI2.5 better characterized this
belt-shape pollution zone (Figure S4). Moreover, the full model
with PMSI2.5 predicted the lower pollution levels in the
northern part of study area than the partial model, with lower
prediction standard errors (Figure S4b,d). The standard errors
in the urban area were generally lower due to the greater
density of AOD−PM2.5 pairs.
However, an estimation bias was apparent in the predictions
even by the full model. The annual means of the observed
PM2.5 concentrations in the heavily polluted Hebei cities were
approximately 120 μg/m3. However, the highest estimated
PM2.5 concentrations (full model) were no larger than 100 μg/
m3 (Figure S4c). Spatially, the estimated PM2.5 concentrations
were not smooth, especially in the winter (Figure S5d).
Moreover, the full model using retrieved AOD values failed to
estimate the increased pollution area in the southern Hebei in
the autumn and winter (Figure S5). In all of these cases, the
model predicted PM2.5 concentrations only when AOD values
Figure 3. Model evaluations using the 10-fold leave-10%-cities-out
were available. The observed mean PM2.5 levels with available
cross-validation method for the (a) Meteo (Meteorological variables) AOD values were lower than the annual mean PM2.5 levels
+ LU (land use variables) + PMSI2.5 model, (b) AOD + Meteo + LU (Figure 2a), which accounts for the underestimations.
model, and (c) full model. Model evaluation using the 10-fold leave- AOD Gap-Filling Evaluation. There were approximately
10%-monitors-out cross-validation method for the full model (d). No 40000 missing AOD values in the grid cells with PM2.5
AOD gap-filling was performed in these models. monitors. Our missing AOD imputation method was effective
at improving PM2.5 predictions. In the model-fitting procedure,
large margin of 0.13. The RMSE also decreased by 3.46 μg/m3, the model overestimated AOD when the retrieved AOD values
from 26.99 μg/m3 to 23.53 μg/m3. In general, the model with were low (Figure 4a). Although the R2 was low (0.36), the
PMSI2.5 predicted PM2.5 more accurately. The variable PMSI2.5 constructed AOD and retrieved AOD showed a good linear
explained more variance of PM2.5 than AOD values by relationship when the retrieved AOD was less than 1.5. By
comparing the model’s performance without using AOD values visually checking the interpolated full AOD data field, we found
and that without PMSI2.5 (Figure 3a). In summary, PMSI2.5 that the spatial variations of the AOD values were well
contains additional spatial information PM2.5 that is not captured. There were almost no abnormal spatial patterns in

Figure 4. (a) Scatterplots of the AOD values fitted by eq 1 and retrieved AOD values; (b) prediction performance using the reconstructed AOD
with full model; (c) annual average AOD values with gap-filling.

4756 DOI: 10.1021/acs.est.5b05940


Environ. Sci. Technol. 2016, 50, 4752−4759
Environmental Science & Technology Article

Figure 5. Annual and seasonal mean estimations of the PM2.5 concentrations (μg/m3) using the full model and reconstructed AOD data set.

the daily AOD. When used in predicting PM2.5, the full model accounts for spatial clustering. They predicted the mean PM2.5
performed well with the reconstructed data set, giving an R2 of concentrations in southern Hebei province at around 90 μg/m3
0.78, an NME of 0.27, and an RMSE 33.39 μg/m3 in the cross- in 2013 and 130 μg/m3 in the winter. In contrast, for 2014, we
validation experiment (Figure 4b). It is interesting that the predicted improved PM2.5 levels to be at around 120 μg/m3
cross-validation performance with the constructed AOD was overall and 180 μg/m3 in the winter, which were closer to
better than the model with retrieved AOD. In part, this is observations. Zheng et al.27 evaluated their LME model in
because the linear relationship between AOD and PM2.5 Beijing−Tianjin−Hebei area with an R2 = 0.77 and NME =
enabled our model to leverage additional extra spatial 0.22 using the more optimiztic leave-one-monitor-out cross
information from PM2.5 monitors. validations method. They did not give estimated seasonal
PM2.5 Concentration Estimation. The model using the variations, even though they predicted annual mean PM2.5
“full” AOD data set predicted the seasonal and annual average concentrations well by using a correction factor. Xie et al.
of the PM2.5 concentrations well (Figure 5). The PM2.5 predicted PM2.5 levels in Beijing, with an R2 = 0.75 in cross-
concentrations were much higher in winter compared to validation.13 Their performance being better than that in this
those using the retrieved AOD only. The mean PM 2.5 study is partially owing to its smaller study area. Without
concentration in winter was approached 200 μg/m3. The considering missing AOD values, they significantly under-
heavy pollution mainly occurred in the middle and southern estimated annual mean PM2.5 levels. In our study, the data set
Hebei province, the western Shandong province, Tianjin, and with complete spatial coverage could well characterize the
Beijing. The PM2.5 levels were also high in the middle of evolution of serious pollution episodes on a regional scale, and
Liaoning province around Shenyang, its capital city. This area is a case pollution episode is presented in the SI (Figure S6).
part of the Northeast plain, which is traditionally a zone of In summary, our revised model and reconstructed AOD data
heavy industry. The PM2.5 concentrations were generally lower set accurately predicts ground PM2.5 concentrations in North
than 40 μg/m3 in northwest Hebei Province and in the Inner China. First, we used a Bayesian hierarchical framework to
Mongolia portion within the domain, which is mainly covered model the daily grid cell-specific linear relationships between
by grassland and forests. It is worth noting that there is a small the AOD and PM2.5 concentrations. Second, we used the spatial
area of higher PM2.5 levels in the east of Inner Mongolia and dependence in all the observed PM2.5 levels by developing an
near Liaoning Province. The area is the location of an industrial interpolated PM2.5 variable PMSI2.5. The new variable PMSI2.5
city, Chifeng, which is the second most polluted city in Inner can significantly improve the model’s prediction performance.
Mongolia, with an annual mean PM2.5 level of nearly 50 μg/m3. The R2 was improved by 0.13, to 0.61, in the cross-validation
The successful prediction of this pollution hotspot among clean study. Third, we reconstructed the AOD values, and thus, we
surroundings further demonstrated the model’s effectiveness. obtained daily estimated PM2.5 pollution maps with complete
The PM2.5 levels were also lower in Shandong and the spatial and temporal coverage, which was shown to be useful
Liaodong peninsula and in the middle south of Shandong for capturing the evolution of PM2.5 pollution episodes that
Province. occurred in North China.
A previous study by Ma et al.14 used the GWR method with This study also demonstrates the potential misinterpretation
2
R = 0.64 in the 10-fold cross validation method, while we of model accuracy when a completely random, leave-one (or
obtained an R2 = 0.61 by our stricter evaluation process that 10%) cross-validation method is used to evaluate model
4757 DOI: 10.1021/acs.est.5b05940
Environ. Sci. Technol. 2016, 50, 4752−4759
Environmental Science & Technology Article

prediction performance. In our case, using a completely space and the ground: effectiveness of emission control. Environ. Sci.
random, exhaustive leave-10%-monitors-out procedure led to Technol. 2010, 44 (20), 7771−7776.
an R2, NME, and RMSE of 0.68, 0.26, and 21.40 μg/m3, (7) Van Donkelaar, A.; Martin, R. V.; Levy, R. C.; da Silva, A. M.;
respectively. Using a process where monitors were removed Krzyzanowski, M.; Chubarova, N. E.; Semutnikova, E.; Cohen, A. J.
Satellite-based estimates of ground-level fine particulate matter during
after being grouped by city led to similar performance statistics
extreme events: A case study of the Moscow fires in 2010. Atmos.
of 0.61, 0.28, and 23.53 μg/m3, respectively. In cases where the Environ. 2011, 45 (34), 6225−6232.
air quality monitors are not uniformly distributed (as is almost (8) Van Donkelaar, A.; Martin, R. V.; Park, R. J., Estimating ground-
always the case because monitors are preferentially located in level PM2.5 using aerosol optical depth determined from satellite
cities), the random removal of monitors will typically lead to remote sensing. J. Geophys. Res. 2006, 111, (D21).10.1029/
having one or more other monitors near the one removed. 2005JD006996
Hence, the values to be predicted can often be captured by (9) Kahn, R.; Banerjee, P.; McDonald, D.; Diner, D. J. Sensitivity of
measurements from an adjacent monitor that was not removed. multiangle imaging to aerosol optical depth and to pure-particle size
Thus, model prediction performance should be assessed by distribution and composition over ocean. J. Geophys. Res. 1998, 103
accounting for geographical monitor clustering. (D24), 32195−32213.


(10) Liu, Y.; Sarnat, J. A.; Kilaru, V.; Jacob, D. J.; Koutrakis, P.
ASSOCIATED CONTENT Estimating ground-level PM2.5 in the eastern United States using
satellite remote sensing. Environ. Sci. Technol. 2005, 39 (9), 3269−
*
S Supporting Information 3278.
The Supporting Information is available free of charge on the (11) Wang, J.; Christopher, S. A., Intercomparison between satellite-
ACS Publications website at DOI: 10.1021/acs.est.5b05940. derived aerosol optical thickness and PM2.5 mass: implications for air
Variables NME, RMSE, SVVR, and TVVR, Table S1, and quality studies. Geophys. Res. Lett. 2003, 30, (21).10.1029/
2003GL018174
Figures S1−S6 (PDF)


(12) Lee, H.; Liu, Y.; Coull, B.; Schwartz, J.; Koutrakis, P. A novel
calibration approach of MODIS AOD data to predict PM 2.5
AUTHOR INFORMATION concentrations. Atmos. Chem. Phys. 2011, 11 (15), 7991−8002.
Corresponding Authors (13) Xie, Y.; Wang, Y.; Zhang, K.; Dong, W.; Lv, B.; Bai, Y. Daily
estimation of ground-level PM2.5 concentrations over Beijing using 3
*Tel: +1-404-894-3079. E-mail: ted.russell@gatech.edu.
km resolution MODIS AOD. Environ. Sci. Technol. 2015, 49, 12280.
*Tel: +86-10-62795269. E-mail: yuqibai@tsinghua.edu.cn. (14) Ma, Z.; Hu, X.; Huang, L.; Bi, J.; Liu, Y. Estimating ground-level
Notes PM2.5 in China using satellite remote sensing. Environ. Sci. Technol.
The authors declare no competing financial interest. 2014, 48 (13), 7436−7444.

■ ACKNOWLEDGMENTS
Y.H. and A.G.R.’s work for this publication was made possible
(15) Song, W.; Jia, H.; Huang, J.; Zhang, Y. A satellite-based
geographically weighted regression model for regional PM 2.5
estimation over the Pearl River Delta region in China. Remote Sensing
of Environment 2014, 154, 1−7.
in part by funding from the USEPA under Grant Nos. (16) Lin, C.; Li, Y.; Yuan, Z.; Lau, A. K.; Li, C.; Fung, J. C. Using
RD834799, RD833866, and RD835217. H.C.’s work for the satellite remote sensing data to estimate the high-resolution
Bayesian model was supported by National Institutes of Health distribution of ground-level PM2.5. Remote Sensing of Environment
Grant No. R21ES023763. Its contents are solely the 2015, 156, 117−128.
responsibility of the grantee and do not necessarily represent (17) Zheng, Y.; Liu, F.; Hsieh, H.-P. In U-Air: When urban air quality
the official views of the supporting agencies. Further, the US inference meets big data. Proceedings of the 19th ACM SIGKDD
Government does not endorse the purchase of any commercial international conference on Knowledge discovery and data mining, 2013;
products or services mentioned in the publication. This study ACM, 2013; pp 1436−1444.
was also supported by the State Environmental Protection Key (18) Lary, D. J.; Faruque, F. S.; Malakar, N.; Moore, A.; Roscoe, B.;
Adams, Z. L.; Eggelston, Y. Estimating the global abundance of ground
Laboratory of Sources and Control of Air Pollution Complex level presence of particulate matter (PM2.5). Geospatial Health 2014, 8
(No. SCAPC201406) and by Tsinghua University (Nos. (3), 611−630.
20131089277 and 553302001).


(19) Geng, G.; Zhang, Q.; Martin, R. V.; van Donkelaar, A.; Huo, H.;
Che, H.; Lin, J.; He, K. Estimating long-term PM2.5 concentrations in
REFERENCES China using satellite-based aerosol optical depth and a chemical
(1) Wang, L.; Wei, Z.; Yang, J.; Zhang, Y.; Zhang, F.; Su, J.; Meng, transport model. Remote Sensing of Environment 2015, 166, 262−270.
C.; Zhang, Q. The 2013 severe haze over southern Hebei, China: (20) Chang, H. H.; Hu, X.; Liu, Y. Calibrating MODIS aerosol
model evaluation, source apportionment, and policy implications. optical depth for predicting daily PM2.5 concentrations via statistical
Atmos. Chem. Phys. 2014, 14, 3151−3173. downscaling. J. Exposure Sci. Environ. Epidemiol. 2014, 24 (4), 398−
(2) Zhang, R.; Jing, J.; Tao, J.; Hsu, S.-C.; Wang, G.; Cao, J.; Lee, C.; 404.
Zhu, L.; Chen, Z.; Zhao, Y. Chemical characterization and source (21) Levy, R. C.; Remer, L. A.; Kleidman, R. G.; Mattoo, S.; Ichoku,
apportionment of PM2.5 in Beijing: seasonal perspective. Atmos. Chem. C.; Kahn, R.; Eck, T. Global evaluation of the Collection 5 MODIS
Phys. 2013, 13 (14), 7053−7074. dark-target aerosol products over land. Atmos. Chem. Phys. 2010, 10
(3) He, K.; Yang, F.; Ma, Y.; Zhang, Q.; Yao, X.; Chan, C. K.; Cadle, (21), 10399−10420.
S.; Chan, T.; Mulawa, P. The characteristics of PM2.5 in Beijing, China. (22) Tao, M.; Chen, L.; Su, L.; Tao, J., Satellite observation of
Atmos. Environ. 2001, 35 (29), 4959−4970. regional haze pollution over the North China Plain. J. Geophys. Res.
(4) Nel, A. Air pollution-related illness: effects of particles. Science 2012, 117, (D12).10.1029/2012JD017915
2005, 308 (5723), 804−806. (23) Engel-Cox, J. A.; Holloman, C. H.; Coutant, B. W.; Hoff, R. M.
(5) Pope, C. A., III; Dockery, D. W. Health effects of fine particulate Qualitative and quantitative evaluation of MODIS satellite sensor data
air pollution: lines that connect. J. Air Waste Manage. Assoc. 2006, 56 for regional and urban scale air quality. Atmos. Environ. 2004, 38 (16),
(6), 709−742. 2495−2509.
(6) Lin, J.; Nielsen, C. P.; Zhao, Y.; Lei, Y.; Liu, Y.; McElroy, M. B. (24) Kloog, I.; Chudnovsky, A. A.; Just, A. C.; Nordio, F.; Koutrakis,
Recent changes in particulate air pollution over China observed from P.; Coull, B. A.; Lyapustin, A.; Wang, Y.; Schwartz, J. A new hybrid

4758 DOI: 10.1021/acs.est.5b05940


Environ. Sci. Technol. 2016, 50, 4752−4759
Environmental Science & Technology Article

spatio-temporal model for estimating daily multi-year PM 2.5


concentrations across northeastern USA using high resolution aerosol
optical depth data. Atmos. Environ. 2014, 95, 581−590.
(25) Kloog, I.; Koutrakis, P.; Coull, B. A.; Lee, H. J.; Schwartz, J.
Assessing temporally and spatially resolved PM2.5 exposures for
epidemiological studies using satellite aerosol optical depth measure-
ments. Atmos. Environ. 2011, 45 (35), 6267−6275.
(26) Van Donkelaar, A.; Martin, R. V.; Pasch, A. N.; Szykman, J. J.;
Zhang, L.; Wang, Y. X.; Chen, D. Improving the accuracy of daily
satellite-derived ground-level fine aerosol concentration estimates for
North America. Environ. Sci. Technol. 2012, 46 (21), 11971−11978.
(27) Zheng, Y.; Zhang, Q.; Liu, Y.; Geng, G.; He, K. Estimating
ground-level PM2.5 concentrations over three megalopolises in China
using satellite-derived aerosol optical depth measurements. Atmos.
Environ. 2016, 124, 232−242.
(28) Just, A. C.; Wright, R. O.; Schwartz, J.; Coull, B. A.; Baccarelli,
A. A.; Tellez-Rojo, M. M.; Moody, E.; Wang, Y.; Lyapustin, A.; Kloog,
I. Using high-resolution satellite aerosol optical depth to estimate daily
PM2.5 geographical distribution in Mexico City. Environ. Sci. Technol.
2015, 49 (14), 8576−8584.
(29) Oliver, M. A.; Webster, R. Kriging: a method of interpolation for
geographical information systems. International Journal of Geographical
Information System 1990, 4 (3), 313−332.
(30) Zhang, Q.; Streets, D. G.; Carmichael, G. R.; He, K.; Huo, H.;
Kannari, A.; Klimont, Z.; Park, I.; Reddy, S.; Fu, J. Asian emissions in
2006 for the NASA INTEX-B mission. Atmos. Chem. Phys. 2009, 9
(14), 5131−5153.
(31) Lv, B.; Zhang, B.; Bai, Y. A systematic analysis of PM2.5 in
Beijing and its sources from 2000 to 2012. Atmos. Environ. 2016, 124,
98−108.
(32) Yang, L.; Cheng, S.; Wang, X.; Nie, W.; Xu, P.; Gao, X.; Yuan,
C.; Wang, W. Source identification and health impact of PM2.5 in a
heavily polluted urban atmosphere in China. Atmos. Environ. 2013, 75,
265−269.
(33) Madaniyazi, L.; Nagashima, T.; Guo, Y.; Yu, W.; Tong, S.
Projecting Fine Particulate Matter-related Mortality in East China.
Environ. Sci. Technol. 2015, 49 (18), 11141−11150.
(34) Jiang, J.; Zhou, W.; Cheng, Z.; Wang, S.; He, K.; Hao, J.
Particulate matter distributions in China during a winter period with
frequent pollution episodes (January 2013). Aerosol Air Qual. Res.
2015, 15 (2), 494.
(35) Remer, L. A.; Kaufman, Y.; Tanré, D.; Mattoo, S.; Chu, D.;
Martins, J. V.; Li, R.-R.; Ichoku, C.; Levy, R.; Kleidman, R. The
MODIS aerosol algorithm, products, and validation. J. Atmos. Sci.
2005, 62 (4), 947−973.
(36) Gryparis, A.; Paciorek, C. J.; Zeka, A.; Schwartz, J.; Coull, B. A.
Measurement error caused by spatial misalignment in environmental
epidemiology. Biostatistics 2009, 10 (2), 258−274.
(37) Szpiro, A. A.; Paciorek, C. J.; Sheppard, L. Does more accurate
exposure prediction necessarily improve health effect estimates?
Epidemiology (Cambridge, Mass.) 2011, 22 (5), 680.

4759 DOI: 10.1021/acs.est.5b05940


Environ. Sci. Technol. 2016, 50, 4752−4759

You might also like