Prediction and Modeling of Dry Seasons Air pollution changes using multiple linear Regression Model: A Case Study of Port Harcourt and its Environs, Niger Delta, Nigeria

The influence of meteorological parameters on air pollutants over Port Harcourt and its environs in the dry season was modeled using multiple linear regressions model. Results indicated that meteorological parameters significantly influenced pollutant concentrations; results also showed poor linear relationships between meteorological parameters and pollutant concentrations, and that meteorological parameters are poor predictor variables of concentrations of air pollutants in the area. Pollution roses of pollutants dispersion pattern in the study area showed that pollutant concentrations increase with increased wind speed. Result also showed that wind speed exerts positive influence on the concentration levels of pollutants in the study area. The yearly prediction of air pollutants was also carried out using a ten-year data from previous studies conducted in the study area. The prediction was done using regression analysis and year as the predictor variable to develop a model. The relationship between air pollutants and year was therefore established for the annual prediction of the future pollutant concentrations in the dry seasons for period of the next fifteen years.


INTRODUCTION
Air quality impacts on the environment can therefore be quantified by simulating environmental conditions using analytical tool known as modeling (Okpala et al., 2013). The simulating of real-life environmental situations can use the systematic method called modeling. Modeling is a tool by which mathematical equations are used to predict the air pollutants future behaviour. Modeling assists in studying and predicting the impacts of various environmental components and also viewing the environment as a system by representing simplified variation of environmental system mathematically and also prediction, testing and comparison of reasonable alternative situations (Okpala et al., 2013). The effective and efficient way to understand the interactions of various air pollution scenarios as relate with meteorology, topography and existing air quality characteristics are air pollution models (Okpala et al., 2013). The relative high concentration of air pollutants in Port Harcourt can be attributed majorly to industrial activities such as oil and gas related activities and vehicular emissions (Antai, 2016). Geographical and meteorological conditions of the study area can also influence some local background concentration of air pollutants since there is a relationship between air pollution and meteorological variables, thus air pollution modeling is the development of a functional relationship between air pollutions concentration and other control variables. Most of the conventional models have been proved inaccurate (Esplin, 1995). These models depend basically on detailed knowledge of pollutant sources, topography in the surrounding environment (Elangasinghe et al., 2014). Multiple linear regression (MLR) model was developed and applied to predict the variations of air pollutants concentrations with meteorological parameters of the study area. This study highlights how the relationships between measured air pollutant concentrations and meteorological parameters were modeled using multiple linear regressions and generalized additive model.

II. METHODOLOGY METHOD OF DATA ANALYSIS AND MODELING
Mean concentration of air pollutants was computed using equation (1) (1) Standard deviation was computed using equation (2) (2) Standard error estimate was determined using equation (3) (3) where, s is the standards deviation, Xmeas,i is the measured i th data point, X is the mean and N is the total number of data set.

Coefficient of variation of air pollutants
The coefficient of variation of each parameter was computed using Equation (4) Computation of Exceedance Factor (EF) A factor known as Exceedance Factor (CPCB, 2006) was used to determine pollutants compliance with national and international standards. The Exceedance Factor (EF) was calculated using equation (5) as follows: (5) whereCiisthemeasured concentration of the i th parameter in the ambient air.
Cstd is the regulatory standard recommended for the i th parameter.
For EF < 100, the parameter is said to be withing permissible limit, and for EF > 100, the parameter is said to exceed permissible limit. The EF for each pollutant was computed based on the Federal Ministry of Environment (FMEnv) stipulated permissible limit as contained in FEPA (1991,1992) and National Ambient Air Quality Standards (NAAQS).

Model Development
Multiple linear regression (MLR) models were applied to predict the variations of pollutant concentrations with meteorological parameters. The following steps were applied in the model building process.
i. Data was collected through field measurement. ii.
Data was prepared and analysed using statistical software. iii.
Appropriate variables were selected as input parameters. iv.
Models were built using the variables. v.
Models were tested and validated models and N X X n i i meas Pollutants were predicted using built models.
Multiple linear regression (MLR) modeling approach was employed to model the influence of meteorological variations on air pollutants. Modeling was based on the following fundamental approaches: Where; Yi and yi are model outcomes or outputs, X1, X2, ……,Xn are predictor variables, b0, b1, b2, …….,bnare regression coefficients, and ɛiis the error factor called residual. Multiple linear regressions (MLR) modeling technique was employed to predict air pollutants concentration in the study area using wind speed (Ws), wind direction (Wd), temperature (Temp), air pressure (Ap) and relative humidity (Rh) as predictor variables. The multiple linear regressions were performed using Statistical Package for the Social Science (SPSS) software, originally developed by International Business Machines (IBM). Stepwise regression approach was used to determine the relationship between air pollutants and individual meteorological parameter. Stepwise regression of independent parameter was performed using Equations (7) and (8).

Model Validation
The model performance was evaluated in consonance with guidelines instituted by EPA (2007). Specific analyses was performed to validate the model outputs against measured data. Both quantitative (statistical) and qualitative (visual) methods were adopted. Measured data was paired against predicted values. Various statistical parameters such as mean square error (MSE) , root mean sqaure error (RMSE) were used to validate and determine the quality of the prediction models. In addition, a measure of goodness of fit known as coefficient of determination, R-sqaure (R 2 ) was used to determine the total variability in the dependent variables that is accounted for by the model eqautions.
The mean square error (MSE) was computed as the mean difference between predicted and measured values using Equation (9), while the root mean square error was computed using Equation (10). (10) where N is the number of measured data or observations. Sum of square error (SSE) will be calculated using equation (11) The sum of squares of the regression model (SSM) was computed using Equation (12).
The residual sum of squares (RSS) was computed using Equation (13)   (13) The residual sum of square error is therefore computed as The residual sum of squares (SSR) was computed using Equation (14).
The total sum of squares (SST) was computed using Equation (15).

Coefficient of determination R-sqaure (R 2 )
The coefficient of determination is the proportion of the total sample variability explained by the regression models and indicates how well the models fit the data. The coefficient of determination was computed using Equation (

III. PRESENTATION OF RESULT (i) Variation of Volatile Organic Compounds (VOCs) with Meteorological Parameters in the Dry Season
The results (shown in Figure 2 (a-e)) indicated that VOCs varied significantly with temperature, and positively correlated with wind speed. The stepwise regression linear models (shown in Table 1) show that the linear relationships between VOCs and wind speed, wind direction, relative humidity and air pressure are not significant at 0.05 confidence levels. However, the relationship between ambient temperature and VOCs concentrations is significant at 0.01 confidence level for a 2-tail test with a coefficient of determination (R 2 ) of 0.015). This implies that though VOCs varies significantly with temperature, only a fraction of 1.5% of the variation can be explained. Results (Table 1) further indicated that wind speed, wind direction, relative humidity and air pressure respectively accounted for 1.8%, 0.18%, 0.14% and 0.014% of the variation.  A multiple linear regression model for the prediction of VOCs was developed using all the meteorological parameters as predictor variables. The model for the prediction of VOCs concentrations was therefore derived as shown in Equation (17). The derived Equation (17)   *Not significant at the 0.05 level (2-tailed). The mean square error (MSE) and the root mean square error were computed to be 31.999ppm and 5.6568ppm respectively. The model sum of squares error (SSM), residual sum of squares error (SSR) and total sum of squares error (SST) were computed to be 159.996ppm, 3567.538ppm and 3727.534ppm respectively as shown in Table 2. The result ( Table 2) showed that meteorological parameters significantly (P-value <0.05) influence the concentrations of VOCs in the area. However, the goodness of fit ( Figure 3) shows a poor linear relationship between VOCs and meteorological parameters with a coefficient of determination (R 2 ) of 0.043. This implies that meteorological parameters accounted for only 4.3% of the variation of VOCs concentrations in the area. The goodness of fit between predicted and measured concentrations of VOCs is shown in Figure 3, while the predicted values are plotted against measured values as shown in Figure 4. ://dx.doi.org/10.22161/ijeab/3.3.25  ISSN: 2456-1878 www.ijeab.com Page | 905 Figure 5 (a-e)) showed that concentrations of CO correlated significantly with wind speed in a positive manner. The stepwise regression linear models (shown in Table 3) show that the linear relationships between concentrations of CO and wind direction, relative humidity, temperature and air pressure are not significant at 0.05 confidence levels. However, the relationship between wind speed and concentrations of CO is highly significant at 0.01 confidence level for a 2-tail test with a coefficient of determination (R 2 ) of 0.088. This implies that though concentrations of CO vary positively with wind speed, only a fraction of 8.8% of the variation can be explained.

Results (shown in
(a.) (b.)   Figure 6) between predicted and measured values showed a poor linear relationship between CO concentrations and meteorological parameters with a coefficient of determination (R 2 ) of 0.120. This implies that meteorological parameters accounted for only 12.0% of the variation of concentrations in the area in the dry season. The goodness of fit between predicted and measured concentrations of CO is shown in Figure 6, while the predicted values are plotted against measured values as shown in Figure 7. The results (shown in Figure 8 (a-e)) indicated that PM2.5 varied significantly with relative humidity and temperature and positively increased with wind speed and air pressure. The stepwise regression linear models (shown in Table 5) show that the linear relationships between PM2.5 and wind speed, wind direction and air pressure are not significant at 0.05 confidence levels. However, the relationship between relative humidity and concentrations of PM2.5 particulate matter is highly significant at 0.01 confidence level for a 2tail test with a coefficient of determination (R 2 ) of 0.047. This implies that though PM2.5 varies significantly with relative humidity, only a fraction of 4.7% of the variation can be explained.      Table 6. The result ( Table 6) showed that meteorological parameters significantly (P-value <0.05) influence the concentrations of PM2.5 in the area. However, the goodness of fit ( Figure 9) shows a poor linear relationship between PM2.5 and meteorological parameters with a coefficient of determination (R 2 ) of 0.125. This implies that only 12.5% of the variation of PM2.5 concentrations can be explained by the meteorological parameters. The goodness of fit between predicted and measured concentrations of PM2.5 is shown in Figure 9, while the predicted values are plotted against measured values as shown in Figure 10.  (Figure 11 (a-c)) showed that pollutant concentrations increase with increased wind speed. Low concentrations of pollutants were obtained at low wind speed and vice-versa. This implies that wind speed has positive influence on the concentration levels of pollutants in the study area.

Fig.11 (a-c): Pollution Roses of Pollutants in the Study Area in the Dry Season
The pollutant polar plots (Figure 12 (a-c)) showed that concentrations of pollutants in the area are associated with wind speed up to 3.5m/s. It is also observed from Figure 12 (a-c) that pollutant concentrations increase with increased wind speed (Folorunsho et al., 1995). Surface polar plots of pollutants concentrations in the study area revealed that high concentrations of SO2, NO2, NH3, H2S and VOCs are associated with the south-west and south-east directions and are dispersed toward the north-east and north-west directions (Jimmy et al., 2013). This may imply that sources of these pollutants are in the southern part, which is the coastal region of the study area. Industrial activities, especially in Eleme area (refineries, petrochemical company, fertilizer companies, industrial waste management facilities, civil construction, gas flaring, and vehicular movement) and the released of black carbon (black soot) due to illegal refineries in the coastal area may be the sources of these pollutants (Antai, 2017). The Figure also indicated that concentrations of CO is associated with south-west, south-east and north-east directions and are dispersed toward the north-west directions. This may imply that sources of this pollutant are both in the southern and northern parts, which are the coastal and up-land areas. Industrial activities, vehicular exhaust emissions, gas flaring and oil and gas exploitation in Eleme, Port Harcourt, Obio/Akpor and Etche areas might be the sources of these pollutant . Similarly, concentrations of Methane (CH4) and Particulate Matter (TSP, PM10 and PM2.5) are associated with both northern and southern directions. This showed that activities in the both the coastal and up-land areas are responsible for the release of these pollutants into the environment (Kochubovski et al., 2012). In other words, industrial activities, vehicular exhaust emissions, civil construction, the released of black carbon (black soot) due to illegal refineries in the coastal area, gas flaring and oil and gas exploitation in Eleme, Port Harcourt, Obio/Akpor, Etche and Ikwerre areas may be the sources of CH4 and particulate matter in the air environment of the study area in the dry season period .

Yearly Prediction for 15 Years for Dry Seasons
Yearly prediction of air pollutants was carried out using a ten year data from previous studies conducted in the study area.
The prediction was done using regression analysis and year as the predictor variable. The relationship between air pollutants and year was therefore established. The annual prediction of pollutant concentrations was made for the dry seasons. The prediction models for each pollutant in the dry season are presented in Equations (20 to 29).The prediction was made for a period of fifteen years (2017 to 2031) and the results of the annual prediction are presented in Table 7 for the dry seasons.