Prediction and optimization of performance and emission characteristics of a dual fuel engine using machine learning

The current research in engine, fuel and lubricant development are aiming towards environmental protection by reducing the harmful emissions. The testing under various conditions becomes mandatory before releasing product to meet the sustainable development goals of United Nations. This experimentation and testing under various operating conditions is time-consuming and tiresome process; it also leads to wastage of manpower, money, precious timeand scarce resources. Intelligent techniques likeMachineLearning (ML)hasproven it’s usage in almost all domains, trying to simulate the results as trained. This advantage is used to predict the performance and emission characteristics of a dual fuel engine. In this study, the experimental data are obtained from a single cylinder CI engine by operating under dual fuel mode using biogas and diesel as primary and secondary fuel respectively.The inputparameters suchasbiogasflowrate,methane fraction (MF), torqueand intake temperature are considered to predict the output parameters. The output parameters of the study includes performance attributesBrake thermal efficiency, secondary fuel energy ratio, and emissions attributesHC,CO,NOxandsmoke. The proposedmodel uses Random forest Regressor and is trained using 324 distinct experiences recorded through physical experimentation. Themodel is validated usingR scorewhich is observed to be 0.997 for the given dataset while trained and tested in the ratio of 85:15. The outputs of themodel are used to compute the output data for any new values of input attributes. The optimized values of the input parameters that could give maximum thermal efficiency and minimum emission is found using Lagrangian optimization. The optimized values are 12.48 Nm torque, 8.29 lit/min of biogas flow rate, methane fraction of 72.8%, intake temperature of 68.3 °C.


Introduction
The constant depletion of conventional fuels and stringent emission norms are motivating researchers to find better alternative fuel for stationary and automotive engines. Biogas is one of the promising alternative fuels for IC engines. It is a mixture of methane (50-70%), carbon dioxide (30-45%) and other gases. Presence of carbon dioxide reduces calorific value and ignitability of biogas. Increase in methane fraction improves the calorific value of biogas. Removal of carbon dioxide ensures high methane fraction and it can be done through various processes such as water scrubber, chemical separation, membrane separation, pressure swing separation and cryogenic separation methods. It is very difficult to ignite biogas without pilot fuel due to high self-ignition temperature of biogas. Increasing the intake temperature may enhance the combustion of biogas. Therefore, it is significant for the researchers to model and quantify the association between input characteristics and the different gaseous that are emitted. Finding optimal values for input parameters of engine to get minimal quantity of exhaust gases with increased efficiency becomes vital for justification of usage. Huge quantum of experimentation is required to find the optimal values which in real time consumes huge resources including fuels, human resource, budget, time etc., Industry 4.0 focuses on digital transformation in industries requiring robotization of most production processes thereby reducing resource utilization at all levels. In alignment with Industry 4.0, there has been lot of works carried out to develop predictive models using machine learning approaches thereby reducing manpower, fuel, time, and budget. The major contributions of the paper are given below.
The data obtained from experimentation including biogas flow rate, methane fraction, torque and intake charge temperature as input parameters for engine and various gaseous emissions as output parameters are used to develop a predictive model using Random Forest Regressor. The model is validated using various metrics and with testing data. Finally, the optimized values of input parameters that may result in minimized gaseous emission is computed using Lagrangian optimization.
The paper is organized as follows. A detailed literature review on biofuel usage and utilization of predictive approaches in thermal industry is discussed in Section 2. The experimental set up and methodology is discussed in Section 3. Modeling using data collected from experiments is explained in Section 4. A detailed analysis with comparison towards predicted and experimental data is detailed in Section 5. Optimization procedure to arrive at probable input values is explored in Section 6. Finally, conclusions are emphasized and presented in Section 7.

Literature review
Two possible methods are available to utilize biogas in CI engine such as HCCI (Homogeneous Charge Compression Ignition) and dual fuel mode. In dual fuel mode, pilot fuel is used along with biogas which is having high self-ignition temperature. Addition of biogas in CI engines reduces brake thermal efficiency due to the presence of CO 2 in biogas. CO 2 reduces flame speed and combustion temperature [1,2]. Combustion temperature can be increased using enhanced compression ratio which improves combustion and reduces HC and CO emissions [3]. Up to 48% diesel substitution was achieved by Duc and Wattanavichien [4] observed around 50% diesel substitution for dual fuel mode in IDI (Indirect Injection) engine.
Fuel conversion efficiencies are low for dual fuel mode compared to diesel-only mode at low and medium loads. However, same fuel conversion efficiencies were observed at high load conditions. Comparable BSEC (Brake Specific Energy Consumption) has also been stated by Mustafi et al. [5] for the two modes. Varying injection amounts of the pilot fuel will regulate the emission parameters [5,6]. The brake thermal efficiency can be improved by an increase in the intake temperature, compression ratio and oxygen content [3,7,8]. Supplying extra oxygen decreases the volatility of combustion and increases diesel replacement. This reduces the methane emitted, while the effects on CO emissions is inconsistent [7]. In dual fuel mode, Sorathia and Yadav [9] noted that coolant losses were higher and exhaust losses lower, the net result being thermal efficiency similar to that of diesel-only mode. In dual fuel mode, reduced percentage exergy destruction and greater exergy efficiency were recorded.
Barik and Murugan [10] have studied biogas fueled dual fuel operation using 0-0.6 kg/h biogas flow rate. The replacement of up to 30% diesel was accomplished at full loading. Due to CO 2 displacing more air and decreasing the burning rate, volumetric efficiency declined and BSEC increased to high values in dual fuel mode. Better performance and emission indices are observed at 0.9 kg/h biogas flow rate. Rahman and Ramesh [11] examined methane fraction effects in the 24-68% range. The impact of the full-range methane fraction (from raw biogas to pure methane) is, however, not investigated. The presence of CO 2 in biogas has an impact comparable to EGR, minimizing NO x , but increasing emissions of CO and HC. Homogeneous mixture of biogas replaces heterogeneous nature of diesel and reduces smoke emissions [12]. Barik and Sivalingam [13] recorded low exhaust temperature, high HC and CO emissions and low NO x and smoke emissions.
Utilization of prediction models in mechanical industry is discussed by many researchers. Girish Kanta et al. [14] predicted the machining parameters using Artificial Neural Networks coupled with Genetic algorithm and has proved the agreement of predicted data with experimental data through validation. Pham et al. [15] discussed about the usage of random forest combined with particle swarm optimization to predict undrained shear strength of soil.

Experimental setup
In this investigation, a regular single-cylinder 4-stroke direct injection CI engine (6 kW maximum power) was used. An arrangement for the introduction of simulated biogas for dual fuel operation was provided for the test setup (see Fig. 1). By blending compressed CH 4 and CO 2 delivered from separate cylinders through pressure controlling valves and thermal mass flow meters, biogas was synthesized. In order to adjust the composition of the simulated biogas, the gas flow rates were regulated independently. Biogas purity was expressed in terms of the concentration of methane (by volume). To put in the CH 4 and CO 2 , a Y-shaped nozzle was connected to the manifold, which was eventually combined in a cylindrical chamber. To ensure that the piston and valve-induced flow oscillations did not affect the flow meter readings, the nozzle was located sufficiently upstream of the cylinder. Further downstream, to preheat the biogas-air mixture, a honeycomb structured heater was mounted in the intake manifold. Using a voltage regulator, the heating rate was controlled. To identify air flow rate and fuel flow rate, an orifice meter and a burette were used respectively. Using a portable gas analyser and smoke meter, the four main emissions of exhaust gases (HC, CO, NO x and smoke) were found.

Experimental methodology
A detailed factorial analysis was conducted to investigate the effects on performance and emission characteristics of biogas flow rate, methane fraction, torque and intake charge temperature under biogas-diesel dual fuel operation. The range of each input parameter was chosen to cover the engine's entire operating range. The biogas flow rate range (2-16 l/min) is equal to 10-90% of the overall energy released during combustion. Methane fraction amounts, viz. The availability of methane in naturally derived biogas and pure methane is 50-100%. In order to investigate the effects of charge preheating, the intake temperature was configured to differ from room conditions (35°C) to the actual working limit of the heater (100°C). At a constant rate of 1900 rpm, the applied load ranged from 20 to 90% of the rated engine load. Operating parameters are provided in Table 1. The method offered by Moffat [14] is used to find uncertainties in the output parameters. For all parameters, less than 3% uncertainties were noticed. A total of 324 independent observations are done and is stored in an excel file. The combination of operating parameters along with emission characteristics is shown in Table 2.

Modeling of system using machine learning
Data collected from physical experimentation is stored using Microsoft excel. Load, Biogas flowrate, Methane fraction and intake temperature are considered to be predictor variables and Carbon-monoxide, Hydrocarbon, smoke, Nitrogen oxide and brake thermal efficiency are considered to be response variables. Around 325 independent experimentations are done by varying input parameters and the results are recorded. Initial analysis of data is presented in Table 3.
The data is preprocessed, trained using random forest and the results are used for finding optimal values of input features. Figure 2 depicts the steps carried out in arriving at the optimal value.

Data preprocessing
Every value received from experimentation is recorded manually and hence are prone to errors. Before proceeding to modeling, the entire dataset is preprocessed to align the values suitable for processing by machine learning algorithms. The following activities are done: -The dataset is checked for missing values. Two features (Load, Intake temperature) are found to have 15 data missing in common. Mean of each feature is calculated and are used to fill up the missing values. -Correlation analysis is then performed to understand the linearity between predictor and response variables. In particular, correlation between input variables and each output variable is calculated using Pearson's correlation coefficient. Values or correlation range from À1 to +1 indicating positive to negative correlation. Linearity exists if the values are either near to +1 or À1. Figures 3 and 4 presents the correlation variables. The correlation between the input and output features are very low lying in the midst between À1 and +1, thus justifying the usage of methods suitable for non-linear models.
-    -The values are recorded in different scales and each feature may have different numeric range. Variations in the scales across input variables may increase the trouble in understanding the results of the problem being modele Since, algorithms tend to work with numeric values; each data is considered quantitatively and hence will create an impact on the modeling. When the ranges of each feature are different, the impact on the results will be large. Hence, to avoid such anomaly, data is normalized to a range of 0 and 1 using min-max normalization. Training dataset is normalized to reduce anomaly due to varying numeric ranges. Figure 5 presents the snapshot of normalized values corresponding to input parameters.

Prediction modeling 4.2.1 Random forest regressor
Random forest is a supervised ensemble learning algorithm that could be used both for regression and classification. Based on Correlation and Regression Trees (CART), this algorithm constructs a swarm of decision trees during training time. The regression trees are created through a process called bootstrapping, a process of drawing random samples from the dataset with replacement. Constructed regression trees are then used as base learners. Each node in a regression tree represents a binary test against the selected predictor variable. The selected variable is tend to minimize the MSE (residual sum of squares) for the data flowing down left and right branches. Mean of all the predicted values from each tree is then considered to be the final prediction. The advantage of CART is that it can fit the data very well and may result in low bias. But, since the results completely depend on input data, CART may suffer from high variance. This can be adjusted by selecting 'm' randomly selected predictors, m <= n ('n' represents the actual number of predictors) for construction each tree and by combining the results and hence Random forest is proven to reduce high variance and overfitting problems. The performance may be validated using R 2 as the metric.    Method: * Create a random forest using the following steps: * Create a bootstrap sample D' from the dataset D. * Construct a decision tree for the bootstrapped sample. * Steps (a) and (b) are repeated for 'n' number of times, where 'n' is the number of subtrees required or possible. * Calculate predictive value from each tree. The mean of the predicted values represent the output.

Implementation
Random forest regressor is used to model the nonlinear relationship from predictor variables to response variables. This is implemented using Python with Jupyter notebook as frontend. The random forest model is trained using the normalized training dataset. Optimal values of response variables are found by tuning various hyperparameters. Table 4 presents the various hyperparameters used for tuning the performance of the model. With fine tuning, a regression forest is built part of which is shown in Figure 6.
Deviation observed between expected and actual value is termed as error or loss. Model evaluation is done using Root Mean Square Error (RMSE) as given in equation (1). Figure 7 represents the actual and predicted values of all response variables (Carbon Monoxide, Hydrocarbon, Smoke, Nitrous Oxide and Brake thermal efficiency). Values vary from 0.07 (minimum value of carbon monoxide) to 1260 (maximum value of Nitrous oxide).

Optimization
The use of optimization in various applications is discussed by researchers [16,17]. Once the modeling is done, the final task is to predict the values of input features that could produce minimum exhaust gases with increase in brake thermal efficiency. This is performed by solving constrained optimization using Lagrangian optimizer. The boundary values for the input variables are shown in Table 5. Two optimizations are performed, first to minimize the exhaust gas output and second to maximize the brake thermal efficiency. The boundary conditions as stated in Table 5 are used as constraints for both optimizations. The output of objective function is computed using the modeling created using Random forest Regressor. The results of two optimizations are averaged to arrive at the final results.

Results and discussion
The R 2 values are recorded by varying different proportions of training and testing data. Figure 8 showing the R 2 for different proportions of training and testing data reveals that the model is stabilized above 85:15 ratios. The score towards 100% reveals that the model could represent all variability of output data.
The proposed model is compared with R 2 score of models trained with multilinear regression, support vector regression and KNN algorithms. Figure 9 represents the comparisons of models based on R 2 score. The proposed model outperforms other model and the results are satisfactory.

Conclusions
The environmental degradation can be reduced and prevented by optimizing the dual fuel engine operating parameters by following machine learning techniques for prediction and lagrangian optimization. The optimized operating parameters such as torque, bio gas flow rate, methane fraction and fuel intake temperature for minimum emission and maximum thermal efficiency are 12.48 Nm, 8.29 l/min, methane fraction of 72.8%, intake temperature of 68.3°C respectively.