# Applications of induction to science

Examples of induction are fitting distributions to data, or finding relationships between observations. For example the observations of counting microbe cultures in a food matrix can give an histogram as shown at figure 1. These observations could follow a distribution, as normal or other types of distributions. Specific tests should be run to test the fitting of the data to a distribution and the interpretation of the result should dependent on the type of the distribution. In this way a theory is built, for example that the counts of the x microbe in y matrix follows a normal distribution.

The function "fitdistr" from library MASS can help in fitting the distribution. For the Normal, log-Normal, exponential and Poisson distributions the closed-form MLEs (and exact standard errors) are used, and start should not be supplied. Here is the code using R for creating the plot and fitting the distribution. Note that the data were simulated using R's Random Number Generator and "rnorm" function.

### R code for creating the histogram ### --------------------------------- set.seed(65) x <- rnorm(500) ### We keep the breaks of a histogram (xb) for later use xb<-hist(x, xlab="log10 (Counts)", main = "Fitting a Distribution", freq=F)$breaks ### --------------------------------------- ### End of code for histogram ### --------------------------------------- ### --------------------------------------- ### Code for fitting a distribution ### --------------------------------------- require(MASS) fitted_x<-fitdistr(x, "Normal") print(fitted_x) ### results # mean sd # -0.001520167 1.062770441 # ( 0.047528539) ( 0.033607752) print(fitted_x$estimate) ### results # mean sd #-0.001520167 1.062770441 print(fitted_x$sd) ### results # mean sd #0.04752854 0.03360775 print(fitted_x$loglik) ### results #[1] -739.9088 ### --------------------------------------- ### End of code for fitting a distribution ### --------------------------------------- ### --------------------------------------- ### Code for plotting the fitted distribution ### --------------------------------------- ### We know use the fitted results to plot the lines over the histogram ### We use the breaks saved before for dnorm function xbc<-seq(min(xb),max(xb),length=50) lines(xbc,dnorm(xbc,fitted_x$estimate[1],fitted_x$estimate[2]),lty=3) # add a box around the plot box() ### --------------------------------------- ### End of code for plotting the fitted distribution ### ---------------------------------------

Another example is the investigation of the relationship between two variables, using observations. For example some observations were plotted at figure 2, which represents the log of the cultures of a pathogen in a food matrix that was observed over time (data from R library MASS). These observations were plotted as a scatter plot and a regression line was fitted. The regression, can describe the relationship and can be used for predictions of growth in predictive microbiology modelling.

The graph gives a first idea about the relationship of growth of counts over time. A linear regression of the variables can be done to investigate further this relationship, and make a predictive model for the growth of the microbe over time. The code for the graph and for the linear regression (results and diagnostics) are presented at the following box.

### Load the library require(MASS) ### Load the dataset attach(hills) ### Plotting code ### -------------------- plot(time,dist, main="Finding a Relationship", ylab="log10 (Counts)") abline(lm(dist~time)) ### -------------------- ### End of plotting ### -------------------- ### Linear regression ### -------------------- summary(lm(dist~time)) ### Results ### -------------------- # Call: # lm(formula = dist ~ time) # # Residuals: # Min 1Q Median 3Q Max # -6.63742 -0.55581 0.02566 0.97145 6.78845 # # Coefficients: # Estimate Std. Error t value Pr(>|t|) # (Intercept) 1.65347 0.57408 2.88 0.00693 ** # time 0.10151 0.00755 13.45 6.08e-15 *** # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # # Residual standard error: 2.203 on 33 degrees of freedom # Multiple R-squared: 0.8456, Adjusted R-squared: 0.841 # F-statistic: 180.8 on 1 and 33 DF, p-value: 6.084e-15 ### -------------------- ### Plot the regression diagnostics ### -------------------- par(mfrow=c(2,2));plot(lm(dist~time)) ### -------------------- ### De-load the dataset detach()

The plot of the regression diagnostics created with R is presented at the figure 3.

The results of the regression show clearly that the relationship is good and that both estimates are significant. The overall model and theory produced from this inductive analysis is an equation that predicts the growth of microbes, as follows:

In these two examples two new theories were inducted, that the log10(Counts) is following a normal distribution and that the growth of the microbs (the log10(Counts)) is linearly related to the time, keeping other factors constant. Using these and other theories, a new theory can be deducted and validated as we will see in the section of Applications of deduction to science.

--Ilias.soumpasis 11:08, 21 November 2008 (UTC)