Applications of induction to science

From DeductiveThinking Wiki
Jump to: navigation, search

Examples of induction are fitting distributions to data, or finding relationships between observations. For example the observations of counting microbe cultures in a food matrix can give an histogram as shown at figure 1. These observations could follow a distribution, as normal or other types of distributions. Specific tests should be run to test the fitting of the data to a distribution and the interpretation of the result should dependent on the type of the distribution. In this way a theory is built, for example that the counts of the x microbe in y matrix follows a normal distribution.

Figure 1. Example of fitting a distribution to data

The function "fitdistr" from library MASS can help in fitting the distribution. For the Normal, log-Normal, exponential and Poisson distributions the closed-form MLEs (and exact standard errors) are used, and start should not be supplied. Here is the code using R for creating the plot and fitting the distribution. Note that the data were simulated using R's Random Number Generator and "rnorm" function.

### R code for creating the histogram 
### ---------------------------------

set.seed(65)
x      <- rnorm(500) 

### We keep the breaks of a histogram (xb) for later use
xb<-hist(x, xlab="log10 (Counts)",  main   = "Fitting a Distribution", freq=F)$breaks

### ---------------------------------------
### End of code for histogram
### ---------------------------------------
### ---------------------------------------
### Code for fitting a distribution
### ---------------------------------------
require(MASS)
fitted_x<-fitdistr(x, "Normal")
print(fitted_x)
### results
#       mean            sd     
#  -0.001520167    1.062770441 
# ( 0.047528539) ( 0.033607752)

print(fitted_x$estimate)
### results
#        mean           sd 
#-0.001520167  1.062770441 

print(fitted_x$sd)
### results
#      mean         sd 
#0.04752854 0.03360775 

print(fitted_x$loglik)
### results
#[1] -739.9088

### ---------------------------------------
### End of code for fitting a distribution
### ---------------------------------------
### ---------------------------------------
### Code for plotting the fitted distribution
### ---------------------------------------

### We know use the fitted results to plot the lines over the histogram
### We use the breaks saved before for dnorm function
xbc<-seq(min(xb),max(xb),length=50)
lines(xbc,dnorm(xbc,fitted_x$estimate[1],fitted_x$estimate[2]),lty=3)

# add a box around the plot
box() 

### ---------------------------------------
### End of code for plotting the fitted distribution
### ---------------------------------------

Another example is the investigation of the relationship between two variables, using observations. For example some observations were plotted at figure 2, which represents the log of the cultures of a pathogen in a food matrix that was observed over time (data from R library MASS). These observations were plotted as a scatter plot and a regression line was fitted. The regression, can describe the relationship and can be used for predictions of growth in predictive microbiology modelling.

Figure 2. Exploring a relationship
Figure 3. Plot regression diagnostics

The graph gives a first idea about the relationship of growth of counts over time. A linear regression of the variables can be done to investigate further this relationship, and make a predictive model for the growth of the microbe over time. The code for the graph and for the linear regression (results and diagnostics) are presented at the following box.

### Load the library
require(MASS)
### Load the dataset
attach(hills)
### Plotting code
### --------------------
plot(time,dist, 
     main="Finding a Relationship", 
     ylab="log10 (Counts)")
abline(lm(dist~time))
### --------------------
### End of plotting
### --------------------
### Linear regression
### --------------------
summary(lm(dist~time))
### Results
### --------------------
# Call:
# lm(formula = dist ~ time)
# 
# Residuals:
#      Min       1Q   Median       3Q      Max 
# -6.63742 -0.55581  0.02566  0.97145  6.78845 
# 
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  1.65347    0.57408    2.88  0.00693 ** 
# time         0.10151    0.00755   13.45 6.08e-15 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
# 
# Residual standard error: 2.203 on 33 degrees of freedom
# Multiple R-squared: 0.8456,	Adjusted R-squared: 0.841 
# F-statistic: 180.8 on 1 and 33 DF,  p-value: 6.084e-15 
### --------------------
### Plot the regression diagnostics
### --------------------
par(mfrow=c(2,2));plot(lm(dist~time))
### --------------------
### De-load the dataset
detach()

The plot of the regression diagnostics created with R is presented at the figure 3.

The results of the regression show clearly that the relationship is good and that both estimates are significant. The overall model and theory produced from this inductive analysis is an equation that predicts the growth of microbes, as follows:

 \log 10(Counts) = 1.65347 + 0.10151 * time

In these two examples two new theories were inducted, that the log10(Counts) is following a normal distribution and that the growth of the microbs (the log10(Counts)) is linearly related to the time, keeping other factors constant. Using these and other theories, a new theory can be deducted and validated as we will see in the section of Applications of deduction to science.

--Ilias.soumpasis 11:08, 21 November 2008 (UTC)