GOOGLE ADS

dimanche 24 avril 2022

How do I make predictions using an ordered factor coefficient in R?

I'm currently trying to develop my understanding of ordered factors in R and using them as dependent variables in a linear model. I understand the outputs.L,.Q and.C represent linear, quadratic and cubic but I'm wondering what is the "newx" that can be used the equations below to derive estimates for each level of my ordered factor.

I thought the "newx" was derived from the contr.poly() function but using this leads to a mismatch between my equation and the results derived from the predict() function. Can anyone help me understand what "newx" should be?

set.seed(101)
d <- data.frame(x=sample(1:4,size=30,replace=TRUE))
d$y <- rnorm(30,1+2*d$x,sd=0.01)
d$x = factor(d$x, labels=c("none", "some", "more", "a lot"))
Coefs <- coef(lm(y~ordered(x), d))
newx <- contr.poly(4)
predict(lm(y~ordered(x), d), newdata = data.frame(x = as.factor(c("none", "some", "more", "a lot"))))
Coefs[1]+(Coefs[2]*newx[1,1])+(Coefs[3]*newx[1,2]^2)+(Coefs[4]*newx[1,3]^3)
Coefs[1]+(Coefs[2]*newx[2,1])+(Coefs[3]*newx[2,2]^2)+(Coefs[4]*newx[2,3]^3)
Coefs[1]+(Coefs[2]*newx[3,1])+(Coefs[3]*newx[3,2]^2)+(Coefs[4]*newx[3,3]^3)
Coefs[1]+(Coefs[2]*newx[4,1])+(Coefs[3]*newx[4,2]^2)+(Coefs[4]*newx[4,3]^3)


Solution of the problem

Just give R a data frame with x values drawn from the levels of the factor ("none", "some", etc.), and it will do the rest.

I changed your setup slightly to change the type of x to ordered() within the data frame (this will carry through all of the computations).

d$x = ordered(d$x, labels=c("none", "some", "more", "a lot"))           
m1 <- lm(y~x, d) ## save fitted object
Coefs <- coef(m1)

Now we can predict():

predict(m1, newdata = data.frame(x=c("none","more"))) 
## 1 2
## 2.993959 6.997342

(didn't have to explicitly say that the new x was ordered())

If you want to dig a little bit deeper into the computations, you can look at the model matrix:

model.matrix(~unique(d$x)) 

For each level of the factor, these are the values R multiplies the coefficients by to generate the prediction (e.g. for level = "none", 1*b0 + (-0.67)*b1 + 0.5*b2 - 0.223*b3)

 (Intercept) unique(d$x).L unique(d$x).Q unique(d$x).C     
1 1 -0.6708204 0.5 -0.2236068
2 1 -0.2236068 -0.5 0.6708204
3 1 0.2236068 -0.5 -0.6708204
4 1 0.6708204 0.5 0.2236068

For even more detail, look at ?poly or the source code of poly() (although neither of these is easy!)

Aucun commentaire:

Enregistrer un commentaire

Comment utiliseriez-vous .reduce() sur des arguments au lieu d'un tableau ou d'un objet spécifique&nbsp;?

Je veux définir une fonction.flatten qui aplatit plusieurs éléments en un seul tableau. Je sais que ce qui suit n'est pas possible, mais...