Select Variables for a Formula Response or the RHS of a Formula
Select variables from a data frame whose names begin with a certain character string.
Select(data = list(), prefix = "y",
lhs = NULL, rhs = NULL, rhs2 = NULL, rhs3 = NULL,
as.character = FALSE, as.formula.arg = FALSE, tilde = TRUE,
exclude = NULL, sort.arg = TRUE)data |
A data frame or a matrix. |
prefix |
A vector of character strings, or a logical.
If a character then
the variables chosen from |
lhs |
A character string. The response of a formula. |
rhs |
A character string.
Included as part of the RHS a formula.
Set |
rhs2, rhs3 |
Same as |
as.character |
Logical. Return the answer as a character string? |
as.formula.arg |
Logical. Is the answer a formula? |
tilde |
Logical.
If |
exclude |
Vector of character strings. Exclude these variables explicitly. |
sort.arg |
Logical. Sort the variables? |
This is meant as a utility function to avoid manually:
(i) making a cbind call to construct
a big matrix response,
and
(ii) constructing a formula involving a lot of terms.
The savings can be made because the variables of interest
begin with some prefix, e.g., with the character "y".
If as.character = FALSE and
as.formula.arg = FALSE then a matrix such
as cbind(y1, y2, y3).
If as.character = TRUE and
as.formula.arg = FALSE then a character string such
as "cbind(y1, y2, y3)".
If as.character = FALSE and
as.formula.arg = TRUE then a formula such
as lhs ~ y1 + y2 + y3.
If as.character = TRUE and
as.formula.arg = TRUE then a character string such
as "lhs ~ y1 + y2 + y3".
See the examples below.
By default, if no variables beginning the the value of prefix
is found then a NULL is returned.
Setting prefix = " " is a way of selecting no variables.
This function is a bit experimental at this stage and
may change in the short future.
Some of its utility may be better achieved using
subset and its select argument,
e.g., subset(pdata, TRUE, select = y01:y10).
For some models such as posbernoulli.t the
order of the variables in the xij argument is
crucial, therefore care must be taken with the
argument sort.arg.
In some instances, it may be good to rename variables
y1 to y01,
y2 to y02, etc.
when there are variables such as
y14.
Currently subsetcol() and Select() are identical.
One of these functions might be withdrawn in the future.
T. W. Yee.
Pneumo <- pneumo
colnames(Pneumo) <- c("y1", "y2", "y3", "x2") # The "y" variables are response
Pneumo$x1 <- 1; Pneumo$x3 <- 3; Pneumo$x <- 0; Pneumo$x4 <- 4 # Add these
Select(data = Pneumo) # Same as with(Pneumo, cbind(y1, y2, y3))
Select(Pneumo, "x")
Select(Pneumo, "x", sort = FALSE, as.char = TRUE)
Select(Pneumo, "x", exclude = "x1")
Select(Pneumo, "x", exclude = "x1", as.char = TRUE)
Select(Pneumo, c("x", "y"))
Select(Pneumo, "z") # Now returns a NULL
Select(Pneumo, " ") # Now returns a NULL
Select(Pneumo, prefix = TRUE, as.formula = TRUE)
Select(Pneumo, "x", exclude = c("x3", "x1"), as.formula = TRUE,
lhs = "cbind(y1, y2, y3)", rhs = "0")
Select(Pneumo, "x", exclude = "x1", as.formula = TRUE, as.char = TRUE,
lhs = "cbind(y1, y2, y3)", rhs = "0")
# Now a 'real' example:
Huggins89table1 <- transform(Huggins89table1, x3.tij = t01)
tab1 <- subset(Huggins89table1,
rowSums(Select(Huggins89table1, "y")) > 0)
# Same as
# subset(Huggins89table1, y1 + y2 + y3 + y4 + y5 + y6 + y7 + y8 + y9 + y10 > 0)
# Long way to do it:
fit.th <-
vglm(cbind(y01, y02, y03, y04, y05, y06, y07, y08, y09, y10) ~ x2 + x3.tij,
xij = list(x3.tij ~ t01 + t02 + t03 + t04 + t05 + t06 + t07 + t08 +
t09 + t10 - 1),
posbernoulli.t(parallel.t = TRUE ~ x2 + x3.tij),
data = tab1, trace = TRUE,
form2 = ~ x2 + x3.tij + t01 + t02 + t03 + t04 + t05 + t06 + t07 + t08 +
t09 + t10)
# Short way to do it:
Fit.th <- vglm(Select(tab1, "y") ~ x2 + x3.tij,
xij = list(Select(tab1, "t", as.formula = TRUE,
sort = FALSE, lhs = "x3.tij", rhs = "0")),
posbernoulli.t(parallel.t = TRUE ~ x2 + x3.tij),
data = tab1, trace = TRUE,
form2 = Select(tab1, prefix = TRUE, as.formula = TRUE))Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.