Is there an R package to calculate 1st order transition matrix from a frequency table?

Solution for Is there an R package to calculate 1st order transition matrix from a frequency table?
is Given Below:

I have a frequency table aggregated from 800 millions of records and am wondering if I can use a package to calculate 1st order transition matrix from the frequency table, which is not symmetric because some state just never happened again. A sample of the frequency table is:

library(data.table)
model.data <- data.table(state1 = c(3, 1, 2, 3), state2 = c(1, 2, 1, 2), Freq = c(1,2,3,4))

model.data looks like this:

state1state2n
311
122
213
324

Using the package pollster, I can compute the proportion table:

library(pollster)
crosstab(model.data, state1, state2, Freq)
state112n
101002
210003
320805

However, the symmetric transition matrix I am looking for is:

state1123n
1010002
2100003
3208005

That is, I still want to include the state 3 even though no one transitioned to it, and the code should be able to automatically find out 3 needs to be appended with a column of 0s.

I am not sure if the markovchain package with the markovchainFit function is going to handle my 800 million rows of data that I need to transform into a list of millions of sequences, due to memory constraints and slow computing speed.

Does anyone know?

It appeared you might have known about the stats::xtabs function since, since the result you are asking us to work with appears to be the result of the base::as.data.frame.table function which converts the “wide” result from a table call into a “long” data.frame representation of the same data. (But maybe not since you posted the pollster code that adds an additional confusing column.) Here we will be reversing that procedure so we can recover a matrix (which R table objects inherit from).

Do notice that I’m using your data object, but not using pkg:pollster code, since your tables didn’t appear to be based on that data.table object.

How to get the zero column, … just put in a single zero data element in the state2=3 “column” position. You only need to add one data point in state2 for the entire column, but it obviously needs to be from some state1 value. It can be from any of the state 1 values:

model.data <- data.table(state1 = c(3, 1, 2, 3,  3), 
                         state2 = c(1, 2, 1, 2,  3), 
                         Freq = c(1,2,3,4, 0))
xtabs(Freq~state1+state2, model.data)
#------------
      state2
state1 1 2 3
     1 0 2 0
     2 3 0 0
     3 1 4 0

Note added: Just to show that this works in the “pollster” tidyverse environment …

> library(pollster)
> crosstab(model.data, state1, state2, Freq)
# A tibble: 3 x 5
  state1   `1`   `2`   `3`     n
   <dbl> <dbl> <dbl> <dbl> <dbl>
1      1     0   100     0     2
2      2   100     0     0     3
3      3    20    80     0     5

And further note that the “n” column would need to be dropped if you wanted to make a transition matrix. (I can’t quite figure out what it represents.)

Regarding how to make a transition matrix (if that’s what is needed, then divide the matrix by the rowSums result, since transition matrices need to have each row sum to unity)

 mat <- xtabs(Freq~state1+state2, model.data)

 trans_mat <- mat/rowSums(mat)
 trans_mat
#-----
       state2
state1   1   2   3
     1 0.0 1.0 0.0
     2 1.0 0.0 0.0
     3 0.2 0.8 0.0

Now you can calculate the state at any discrete interval using matrix multiplication: See ?'%*%' or matrix exponentiation ?expm::expm

Here’s further coding a plots related to matrix operations on transition matrices to generate Markov simulations:
Simple Markov Chain in R (visualization)

There are further statistical operations on Markov sequences delivered in the markovchain package, but I did not see that it had anything for actually construction a transition matrix from data. I could be wrong about that since I only read the first 5 packages of the vignette. (They seemed to assume everyone would know how to do that, although when I wrote the code for the answer I linked above, I needed to go back to my books for a refresher.)

An option with igraph

model.data %>%
  setorder(state1) %>%
  graph_from_data_frame() %>%
  as_adjacency_matrix(attr = "Freq", sparse = FALSE) %>%
  proportions(1)  # 1 sets rows as the margin, similar to `prop.table`

gives

    1   2 3
1 0.0 1.0 0
2 1.0 0.0 0
3 0.2 0.8 0

Or with base R

> proportions(xtabs(Freq ~ ., model.data), 1)
      state2
state1   1   2
     1 0.0 1.0
     2 1.0 0.0
     3 0.2 0.8