Log_sum_exp(matrix) by column or by row (original) (raw)

Is it possible to compute log_sum_exp(matrix) and get a column (so just apply the operation on each element in every row) or get a row (so apply the operation on each element in every column)? I am trying to speed up the forward algorithm in a hidden Markov model longitudinal problem. But right now it loops thru each person, each period and all the possible states. It is quite slow. I want to see if I can increase the speed if I can do this log_sum_exp operation for every person together in a vectorized manner.

This is the code that I am running. The forward-algorithm part under the model{} is looping every person (I), every period (T), and every state (S). I want to vectorize that portion of the code. The sticking point is the use of log_sum_exp. Thank you.

data {
  int<lower=1> S; // num states
  int<lower=1> I; // num customers
  int<lower=1> T; // num calibration periods
  int<lower=0> Y[I,T]; // observed behavior
  int<lower=1> K; // Binomial number trials
}

parameters {
  matrix[S*(S-1), I] z; // indep normals for transition prob.
  cholesky_factor_corr[S*(S-1)] L_Omega; // cholesky corr. for transition prob.
  row_vector[S*(S-1)] mu_theta; // mean of unconstrained transition prob.
  vector<lower=0,upper=pi()/2>[S*(S-1)] tau_unif; // scaled variance of transition prob.
  simplex[S+1] mu_unif_phi; // transformed state dependent prob.
  simplex[S] ppi; // initial prob.
}

transformed parameters{
  matrix[I,S*(S-1)] theta; // individual transition parameters
  vector<lower=0>[S*(S-1)] tau; // variance of transition prob.
  matrix[S,S] log_Q[I]; // log transition prob.

  for (s in 1:S*(S-1))
    tau[s] = 2.5 * tan(tau_unif[s]);
  theta = rep_matrix(mu_theta,I) + (diag_pre_multiply(tau,L_Omega) * z)';  // heterogenity for transition prob.
  for (i in 1:I){
    for (k in 1:S){
      row_vector[S-1] ttheta;
      ttheta = theta[i,((S-1)*(k-1)+1):((S-1)*k)];
      log_Q[i,k,]=to_row_vector(log_softmax(to_vector(append_col(ttheta,0))));
      // add the zero at the end, so vector is size S,
      // and compute softmax (multinomial logistic link)
    }
  }
}

model {
  to_vector(z) ~ normal(0, 1);
  L_Omega ~ lkj_corr_cholesky(2);
  mu_theta ~ normal(0, 2);
  mu_unif_phi~dirichlet(rep_vector(1.0,S+1));
  {
    vector[S] mu_phi = head(cumulative_sum(mu_unif_phi),S);
    // Forward algorithm
    for (i in 1:I){
      real acc[S];
      real gamma[T,S];
      for (k in 1:S)
        gamma[1,k] = log(ppi[k])+ binomial_lpmf(Y[i,1]|K,mu_phi[k]);
      for (t in 2:T) {
        for (k in 1:S) {
          for (j in 1:S){
            acc[j] = gamma[t-1,j] + log_Q[i,j,k] + binomial_lpmf(Y[i,t]|K,mu_phi[k]) ;
          }
          gamma[t,k] = log_sum_exp(acc);
        }
      }
      target +=log_sum_exp(gamma[T]);
    }
  }
}

I’m not exactly sure if you can do a whole matrix, but I think there’s a way to speed up at least part of your forward algorithm implementation. Here’s an example in Stan of an HMM with gamma state-dependent distributions:

data {
   int<lower=0> T; // length of the time series
   vector[T] steps; // step lengths
   int<lower=1> N; // number of states
}

parameters {
   positive_ordered[N] mu; // mean of gamma - ordered
   vector<lower=0>[N] sigma; // SD of gamma
   //tpm
   simplex[N] gamma[N];
}  

transformed parameters {
   vector<lower=0>[N] shape;
   vector<lower=0>[N] rate;

   // transform mean and SD to shape and rate
   for(n in 1:N)
       shape[n] = mu[n]*mu[n]/(sigma[n]*sigma[n]);
   
   for(n in 1:N)
       rate[n] = mu[n]/(sigma[n]*sigma[n]);
}

model {
   vector[N] logp;
   vector[N] logptemp;
   matrix[N,N] log_gamma_tr;
   
   // priors
   mu[1] ~ normal(1, 1);
   mu[2] ~ normal(5, 1);
   sigma ~ student_t(3, 0, 1);

     
   // transpose
   for(i in 1:N)
       for(j in 1:N)
           log_gamma_tr[j,i] = log(gamma[i,j]);

   for(i in 1:N)
     logp = rep_vector(-log(N), N) + gamma_lpdf(steps[t] | shape[n], rate[n]);
    
   // likelihood computation
   for (t in 2:T) {
     for (n in 1:N) {
         logptemp[n] = log_sum_exp(to_vector(log_gamma_tr[n]) + logp) + 
                                   gamma_lpdf(steps[t] | shape[n], rate[n]);   
    }
    logp = logptemp;
   }
   
 target += log_sum_exp(logp);
}

In the likelihood computation, you can use the log_sum_exp to perform operations by row. Hope that speeds it up a bit.

paul June 25, 2020, 12:07pm 3

You can also use the ’ operator to transpose the matrix. However log_sum_exp will work with vectors (columns) and row_vectors. So you just give it whichever you need to sum.

I am trying to apply log_sum_exp to a matrix but my goal is to perform the actual computation by row or by column of that matrix, and then get the corresponding vector or row_vector as the result.

paul June 25, 2020, 5:25pm 5

log_sum_exp returns a real, so it will sum an entire matrix. So you need to break it down with a for-loop I think.
f it was me, I would write as a function. Something like…

  vector matrix_logsumexp(matrix Mat) {
    int NR = rows(Mat);
    vector[NR] res;
    for (i in 1:NR) {
      res[i] = log_sum_exp(Mat[i,]);      
    }
    return res;
  }  //matrix_logsumexp

Then you can expose the function and test it in R to make sure it is doing what you think it should (useful for us non-experts with complex functions). Is this what you were thinking of?

Hello,

did this ever get implemented? I’ve been trying to vectorise my foraging model and I have hit the same stumbling block.

Unfortunately, no. I just a loop over each row or column for the matrix.

There is a R function that can take a matrix as an input. It is

TargetScore::logsumexp(matrix, 2)

TargetScore is part of the Bioconductor libraries.

ah. that’s a shame. Thanks for replying!

I spent yesterday trying to vectorise my slow model, and I think I managed it all except for this final stage. And sadly, the resulting model was slower rather than faster!

I guess I can live with slow.

We don’t have a function that applies functions row-wise to matrices. But even if we did, it wouldn’t be faster than the function @paul wrote above. That’s because there’s no shared computations among the rows.