Log_sum_exp(matrix) by column or by row (original) (raw)
Is it possible to compute log_sum_exp(matrix) and get a column (so just apply the operation on each element in every row) or get a row (so apply the operation on each element in every column)? I am trying to speed up the forward algorithm in a hidden Markov model longitudinal problem. But right now it loops thru each person, each period and all the possible states. It is quite slow. I want to see if I can increase the speed if I can do this log_sum_exp operation for every person together in a vectorized manner.
This is the code that I am running. The forward-algorithm part under the model{}
is looping every person (I), every period (T), and every state (S). I want to vectorize that portion of the code. The sticking point is the use of log_sum_exp
. Thank you.
data {
int<lower=1> S; // num states
int<lower=1> I; // num customers
int<lower=1> T; // num calibration periods
int<lower=0> Y[I,T]; // observed behavior
int<lower=1> K; // Binomial number trials
}
parameters {
matrix[S*(S-1), I] z; // indep normals for transition prob.
cholesky_factor_corr[S*(S-1)] L_Omega; // cholesky corr. for transition prob.
row_vector[S*(S-1)] mu_theta; // mean of unconstrained transition prob.
vector<lower=0,upper=pi()/2>[S*(S-1)] tau_unif; // scaled variance of transition prob.
simplex[S+1] mu_unif_phi; // transformed state dependent prob.
simplex[S] ppi; // initial prob.
}
transformed parameters{
matrix[I,S*(S-1)] theta; // individual transition parameters
vector<lower=0>[S*(S-1)] tau; // variance of transition prob.
matrix[S,S] log_Q[I]; // log transition prob.
for (s in 1:S*(S-1))
tau[s] = 2.5 * tan(tau_unif[s]);
theta = rep_matrix(mu_theta,I) + (diag_pre_multiply(tau,L_Omega) * z)'; // heterogenity for transition prob.
for (i in 1:I){
for (k in 1:S){
row_vector[S-1] ttheta;
ttheta = theta[i,((S-1)*(k-1)+1):((S-1)*k)];
log_Q[i,k,]=to_row_vector(log_softmax(to_vector(append_col(ttheta,0))));
// add the zero at the end, so vector is size S,
// and compute softmax (multinomial logistic link)
}
}
}
model {
to_vector(z) ~ normal(0, 1);
L_Omega ~ lkj_corr_cholesky(2);
mu_theta ~ normal(0, 2);
mu_unif_phi~dirichlet(rep_vector(1.0,S+1));
{
vector[S] mu_phi = head(cumulative_sum(mu_unif_phi),S);
// Forward algorithm
for (i in 1:I){
real acc[S];
real gamma[T,S];
for (k in 1:S)
gamma[1,k] = log(ppi[k])+ binomial_lpmf(Y[i,1]|K,mu_phi[k]);
for (t in 2:T) {
for (k in 1:S) {
for (j in 1:S){
acc[j] = gamma[t-1,j] + log_Q[i,j,k] + binomial_lpmf(Y[i,t]|K,mu_phi[k]) ;
}
gamma[t,k] = log_sum_exp(acc);
}
}
target +=log_sum_exp(gamma[T]);
}
}
}
I’m not exactly sure if you can do a whole matrix, but I think there’s a way to speed up at least part of your forward algorithm implementation. Here’s an example in Stan of an HMM with gamma state-dependent distributions:
data {
int<lower=0> T; // length of the time series
vector[T] steps; // step lengths
int<lower=1> N; // number of states
}
parameters {
positive_ordered[N] mu; // mean of gamma - ordered
vector<lower=0>[N] sigma; // SD of gamma
//tpm
simplex[N] gamma[N];
}
transformed parameters {
vector<lower=0>[N] shape;
vector<lower=0>[N] rate;
// transform mean and SD to shape and rate
for(n in 1:N)
shape[n] = mu[n]*mu[n]/(sigma[n]*sigma[n]);
for(n in 1:N)
rate[n] = mu[n]/(sigma[n]*sigma[n]);
}
model {
vector[N] logp;
vector[N] logptemp;
matrix[N,N] log_gamma_tr;
// priors
mu[1] ~ normal(1, 1);
mu[2] ~ normal(5, 1);
sigma ~ student_t(3, 0, 1);
// transpose
for(i in 1:N)
for(j in 1:N)
log_gamma_tr[j,i] = log(gamma[i,j]);
for(i in 1:N)
logp = rep_vector(-log(N), N) + gamma_lpdf(steps[t] | shape[n], rate[n]);
// likelihood computation
for (t in 2:T) {
for (n in 1:N) {
logptemp[n] = log_sum_exp(to_vector(log_gamma_tr[n]) + logp) +
gamma_lpdf(steps[t] | shape[n], rate[n]);
}
logp = logptemp;
}
target += log_sum_exp(logp);
}
In the likelihood computation, you can use the log_sum_exp to perform operations by row. Hope that speeds it up a bit.
paul June 25, 2020, 12:07pm 3
You can also use the ’ operator to transpose the matrix. However log_sum_exp will work with vectors (columns) and row_vectors. So you just give it whichever you need to sum.
I am trying to apply log_sum_exp to a matrix but my goal is to perform the actual computation by row or by column of that matrix, and then get the corresponding vector or row_vector as the result.
paul June 25, 2020, 5:25pm 5
log_sum_exp returns a real, so it will sum an entire matrix. So you need to break it down with a for-loop I think.
f it was me, I would write as a function. Something like…
vector matrix_logsumexp(matrix Mat) {
int NR = rows(Mat);
vector[NR] res;
for (i in 1:NR) {
res[i] = log_sum_exp(Mat[i,]);
}
return res;
} //matrix_logsumexp
Then you can expose the function and test it in R to make sure it is doing what you think it should (useful for us non-experts with complex functions). Is this what you were thinking of?
Hello,
did this ever get implemented? I’ve been trying to vectorise my foraging model and I have hit the same stumbling block.
Unfortunately, no. I just a loop over each row or column for the matrix.
There is a R function that can take a matrix as an input. It is
TargetScore::logsumexp(matrix, 2)
TargetScore
is part of the Bioconductor libraries.
ah. that’s a shame. Thanks for replying!
I spent yesterday trying to vectorise my slow model, and I think I managed it all except for this final stage. And sadly, the resulting model was slower rather than faster!
I guess I can live with slow.
We don’t have a function that applies functions row-wise to matrices. But even if we did, it wouldn’t be faster than the function @paul wrote above. That’s because there’s no shared computations among the rows.