add function dotproduct and unit tests by tnaake · Pull Request #17 · rformassspectrometry/MsCoreUtils (original) (raw)
Dear @sgibb
thanks for pointing this out.
- Concerning 1. Given the input, this is expected (but not what we intend). As we have the exponent
n=2
the differences in m/z will be intensified, especially200^n
/201^n
and100^n
and101^n
will change the weightsws1
andws2
tremendously (with thedotproduct
function we will calculate the similarity betweenws1
andws2
). If the function gets "aligned" m/z values, the similarity is calculated as follows and as expected (this means, the function should only get the same/aligned m/z values whenn != 0
):
x1 <- data.frame(mz = c(101, NA, 201), intensity = c(1, 0, 1))
y1 <- data.frame(mz = c(101, 102, 201), intensity = c(1, 1, 1))
dotproduct(x1, y1, m=0.3, n=1) ## 0.8294594
dotproduct(x1, y1, m=0.3, n=2) ## 0.9413171
x2 <- data.frame(mz = c(1, NA, 201), intensity = c(1, 0, 1))
y2 <- data.frame(mz = c(1, 102, 201), intensity = c(1, 1, 1))
dotproduct(x2, y2, m=0.3, n=1) ## 0.795221
dotproduct(x2, y2, m=0.3, n=2) ## 0.9378086
For two spectra with "lower" m/z value, we get a lower similarity score. If we set a higher n
we will get a higher similarity score.
- Concerning 2. There was a small error in my text before:
n
has to be0
(all m/z values will be 1).
x1 <- data.frame(mz = c(101, NA, 201), intensity = c(1, 0, 1))
y1 <- data.frame(mz = c(101, 102, 201), intensity = c(1, 1, 1))
x2 <- data.frame(mz = c(1, NA, 201), intensity = c(1, 0, 1))
y2 <- data.frame(mz = c(1, 102, 201), intensity = c(1, 1, 1))
dotproduct(x1, y1, m=0.3, n=0) ## 0.6666667
dotproduct(x2, y2, m=0.3, n=0) ## 0.6666667
This means 2 of 3 shared peaks. I used the formula implemented in this dotproduct
function (containing weights for m/z and intensities) before and it was also used in some publications. What we could implement is, that n=0
and m=1
(default), i.e. m/z values will not be used, but only intensity values as they are.
x1 <- data.frame(mz = c(101, NA, 201), intensity = c(1, 0, 1))
y1 <- data.frame(mz = c(101, 102, 201), intensity = c(1, 1, 1))
x2 <- data.frame(mz = c(101, NA, 201), intensity = c(3, 0, 5))
y2 <- data.frame(mz = c(101, 102, 201), intensity = c(3,4, 5))
dotproduct(x1, y1, m=1, n=0) ## 0.6666667
dotproduct(x2, y2, m=1, n=0) ## 0.68