Randomistic Data Elements (original) (raw)

Rounding Error Propagation: Bias and Uncertainty

ForsChem Research Reports, 2024

Any rounding operation of a value causes loss of information, and thus, introduces error. Two types of error are involved: Systematic error (bias) and random error (uncertainty). Uncertainty is always introduced for any type of rounding employed. Bias is directly introduced only when lower ("floor") and upper ("ceiling") types of rounding are used. Central rounding is in principle unbiased, but bias may emerge in the case of nonlinear operations. The purpose of this report is discussing the propagation of both types of rounding error when rounded values are used in common mathematical operations. The basic mathematical operations considered are addition/subtraction, product, and natural powers. These operations can be used to evaluate the propagation of error in power series, which then are used to describe error propagation for any arbitrary nonlinear function. Even when power series approximations can be obtained for any arbitrary reference value, it is highly recommended using the corresponding rounded value as reference. The error propagation expressions obtained are implemented in R language to facilitate the calculations. A couple of examples are included to illustrate the evaluation of error propagation. These examples also show that truncating the power series after the linear term already provides a good estimation of error propagation (using the rounded value as reference point for the power series expansion).

Constrained Randomistic Variables

ForsChem Research Reports, 2024

Randomistic variables integrate the realms of deterministic and random variables. Randomistic variables are represented by probability distribution functions, and in the case of continuous variables, also by probability density functions (just like random variables). Any randomistic variable can be subject to external constraints on its possible values. Thus, the resulting probability distribution of the constrained variable may be different from the probability distribution of the original variable. In this report, general expressions for analytically determining the probability distribution functions (or probability density functions) of constrained randomistic variables are presented. These expressions are extended to constraints involving multiple, independent randomistic variables. Several illustrative examples, with different degrees of difficulty, are included. These examples show that constrained randomistic variables represent the solution to a wide variety of problems, including algebraic systems of equations, inequalities, magic squares, etc. Further improvements in analytical and numerical methods for finding constrained probability functions would be highly desirable.

Local Average Probabilities of Randomistic Variables

ForsChem Research Reports, 2024

Local indistinguishability of the values of a randomistic variable (due to resolution limitations, measurement uncertainty or any other cause), have a discretization effect on the probability distribution function of the variable. In this report, analytical expressions for determining the probability distributions after locally averaging variable values are presented. As a particular case, local conditional averaging is observed when the discretization of a variable affects the probability distribution function of a dependent variable. These expressions are then applied to some representative examples in order to illustrate the procedure. In the case of continuous variables, after local averaging a variable, the original probability density function transforms into a series of step-like, local uniform functions, resembling a histogram. As the size of the local region considered decreases, the resulting probability distribution function coincides with the original, exact distribution function. On the other hand, as the local region size increases, the distribution function resembles a histogram with fewer bins, until a single uniform distribution is finally obtained.

Essay: Common Pitfalls in Experimental Design

ForsChem Research Reports, 2024

Experimentation is the core of scientific research. Performing an experiment can be considered equivalent to asking a question to Nature and waiting for an answer. Understanding a natural phenomenon usually requires doing many experiments until a satisfactory model of such phenomenon is obtained. There are infinite possible ways to plan a set of experiments for researching a certain phenomenon, and some are more efficient than others. Experimental Design, also known as Design of Experiments (DoE), provides a systematic approach to obtain efficient experimental arrangements for different research problems. Experimental Design emerged almost a Century ago based on statistical analysis. Some decades after the development of DoE methods, they became widely used in all fields of Science and Engineering. Unfortunately, these valuable tools have been presently employed without a proper knowledge resulting in potentially erroneous conclusions. The purpose of this essay is discussing several mistakes that may occur due to the incorrect use of DoE methods.

The Realm of Randomistic Variables

ForsChem Research Reports, 2018

Randomistic variables are variables whose behavior can be either random or deterministic. They are, therefore, a generalization that allows integrating determinism and randomness in a single idea. In fact, randomness emerges as soon as information about a deterministic variable is missing. On the other hand, determinism emerges when the variation of a random variable approaches zero. In this report, some definitions of properties and operators on randomistic variables are presented. In particular, the generalized n-order moment operator of a randomistic distribution, as well as the n-order moment variation of the distribution, is discussed. These operators are generalized, because the order n can be any real number instead of a nonnegative integer. This definition gives rise to complex values of the moments in certain cases, i.e. when non-integer order moments are considered for randomistic variables that may take negative values. Furthermore, randomistic variables themselves may take imaginary or complex values, giving rise to particular behaviors such as negative variances, complex variances and zero variance of non-deterministic variables. The definition of a deterministic variable is revised, concluding that for a deterministic variable, all n-order moment variations are equal to zero, for any real value of n. In addition, some comments about the differences between discrete and continuous randomistic variables are presented, in order to illustrate that there is only a subtle difference between both types of variables, and thus, they can be considered within the same mathematical framework.

Understanding Work, Heat, and the First Law of Thermodynamics 2: Examples

ForsChem Research Reports, 2024

The First Law of Thermodynamics represents the principle of energy conservation applied to the interaction between different macroscopic systems. The traditional mathematical description of the First Law (e.g. dU=TdS-PdV) is rather simplistic and lack universal validity, as it is only valid when several implicit assumptions are met. For example, it only considers mechanical work done associated with a change in volume of a system, but completely neglects other types of work. On the other hand, it employs the concept of entropy which is not only ambiguous but also implies only heat associated with a temperature difference, neglecting other types of heat transfer that may take place at mesoscopic and/or microscopic levels. In addition, it does not consider mass transfer effects. In the previous report of this series, a more general representation of the First Law is obtained considering different conditions and different types of interactions between the systems. In this report, the expression previously obtained is applied to different representative examples, involving macroscopic systems with no volume change, gas systems with volume change, and even a case where mass transfer between the systems takes place.

ForsChem Research 100th Report: A 7-Year Retrospective

ForsChem Research Reports, 2022

The current report celebrates the 100 th report published by ForsChem Research, as well as the 7 th year since the beginning of the ForsChem Research Project. In this publication, a brief review of the evolution of ForsChem Research is presented, highlighting the most important contributions published in ForsChem Research Reports. A graphical bibliometric analysis is also included to illustrate the evolution of the works published during its first 7 years, and their impact as measured by ResearchGate (RG) stats. In addition, a selection of the author's top 10 favorite reports is presented. Finally, a brief outline is exposed about the plans for ForsChem Research in the future.

Optimal Model Structure Identification. 1. Multiple Linear Regression

ForsChem Research Reports, 2023

This is the first part of a series of reports discussing different strategies for optimizing the structure of mathematical models fitted from experimental data. In this report, the concept of randomistic models is introduced along with the general formulation of the multi-objective optimization problem of model structure identification. Different approaches can be used to solve this problem, depending on the set of possible models considered. In the case of mathematical models with linear parameters, a stepwise multiple linear regression procedure can be used. In particular, a stepwise strategy in both directions (backward elimination and forward selection) is suggested based on the selection of relevant terms for the model prioritized on their absolute linear correlation coefficients with respect to the response variable, followed by the identification of statistically significant or explanatory terms based on optimal significant levels. Two additional constraints can be included, considering a lower limit in the normality value of the residuals (normality assumption check), as well as a lower limit in standard residual error (avoiding model overfitting). This stepwise strategy, which successfully overcomes several limitations of conventional stepwise regression, is implemented as a function (steplm) in R language, and different examples are presented to illustrate its use.

Normal Distribution and Transcendental Functions: Mathematical and Historical Relations

ForsChem Research Reports, 2022

In this report some mathematical relations between the normal probability functions (probability density and cumulative probability) and other transcendental functions (including the exponential, error, trigonometric, gamma, beta, and hypergeometric functions). The mathematical expressions are accompanied by historical remarks and anecdotes, showing that Mathematics and History are a rich source of relations, some of them unexpected. The normal bell, normal single-wave, and normal exponential transcendental functions are also introduced.

Discretization of Probability Distributions: Random, Deterministic and Randomistic Sampling

ForsChem Research Reports, 2019

Sampling procedures are commonly used to extract a finite number of elements from a particular probability distribution. This discretization of the probability distribution is usually performed using pseudo-random number generators. This type of discretization, known as random sampling, requires suitable functions for transforming standard uniform random numbers into random numbers following any arbitrary probability distribution. While random sampling resembles the natural behavior of experimentation, individual samples do not necessarily preserve all the properties of the original probability distribution. Those properties include the cumulative probability and the moments of the distribution. The match between the cumulative probability observed in a sample and that of the original distribution can be determined using the random goodness-of-fit criterion. Random samples seldom achieve a 100% fit to the original distribution. Deterministic sampling methods, on the other hand, always present a 100% random goodness-of-fit, but their values are always the same, depending on the size of the sample. One particular case of deterministic sampling is optimal sampling, which ensure goodness-of-fit but also allows preserving the moments of the original distribution. Finally, randomistic sampling combines the fitness of deterministic sampling with the changing behavior of random samples, resulting in an interesting alternative for representing random variables, particularly in applications involving Monte Carlo methods, where the sample is expected to represent the properties of the full distribution.