Imaging real systems in models is one of the eldest tasks of sciences, working with models is common practice for hundreds of years.
In real systems there are often a large number of external influences on the systems conduct. Disturbances – system immanent or external ones – can arise and influence the system conduct negatively. In classical procedures of model building such disturbances are subsumed under the label “noising” and are then often ignored.
Thus classical models only present the ideal case, the absence of disturbances. Hence reality will be mapped insufficiently
Stochastic Modelling is a method to image complex systems in a manageable number of mathematic models that considers the special qualities of the real system. Characteristics as disturbances or noising of variables in a real system will be imaged, too. Hence the resulted models better reflect the reality.
Such models can be used especially for predictions in combining them with Data Mining models, for instance. Data Mining will try to identify hidden information in large datasets. Therefore recurrent patterns will be searched for in datasets. The results of Data Mining are better when using stochastic models regarding the noising of data.
Stochastic Modeling is not a new discipline but is implemented in several scientific researches for quite some time. By means of stochastic approaches queues will be modeled or uncertainty in production planning will be addressed for instance.
Classical Data Mining works often on the basis of statistic methods. Such methods per se are able to handle with random data but this property will be then repealed by unsuitable data models.
Deficits of existing systems
Classical stochastic methods require transformation of the real systems into one of the standard models (f.i. queuing theory). The advantage of stochastic models will be then purchased by the problem of unsuitable basic models, especially if not only simulation is in focus but actual the optimization of certain decision-making factors of the system.
Suitable for methods of Data Mining are well structured data. That means that a the majority of available data must be renounced or that those data must be recycled in a lengthy and resource intensive process.
Therefore, classical approaches will fail because
- model building only represents the system indirectly,
- the majority of the existing quantities will not be used and
- will be reduced by new error sources.
Method of resolution
The weakness of reduction of real systems on classical standard models shall be countered by extension of the modelling approaches by stochastic components.
Certainly many optimization methods of graph theory cannot be generalized directly to graphs with stochastic components, but using nature analogue search methods efficiently leads to useful solutions.
Petri nets, which are regarded as standard in process organisations, can be augmented with stochastic components and can be used as base for solid projections.
Data Processing for Data Mining
Because existing methods of Data Mining build on structured data the second focus is on automatic structure of unstructured data, however their stochastic characteristics must not be destroyed by simple classification.
Suitable methods are given for instance in the area of semantic analysis
· The Probabilistic Latent Semantic Analysis (PLSA) uses information about the correlations between the words of a text to evaluate automatically connections and to define the probability of belonging documents to so-called latent aspects.
· Concept Node Patterns (CNP) search for connections between terms in texts based on given patterns. Cause-effect relationsships in the text could be recognized and made usable.
Predictions by stochastic data
The above-named refinements in modelling and data processing help variously to improve the predictions of the system behavior.
In this research field the Medical Data Mining – the automatic evaluation of socio demographic data and clinical studies to predict prospective patient numbers differentiated on medical indications, the so-called patient populations – serves as prototypical application area.
Comprehensive prediction problems arise in medical context whose base are stochastic partly unstructured data. One central question is the optimal design of supply systems whose base regional patient populations are. Especially in planning medical supply centers (MVZ's) the differentiated analysis of single patient populations – the prospective patient numbers - plays an influential role.
Stochastic modelling allows the direct consideration of the intrinsic stochastic of prospective patient flows and sickness rate and is suitable therefore for the micro-geographical analysis of patient populations. In particular, substantial statements with regard to the quality of estimations of the potential are possible.
Central advantages of the outlined approach above are high precision, differentiated analysis options following indications and the integration of comprehensive stochastic informations. So a better planning of medical supply structures is possible whereby the base for planning are besides the economic need structures also the prospective in particular.
Contact person: Prof. Dr. Oliver Wendt, phone: +49 631 2053771