Cluster analysis Documentation

General 

This program is under development. Check your results carefully.

The Python script 'clusterana.py' is used to extract more detailed information from the data produced by the script clusterstat.py.  The program requires a completed or partly completed run of clusterstat.py. The program reads the restart file generated by clusterstat to extract the system information. In particular, the script allows to check the accuracy of the cluster expansion under consideration or negligence of selected clusters.

Usage

To start calculating the clusters type:
clusterana  [-opt]  input_file
Possible options can be obtained with
clusterana -h
-------------------------------------------------------------------------------
Purpose:     Statistical analysis for the cluster expansion
             of a potential surface.
Usage: clusterana [-deb -ver -h -? ] inputfile.inp
-deb Run in debug mode (more detailed logging). -D <name> The name directory of a clusterstat run. -ver Print version info. -h -? Print this help text.
Further analysis of data produced by statistical analysis program 'clusterstat'. This requires an (at least partly) completed run of 'clusterstat'. The system information is obtained from the restart file of 'clusterstat'. -------------------------------------------------------------------------------

Input Documentation

The input needed by clusterana in general follows the rules of the usual MCTDH input. The input is organized in three sections:
Section Description
RUN  What is to be done.
EXPANSION Specification of the expansion terms. (optional)
COORDINATES modification of the coordinate ranges (optional).

Example inputs can be found in $MCTDH_DIR/inputs/clusters/statistics.

RUN-SECTION

Required keywords

Keyword
Description

name = S The 'name' directory of the previous clusterstat run and directory to which output files are written.
Optional keywords

Keyword
Description

steps = I (,I1) The number of Monte-Carlo steps to be processed. If one value is given, the first I steps are read, if two numbers are given, steps I to I1 are read. Default: all steps will be processed.
logminmax = I (,I1) Enable logging of the set of samples with the smallest and lowest values concerning the true potential, the approximated potential and the difference between them.
The first number denotes the maximum number of samples to be logged, the second, if given, the minimum number of samples to be skipped before the next sample can be logged. This can be used to avoid logging of all samples when the random walker is in a certain region of the PES that produces large errors etc.

count-samples = R,R1 (,S) Count the number of samples for which the contribution of a cluster is NOT in the region between R and R1. S is a unit, i.e., one of "cm-1", "eV", "meV", "au", "kcal/mol" or "kJ/mol".
densities = S (,S1)  Calculate 1D and/or 2D densities (histograms) for the coordinate vectors of the random walker. S and S1 can be one of "1D" and "2D". If "1D" is set, all one-dimensional densities are calculated, if "2D" is set, all two-dimensional densities are calculated. Requires keyword "bins".
bins = I The number of bins used in the calculation of the densities.
outunit = S Unit in which the output is written. Default: same as used in the clusterstat run.
overwrite Allow overwriting of existing files in the 'name' directory. Similar to option -w in the command line.
mean Write order-dependent and reference-dependent cumulative mean error of the cluster expansion to file.
rms Write order-dependent and reference-dependent cumulative root-mean-square error of the cluster expansion to file.
auto Calculate the auto-correlation of the trajectory of the random walker to file (to check the quality of the random walk).
cluster-statistics Write detailed statistics for each cluster to file.


EXPANSION-SECTION

The EXPANSION-SECTION if given, is used to define which of the clusters calculated in the clusterstat run are to be used to approximate the PES. Only the clusters specified here will be included in the calculation of mean and RMS values as specified in the RUN-SECTION.

For a detailed description on defining the cluster-expansion see the documentation of clusterstat.


COORDINATE-SECTION

The COORDINATES-SECTION can be used to alter the range of the coordinated of the random walker. All samples for which the walker has left the specified range are ignored (except for calculation of the auto-correlation function). The range can be specified in two ways. On possibility is the definition of a range as in the documentation of clusterstat. A second possibility is providing a function which tests if the walker is an a certain region. In this case the COORDINATES-SECTION may only contain two keywords:

Keyword
Description
user-source = S The path (relative or absolute) to the module containing the routine which tests the coordinate vector. See using the 'user-source' keyword.
routine = S Name of the routine provided by the module given with 'user-source' which is used to test the coordinate vector.

The routine specified above receives a NumPy array (type float) containing a coordinate vector and must return True or False, indicating the validity of the sample belonging to the coordinate vector. If True is returned the sample is assumed to be valid and included in the statistics, otherwise the sample is skipped.

Output Documentation

The output generated by clusterana are a number of ASCII files containing different types of data. The following files are generated: