The Python script 'clusterstat.py' is used perform a random walk
over
a potential energy surface and generate raw data concerning a
particular cluster expansion defined in the input file.
Within the clusterexpansion a highdimensional potential V is
expanded into nparticle interaction terms, called the clusters.
(Elsewhere, this expansion is also refered to as 'HDMR' or
'nmode representation'.) The
raw data can be
further analyzed with the Python script 'clusterana.py'.
The program is executed in three stages:
Remark:
The script relies on many features of the NumPy/SciPy
package which is not
installed by default on many platforms. Please make sure that NumPy
1.1 or later is installed. (Note: for Apple computers the Numpy/SciPy
developers strongly recommentd using the official Python distribution
and not the version shipped with Mac OS X.)
Possible options can be obtained with
clusterstat [opt] input_file
clusterstat h
 Purpose: Statistical analysis for the cluster expansion of a potential surface.
Usage: clusterstat [mnd c D deb w ver h ? ] inputfile.inp
mnd Make name directory c Continuation run n <int> Perform <int> steps (ignore value in input file) t <file> Use a trajectory stored in <file> D <dir> Denotes the directory where files are written to (name in .inp file ignored). deb Run in debug mode (more detailed logging) w Allow to overwrite existing data. ver Print version info h ? Print this help text. 
Note: Continuation runs (c option) can be used to restart a
crashed or canceled job.
If c is given, the program attempts to read a previously
created restartfile in the name directory and reads the current state.
Section  Description 

RUN  What is to be done. 
COORDINATE  Definition of the physical coordinates and their ranges. 
MODES  Definition of logical coordinates, i.e., the modecombination scheme (similar to the definition of singleparticle functions). 
REFERENCE  Reference points around which the PES is expanded. 
EXPANSION  Specification of the expansion terms. 
Example inputs can be found in $MCTDH_DIR/inputs/clusters/statistics.





name = S  The 'name' directory to which output files are written.  
usersource = S  The path (relative or absolute) to the module containing the potential energy routine and, if required, a probe routine and a switching function. See using the 'usersource' keyword.  
steps = I  The number of steps of the random walker.  
stepnorm = R  The size, i.e, the Euclidean norm, of a step of the random walker  





title = S  The title of the calculation.  
potentialroutine = S  The name of the routine that evaluates the PES for a given coordinate vector (see below). The routine must be found in the module set by the 'usersource' keyword (see above). Default: S = "potential"  
proberoutine = S  The name of the routine that evaluates a probe function for a given coordinate vector (see below). This routine can be different from the PES routine (see below). The routine must be found in the module set by the 'usersource' keyword (see ). Default: the same routine set by the 'potentialroutine' keyword  
temperature = R(,S)  The 'temperature' used in the MonteCarlo process to
calculate the trajectory of the random walker. S can be a unit
(see 'outunit' keyword for possible units). Default unit is "au"
. Note: there is no multiplication with the Boltzmann constant.
The number given here really is k_{B}T = β^{1}
already. This keyword can be omitted if 'trajectory' is set. 

outunit = S  By default, all output is written in atomic units ("au"). It can, however, be changed to other units. 'outunit' can be one of "cm1", "eV", "meV", "au", "kcal/mol" or "kJ/mol".  
trajectory = S  Equivalent to option t. The name of or path to a file containing a trajectory. If trajectory is given, no random walk is performed but the coordinates are obtained from this file. Format: ASCII, one set of whitespace separated coordinate values (float) per line. Optional additional column containing a weight. See coordinates file in output documentation. 

overwrite  Allow overwriting of existing files in the 'name' directory. Similar to option w in the command line.  
mean  While generating the clusterdata calculate the cumulative and orderdependent mean error of the expansion. See mean_error_order file in output documentation.  
rms  While generating the clusterdata calculate the cumulative and orderdependent rootmeansquare of the error of the expansion. See rms_error_order file in output documentation.  
walkonly  Do not generate any data concerning the clusterexpansion but only perform a random walk following the probefunction.  
ignoreweights  If trajectory = S or comand line option t is set, possibly existing weight factors in the trajectory file are ignored and all coordinates are weighted equally. 
The potentialroutine and proberoutine keywords
The potentialroutine keyword is used to specify the name of the PES routine in the module specified as 'usersource'. The subroutine receives a 1D numpy array (type float) containing a coordinate vector and must return a single float number, the potential energy for the given coordinates in atomic units. The program assumes that the result is in atomic units.
The proberoutine keyword is used to specify the name of a probe routine. The probe routine is only used to determine the trajectory of the random walker during the MonteCarlo process. I.e, the trajectory of the random walker follows the Boltzmann distribution of the probe function (~exp(βV_{probe})). If the probefunction is not specified, the routine specified by the potentialroutine keyword is used.
Simple example of a PES subroutine:
#!/usr/bin/env python
#
# file: my_pes.py
import numpy
def potential(Q):
"My PES routine  just a harmonic oscillator"
return numpy.dot(Q,Q)/2
The coordinates are set up one per line, starting with the coordinate label, followed by the start of the coordinate range and its end.
In addition to the range, also a scaling factor for the random walker can be added. Since the step size of the random walker is always of the same length, one can add a weighting factor for each component after the range (default is unity). If the weighting factor is set to zero, the initial value of coordinate is not altered.
Note: The initial position of the random walker is set to the first reference point.
Example
MODESSECTION
Q1, Q2 # mode 1
Q3 # mode 2
Q5, Q4 # mode 3
Q6 # mode 4
endmodessection
Within the REFERENCESECTION, these reference points are specified one per line between the beginreference and endreference statements. The first entry in a line is the label of the coordinate as specified in the COORDINATESSECTION. The label is followed by whitespace separated coordinate values for each reference point. Note: The initial position of the random walker is set to the first reference point.
Other keywords
In the case of multiple reference points additional keywords can be
used
to define how the clusters are summed. By default all clusters will be
summed with the same relative weight.




weights = R,R1,...  The clusters for each reference point are summed together with weights R,R1, etc. If weights are given, there must be as many weights as reference points. weights cannot be negative and the sum must be unity. If neither weights nor a switching function are given, all weights are set to 1/N where N is the number of reference points. 
switchfunction = S  Alternatively to the weights keyword, a switching function can be used to generate coordinate dependent weights. S is the name of a routine in the module specified by the 'usersource' keyword in the RUNSECTION (see below). 
Using a switching function
Using a userdefined switching function is similar to defining the
potential energy routine.
A routine (or other callable object) of the name set by the
'switchfunction' keyword must be found in the module or library defined by
the 'usersource' keyword in the runsection. The switching function
receives a complete coordinate vector containing the currunt position of the
random walker (1D numpy array, type float) and must
return a 1D float array (or any other ordered sequence) of weights.
The array must contain as
many entries as there are reference points. There must not be negative
elements and the sum of all elements must be unity.
Example: two reference points connected by a switching function
REFERENCESECTION beginreference # label ref1 ref2 Q1 0.0 0.0 Q2 0.0 0.0 Q3 0.0 0.0 Q4 1.0 1.0 Q5 1.0 1.0 Q6 1.0 1.0 endreference switchfunction = my_switch_routine endreferencesection
Complete modes are specified by integers, while single coordinates are specified by their labels as given within the COORDINATESECTION or PRIMITIVEBASISSECTION. The integer defining a mode is determined by its position in the MODESSECTION, i.e., the first mode defined in the MODESSECTION is adressed by the integer 1, the second by 2, etc. For instance, the integer 3 will refer to mode three containing the coordinates 'Q4' and 'Q5' in the example above.
A tuple is a list of mode integers and/or coordinate labels enclosed in round brackets. For example the tuple (1,2,Q4) defines a cluster containing the coordinates 'Q1', 'Q2', 'Q3' and 'Q4', where the first two modes are completely taken into account by specification through integers, but only one coordinate ('Q4') of mode '3' is included. The same cluster could of course also be specified by the tuple (Q1,Q2,Q3,Q4).
Ordering of the tuples
Within a tuple, the mode number (whether or not a mode is specified by
its integer or only parts of it are included by a coordinate label)
must always increase from left to right so that the smallest mode number is in
the most left position. Also, a mode cannot be specified multiply, for
example when a mode integer and a coordinate label are used.
Within the example above, the tuples (2,1,3) and (2,3,Q5) are invalid;
the first because of the order of integers, the second because of
double definition of the coordinate label 'Q5' as it is part of mode
3.
The empty tuple
The empty tuple () is used for clusters with no coordinate
dependence, i.e, a singe (reference) point (see REFERENCESECTION).
Using wildcards
If a lage number of clusters is to be calculated, specification of all
single clusters can be cumbersome. The character '*' can be used as a
wildcard. Wildcards always represent mode integers and cannot be used for
coordinate labels. Depending on the position within a tuple
a wildcard is expanded to all possible mode integers for this particular position.
Examples using the mode definition above:
The tuple (*,3,*) is expanded into (1,3,4),(2,3,4).
The tuple (*,Q5) is expanded into (1,Q5),(2,Q5).
The tuple (*,*,4) is an acronym for the tuples (1,2,4), (1,3,4) and (2,3,4).
The tuple (*,*,2) is invalid as the first mode integer after the wildcards is too small.
The tuple (*,*) is expanded to tuples (1,2),(1,3),(1,4),(2,3),(2,4),(3,4).
Example 1
Expansion to second order around a single point:
EXPANSIONSECTION
(),(*),(*,*)
endexpansionsection
Example 2:
Full expansion to first order around logical
coordinate 4 plus two additional 2ndorder clusters:
EXPANSIONSECTION
(4)
(*,4)
(*,3,4)
endexpansionsection
1. Initialization
In the very beginning the program opens a log file clusterstat.log which remains open until termination of the program. It recieves all messages sent through the logging system and contains timestamps, log levels and messages.
Also in the very beginning a file input containing a string representation of the original input is generated. This file is reproduced from allready processed input and is not merely a copy of the original input file.
In Addition, the restart file is generated. This file is continuously updated to allow restarting the calculation after a crash or manual interruption. Note: The restart file is generated using the Python builtin cPickle module and can be easily inspected and altered in an interactive Python shell (see cPickle documentation).
2. Random walk
After the initialization, if required, a random walk using a Metropolis algorithm is performed. During the random walk the file coordinates is generated. It contains the coordinate vector for each accepted MonteCarlo step in ASCII format, one vector per line with whitespace separated floating point numbers. Also generated at this stage is the file probe. The nth line of this file contains the energy returned by the probe function of the nth accepted MonteCarlo step, the mean of the first n energies and the rootmeansquare of the first n energies. These files are not generated if the trajectory is read from file.
3. Calcululation of the Clusters
When the trajectory is generated, the clusters are calculated for
each coordinate vector and reference point. The following files are
created (all ASCII and whitespace separated for multiple columns):