SageCal

master  dev 

Read INSTALL for installation. This file gives a brief guide to use SAGECal. Warning: this file may be obsolete. use sagecal h to see uptodate options.
Read the https://github.com/nlescdirac/sagecal/blob/master/CONTRIBUTING.md "contributing guide"
Code documentation can be found here.
Input to sagecal must be in CASA MS format, make sure to create a column in the MS to write output data as well. The data can be in raw or averaged form, also initial calibration using other software can be also applied.
Use Duchamp to create a mask for the image. Use buildsky to create a sky model. (see the README file on top level directory). Also create a proper cluster file. Special options to buildsky: o 1 (NOTE: not o 2)
Alternatively, create these files by hand according to the following formats.
cluster_id chunk_size source1 source2 ... e.g.
Note: putting ve values for cluster_id will not subtract them from data. chunk_size: find hybrid solutions during one solve run. Eg. if t 120 is used to select 120 timeslots, cluster 0 will find a solution using the full 120 timeslots while cluster 2 will solve for every 120/3=40 timeslots.
#### 2c) Sky model format:
or
e.g.:
Note: Comments starting with a '#' are allowed for both sky model and cluster files. Note: 3rd order spectral indices are also supported, use F 1 option in sagecal. Note: Spectral indices use natural logarithm, exp(ln(I0) + p1 * ln(f/f0) + p2 * ln(f/f0)^2 + ..)
so if you have a model with common logarithms like 10^(log(J0) + q1*log(f/f0) + q2*log(f/f0)^2 + ..)
then, conversion is
so
Optionally: Make sure your machine has (1/2 working NVIDIA GPU cards or Intel Xeon Phi MICs) to use sagecal. Recommended usage: (with GPUs)
Use your solution interval (t 60) so that its big enough to get a decent solution and not too big to make the parameters vary too much. (about 20 minutes per solution is reasonable).
Note: It is also possible to calibrate more than one MS together. See section 4 below. Note: To fully use GPU acceleration use E 1 option.
Simulations: With a 1, only a simulation of the sky model is done. With a 1 and p 'solutions_file', simulation is done with the sky model corrupted with solutions in 'solutions_file'. With a 1 and p 'solutions_file' and z 'ignore_file', simulation is done with the solutions in the 'solutions_file', but ignoring the cluster ids in the 'ignore_file'. Eg. If you need to ignore cluster ids '1', '10', '999', create a text file :
and use it as the 'ignore_file'.
Use mpirun to run sagecalmpi, example:
Specific options : np 11
: 11 processes : starts 10 slaves + 1 master
./machines
: will list the host names of the 11 (or fewer) nodes used ( 1st name is the master ) : normally the node where you invoke mpirun
f 'MS*pattern'
: Search MS names that match this pattern and calibrate all of them together. The total number of MS being calibrated can be higher than the actual number of slaves (multiplexing).
A 30
: 30 ADMM iterations.
P 2
: polynomial in frequency has 2 terms.
Q
: can change the type of polynomial used (Q 2
gives Bernstein polynomials).
r 5
: regularization factor is 5.0.
G textfile
: each cluster can have a different regularization factor, instead of using r
option when the regularization is the same for all clusters.
MPI specific options:
/scratch/users/sarod
: this is where MPI stores temp files (default is probably /tmp
).
mca*
: various options to tune the networking and scheduling.
Note: the number of slaves (np option) can be lower than the number of MS calibrated. The program will divide the workload among the number of available slaves.
The rest of the options are similar to sagecal.
All SAGECal solutions are stored as text files. Lines starting with '#' are comments. The first noncomment line includes some general information, i.e. freq(MHz) bandwidth(MHz) time_interval(min) stations clusters effective_clusters
The remaining lines contain solutions for each cluster as a single column, the first column is just a counter. Let's say there are K effective clusters and N directions. Then there will be K+1 columns, the first column will start from 0 and increase to 8N1, which can be used to count the row number. It will keep repeating this, for each time interval. The rows 0 to 7 belong to the solutions for the 1st station. The rows 8 to 15 for the 2nd station and so on. Each 8 rows of any given column represent the 8 values of a 2x2 Jones matrix. Lets say these are S0,S1,S2,S3,S4,S5,S6
and S7
. Then the Jones matrix is [S0+j*S1, S4+j*S5; S2+j*S3, S6+j*S7]
(the ';' denotes the 1st row of the 2x2 matrix).
When a cluster has a chunk size > 1, there will be more than 1 solution per given time interval. So for this cluster, there will be more than 1 column in the solution file, the exact number of columns being equal to the chunk size.