Crombie Tools
|
Describes the layout of an analysis workspace created by crombie workspace
.
A workspace is where all analysis-specific configurations are placed. You can create a workspace with the name Workspace
like the following.
mkdir Workspace cd Workspace crombie workspace
A new workspace will be created from the CrombieTools/templates
directory and have the following layout.
Workspace/ |-- docs | |-- Workspace.tex | |-- download.sh | |-- figs | `-- presentation.tex |-- plotter | |-- CrombiePlotterConfig.sh | |-- MCConfig.txt | `-- cuts.py `-- slimmer |-- CrombieSlimmingConfig.sh |-- FlatSkimmer.sh |-- JobScriptList.txt `-- runSlimmer.py
Each subdirectory has distinct function, which are described below.
The first thing that will often be needed in an analysis is slimming files into flat trees. There are two essential steps that this layout assumes you need.
Changing the file format The user can go from some large, inclusive ntuples to something small and flat. Crombie Tools currently supports two methods to do this in batch jobs. One is through submissions through LXBATCH to run on files on EOS. The other method is to run on files interactively in a terminal. To allow the user to run this step less often, they are encouraged to not perform too much skimming of events in this step, as that can performed in the next step.
Skimming events from trees This can be done with a simple cut string, as well as using a Good Run list.
To understand how to do these steps, take a look inside of the file slimmer/CrombieSlimmingConfig.sh
. The variables all start with Crombie
to ensure that the names do not overlap with any environment variables that may be set by other tools used by the user. The meanings of each variable is listed below.
These are the environment variables used in the slimming and skimming of an analysis.
CrombieFilesPerJob | This specifies the number of files on EOS each LXBATCH job will read. Keep this constant between resubmissions, because it directly determines which files are run together. The other variables setting up the queue and number of cores won't change anything. |
CrombieQueue | Specifies which LXBATCH queue will be submitted to. Examples are 8nm , 1nh , 8nh , 1nd , 2nd , 1nw , 2nw , and 2nw4cores . |
CrombieNLocalProcs | The number of processors used for local slimming, skimming, plotting, etc. The default number uses all available processors, which is perhaps not always desired. |
CrombieFileBase | Each file name of the flat files you'll work with start with <base>_ and that base is set here so that your LXBATCH jobs make the correct name. These namespaces are useful for knowing what each ntuple was used for previously. |
CrombieEosDir | This can be one of two things.
The submission tool will figure out which one of the two you set this variable as by checking if a local file with that name exists. |
CrombieRegDir | crombie terminalslim does not run on EOS necessarily, like crombie submitlxbatch does. This is the variable that sets what folder to look in for datasets. Note that this only takes a relative directory, not a list of directories. |
CrombieTempDir | A location to store the direct LXBATCH output. This is the directory that will be checked when the tool is trying to determine what files have and have not been successfully created. |
CrombieFullDir | This is simply location of the hadded LXBATCH output. That is, all the files of the same dataset will be combined. This directory will also hold a list of the original locations of the datasets on EOS so that differences in dataset location can be detected for separate runs. |
CrombieSkimDir | This is the location of the flat trees run through a good runs skim as well as any other cuts added. This is not a necessary variable, but is used in the template of FlatSkimmer.sh . |
CrombieDirList | If left blank, all of the datasets in the CrombieEosDir or CrombieRegDir will be run on. Otherwise, this variable should name a local .txt file that has a list of datasets that you want to run on. |
CrombieSlimmerScript | This names the script that the LXBATCH job will run. Make sure that the script is executable (
If no arguments are passed, make sure the script compiles everything that it needs (with `LoadMacro('..+')` for example). |
CrombieJobScriptList | This variable names a local .txt file that names the relative paths of all files that should be copied to the LXBATCH node for the job to be completed. This will often include macros and headers needed. All of these files must be in the slimmer subdirectory. Full path names or using .. will not work. |
CrombieCheckerScript | Names a script that checks the output of each file run on in the LXBATCH job. The script should return a non-zero exit code if there is a problem with the output file. The script should return exit code 5 for fatal errors that will abort the job. The job will also abort for any non-zero exit code for the hadded output of the LXBATCH job. These errors will be reported in a local file, LxbatchFileChecks.log . |
CrombieGoodRuns | This is a variable also only used by the template of FlatSkimmer.sh , so it's optional. It names the location of the good runs JSON for the the skimmer to use. |
This is now deprecated, see here for documentation of new tool.
There is a tool for generating flat tree classes for the user. The variables that you want to include in a flat tree can be specified in OutTree.txt
or whatever you rename it to. The format of the configuration file is <branchName>/<type>=<default>
. <branchName>
should be easy to understand. Valid entries for <type>
are the following:
F | float |
I | int |
i | unsigned int |
L | long |
l | unsigned long |
O | bool |
Any other types will be assumed to be TObjects. The header file for the listed TObject will be included automatically. This feature is not tested extensively since I don't use it.
You can also preface a type with V
for (pointers of) vectors of these types. After writing a tree configuration file with name OutTree.txt
, just run
crombie oldmaketree OutTree
This makes a class that contains your tree. This is done automatically for you in the default runSlimmer.py
template. You can call each branch of the tree via a public member with the same name of the <branchName>
and fill the whole tree with function Fill()
at the end of each event. If you do not set a value for a particular event, the branch will be filled with <default>
. You can then write the tree to a file via WriteToFile(*TFile,"<WhatYouWantToCallTree>")
. There's also other overloaded write and creation function using a file name you specifiy with the initializer. See slimmer.cc
for an example of how to write a flat tree using this class.
The environment variables used for skimming the flat trees afterwards are actually optional, if you edit the file slimmer/FlatSkimmer.sh
to accommodate that. The user is encouraged to edit FlatSkimmer.sh
, which makes use of a flexible tool, crombie skim
. Note, crombie skim
is not included in the command line references because it is not recommended that this command is used interactively. It is much more efficient and stable to create a script like FlatSkimmer.sh
. crombie skim
takes files from an input directory, skims them, and places them in a separate directory. Here is the help message to help you understand how to customize this.
usage: crombie skim [-h] [--numproc NUM] [--indir DIR] [--outdir DIR] [--json FILE] [--cut CUT] [--tree NAME] [--copy [NAMES [NAMES ...]]] [--run EXPR] [--lumi EXPR] [--freq NUM] [--filters [FILE [FILE ...]]] [--duplicate] Slims the contents of one directory into another one optional arguments: -h, --help show this help message and exit --numproc NUM, -n NUM Number of processes that FlatSkimmer will spawn. --indir DIR, -i DIR Directory that contains input files to be slimmed. --outdir DIR, -o DIR Directory where slimmed stuff will be placed. --json FILE, -j FILE Good runs json file location to be used. --cut CUT, -c CUT Cut used in slimming. --tree NAME, -t NAME Name of tree that will be slimmed. --copy [NAMES [NAMES ...]] List other object names to copy into the slimmed file. --run EXPR, -r EXPR Set the expression for Run Number. --lumi EXPR, -l EXPR Set the expression for Lumi Number. --freq NUM, -f NUM Set the reporting frequency. --filters [FILE [FILE ...]], -e [FILE [FILE ...]] Set the filter files. --duplicate, -d Turn on duplicate checking.
After running FlatSkimmer.sh
, you should have your small ntuples ready to work with.
The next subdirectory of a workspace is the plotting directory. This comes with it's own list of environment variables.
CrombieMCConfig | This names the file that will be read to set the backgrounds for plots, limit trees, datacards, etc. See the MC Configuration for details on how to set this up this background configuration. |
CrombieSignalConfig | This variable names a file like CrombieMCConfig , but files listed in the named location are assumed to be signal files, not background files. |
CrombieExcept_* | This variable names a region in the * location, and this variable points to a file that designates replacements to the background configuration for the named region. See the instructions for Adjustment Configuration below to set up this file correctly. |
CrombieLuminosity | This variable just gives the luminosity used to make plots and limit workspaces. |
CrombieInFilesDir | This variable names the directory containing the ntuple files that are being used for plotting and tree making. It should usually match the latest value you had for CrombieSkimDir . |
CrombieOutPlotDir | This variable names the directory where an automatically configured PlotStack object will place all of the plots. |
CrombieOutLimitTreeDir | This variable names the directory where an automatically configured LimitTreeMaker object will place all of the tree files. |
CrombieCutsFile | This file gives the name of the python file to be loaded as the function CrombieTools.LoadConfig.cuts when the user imports CrombieTools.LoadConfig in a working directory with this environment variable set. |
Each analysis will probably make use of multiple MC Samples. You can keep track of those all with one simple MC Config.
You will generally have one main configuration file with most of your background samples listed. Signal samples should be kept in a separate configuration file, since these will be marked as signal or background when read. Each sample should be contained in a single .root
file. The MC Config will keep track of these files, one row at a time. The order of the elements should be this:
<LimitTreeName> <FileName> <CrossSection> <LegendEntry> <FillColorOrLineStyle>
The elements are space delimited.
Limit Tree Name | This is the base of the tree that will be made by LimitTreeMaker for this file. The name should be unique for each file if using LimitTreeMaker. For other analyses, this can instead be used only to differentiate signal and background. In this case, putting . in the config file will copy the previous line. |
File Name | This is the name of the file for the given sample. The file name does not need to be absolute, as the input directory is set in FileConfigReader::SetInDirectory(), usually by reading the environment configuration. |
Cross Section | This should be the cross section of the sample in pb. |
Legend Entry | This is the legend entry that will be made in all of the stack plots using this config for the given file. If you want to have spaces in your legend entry, place _ instead (since the elements of the config are space delimited). These are all replaced with spaces by FileConfigReader::ReadMCConfig(). Legend entries being repeated next to each other will cause multiple files to merge into the same stack element. A shortcut to using the legend entry in the previous line is to just put . as the Legend Entry. |
Fill Color or Line Style | For background MC, this specifies the fill color, using the Color_t enums from ROOT. If you wish to give a custom RGB color, just make this entry For signal samples, this entry should give the linestyle you wish to use for the sample. |
To avoid having duplicate entries in multple configurations, there is an easy way to switch out MC samples for different ones, while keeping the rest of the samples the same. If a line starts with the keyword skip
instead of a tree name and then lists a file, the MCReader will erase the MCFileInfo for that file. A line like this simply contains:
skip <FileName>
A configuration file with lines like this can also contain lines like those in the base configuration. This makes it easy to swap out files. After reading one config, just read the adjusting configuration after before making limit trees or plotting.
MC samples that are created using different generators or hadronizers, etc. can be easily merged together without changing the configured cross section. The two different samples are weighted in such as way as to minimize the total statistical uncertainty.
To merge multiple samples, simply start the process by making a lone line.
INGROUP
Separate each set of samples with this delimiter. After the last set of samples to be merged, place the line.
ENDGROUP
For example, if you want to merge three types of samples in your plots, your MC Config would look like this.
INGROUP example type0_file0.root 0.5 LegendEntry 600 . type0_file1.root 0.5 . . INGROUP . type1_file0.root 0.2 . . . type1_file1.root 0.6 . . . type1_file2.root 0.2 . . INGROUP . type2_file0.root 1.0 . . ENDGROUP
This will merge three different samples with a process with cross section of 1.0 in such a way as their total plotted cross section is 1.0 and their statistical uncertainty is minimized.
Other directories can of course be added by hand. There are certain ways to still source the old configuration files if you need it, and all of the command line and python tools are still available. Just be careful that if you change the configuration in a separate directory, the changes will be reflected in your miscellaneous directory. An analysis should be as tightly coupled as possible.