1 # Analysis Workspace {#workspace}
3 @brief Describes the layout of an analysis workspace created by `crombie workspace`.
5 A workspace is where all analysis-specific configurations are placed.
6 You can create a workspace with the name `Workspace` like the following.
12 A new workspace will be created from the `CrombieTools/templates` directory
13 and have the following layout.
20 | `-- presentation.tex
22 | |-- CrombiePlotterConfig.sh
26 |-- CrombieSlimmingConfig.sh
32 Note to self: this was generated using
33 tree Workspace/ --charset ASCII
34 after making a new workspace
37 Each subdirectory has distinct function, which are described below.
39 # Slimming {#slimming}
41 The first thing that will often be needed in an analysis is slimming
42 files into flat trees.
43 There are two essential steps that this layout assumes you need.
47 <li> <strong>Changing the file format</strong>
48 The user can go from some large, inclusive ntuples to something small and flat.
49 Crombie Tools currently supports two methods to do this in batch jobs.
50 One is through submissions through LXBATCH to run on files on EOS.
51 The other method is to run on files interactively in a terminal.
52 To allow the user to run this step less often, they are encouraged to
53 not perform too much skimming of events in this step,
54 as that can performed in the next step.
56 <li> <strong>Skimming events from trees</strong>
57 This can be done with a simple cut string, as well as using a Good Run list.
61 To understand how to do these steps, take a look inside of the file `slimmer/CrombieSlimmingConfig.sh`.
62 The variables all start with `Crombie` to ensure that the names do not overlap with any
63 environment variables that may be set by other tools used by the user.
64 The meanings of each variable is listed below.
66 # Environment Variables {#envconfig}
68 These are the environment variables used in the slimming and skimming of an analysis.
70 <table cellpadding=20>
72 <td align="left" valign="top">
73 <code>CrombieFilesPerJob</code>
76 This specifies the number of files on EOS each LXBATCH job will read.
77 Keep this constant between resubmissions, because it directly determines which files
79 The other variables setting up the queue and number of cores won't change anything.
83 <td align="left" valign="top">
84 <code>CrombieQueue</code>
87 Specifies which LXBATCH queue will be submitted to.
88 Examples are `8nm`, `1nh`, `8nh`, `1nd`, `2nd`, `1nw`, `2nw`, and `2nw4cores`.
92 <td align="left" valign="top">
93 <code>CrombieNLocalProcs</code>
96 The number of processors used for local slimming, skimming, plotting, etc.
97 The default number uses all available processors, which is perhaps not always desired.
101 <td align="left" valign="top">
102 <code>CrombieFileBase</code>
105 Each file name of the flat files you'll work with start with `<base>_` and that base is
106 set here so that your LXBATCH jobs make the correct name.
107 These namespaces are useful for knowing what each ntuple was used for previously.
111 <td align="left" valign="top">
112 <code>CrombieEosDir</code>
115 This can be one of two things.
117 - A directory on EOS where you will look for all datasets.
118 - A local .txt file which has a list of directories to look for datasets.
120 The submission tool will figure out which one of the two you set this variable
121 as by checking if a local file with that name exists.
125 <td align="left" valign="top">
126 <code>CrombieRegDir</code>
129 `crombie terminalslim` does not run on EOS necessarily, like `crombie submitlxbatch` does.
130 This is the variable that sets what folder to look in for datasets.
131 Note that this only takes a relative directory, not a list of directories.
135 <td align="left" valign="top">
136 <code>CrombieTempDir</code>
139 A location to store the direct LXBATCH output.
140 This is the directory that will be checked when the tool is trying to determine
141 what files have and have not been successfully created.
145 <td align="left" valign="top">
146 <code>CrombieFullDir</code>
149 This is simply location of the hadded LXBATCH output.
150 That is, all the files of the same dataset will be combined.
151 This directory will also hold a list of the original locations of the datasets
152 on EOS so that differences in dataset location can be detected for separate runs.
156 <td align="left" valign="top">
157 <code>CrombieSkimDir</code>
160 This is the location of the flat trees run through a good runs skim as well
161 as any other cuts added.
162 This is not a necessary variable, but is used in the template of `FlatSkimmer.sh`.
166 <td align="left" valign="top">
167 <code>CrombieDirList</code>
170 If left blank, all of the datasets in the `CrombieEosDir` or `CrombieRegDir` will be run on.
171 Otherwise, this variable should name a local .txt file that has a list of datasets that
176 <td align="left" valign="top">
177 <code>CrombieSlimmerScript</code>
180 This names the script that the LXBATCH job will run.
181 Make sure that the script is executable (`chmod +x`),
182 and that it takes two arguments:
184 - The input file name
185 - The output file name
187 If no arguments are passed, make sure the script compiles everything
188 that it needs (with `LoadMacro('..+')` for example).
192 <td align="left" valign="top">
193 <code>CrombieJobScriptList</code>
196 This variable names a local .txt file that names the relative paths of all files that should be copied
197 to the LXBATCH node for the job to be completed.
198 This will often include macros and headers needed.
199 All of these files must be in the `slimmer` subdirectory.
200 Full path names or using `..` will not work.
204 <td align="left" valign="top">
205 <code>CrombieCheckerScript</code>
208 Names a script that checks the output of each file run on in the LXBATCH job.
209 The script should return a non-zero exit code if there is a problem with the output file.
210 The script should return exit code 5 for fatal errors that will abort the job.
211 The job will also abort for any non-zero exit code for the hadded output of the LXBATCH job.
212 These errors will be reported in a local file, `LxbatchFileChecks.log`.
216 <td align="left" valign="top">
217 <code>CrombieGoodRuns</code>
220 This is a variable also only used by the template of `FlatSkimmer.sh`, so it's optional.
221 It names the location of the good runs JSON for the the skimmer to use.
226 # Generating flat trees for output {#flattrees}
228 This is now deprecated, see [here](@ref maketree) for documentation of new tool.
230 There is a tool for generating flat tree classes for the user.
231 The variables that you want to include in a flat tree can be specified in `OutTree.txt` or whatever you rename it to.
232 The format of the configuration file is `<branchName>/<type>=<default>`.
233 `<branchName>` should be easy to understand.
234 Valid entries for `<type>` are the following:
237 <tr><td>`F`</td><td>float</td></tr>
238 <tr><td>`I`</td><td>int</td></tr>
239 <tr><td>`i`</td><td>unsigned int</td></tr>
240 <tr><td>`L`</td><td>long</td></tr>
241 <tr><td>`l`</td><td>unsigned long</td></tr>
242 <tr><td>`O`</td><td>bool</td></tr>
245 Any other types will be assumed to be TObjects.
246 The header file for the listed TObject will be included automatically.
247 This feature is not tested extensively since I don't use it.
249 You can also preface a type with `V` for (pointers of) vectors of these types.
250 After writing a tree configuration file with name `OutTree.txt`, just run
252 crombie oldmaketree OutTree
254 This makes a class that contains your tree.
255 This is done automatically for you in the default `runSlimmer.py` template.
256 You can call each branch of the tree via a public member with the same name of the `<branchName>`
257 and fill the whole tree with function `Fill()` at the end of each event.
258 If you do not set a value for a particular event, the branch will be filled with `<default>`.
259 You can then write the tree to a file via `WriteToFile(*TFile,"<WhatYouWantToCallTree>")`.
260 There's also other overloaded write and creation function using a file name you specifiy with the initializer.
261 See `slimmer.cc` for an example of how to write a flat tree using this class.
265 The environment variables used for skimming the flat trees afterwards are
266 actually optional, if you edit the file `slimmer/FlatSkimmer.sh` to accommodate that.
267 The user is encouraged to edit `FlatSkimmer.sh`,
268 which makes use of a flexible tool, `crombie skim`.
269 Note, `crombie skim` is not included in the [command line references](bin/README.md)
270 because it is not recommended that this command is used interactively.
271 It is much more efficient and stable to create a script like `FlatSkimmer.sh`.
272 `crombie skim` takes files from an input directory, skims them, and places them in
273 a separate directory.
274 Here is the help message to help you understand how to customize this.
276 usage: crombie skim [-h] [--numproc NUM] [--indir DIR] [--outdir DIR]
277 [--json FILE] [--cut CUT] [--tree NAME]
278 [--copy [NAMES [NAMES ...]]] [--run EXPR] [--lumi EXPR]
279 [--freq NUM] [--filters [FILE [FILE ...]]] [--duplicate]
281 Slims the contents of one directory into another one
284 -h, --help show this help message and exit
285 --numproc NUM, -n NUM
286 Number of processes that FlatSkimmer will spawn.
287 --indir DIR, -i DIR Directory that contains input files to be slimmed.
288 --outdir DIR, -o DIR Directory where slimmed stuff will be placed.
289 --json FILE, -j FILE Good runs json file location to be used.
290 --cut CUT, -c CUT Cut used in slimming.
291 --tree NAME, -t NAME Name of tree that will be slimmed.
292 --copy [NAMES [NAMES ...]]
293 List other object names to copy into the slimmed file.
294 --run EXPR, -r EXPR Set the expression for Run Number.
295 --lumi EXPR, -l EXPR Set the expression for Lumi Number.
296 --freq NUM, -f NUM Set the reporting frequency.
297 --filters [FILE [FILE ...]], -e [FILE [FILE ...]]
298 Set the filter files.
299 --duplicate, -d Turn on duplicate checking.
301 After running `FlatSkimmer.sh`, you should have your small ntuples ready to work with.
303 # Plotting {#plotting}
305 The next subdirectory of a workspace is the plotting directory.
306 This comes with it's own list of environment variables.
308 <table cellpadding=20>
310 <td align="left" valign="top">
311 <code>CrombieMCConfig</code>
314 This names the file that will be read to set the backgrounds for plots,
315 limit trees, datacards, etc.
316 See the [MC Configuration](@ref formatmc) for
317 details on how to set this up this background configuration.
321 <td align="left" valign="top">
322 <code>CrombieSignalConfig</code>
325 This variable names a file like `CrombieMCConfig`,
326 but files listed in the named location are assumed to be
327 signal files, not background files.
331 <td align="left" valign="top">
332 <code>CrombieExcept_*</code>
335 This variable names a region in the `*` location,
336 and this variable points to a file that designates replacements to
337 the background configuration for the named region.
338 See the instructions for [Adjustment Configuration](@ref formatmc) below
339 to set up this file correctly.
343 <td align="left" valign="top">
344 <code>CrombieLuminosity</code>
347 This variable just gives the luminosity used to make plots and limit workspaces.
351 <td align="left" valign="top">
352 <code>CrombieInFilesDir</code>
355 This variable names the directory containing the ntuple files that are being used
356 for plotting and tree making.
357 It should usually match the latest value you had for `CrombieSkimDir`.
361 <td align="left" valign="top">
362 <code>CrombieOutPlotDir</code>
365 This variable names the directory where an automatically configured PlotStack object
366 will place all of the plots.
370 <td align="left" valign="top">
371 <code>CrombieOutLimitTreeDir</code>
374 This variable names the directory where an automatically configured LimitTreeMaker object
375 will place all of the tree files.
379 <td align="left" valign="top">
380 <code>CrombieCutsFile</code>
383 This file gives the name of the python file to be loaded as the function `CrombieTools.LoadConfig.cuts`
384 when the user imports CrombieTools.LoadConfig in a working directory with this environment variable set.
389 # Formatting MC Configuration Files {#formatmc}
391 Each analysis will probably make use of multiple MC Samples.
392 You can keep track of those all with one simple MC Config.
394 ## Base Configuration
396 You will generally have one main configuration file with most of your background samples listed.
397 Signal samples should be kept in a separate configuration file, since these will be marked as signal or background when read.
398 Each sample should be contained in a single `.root` file.
399 The MC Config will keep track of these files, one row at a time.
400 The order of the elements should be this:
402 <LimitTreeName> <FileName> <CrossSection> <LegendEntry> <FillColorOrLineStyle>
404 The elements are space delimited.
406 <table cellpadding=5>
408 <td align="left" valign="top" width="15%">
411 <td align="left" valign="top">
412 This is the base of the tree that will be made by LimitTreeMaker for this file.
413 The name should be unique for each file if using LimitTreeMaker.
414 For other analyses, this can instead be used only to differentiate signal and background.
415 In this case, putting `.` in the config file will copy the previous line.
419 <td align="left" valign="top">
422 <td align="left" valign="top">
423 This is the name of the file for the given sample.
424 The file name does not need to be absolute, as the input directory is set in
425 FileConfigReader::SetInDirectory(), usually by reading
426 the [environment configuration](@ref envconfig).
430 <td align="left" valign="top">
433 <td align="left" valign="top">
434 This should be the cross section of the sample in pb.
438 <td align="left" valign="top">
441 <td align="left" valign="top">
442 This is the legend entry that will be made in all of the stack plots using this config for the given file.
443 If you want to have spaces in your legend entry, place `_` instead (since the elements of the config are space delimited).
444 These are all replaced with spaces by FileConfigReader::ReadMCConfig().
445 Legend entries being repeated next to each other will cause multiple files to merge into the same stack element.
446 A shortcut to using the legend entry in the previous line is to just put `.` as the Legend Entry.
450 <td align="left" valign="top">
451 Fill Color or Line Style
453 <td align="left" valign="top">
454 For background MC, this specifies the fill color, using the Color_t enums from ROOT.
455 If you wish to give a custom RGB color, just make this entry `rgb` and follow that with the red, blue, and green components space delimited out of 255.
456 If the legend entry of this line matches the entry in the previous line, the color is ignored (but must still be in the config).
457 Again, placing a `.` in this case is a useful shortcut.
459 For signal samples, this entry should give the linestyle you wish to use for the sample.
464 ## Adjustment Configuration
466 To avoid having duplicate entries in multple configurations, there is an easy way to switch out MC samples for different ones, while keeping the rest of the samples the same.
467 If a line starts with the keyword `skip` instead of a tree name and then lists a file, the MCReader will erase the MCFileInfo for that file.
468 A line like this simply contains:
472 A configuration file with lines like this can also contain lines like those in the base configuration.
473 This makes it easy to swap out files.
474 After reading one config, just read the adjusting configuration after before making limit trees or plotting.
476 ## Merging MC Samples
478 MC samples that are created using different generators or hadronizers, etc. can be easily merged together without changing the configured cross section.
479 The two different samples are weighted in such as way as to minimize the total statistical uncertainty.
481 To merge multiple samples, simply start the process by making a lone line.
485 Separate each set of samples with this delimiter.
486 After the last set of samples to be merged, place the line.
490 For example, if you want to merge three types of samples in your plots, your MC Config would look like this.
493 example type0_file0.root 0.5 LegendEntry 600
494 . type0_file1.root 0.5 . .
496 . type1_file0.root 0.2 . .
497 . type1_file1.root 0.6 . .
498 . type1_file2.root 0.2 . .
500 . type2_file0.root 1.0 . .
503 This will merge three different samples with a process with cross section of 1.0 in such a way as their total plotted cross section is 1.0 and their statistical uncertainty is minimized.
507 @todo Document the documentation subdirectory
511 Other directories can of course be added by hand.
512 There are certain ways to still source the old configuration files if you need it,
513 and all of the command line and python tools are still available.
514 Just be careful that if you change the configuration in a separate directory,
515 the changes will be reflected in your miscellaneous directory.
516 An analysis should be as tightly coupled as possible.