Crombie Tools
README.md
Go to the documentation of this file.
1 # Analysis Workspace {#workspace}
2 
3 @brief Describes the layout of an analysis workspace created by `crombie workspace`.
4 
5 A workspace is where all analysis-specific configurations are placed.
6 You can create a workspace with the name `Workspace` like the following.
7 
8  mkdir Workspace
9  cd Workspace
10  crombie workspace
11 
12 A new workspace will be created from the `CrombieTools/templates` directory
13 and have the following layout.
14 
15  Workspace/
16  |-- docs
17  | |-- Workspace.tex
18  | |-- download.sh
19  | |-- figs
20  | `-- presentation.tex
21  |-- plotter
22  | |-- CrombiePlotterConfig.sh
23  | |-- MCConfig.txt
24  | `-- cuts.py
25  `-- slimmer
26  |-- CrombieSlimmingConfig.sh
27  |-- FlatSkimmer.sh
28  |-- JobScriptList.txt
29  `-- runSlimmer.py
30 
31 <!---
32  Note to self: this was generated using
33  tree Workspace/ --charset ASCII
34  after making a new workspace
35 -->
36 
37 Each subdirectory has distinct function, which are described below.
38 
39 # Slimming {#slimming}
40 
41 The first thing that will often be needed in an analysis is slimming
42 files into flat trees.
43 There are two essential steps that this layout assumes you need.
44 
45 <ol>
46 
47  <li> <strong>Changing the file format</strong>
48  The user can go from some large, inclusive ntuples to something small and flat.
49  Crombie Tools currently supports two methods to do this in batch jobs.
50  One is through submissions through LXBATCH to run on files on EOS.
51  The other method is to run on files interactively in a terminal.
52  To allow the user to run this step less often, they are encouraged to
53  not perform too much skimming of events in this step,
54  as that can performed in the next step.
55 
56  <li> <strong>Skimming events from trees</strong>
57  This can be done with a simple cut string, as well as using a Good Run list.
58 
59 </ol>
60 
61 To understand how to do these steps, take a look inside of the file `slimmer/CrombieSlimmingConfig.sh`.
62 The variables all start with `Crombie` to ensure that the names do not overlap with any
63 environment variables that may be set by other tools used by the user.
64 The meanings of each variable is listed below.
65 
66 # Environment Variables {#envconfig}
67 
68 These are the environment variables used in the slimming and skimming of an analysis.
69 
70 <table cellpadding=20>
71  <tr>
72  <td align="left" valign="top">
73  <code>CrombieFilesPerJob</code>
74  </td>
75  <td align="left">
76  This specifies the number of files on EOS each LXBATCH job will read.
77  Keep this constant between resubmissions, because it directly determines which files
78  are run together.
79  The other variables setting up the queue and number of cores won't change anything.
80  </td>
81  </tr>
82  <tr>
83  <td align="left" valign="top">
84  <code>CrombieQueue</code>
85  </td>
86  <td align="left">
87  Specifies which LXBATCH queue will be submitted to.
88  Examples are `8nm`, `1nh`, `8nh`, `1nd`, `2nd`, `1nw`, `2nw`, and `2nw4cores`.
89  </td>
90  </tr>
91  <tr>
92  <td align="left" valign="top">
93  <code>CrombieNLocalProcs</code>
94  </td>
95  <td align="left">
96  The number of processors used for local slimming, skimming, plotting, etc.
97  The default number uses all available processors, which is perhaps not always desired.
98  </td>
99  </tr>
100  <tr>
101  <td align="left" valign="top">
102  <code>CrombieFileBase</code>
103  </td>
104  <td align="left">
105  Each file name of the flat files you'll work with start with `<base>_` and that base is
106  set here so that your LXBATCH jobs make the correct name.
107  These namespaces are useful for knowing what each ntuple was used for previously.
108  </td>
109  </tr>
110  <tr>
111  <td align="left" valign="top">
112  <code>CrombieEosDir</code>
113  </td>
114  <td align="left">
115  This can be one of two things.
116 
117  - A directory on EOS where you will look for all datasets.
118  - A local .txt file which has a list of directories to look for datasets.
119 
120  The submission tool will figure out which one of the two you set this variable
121  as by checking if a local file with that name exists.
122  </td>
123  </tr>
124  <tr>
125  <td align="left" valign="top">
126  <code>CrombieRegDir</code>
127  </td>
128  <td align="left">
129  `crombie terminalslim` does not run on EOS necessarily, like `crombie submitlxbatch` does.
130  This is the variable that sets what folder to look in for datasets.
131  Note that this only takes a relative directory, not a list of directories.
132  </td>
133  </tr>
134  <tr>
135  <td align="left" valign="top">
136  <code>CrombieTempDir</code>
137  </td>
138  <td align="left">
139  A location to store the direct LXBATCH output.
140  This is the directory that will be checked when the tool is trying to determine
141  what files have and have not been successfully created.
142  </td>
143  </tr>
144  <tr>
145  <td align="left" valign="top">
146  <code>CrombieFullDir</code>
147  </td>
148  <td align="left">
149  This is simply location of the hadded LXBATCH output.
150  That is, all the files of the same dataset will be combined.
151  This directory will also hold a list of the original locations of the datasets
152  on EOS so that differences in dataset location can be detected for separate runs.
153  </td>
154  </tr>
155  <tr>
156  <td align="left" valign="top">
157  <code>CrombieSkimDir</code>
158  </td>
159  <td align="left">
160  This is the location of the flat trees run through a good runs skim as well
161  as any other cuts added.
162  This is not a necessary variable, but is used in the template of `FlatSkimmer.sh`.
163  </td>
164  </tr>
165  <tr>
166  <td align="left" valign="top">
167  <code>CrombieDirList</code>
168  </td>
169  <td align="left">
170  If left blank, all of the datasets in the `CrombieEosDir` or `CrombieRegDir` will be run on.
171  Otherwise, this variable should name a local .txt file that has a list of datasets that
172  you want to run on.
173  </td>
174  </tr>
175  <tr>
176  <td align="left" valign="top">
177  <code>CrombieSlimmerScript</code>
178  </td>
179  <td align="left">
180  This names the script that the LXBATCH job will run.
181  Make sure that the script is executable (`chmod +x`),
182  and that it takes two arguments:
183 
184  - The input file name
185  - The output file name
186 
187  If no arguments are passed, make sure the script compiles everything
188  that it needs (with `LoadMacro('..+')` for example).
189  </td>
190  </tr>
191  <tr>
192  <td align="left" valign="top">
193  <code>CrombieJobScriptList</code>
194  </td>
195  <td align="left">
196  This variable names a local .txt file that names the relative paths of all files that should be copied
197  to the LXBATCH node for the job to be completed.
198  This will often include macros and headers needed.
199  All of these files must be in the `slimmer` subdirectory.
200  Full path names or using `..` will not work.
201  </td>
202  </tr>
203  <tr>
204  <td align="left" valign="top">
205  <code>CrombieCheckerScript</code>
206  </td>
207  <td align="left">
208  Names a script that checks the output of each file run on in the LXBATCH job.
209  The script should return a non-zero exit code if there is a problem with the output file.
210  The script should return exit code 5 for fatal errors that will abort the job.
211  The job will also abort for any non-zero exit code for the hadded output of the LXBATCH job.
212  These errors will be reported in a local file, `LxbatchFileChecks.log`.
213  </td>
214  </tr>
215  <tr>
216  <td align="left" valign="top">
217  <code>CrombieGoodRuns</code>
218  </td>
219  <td align="left">
220  This is a variable also only used by the template of `FlatSkimmer.sh`, so it's optional.
221  It names the location of the good runs JSON for the the skimmer to use.
222  </td>
223  </tr>
224 </table>
225 
226 # Generating flat trees for output {#flattrees}
227 
228 This is now deprecated, see [here](@ref maketree) for documentation of new tool.
229 
230 There is a tool for generating flat tree classes for the user.
231 The variables that you want to include in a flat tree can be specified in `OutTree.txt` or whatever you rename it to.
232 The format of the configuration file is `<branchName>/<type>=<default>`.
233 `<branchName>` should be easy to understand.
234 Valid entries for `<type>` are the following:
235 
236 <table>
237  <tr><td>`F`</td><td>float</td></tr>
238  <tr><td>`I`</td><td>int</td></tr>
239  <tr><td>`i`</td><td>unsigned int</td></tr>
240  <tr><td>`L`</td><td>long</td></tr>
241  <tr><td>`l`</td><td>unsigned long</td></tr>
242  <tr><td>`O`</td><td>bool</td></tr>
243 </table>
244 
245 Any other types will be assumed to be TObjects.
246 The header file for the listed TObject will be included automatically.
247 This feature is not tested extensively since I don't use it.
248 
249 You can also preface a type with `V` for (pointers of) vectors of these types.
250 After writing a tree configuration file with name `OutTree.txt`, just run
251 
252  crombie oldmaketree OutTree
253 
254 This makes a class that contains your tree.
255 This is done automatically for you in the default `runSlimmer.py` template.
256 You can call each branch of the tree via a public member with the same name of the `<branchName>`
257 and fill the whole tree with function `Fill()` at the end of each event.
258 If you do not set a value for a particular event, the branch will be filled with `<default>`.
259 You can then write the tree to a file via `WriteToFile(*TFile,"<WhatYouWantToCallTree>")`.
260 There's also other overloaded write and creation function using a file name you specifiy with the initializer.
261 See `slimmer.cc` for an example of how to write a flat tree using this class.
262 
263 # Skimming
264 
265 The environment variables used for skimming the flat trees afterwards are
266 actually optional, if you edit the file `slimmer/FlatSkimmer.sh` to accommodate that.
267 The user is encouraged to edit `FlatSkimmer.sh`,
268 which makes use of a flexible tool, `crombie skim`.
269 Note, `crombie skim` is not included in the [command line references](bin/README.md)
270 because it is not recommended that this command is used interactively.
271 It is much more efficient and stable to create a script like `FlatSkimmer.sh`.
272 `crombie skim` takes files from an input directory, skims them, and places them in
273 a separate directory.
274 Here is the help message to help you understand how to customize this.
275 
276  usage: crombie skim [-h] [--numproc NUM] [--indir DIR] [--outdir DIR]
277  [--json FILE] [--cut CUT] [--tree NAME]
278  [--copy [NAMES [NAMES ...]]] [--run EXPR] [--lumi EXPR]
279  [--freq NUM] [--filters [FILE [FILE ...]]] [--duplicate]
280 
281  Slims the contents of one directory into another one
282 
283  optional arguments:
284  -h, --help show this help message and exit
285  --numproc NUM, -n NUM
286  Number of processes that FlatSkimmer will spawn.
287  --indir DIR, -i DIR Directory that contains input files to be slimmed.
288  --outdir DIR, -o DIR Directory where slimmed stuff will be placed.
289  --json FILE, -j FILE Good runs json file location to be used.
290  --cut CUT, -c CUT Cut used in slimming.
291  --tree NAME, -t NAME Name of tree that will be slimmed.
292  --copy [NAMES [NAMES ...]]
293  List other object names to copy into the slimmed file.
294  --run EXPR, -r EXPR Set the expression for Run Number.
295  --lumi EXPR, -l EXPR Set the expression for Lumi Number.
296  --freq NUM, -f NUM Set the reporting frequency.
297  --filters [FILE [FILE ...]], -e [FILE [FILE ...]]
298  Set the filter files.
299  --duplicate, -d Turn on duplicate checking.
300 
301 After running `FlatSkimmer.sh`, you should have your small ntuples ready to work with.
302 
303 # Plotting {#plotting}
304 
305 The next subdirectory of a workspace is the plotting directory.
306 This comes with it's own list of environment variables.
307 
308 <table cellpadding=20>
309  <tr>
310  <td align="left" valign="top">
311  <code>CrombieMCConfig</code>
312  </td>
313  <td align="left">
314  This names the file that will be read to set the backgrounds for plots,
315  limit trees, datacards, etc.
316  See the [MC Configuration](@ref formatmc) for
317  details on how to set this up this background configuration.
318  </td>
319  </tr>
320  <tr>
321  <td align="left" valign="top">
322  <code>CrombieSignalConfig</code>
323  </td>
324  <td align="left">
325  This variable names a file like `CrombieMCConfig`,
326  but files listed in the named location are assumed to be
327  signal files, not background files.
328  </td>
329  </tr>
330  <tr>
331  <td align="left" valign="top">
332  <code>CrombieExcept_*</code>
333  </td>
334  <td align="left">
335  This variable names a region in the `*` location,
336  and this variable points to a file that designates replacements to
337  the background configuration for the named region.
338  See the instructions for [Adjustment Configuration](@ref formatmc) below
339  to set up this file correctly.
340  </td>
341  </tr>
342  <tr>
343  <td align="left" valign="top">
344  <code>CrombieLuminosity</code>
345  </td>
346  <td align="left">
347  This variable just gives the luminosity used to make plots and limit workspaces.
348  </td>
349  </tr>
350  <tr>
351  <td align="left" valign="top">
352  <code>CrombieInFilesDir</code>
353  </td>
354  <td align="left">
355  This variable names the directory containing the ntuple files that are being used
356  for plotting and tree making.
357  It should usually match the latest value you had for `CrombieSkimDir`.
358  </td>
359  </tr>
360  <tr>
361  <td align="left" valign="top">
362  <code>CrombieOutPlotDir</code>
363  </td>
364  <td align="left">
365  This variable names the directory where an automatically configured PlotStack object
366  will place all of the plots.
367  </td>
368  </tr>
369  <tr>
370  <td align="left" valign="top">
371  <code>CrombieOutLimitTreeDir</code>
372  </td>
373  <td align="left">
374  This variable names the directory where an automatically configured LimitTreeMaker object
375  will place all of the tree files.
376  </td>
377  </tr>
378  <tr>
379  <td align="left" valign="top">
380  <code>CrombieCutsFile</code>
381  </td>
382  <td align="left">
383  This file gives the name of the python file to be loaded as the function `CrombieTools.LoadConfig.cuts`
384  when the user imports CrombieTools.LoadConfig in a working directory with this environment variable set.
385  </td>
386  </tr>
387 </table>
388 
389 # Formatting MC Configuration Files {#formatmc}
390 
391 Each analysis will probably make use of multiple MC Samples.
392 You can keep track of those all with one simple MC Config.
393 
394 ## Base Configuration
395 
396 You will generally have one main configuration file with most of your background samples listed.
397 Signal samples should be kept in a separate configuration file, since these will be marked as signal or background when read.
398 Each sample should be contained in a single `.root` file.
399 The MC Config will keep track of these files, one row at a time.
400 The order of the elements should be this:
401 
402  <LimitTreeName> <FileName> <CrossSection> <LegendEntry> <FillColorOrLineStyle>
403 
404 The elements are space delimited.
405 
406 <table cellpadding=5>
407  <tr>
408  <td align="left" valign="top" width="15%">
409  Limit Tree Name
410  </td>
411  <td align="left" valign="top">
412  This is the base of the tree that will be made by LimitTreeMaker for this file.
413  The name should be unique for each file if using LimitTreeMaker.
414  For other analyses, this can instead be used only to differentiate signal and background.
415  In this case, putting `.` in the config file will copy the previous line.
416  </td>
417  </tr>
418  <tr>
419  <td align="left" valign="top">
420  File Name
421  </td>
422  <td align="left" valign="top">
423  This is the name of the file for the given sample.
424  The file name does not need to be absolute, as the input directory is set in
425  FileConfigReader::SetInDirectory(), usually by reading
426  the [environment configuration](@ref envconfig).
427  </td>
428  </tr>
429  <tr>
430  <td align="left" valign="top">
431  Cross Section
432  </td>
433  <td align="left" valign="top">
434  This should be the cross section of the sample in pb.
435  </td>
436  </tr>
437  <tr>
438  <td align="left" valign="top">
439  Legend Entry
440  </td>
441  <td align="left" valign="top">
442  This is the legend entry that will be made in all of the stack plots using this config for the given file.
443  If you want to have spaces in your legend entry, place `_` instead (since the elements of the config are space delimited).
444  These are all replaced with spaces by FileConfigReader::ReadMCConfig().
445  Legend entries being repeated next to each other will cause multiple files to merge into the same stack element.
446  A shortcut to using the legend entry in the previous line is to just put `.` as the Legend Entry.
447  </td>
448  </tr>
449  <tr>
450  <td align="left" valign="top">
451  Fill Color or Line Style
452  </td>
453  <td align="left" valign="top">
454  For background MC, this specifies the fill color, using the Color_t enums from ROOT.
455  If you wish to give a custom RGB color, just make this entry `rgb` and follow that with the red, blue, and green components space delimited out of 255.
456  If the legend entry of this line matches the entry in the previous line, the color is ignored (but must still be in the config).
457  Again, placing a `.` in this case is a useful shortcut.
458 
459  For signal samples, this entry should give the linestyle you wish to use for the sample.
460  </td>
461  </tr>
462 </table>
463 
464 ## Adjustment Configuration
465 
466 To avoid having duplicate entries in multple configurations, there is an easy way to switch out MC samples for different ones, while keeping the rest of the samples the same.
467 If a line starts with the keyword `skip` instead of a tree name and then lists a file, the MCReader will erase the MCFileInfo for that file.
468 A line like this simply contains:
469 
470  skip <FileName>
471 
472 A configuration file with lines like this can also contain lines like those in the base configuration.
473 This makes it easy to swap out files.
474 After reading one config, just read the adjusting configuration after before making limit trees or plotting.
475 
476 ## Merging MC Samples
477 
478 MC samples that are created using different generators or hadronizers, etc. can be easily merged together without changing the configured cross section.
479 The two different samples are weighted in such as way as to minimize the total statistical uncertainty.
480 
481 To merge multiple samples, simply start the process by making a lone line.
482 
483  INGROUP
484 
485 Separate each set of samples with this delimiter.
486 After the last set of samples to be merged, place the line.
487 
488  ENDGROUP
489 
490 For example, if you want to merge three types of samples in your plots, your MC Config would look like this.
491 
492  INGROUP
493  example type0_file0.root 0.5 LegendEntry 600
494  . type0_file1.root 0.5 . .
495  INGROUP
496  . type1_file0.root 0.2 . .
497  . type1_file1.root 0.6 . .
498  . type1_file2.root 0.2 . .
499  INGROUP
500  . type2_file0.root 1.0 . .
501  ENDGROUP
502 
503 This will merge three different samples with a process with cross section of 1.0 in such a way as their total plotted cross section is 1.0 and their statistical uncertainty is minimized.
504 
505 # Documentation
506 
507 @todo Document the documentation subdirectory
508 
509 # Miscellaneous
510 
511 Other directories can of course be added by hand.
512 There are certain ways to still source the old configuration files if you need it,
513 and all of the command line and python tools are still available.
514 Just be careful that if you change the configuration in a separate directory,
515 the changes will be reflected in your miscellaneous directory.
516 An analysis should be as tightly coupled as possible.