Skip to content

New data#8

Open
ecole41 wants to merge 6 commits into
mainfrom
new_data
Open

New data#8
ecole41 wants to merge 6 commits into
mainfrom
new_data

Conversation

@ecole41

@ecole41 ecole41 commented May 21, 2026

Copy link
Copy Markdown
Collaborator

This PR contains the data_maps.csv file which shows datasets within SIMUnet and their corresponding NNPDF datasetname. The copy_data.py script can then be used to copy the SIMUnet simufac file into this repo under the NNPDF dataset name

@ecole41 ecole41 requested a review from scarlehoff May 21, 2026 09:41
@scarlehoff scarlehoff changed the base branch from nnpdf_as_a_library to main May 21, 2026 12:17
@scarlehoff

scarlehoff commented May 21, 2026

Copy link
Copy Markdown
Owner

I've added a conversor script that seems to be working ok. I've added one example of converted dataset. This one is easy enough that can be inspected by hand (I actually ran the script over the entire csv and was working out of the box for all, but want to polish it before pushing a lot of data that will be hard to review).

The conversion of FkTables is a bit trickier because of the compound files which need to get autodiscovered for which I need to download the theory and check... but I think it'll be fine.

The only remaining questions that I have right now are:

  1. The naming. What do we want to do with that?

We have two options:

  • Handcraft a new name for each of the dataset that might need it
  • Inject automatically "7/8/13 TeV" (if it is written anywhere) or "UNKOWN" as the energy and a generic _OBS when needed for the observable so that it complies with NNPDF notation.
  1. The QCD c-factors.
    I guess the datasets that don't have [QCD] in the .csv file have no NNLO corrections. Do we want to include them, or should we ignore them?
    Should we burn the [QCD] factors into the fktables as we do in NNPDF or do you prefer to have them separate?

  2. Which theory should be the baseline.
    I will convert the fktables from theory 270 and will merge them with theory 40_000_000 to create theory 40_000_270? Is that ok?

TODO: sadly the kinematics in the old validphys were a mess and if you are relying on that I don't think there's any clean way of moving from kin1, kin2, kin3 to meaningful names.
We could, however, take some shortcuts if these are needed to e.g., apply cuts. But for the time being I'll leave kin1,2,3

@ElieHammou

Copy link
Copy Markdown
Collaborator

Thanks a lot for the work!

Sorry about the delay, here is what I think about your questions @scarlehoff :

  1. I would stick to an NNPDF convention, even at the price of adding an "UNKNOWN" label to start off. We can then assess which datasets lack this info and try to recover it case by case.

  2. I would port everything in the new format and possibly make a note of the datasets which are only NLO. They can still be useful for closure test exercises for us.
    For the cfactors, we definitely want an option to not include, to assess their impact on the PDF-SMEFT interplay. If burning them into the FK-table prevents that I think it is not a good idea. I would be interested to discuss these options further though.

  3. That sounds perfect to me!

@scarlehoff

scarlehoff commented May 29, 2026

Copy link
Copy Markdown
Owner

Ok, so I went now through the entire set of data, it converts fine :)

The only one that are left unconverted are:

CMSDY1D12: would this be equivalent to CMS_Z0_7TEV_DIMUON but for 8 TEV? (why are we missing this one in nnpdf?)

And

BETA_DECAY_OBS, NOT FOUND, , EFT_LO
MESON1_OBS, NOT FOUND, , EFT_LO
MESON2_OBS, NOT FOUND, , EFT_LO
PV_OBS, NOT FOUND, , EFT_LO
ATLAS_WPWM_13TEV_HMT_DIF-LEP-PM, NOT FOUND, ['EWK'], EFT_LO

these are not available at least in this branch? Are they new in simunet?

The important "TO DO" here is now to check that the conversion worked as expected of course.

I guess the easiest thing to do, if you have already a comparison of chi2 for a given PDF for all these datasets, we can check that the chi2 hasn't changed. If it hasn't it means that all (data and theory) converted fine. Otherwise we might need to figure out where the possible problems are.

@scarlehoff

Copy link
Copy Markdown
Owner

I've prepared the theory now, but I'm afraid not all necessary FkTables are available in theory 270. Many of the datasets do not have anything there (for instance, the ones with WHEL in the name) although there are SIMU factors.
What is the deal with those?

@ecole41

ecole41 commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

Thanks for all the work. I can't seem to find any metadata for the CMSDY1D12 dataset, I will ask Luca and Maria (it looks as though they implemented the simu factors and K-factors for this dataset).

As for the _OBS datasets, these have been added for a separate project so you can ignore these. If we want to add them into Simunet we can think about doing this later.

The ATLAS_WPWM_13TEV_HMT_DIF-LEP-PM dataset relates to this future test dataset in NNPDF NNPDF/nnpdf#2382.

The datasets which don't have FK-tables are ones which we use the fixed predictions in the simu_fac files, so FK-tables aren't needed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants