Paul-Ehrlich-Institut

Information on the Use of Cookies

In order to operate and optimise our website, we would like to collect and analyse statistical information completely anonymously. Will you accept the temporary use of statistics cookies?

You can revoke your consent at any time in our privacy policy.

OK

Analysis of Data and Linkage Quality

Since the linkage of the vaccination data and the data from the health insurance companies cannot be carried out exclusively via one unique identifier per person, it is possible that errors may occur during the linkage. In this study, two types of such linkage errors are possible: incorrect matches and missing matches, whereby in the case of missing matches, a distinction must still be made between completely missing matches and partially missing matches.

Incorrect matches occur when the information from the health insurance data is assigned to the wrong person in the vaccination data due to an identical pseudonym. This can only happen if the information used to form the pseudonym is identical for both persons or if, due to one or more input errors, the information in the data appears to be identical.

Missing matches, on the other hand, occur if a person has received at least one COVID-19 vaccination, but it was not possible to assign this information to the person's health insurance data. This can happen, for example, if a person's name or date of birth was not correctly recorded in at least one of the two vaccination records, or if the person changed their name between vaccination and the analysis of the statutory health insurance data. A completely missing match means that the information on all the COVID-19 vaccinations that a person actually received is missing. If matches are only partially missing, one potential scenario is that the information for the second vaccination was linked, but the information for the first vaccination is missing.

It can be assumed that missing matches occur significantly more often than incorrect matches. However, the exact extent of both problems is unclear. Therefore, one of the main goals of the feasibility study is to estimate how often such linkage errors occur in practice. Certain conclusions can be made on the basis of the ASHIP vaccination data and the pseudonyms generated therefrom using names and dates of birth, such as an estimate of the proportion of incorrect matches, since in this data set individuals can be distinguished based on the health insurance number pseudonym. The proportion of missing matches cannot be easily estimated. Various evaluations should be carried out in order to estimate this proportion. There are also multiple possible linkage procedures, such as taking the postal code into account when linking, which can reduce the proportion of incorrect matches. However, this comes at the price of a higher proportion of missing matches. Future use of the data should involve drawing up proposals as to which type of linkage (with vs. without postcode, which pseudonym) is best suited for which form of analysis.

The working group also carried a simulation study to investigate the influence of different degrees of linkage errors on the analysis results. It was found that with proportions of missing matches that can be realistically expected (up to 20%), no significant systematic errors in the analysis of the vaccine side effects are to be expected if the self-controlled case series method is used for evaluation.