Why do you want to know this...

The three main qualities of a diffraction data set

During data collection you want to optimise the following three properties

In reality you will have to make compromises between these three due to limited time, crystal decay, etc. To make the best trade-off you need to know your crystals, equipment and, importantly, the problem you want to solve with the data. Below we'll look at several different ways you may need to use the data.

Electron density maps

One common use of a dataset is to calculate an electron density map. Remember that the electron density at a point x,y,z is given by:

electron density formula

As you can see, the density is the sum of the contributions of each reflection and the importance of each reflection is proportional to its amplitude. RESOLUTION: Low resolution reflections tend to be strong and therefore make a large contribution to the map. High resolution reflections are weaker, but there are many more of them and they provide the detail needed for many interpretations. For good electron density maps you want to capture as high a resolution as possible but without losing the low resolution. COMPLETENESS: Every reflection not measured causes a loss of signal proportional to the amplitude of the reflection. In other words, a missed reflection is equivalent to a measurement with a 100% error. ACCURACY: Errors in the measurement correspond to noise in the density map. In a typical dataset measurement error is just several percent, increasing to perhaps 50% or so at high resolution. Therefore a poor measurement is normally still a lot better than no measurement at all. It is also useful to realize that even for a highly refined structure map noise due to errors in Fobs measurement is small compared to errors in Fcalc and phase errors. CONCLUSION: For density maps resolution and data completeness are most important.

Structure refinement

During structure refinement we use our data as observations to fit our model against. An important factor in refinement is the observation/parameter ratio. The larger the ratio the more stable the refinement and the less problems with model bias. Large numbers of observations also allow you to refine a larger number of parameters. RESOLUTION: the best way to get a large number of observations is to obtain high resolution data. High resolution reflections are also sensitive to small errors in atomic parameters and therefore allow precise positioning during refinement. COMPLETENESS: A reduced completeness clearly leads to fewer reflections to refine against. ACCURACY: Since errors in Fcalc tend to be significantly larger than errors in Fobs, measurement accuracy is not as critical.

Similar to density maps, resolution and completeness are more important than accuracy. There is however a difference. For the electron density map weaker reflections are not as important than stronger reflecions. However, during refinement we minimize the difference between observed and calculated structure factors. For least squares the target function is:

least squares formula

So in refinement you don't look at the absolute value of each structure factor, but you compare the observed and calculated structure factors. Therefore a weak reflection can definately "tell" the refinement program that the model needs adjustment; namely if Fcalc is large. So for refinement, the more the merrier and you should try to collect every reflection you can lay your hands on.

Molecular replacement

Molecular replacement needs to find six parameters to define the position and orientation of a search model in the new unit cell. RESOLUTION: When the search model is rather different than the protein of interest it will not be able to give a good correspondence to the higher resolution data. So in practise we tend to only use the low resolution data, say 4 to 10 Angstrom. COMPLETENESS: With only six parameters to determine we clearly don't need lots of data. Incompleteness due to randomly missed reflections are not a serious problem. Missing large chuncks of reflections should however be avoided. ACCURACY: Molecular replacement works in Patterson space, e.g. on intensities rather than amplitudes. Since the intesity is the square of the amplitude, molecular replacement is dominated by the strongest reflections. These are also the easiest ones to measure accurately so normally no special precautions are needed, except perhaps the need to prevent overloading strong low resolution reflections.

Isomorphous replacement phasing

Isomorphous replacement phasing compares Fnative and Fderivative, where the latter is the diffraction of a crystal with a bound heavy atom. RESOLUTION: Binding of the derivative normally leads to structural changes which leads to non-isomorphism. This effect is stronger at high resolution. Therefore one can normally not use the high resolution data. In addition, medium or low resolution (3 to 5 Angstrom) is often sufficient to solve the phase problem as additional phase information can be obtained by density modification and, especially powerful, from a partial model. COMPLETENESS: If 10% of reflections are randomly missing from both native and derivative datasets, then only 81% of reflections will have both an Fnative and Fderivative and can be used for phasing. Low levels of (random) incompleteness can however be tolerated as missing phases can be retrieved by density modification. ACCURACY: The difference between Fnative and Fderivative is generally on the order of 10 to 25%, considerably more than the normal measurement error. In addition, non-isomorphism often makes a significant contribution to errors. Nevertheless, since we are interested in the difference between to amplitudes measurement errors become more significant.

Anomalous dispersion phasing

Phasing based on anomalous dispersion compares F+ and F- which are the Friedel mates (or Bijvoet pairs). RESOLUTION: Unlike isomorphous replacement phasing, anomalous dispersion does not suffer from non-isomorphism and in theory the anomalous signal even increases with resolution. Collecting high resolution data is thus useful. However, it is still true that ~3 Angstrom resolution phases are often sufficient so resolution should not be pushed to the limit. COMPLETENESS: What was said for isomorphous replacement phasing applies here as well. ACCURACY: The difference between F+ and F- is generally on the order of 1 to 3%, which is normally less than the measurement error. This is made worse by the fact that the error in ||F+| - |F-|| is larger than the errors in F+ and F- themselves. We are therefore interested in a signal that is small compared to measurement error. Accordingly, accuracy is paramount to successful anomalous dispersion phasing. Errors leading to outliers in the data are particularly damaging as it can generate anomalous differences that are several times what they should have been. Detecting heavy atom positions happens in Patterson space and is thus sensitive to the strongest coefficients, which would be due to these outliers. The best way to reduce/prevent outliers is to collect a highly redundant dataset and reduce radiation damage by not pushing for the highest possible resolution.