It can take a lot of sweat and tears to finally get that crystal you were looking for. Now of course you want to get the best possible data from that crystal and, heaven forbid, you don't want to mess it up during the data collection stage.

Once you have got your data you will spend hours, days, weeks behind the computer to get the best possible crystal structure. With good data this will all work a lot faster, better, and it will actually be fun.

There is going to be an exam :-)

During data collection you want to optimise the following three properties

- Resolution
- Completeness
- Accuracy

In reality you will have to make compromises between these three due to limited time, crystal decay, etc. To make the best trade-off you need to know your crystals, equipment and, importantly, the problem you want to solve with the data. Below we'll look at several different ways you may need to use the data.

One common use of a dataset is to calculate an electron density map. Remember that the electron density at a point x,y,z is given by:

As you can see, the density is the sum of the contributions of each
reflection and the importance of each reflection is proportional to its
amplitude. **RESOLUTION:** Low resolution reflections tend to be strong and
therefore make a large contribution to the map. High resolution reflections
are weaker, but there are many more of them and they provide the detail
needed for many interpretations. For good electron density maps you want to
capture as high a resolution as possible but without losing the low resolution.
**COMPLETENESS:** Every reflection not measured causes a loss of signal
proportional to the amplitude of the reflection. In other words, a missed
reflection is equivalent to a measurement with a 100% error. **ACCURACY:**
Errors in the measurement correspond to noise in the density map. In a typical
dataset measurement error is just several percent, increasing to perhaps 50%
or so at high resolution. Therefore a poor measurement is normally still a lot
better than no measurement at all. It is also useful to realize that even for
a highly refined structure map noise due to errors in Fobs measurement is small
compared to errors in Fcalc and phase errors. **CONCLUSION:** For density
maps resolution and data completeness are most important.

During structure refinement we use our data as observations to fit our
model against. An important factor in refinement is the observation/parameter
ratio. The larger the ratio the more stable the refinement and the less
problems with model bias. Large numbers of observations also allow
you to refine a larger number of parameters. **RESOLUTION:** the best way
to get a large number of observations is to obtain high resolution data. High
resolution reflections are also sensitive to small errors in atomic parameters
and therefore allow precise positioning during refinement. **COMPLETENESS:**
A reduced completeness clearly leads to fewer reflections to refine against.
**ACCURACY:** Since errors in Fcalc tend to be significantly larger than
errors in Fobs, measurement accuracy is not as critical.

Similar to density maps, resolution and completeness are more important than accuracy. There is however a difference. For the electron density map weaker reflections are not as important than stronger reflecions. However, during refinement we minimize the difference between observed and calculated structure factors. For least squares the target function is:

So in refinement you don't look at the absolute value of each structure factor, but you compare the observed and calculated structure factors. Therefore a weak reflection can definately "tell" the refinement program that the model needs adjustment; namely if Fcalc is large. So for refinement, the more the merrier and you should try to collect every reflection you can lay your hands on.

Molecular replacement needs to find six parameters to define the
position and orientation of a search model in the new unit cell.
**RESOLUTION:** When the search model is rather different than the
protein of interest it will not be able to give a good correspondence to
the higher resolution data. So in practise we tend to only use the low
resolution data, say 4 to 10 Angstrom. **COMPLETENESS:** With only
six parameters to determine we clearly don't need lots of data. Incompleteness
due to randomly missed reflections are not a serious problem. Missing large
chuncks of reflections should however be avoided. **ACCURACY:** Molecular
replacement works in Patterson space, e.g. on intensities rather than
amplitudes. Since the intesity is the square of the amplitude, molecular
replacement is dominated by the strongest reflections. These are also the
easiest ones to measure accurately so normally no special precautions are
needed, except perhaps the need to prevent overloading strong low
resolution reflections.

Isomorphous replacement phasing compares Fnative and Fderivative, where
the latter is the diffraction of a crystal with a bound heavy atom. **
RESOLUTION:** Binding of the derivative normally leads to structural
changes which leads to non-isomorphism. This effect is stronger at high
resolution. Therefore one can normally not use the high resolution data. In
addition, medium or low resolution (3 to 5 Angstrom) is often sufficient to
solve the phase problem as additional phase information can be obtained by
density modification and, especially powerful, from a partial model.
**COMPLETENESS:** If 10% of reflections are randomly missing from both
native and derivative datasets, then only 81% of reflections will have both
an Fnative and Fderivative and can be used for phasing. Low levels of (random)
incompleteness can however be tolerated as missing phases can be retrieved
by density modification. **ACCURACY:** The difference between Fnative and
Fderivative is generally on the order of 10 to 25%, considerably more than
the normal measurement error. In addition, non-isomorphism often makes a
significant contribution to errors. Nevertheless, since we are interested
in the difference between to amplitudes measurement errors become more
significant.

Phasing based on anomalous dispersion compares F+ and F- which are the
Friedel mates (or Bijvoet pairs). **RESOLUTION:** Unlike isomorphous
replacement phasing, anomalous dispersion does not suffer from
non-isomorphism and in theory the anomalous signal even increases with
resolution. Collecting high resolution data is thus useful. However, it is
still true that ~3 Angstrom resolution phases are often sufficient so
resolution should not be pushed to the limit. **COMPLETENESS:** What was
said for isomorphous replacement phasing applies here as well. **ACCURACY:**
The difference between F+ and F- is generally on the order of 1 to 3%, which is
normally less than the measurement error. This is made worse by the fact that
the error in ||F+| - |F-|| is larger than the errors in F+ and F- themselves.
We are therefore interested in a signal that is small compared to measurement
error. Accordingly, accuracy is paramount to successful anomalous dispersion
phasing. Errors leading to outliers in the data are particularly damaging as
it can generate anomalous differences that are several times what they should
have been. Detecting heavy atom positions happens in Patterson space and is
thus sensitive to the strongest coefficients, which would be due to these
outliers. The best way to reduce/prevent outliers is to collect a highly
redundant dataset and reduce radiation damage by not pushing for the
highest possible resolution.