Optimizing diffraction data collection


To optimize data collection we first have to know what we consider to be optimal. As we saw at the beginning this depends on what the data is going to be used for. In most cases you want to achieve the following:

In practice you will have to compromize between these parameters, with most of the compromizes being between time and the other three. In the previous section about data collection geometries we have discussed how to measure a highly complete data set. Therefore we will focus here mostly on how to optimize data accuracy and resolution.

A data collection experiment can be thought of as containing two parts; i) generating the signal, and ii) detecting the signal. The goal is to boost the signal and to reduce sources of noise during detection. In addition, the user has to decide on a data collection strategy.


Benefits of increasing the signal


In an X-ray diffraction experiment you observe (count) the number of photons that hit a detector pixel in a given amount of time. This is a probabilistic process that obeys "counting statistics", also called Poisson statistics. There are three important properties of counting statistics to remember:

The first property means that the signal/noise ratio is N/SQRT(N)=SQRT(N). So for N=100 the signal/noise ratio is 10. If we expose 4 times as long, N would be 400 and the signal/noise ratio 20. So increasing the signal also increases the noise, but decreases the signal/noise ratio. This is the reason why it is beneficial to boost the signal in a diffraction experiment. It is also important to realise that we can never get rid of the "counting noise". We can only try to minimize all other sources of noise.

There are several ways to boost the signal during the experiment:

As will be described in the next section, the signal in a diffraction experiment is the sum of true diffraction by the crystal and random scatter of X-rays which generates the "background" scatter. Just note that all methods to increase the signal increase both diffraction and background in equal proportions (except perhaps the crystal volume which boosts signal more than background scatter).


Background and diffraction signals


During a diffraction experiment X-rays are scattered in two ways.

If we call the number of diffracted and background photons that hit a detector pixel Nd and Nb, respectively, then the standard deviation for that pixel is SQRT(Nd+Nb). So background contributes to the noise in a pixel's value and should thus be minimized. The impact of background is especially strong for the weakest, mostly high resolution, reflections where the noise in a pixel is dominated by the background. Reducing background scatter is therefore especially important for these weak reflections.

Reducing generation of background scatter

Background scatter originates from the interaction of the X-ray beam with matter. We obviously must place the crystal in the beam to see any signal, but we can try to remove as much as possible all other matter from the direct beam path.

Reducing measurement of background scatter

After we have minimized the generation of background scatter there are still a few tricks to reduce the impact of the remaining background scatter on our measurement. The principle is that you only have to worry about background photons that you measure at the same time and place as diffracted photons. You can use the following four methods.

Reduce spot size

If you can reduce your spot size by focussing the X-ray beam (increasing its brilliance) then each spot takes up fewer detector pixels. Since this treatment does not increase the background radiation per pixel, the signal/noise ratio will go up. In principle, it also helps to use small pixels so that you can tightly map the integration area around the true spot shape. However, since the pixels tend to be small relative to the spot size, this benefit is limited.

Increase the detector distance

Radiation emitted by a point source decreases as the square of the distance to source. This also applies to background radiation. Interestingly, when you have a parallel beam, or a beam that is focussed on the detector, then its intensity does not decrease with distance. Accordingly, doubling the detector distance reduces background scatter per pixel by a factor of 4 without affecting the diffracted intensity. You should therefore place your detector at the maximum distance that still captures the highest resolution data.

Reduce the oscillation width per image

So far we have considered the reciprocal lattice to consist of points. However, it is more realistic to think of them as small spheres. Diffraction starts when the sphere first touches the Ewald sphere, and stops when it has passed through completely. Therefore, each reflection will diffract over a a small angular range. This range is what we refer to as the mosaicity of the crystal (the mosaicity is thought to originate from small misalignments of many small crystallites in the crystal: a mosaic of crystallites). I will use the name diffraction range to indicate the angular range over which a reflection is observed.

You collect signal over the diffraction range. The signal is indicated in Figure 1 as a parabola. At the same time you collect background over the full oscillation range (delta-phi) of all images that contain (part of) the reflection. This is indicated by the rectangles in Figure 1. Ideally, the oscillation range is the same as the diffraction range (compare Figs. 1A & 1B). When a reflection spans more then one image, Figs 1C & 1D, then the background is the sum of the backgrounds of the individual images. An ideal situation can be approached, by reducing the oscillation width per diffraction image. This is called fine-slicing, see Figure 1.


Figure 1: Background contributions with varying oscillation widths.
A) When the oscillation width (delta-phi) is large relative to the mosaicity then unnecessary background is accumulated before and after the reflection diffracts.
B) In an almost ideal case the oscillation range (nearly) coincides with the range over which the spot diffracts giving a minimal background.
C) When delta-phi is only slightly larger then the mosaicity, many reflections will span two images giving again a large background contribution.
D) As delta-phi gets smaller it can bracket the actual reflection more and more precisely.

The recommended oscillation range is approximately 1/3 to 1/2 of the mosaicity. However, this recommendation has to be balanced with detector-readout noise. If your reflection spans N images, you will get N times as much readout noise as for a single image. For image plates, you also have to consider the extra readout time needed for fine-slicing. Finally, some programs are optimized for narrow oscillation ranges whereas others prefer to see wider ranges. Another point to note is that as the oscillation ranges get smaller, the accuracy of the oscillation camera becomes increasingly important. Therefore you shouldn't push fine-slicing too far.

Guard slits

Background scatter created inside the collimator can be removed by guard slits. These slits are as close to the crystal as possible and they form an opening slightly wider than the X-ray beam. This way they capture background scatter formed within the collimator without causing scatter from the primary beam hitting the slits.


Detection noise


Above we have looked at noise due to counting statistics and background scatter. A further source of noise in the experiment arises during detection and during readout of the detector. Three main types of detection noise can be distinguished.

(Note: I made up the noise names above. Does anyone know if there are official terms for them?)

Time-dependent noise is a noise signal that accumulates over time independent of the real signal. Normally time-dependent noise is not a severe problem unless very long exposures are used. Examples are:

Cosmic rays.
These are rare events that affect only very few pixels.
Zingers.
Zingers are caused by radioactive decay of material in the detector. They occur in CCD detectors due to decay of thorium in their glas fiber-optic tapers.
Dark current.
Dark current is a small signal that is measured even if no photons hit the detector (that's why it is called dark current). Dark current is typically found in electronic detectors and is induced by thermal motion. CCD chips are therefore cooled to minimize this effect.
Signal loss.
Here we don't really introduce noise but instead loose signal which will lead to a reduced signal/noise ratio. This situation occurs in image plates where the storage phosphors slowly loose signal. This process is a function of time and temperature.

Detection noise is caused during the detection and readout process. The true signal to be measured is the number of photons that hit a pixel in a certain time span. However, most detectors measure a secondary signal (S) that is proportional to the number of photons (N) that fell on a detector pixel. So S = p * N where p is the proportionality constant (the amount of signal contributed by a single photon). Unfortunately, p is not a true constant, instead the contribution of each photon is p +/- Sigma-p. For instance, the number of phosphor centers in an image plate that get activated by an X-ray photon is a chance process. Similarly, the readout signal induced by laser irradiation of the image plate is again probabilistic. Finally, the readout signal is measured by a photomultiplier tube which again introduces a bit of uncertainty.

Detector-dependent noise is specific for a detection system and does not depend on the measured signal or exposure time. For example, detector-dependent noise is high in photographic film (called chemical fog). Imaging plate and CCD detectors have readout noise levels approximately equivalent to 1 or 2 X-ray photons per pixel. Ideally, detector-dependent noise should be small relative to the signal and this is normally the case.


X-ray detectors

As shown above, the properties of the detector affect data collection and depending on the problem at hand one detector may be more suitable than another. Below follows a brief list of detectors with their main properties.

X-ray film
This type of detector is now obsolete. Its only strength is a high spatial resolution. This is why it has remained in use longest in data collection of large viruses where closely separated spots have to be resolved.
Image plate (IP) detectors
These are the general purpose workhorses in protein crystallography. They have a large active surface, sufficient spatial resolution, and are relatively affordable. Their main drawback is that readout is not as fast as the electronic detectors. This is not a real concern at home where exposure times are often much longer than the readout times. However, at modern synchrotrons IP detectors are just too slow.
CCD detectors
CCD detectors are almost the opposite of IP detectors; they are very fast, have a small active surface and are expensive. They also "suffer" more from time-dependent noise; both dark current and zingers. CCD Detectors with a larger active surface are made by coupling the detector to a fiber optic, and by tiling multiple CCDs together. CCDs shine when very intense signals are collected very rapidly, i.e. at a synchrotron. In this situation time-dependent noise is small and you really need the readout speed.
Counting detectors
Counting detectors were not mentioned in the section on detection noise because they are theoretically wonderful detectors that introduce virtually no noise in the measurements. They can often even filter out cosmic rays based on their higher energy. However, their active surface is rather small and they have poor spatial resolution. In addition, they cannot handle intense signals which makes them useless on synchrotrons. They work best when measuring weak signals when you really want to minimize noise.
Amorphous selenium detectors
These detectors were not mentioned before because they are too new. Like CCDs they are electronic integrating detectors that accumulate a charge proportional to the signal. The big difference is that the aSe detector is big and therefore does not need fiber optics or tiling to create a large apperture. Another difference is that you can readout individual pixels unlike CCDs where you have to readout rows of data. Readout is fast, spatial resolution is excellent and there is potential to make it behave like a counting detector. Drawbacks are a larger readout noise, and the fact that the technology, which was developed for medical X-rays, needs more time and further optimization to prove itself as a strong contender for our application.

Data collection strategy


Apart from the signal and noise issues, there are a number of practical considerations and decisions that have to be made. These are listed below.

Is is worth collecting
Can it be processed and is the data going to be useful
Detector distance
Where should you place the detector
Detector swing-out
Placing the detector off-centre
Exposure time
How long should you expose per image
Low & high resolution data collection
How to capture both the low and high resolution data
Wavelength
How does the choice of wavelength affect your data
Micro-crystals
Considerations on data collection from micro crystals
Polarization & Lorentz factor
How do polarization and the Lorentz factor affect your data
MAD data collection
Special tricks for MAD data collection


Is it worth collecting

It is only worth collecting a dataset if it will be able to answer the questions you are interested in. A perfect 1.2 Å dataset may not be worth collecting if you already have 1.0 Å data. Conversely, a 3.5 Å dataset may be fine if you just want to determine if compound X bound to your protein for which you already know the structure. Along similar lines, poor spot shape, twinning etc can be tolerated if you don't need the highest data accuracy, but will be fatal if you plan to do anomalous phasing. A pretty good indicator for a dataset not worth collecting is one that you can't (auto)index even though you know all experimental parameters (beam center, detector distance etc) have been set correctly. In case of doubt, just collect the data and figure it out later (Michael Rossmann's shoot first ask questions later strategy).


Detector distance

Current software requires that the spots in your diffraction image are well separated. Since the distance between spots depends linearly on the detector distance, the crystal-to-detector distance is the prime variable to control spot separation. (You could also change wavelength, but this is not commonly done). The separation between neighboring spots along a unit cell axis of length A, using a wavelength of L and a detector distance of D is approximately:

separation = 1/A * D * L

A rule of thumb is to use a detector distance (in mm) that is equal to the longest unit cell axis (in Å). So 100 mm for a longest axis of 100 Å. This gives a spot separation (in mm) that is equal to the wavelength (in Å), see the formula. In general this is too conservative, especially when using mirrors, but it gives a reasonable first guess. If you know the approximate spot size that you can achieve with your collimation, then aim at a separation of 1.5 times the spot size.

The need for separation between reflections sets a minimum detector distance. However, you may want to use a longer detector distance if you can do so without loosing the highest resolution reflections. This reduces radiation background and uses the detector surface more efficiently. If you do not know the data collection equipment well, then don't push it too far, because the display may not be able to clearly show the very weak highest resolution data even if it exists.


Detector swing-out

If your crystal diffracts beyond the edge of the detector then you should move the detector closer. If this is impossible due to spot overlap or hardware limits, then you should consider using a shorter wavelength (this compresses the diffraction pattern). If this is not an option either then you have no other choice then to swing out the detector.

The reason I treat using a swing-out angle as a method of last resort is that whenever you have to use it, it means you are not measuring all of the available diffraction data, see Figure 1. This is one of the reasons why I like the large aperture image plates for laboratory X-ray detectors.

Figure 1: Wasted diffraction data
All data in the grey areas are not collected. The pale grey area contains valuable but redundant data. The dark grey area is more serious since it normally contains non-redundant data. This means that you will have to make one or more extra data collection runs, unless symmetry saves you.

From Figure 1 it is clear that you mis part of the diffraction pattern on the side opposite to the swing-out direction (pale grey). In addition, you will mis part of the diffraction pattern perpendicular to the swing-out direction (dark grey). If you use one of the data collection strategies listed in the oscillation range table, then the white region will collect all unique data passing through the top half of the Ewald sphere, whereas the pale grey area is actually redundant as it contains the data that passes through the bottom half of the Ewald sphere (see geometry). Accordingly, you don't have to change your data collection strategy, you just lose redundancy. The only exception is when there is symmetry perpendicular to the oscillation axis. Normally 90 degrees of data would suffice. However, now you have to collect theta degrees extra to account for the curvature of the Ewald sphere (see here).

The dark grey area generally does not contain redundant information. Symmetry can help though. If you have a 2-, 4-, or 6-fold axis parallel to the spindle then, combined with Friedel symmetry, this generates a mirror plane perpendicular to and bisecting the oscillation axis (see here). So you only have to worry about the dark grey area on one half of the detector (either the top of bottom half in Figure 1). The dark grey area is also non-unique if it is related to the measured volume by symmetry. This is the case if there is a 3-, 4-, or 6-fold axis perpendicular to the oscillation axis. The higher the symmetry the more likely it will cover the dark grey area, see Figure 2.

Figure 2: Effects of a 4-fold perpendicular to the oscillation axis
The measured volume, indicated by the white sphere labeled "detector", is related to the other circular outlines by the 4-fold symmetry. There is still a small missed volume left between circles at high resolution. In this case a slightly larger swing-out might have been better to reduce/eliminate this region.

If you have to swing-out the detector, then how far should you swing out? With a rectangular detector you simply swing out so that the highest resolution is close to the edge of the detector.
For circular detectors things are a bit more complicated. Figure 3 gives three possibilities. Situation A uses the full detector surface to measure data. This is an efficient way of using the detector, but a very inefficient way to measure the highest resolution data. Situation B wastes about a third of the detector surface, but it measures the highest resolution data most efficiently. Situation C is a compromise which uses the detector surface more efficiently then in B and still captures a significant fraction of the high resolution data.
Note that swinging out a bit less then B significantly increases detector surface usage while costing very little in high resolution coverage. Vice versa, swinging out a little bit more than in A costs very little in wasted detector surface while gaining a lot in high resolution coverage.

Figure 3: Choice of swing-out angle
A) Small swing-out angles use the detector surface efficiently, but do not cover much of the high resolution volume (no dotted line).
B) Large swing-out angles waste a lot of detector surface but cover the high resolution volume efficiently (maximum dotted line).
C) Intermediate swing-out angles already capture a lot of the high resolution volume (significant dotted line) without wasting too much detector surface.

It is up to you to decide which swing-out angle is most appropriate for your project. If you have 3-, 4-, or 6-fold symmetry perpendicular to the oscillation axis or if you have to measure cusp data anyway, then see if you can use a swing-out angle so that the symmetry or the cusp data fill in the missing volume, see Figure 2.


Exposure time

In theory data accuracy improves with longer exposure times but in practise there are other considerations as well. The first priority should be to get a complete dataset. Given the amount of time available and the minimum number of images to collect you can set an upper limit to the data collection time per image. You may want to reduce this further in order to prevent radiation damage (for room temperature crystals or frozen crystals at intense synchrotron sources). It may therefore be wiser to use shorter exposures and collect more images. The extra redundancy will improve data quality and if radiation damage renders the latter part of the dataset useless you hopefully still have enough undamaged data left to make a complete dataset.

The question whether you should push exposure time or redundancy remains controversial. Eg. you can collect the minimum oscillation range that covers the unique volume with long exposures/image, or you can collect a much wider oscillation range (or multiple ranges with different crystal orientations) with shorter exposures/degree of oscillation. The latter option more or less guarantees that your dataset will be very complete, it will help scaling and it will make outlier rejection in the merging stage more reasonable. The (psychological) drawback is that higher redundancy and shorter exposures will increase the Rsym statistics, even if the data actually are better. The question really is whether multiple weak observations are better than one strong observation. This may be decided by how well the data processing software can handle the weak data. I can't tell you what's best but I am interested to get your feedback on this.

Instead of talking about exposure time per frame it may be better to talk about exposure time per degree of oscillation. A different approach to increase your signal is to decrease the oscillation angle per frame while keeping the exposure time per frame constant. This has some added advantages:

Of course there are some disadvantages too:

The first two points should not really be a concern (you can get 160Gb disks at a reasonable price), the last three depend on your hardware and software.


Low & high resolution data collection

For high resolution data collection one generally collects two or more datasets at different resolutions to avoid that strong low resolution reflections saturate the detector. (Remember that the impact of a reflection on the density map is proportional to its amplitude. So you definately don't want to loose the overloaded low resolution reflections). In the past, before freezing crystals, one collected the high resolution data first to minimize loss of resolution due to radiation damage. A second low resolution dataset could still be collected afterwards even if radiation damage had destroyed the highest resolution reflections.
This strategy needs to be reconsidered when using frozen crystals at intense third generation synchrotrons. Freezing has prevented the non-specific radiation damage that led to a general increase in B-factor and thus loss of resolution of the pre-freezing era. However, at the very high X-ray dosages now attainable at third generation synchrotrons we start to observe specific damage, especially to disulphide bonds, carboxylate groups, etc. Even if resolution itself is not degraded, the changes to the structure cause a change in the diffraction intensities that cannot be simply captured by a scale or temperature factor. So, whereas we can merge high and low resolution data by modeling non-specific radiation damage by scale and B-factor, we cannot really do the same for specific data. The damaged and undamaged data have become distinctly different reflecting the different damaged and undamaged structures they are derived from. As a consequence, I believe we should modify our data collection strategy for frozen crystals to collect the low resolution data first. This will allow you to collect the low resolution data from undamaged crystals. Since the low resolution data can be collected with very limited X-ray dose, the crystal should still be virtually virgin when you start the high resolution data collection.

During the low resolution part of the data collection you want to minimize the radiation dose of your crystal by using short exposures. Since at low resolution diffraction will be strong relative to the background there is not as much need to minimize background scatter. Therefore you can place the beamstop far from the crystal (gaining you the lowest resolution terms). You can also use wider oscillation ranges as overlapping lunes and increased background are less of a problem. You should still move the detector back to improve spot separation and decrease background.

During the high resolution part of the data collection you want to capture as much high resolution data as possible before radiation decay sets in. So start data collection at the optimal orientation (you should know where that is based on the low resolution dataset) and use a sufficient but not excessive exposure time. If redundancy is less of a concern you should consider using the " multi-wedge method. You can also move the beamstop as close to the crystal as practically possible. This will reduce background and "bleeding" of heavily overloaded reflections on some CCD detectors (bleeding causes signal to spill from overloaded pixels into its neighbours forming linear streaks).


Wavelength

One of the advantages of synchrotrons is that you have access to a whole range of wavelengths, rather then the fixed wavelength of a conventional X-ray source. This is especially useful when you are collecting anomalous data from a heavy atom scatterer. By using absorption edge effects you can maximize the anomalous signal. More general factors in chosing a wavelength are:

Advantages of a short wavelength

Disadvantages of a short wavelength


Micro crystals

There is a great interest in improving diffraction from microcrystals. The main focus has been to create smaller and more brilliant beamlines to match the size the crystals. However, from what you have learned so far it should be clear that boosting the signal (by increasing beam brilliance) is not the only approach and its utility is limited by the effects of radiation damage. Especially for small crystals radiation damage is a primary concern as the X-ray dosage per unit cell has to be high to reach a similar diffraction signal as for a larger crystal. It is therefore paramount to extract as much signal as possible from a minimal exposure. That means a focus on reducing background scatter and detection noise.

I also find it interesting to consider the choice of wavelength for the case of microcrystals. One of the disadvantages of longer wavelengths is an increased absorption in the crystal. Without proper absorption correction this will affect data accuracy. However, for microcrystals absorption will be minimal even at longer wavelengths. So one of the big disadvantages isn't a real concern. Conversely, one of the big advantages is that diffracting power increased as the cubic power of the wavelength. So doubling the wavelength increases the number of diffracted photons by a factor of 8! If radiation damage does not also go up by a factor of 8 then longer wavelengths should help microcrystal diffraction.

To get the full benefit of longer wavelengths we should use some form of counting detector. The reason is that the signal of integrating detectors is proportional to the number of photons and the energy of the photon. Doubling the wavelength therefore generates "only" four times the signal (8 times more photons but half the energy per photon). A counting detector would realise the full 8 times increase in signal. It is also the best detector in situations were noise has to be minimized which is exactly the case for microcrystals.


X-ray polarization & the Lorentz factor

X-rays can be polarized, especially in synchrotrons. This is important since crystals diffract more strongly in the direction perpendicular to the polarization. The effect is much stronger for large diffraction angles so it mostly affects the high resolution data. In synchrotrons, X-rays are polarized in the plane of the ring, so crystals diffract better in the vertical direction.

There is a second effect, the Lorentz factor, that affects diffraction strength. The Lorentz factor describes different speeds with which reflections pass the Ewald sphere. Reflections that cut through the Ewald sphere will pass through quickly, whereas reflections that just touch the Ewald sphere will pass through slowly. The latter reflections are located along a central line parallel to the oscillation axis. These reflections cannot be measured very accurately. For this reason it is beneficial to place the rotation axis parallel to the plane of polarization. This lets the reflections with the smallest Lorentz correction, the most accurate ones, benefit most from the polarization effect. Now you know why all synchrotrons, which produce horizontally polarized X-rays, have horizontal oscillation axes.

In case you are curious how this affects your data, don't worry, the data processing software will apply polarization and Lorentz corrections for you.


MAD data collection


When you have a crystal with an anomalous scatterer and you want to use the anomalous signal for MAD phasing, then you have to take extra precautions to ensure data quality. The anomalous signal is very small and can easily drown in the noise. In addition, you want to minimize systematic differences as much as possible. That means if one reflection is slightly underestimated it is best to have its Friedel mate also slightly underestimated. This way the anomalous difference, which you need for phasing, is more accurate.

One source of systematic differences is time. It is therefore best to collect Friedel mates simultaneously, or within a short time span. When you have 2-, 4-, or 6-fold symmetry and you can align the symmetry axis with the oscillation axis, then you will measure Friedel mates simultaneously on the same image (see here). Another trick that has been used is to collect a small wedge of data (let's say 5 degrees), then rotate the crystal 175 degrees and collect another wedge of 5 degrees. The two wedges are 180 degrees apart and therefore cover reciprocal space volumes related by Friedel symmetry. It is best to ask the staff at the synchrotron whether this practise is recommended. Depending on the setup it may not be necessary or to difficult to do.

Another source of problems are outliers. Sometimes a reflection is measured incorrectly leading to gross over- or underestimation of its intensity. When calculating the anomalous difference between two Friedel mates, outliers will result in very large differences. These can have a major impact on MAD phasing. One way to minimize this risk is to collect highly redundant data. In this situation, the outlier can be statistically detected and rejected.

Finally, since we need to collect all Friedel mates, we have to adapt our data collection strategies. For P1 we found that normally 180 degrees of data is sufficient to capture all unique reflections. However, for MAD data we have to collect at least two theta degrees more. When there is a 2-, 4-, or 6-fold axis parallel to the oscillation axis you can use the same strategy (collect 360/N degrees of data) and when the axis is perpendicular to the oscillation axis you have to rotate 180 degrees.