In this introductory level class you will learn the difference between data and information and what are the basic ideas that are involved in the transformation of one into the other. If you have not given much thought to the matter, you can have some vague idea that data and information are somewhat different but would have trouble pinpointing what sets data and information apart. Do not despair, for if you have devoted a lot of effort to think about data and information, you will also have trouble pinpointing the differences. Data and information differ deeply but also subtly. Understanding the depth and subtlety of these differences is the fundamental goal of this class.
From a practical point of view we say that we have data when we have quantitative measurements of a phenomena of interest and we say that we have information when we have answered a question that we consider relevant. If, for example, we are given a recording of a musical instrument and are asked to find out the note that the instrument is playing, the recording is data, and the names of the note and the instrument are information. If we are given an image and are asked to find the borders of the objects in the image, the image is data and the borders are information. And if we are given a set of pictures of faces and are asked to identify a person of interest, the images are data and finding the person of interest is information.
The distinctions above are clear and anything but subtle. However, a purist would argue that in the above discussion we are dwelling on distinctions without differences and at a fundamental level the truth would be on his side. When we identify the instrument in the musical waveform, the borders of the image, or a particular person in a set of faces we are not ”creating” information but rather ”uncovering” information that was already present in the data. This process of uncovering information is important, but when all is said and done there’s no escaping the fact that data ”is” information.
The practitioner and purist perspectives may be different, but the last word in this controversy belongs to the Dodo Bird of Alice’s Adventures in Wonderland: “Everybody has won and all must have prizes.” The different perspectives can be reconciled if we agree that what is important is the ”processing” of information. The practitioner may be thinking of transforming data into information and the purist of decanting simpler statements. However, both can agree on the fundamental importance of processing signals and information to uncover patterns of interest. This is what you will learn how to do this in this class.
From Data to Information
If we can agree on the importance of processing information, the first question that arises is how do we process information. The answer is lengthy, it’ll take 14 weeks and 42 lectures to answer. But to give a taste of what is involved let us reconsider the problem of identifying the note and the instrument in a musical recording. The original data looks a little like the signal on the left in Figure 1. You can see this signal oscillating slowly up and down, but this pattern of variation is obscured by some faster variations. If you are asked to identify the note that is being played or the instrument that is playing the note, the faster variations impede you from answering these questions. The signal on the right is the information that we extract from this data. The faster variations have been removed for us to identify the precise pattern of slow variation. It is now easy for us to answer the questions about the notes that are being played and the instruments that are playing the notes. We say that the signal on the left is noisy and the signal in the right has been filtered, or cleaned.
(:comment FIGURE BEGIN 🙂
[-”Figure 1.” The signal in the left is an example of how data looks like. You can tell that there is a pattern where the signal is moving up and down slowly but the pattern is obscured by some rapid variations. The signal on the right is the information that has been extracted from this data.-]
(:comment FIGURE END 🙂
Do notice that we are not specifically explaining how the note and the instrument are identified once we recover the clean signal from the noisy signal. But this is not the difficult part. Once we have the signal on the right we just compare with other clean signals that represent different notes and different instruments and we settle for the note and instrument that most resembles the signal on the right. The difficult part is the identification of information is on deciding which part of the signal has to be removed. Of course, we know that we need to remove the faster variations because this is what our brains do to identify the pattern. But how do we tell an artificial system to separate the slow variations from the faster variations?
[-”’Figure 2.”’ Fourier transform of the signal on Figure 1.The Fourier transform expresses the signal as a sum of oscillations of different frequencies. This figure shows the magnitude of the different oscillations that have to be superimposed to obtain the signal in Figure 1. It is clear where the information lies. -]
The answer to this question is given by the Fourier transform which is a tool that allows you to rewrite a function as a sum of oscillations of different frequencies. If we compute the Fourier transform of the left signal in Figure 1, the transform looks like the signal we show in Figure 2. The interpretation of that figure is that the signal on the left of Figure 1 is dominated by the (slow) oscillations that appear as having large magnitudes in Figure 2. The (fast) oscillations that appear as having small magnitudes in Figure 2 are not as relevant tho the pattern of variation. It is then reasonable to say that the slow oscillations represent information and that the fast oscillations do not. Reasonable or not, this is true. If we remove the fast oscillations and reconstruct the signal whose slow oscillations coincide with the slow oscillations in Figure 1; that is, if we reconstruct the signal whose Fourier transform is the signal on the right plot in Figure 2; we obtain the right plot in Figure 1.
Beyond time signals
The example discussed above is of a time signal that we process with a Fourier transform. The first half of this class will deal with time signals and different flavors of the Fourier transform. Time signals are not the most exciting but they are still very important and the most common. They are also the ones that are easiest to understand and the ones where our intuition about oscillations of different frequencies is most natural.
In the second half of the class we ask the question of whether similar transformations can be applied to different types of signals with similar results regarding the extraction of information from raw data. The answer to this question is affirmative. We will see that similar transforms can be defined for images, signals with arbitrary correlation structure, and signals supported on graphs. You can refer to the [[ProspectiveStudents/Syllabus | syllabus]] for more details about this more generic transforms.