Requirements
My original requirement for the hardware was to use it as a simple, voice-activated recorder to record phone calls. Later, I realized that I could do the caller-ID decoding as part of the voice recorder's processing, thus freeing up two serial ports (used by the hardware caller-ID boxes). (Soon, I will be doing call-progress tone detection and DTMF decoding, but that's currently left as an exercise for the reader.)
Figure 2 presents the big picture of the system. Both lines go into a sound card and are sampled at 8 kHz. The samples then go into the voice-activated recorder, which listens for activity. When it detects activity (as defined by the signal levels going above a certain threshold, for a certain period of time), it begins recording. While it's recording, it feeds samples into the software caller-ID FSK modem (there's no point in feeding samples through the modem if there's no signal).
Each sample is run through the FSK modem, and if a zero or a one is detected, the bit is then sent into a software UART ("Universal Asynchronous Receiver Transmitter," but we're using just the receiver part). The software UART looks for the start bit, and when it gets it, accumulates eight more bits, and constructs the byte. The bytes are then accumulated in a buffer. When a sufficient number of bytes have been accumulated, the buffer is passed to the caller-ID event decoder, which analyzes the buffer for caller-ID information (date, time, phone number, caller's name, message waiting, and so on), and stores the information. A logger is waiting for information from the caller-ID event decoder, and logs the information to a text file.
Now turn to the FSK modem part of the system, keeping in mind the constraints affecting the real-time processing of the samples.
A sample rate of 8 kHz means that a sample arrives every 125 microseconds. In the past, before soundcards, an A/D (analog-to-digital) converter would simply present the digital value of the sample when the conversion was finished. This meant that something had to happen every 125 microseconds. The constraint here is that the system had to provide a reliable, preemptive scheduler that would allow a task to be scheduled every 125 microseconds.
Coupled with the fact that something had to happen fairly frequently, the next issue was how much actual work had to happen. Do you perform the FSK calculations or do you simply buffer the data? If you buffered the data, meaning you've deferred the FSK calculations until later, then the only way you could perform the FSK work would be with a reliable preemptive scheduler that interrupts your FSK calculations in order to store subsequent samples.
However, times have changed. Today's sound cards buffer the samples. This means that we don't need to worry about incurring scheduling overhead (context switches) for each and every sample. The hardware simply informs the kernel (via an interrupt) that a buffer is full and should be emptied. This means that you can access blocks of samples, rather than being forced to get the samples one at a time from the hardware. The advantage is that you invoke a single kernel call to transfer an entire block of samples, (and then process them individually), rather than invoking kernel calls for each and every sample.
Another constraint is the actual amount of processing required to perform the FSK algorithm. If you are right on the edge in terms of CPU processing power to perform the calculations, then adding context-switching overhead is not going to help. By eliminating (or vastly reducing) one constraint, you've bought yourself more time to perform the FSK work.