Portlandia Cloud Services

Voice & FAX 503-690-2700 sales@portlandiacloudservices.com http://www.portlandiacloudservices.com

Using Voicemodems for network monitoring

My interest in Voicemodems started in February of 2015, ironically because of a wireless cellular company; Verizon! Up until the end of January, Verizon operated a Telelocator Alphanumeric Protocol (TAP) gateway for use by it’s customers. TAP is a very old protocol dating from the early 90’s, and it’s primary purpose is to allow computers to dial into a modem and send a page. This dates from the pre-cell-phone era when Very Important People carried beepers or pagers. To send a page you would type it into a small TAP keyboard which would dial a TAP gateway and upload the page, which would then be broadcast out to a beeper.

When cellular phones hit the scene the cellular carriers all adopted paging, in an effort to put the beeper/pager companies out of business – their selling model was “why carry a pager when you can carry a whole phone” This is why cell phones today send SMS messages, in fact. But at that time the cell companies just wanted to eat into the beeper market share with a better product. So, all of them built TAP gateways which would accept the incoming pages from the TAP keyboards and send them out to cell phones. The TCP keyboard would dial into one of these and it would work the same as a dedicated beeper company.

Over the years the cell companies began installing email-to-page gateways, for example to send a page to Verizon you simply email 1234567890@vtext.net which sends the SBS page to the phone number 1234567890. Many cell companies created webmail forms on their sites that allowed people to use a web browser to replace their old TAP keyboards. This was great when you had a human sending the page out. The problem was that not all pages were being sent by humans.

It turns out that email-to-page and TAP gateways are perfect for monitoring systems. If you have a critical piece of machinery you can instrument it and run all the instrumentation to a monitoring computer that will page a tech if something goes wrong with it. If you have a server room, for example, you can monitor your servers and page out when any of them go offline or have problems. If you have a website you can monitor that and page the webmaster when it goes offline.

Typically, the easiest way for a monitoring system to send a page is to have it send an email to an email-to-SMS gateway email address. There are a plethora of free and commercial monitoring programs out there that do this. For example a simple and free one that runs on Windows is Ping 4 Life http://sourceforge.net/projects/pingforlife/. This can be configured to email a pager if a server goes down. A more expensive program is What’s Up Gold, located at http://www.whatsupgold.com/. On the Unix/Linux platform a very popular free one is Big Sister http://www.bigsister.ch/ All of these can be configured to send out emails when events happen, and those emails can go to an email-to-paging gateway.

But, what happens if you want to monitor the email server that you are using to send out the monitoring emails with? Or when you want to monitor the Internet connection itself? That is where the older TAP gateways come into play. All you have to do is install a modem into the monitoring computer and have it dial into a TAP gateway and send out a page. It CAN’T send a page to an email-to-SMS gateway when it’s Internet connection is down, but it can still dial a modem. And that is precisely the system that I had setup years ago and used to monitor servers – until January of 2015, when Verizon closed the gateway down.

It was obvious something had to be done to fix the problem. Verizon themselves suggested that I go to a service that runs a TAP to paging gateway. I dismissed this immediately. I had been paying for the TAP gateway as part of my cell phone bill. Why should I have to start paying another amount over and above that?

Clearly, one answer was to have the monitoring system dial the notification cell phone numbers directly. With a standard plain old modem it was child’s play to write a script that would dial a cell phone, and wait for someone to answer, then send a bunch of DTMF digits in a row. The tech answering the phone would see the call from the paging system, hear a bunch of digit 1’s for example, then the paging system would hang up. The tech would then know that the mailserver was offline and the monitoring system was unable to email warning pages out. Different tones could be used to indicate different kinds of problems. It was a kludgey system but it got past the immediate problem.

I knew that to do something more sophisticated would require better hardware, and so that is when I decided to investigate voicemodems. Indeed, the commercial monitoring programs (like What’s Up) had already anticipated this problem and support voicemodems already in this fashion. However, there are no free Windows monitoring programs out there with this capability, and I use Unix for my monitoring platform anyway.


First, let’s look at what a voicemodem is. A Voicemodem is a regular modem that has an additional piece of hardware in it that allows it to convert a data stream to a voice, and back. Voicemodems became very popular in the late 1990’s because people could buy them and install them into a Windows system and build a very sophisticated answering machine. Someone calling a home could, for example, get a menu and press 1 to leave a message for one member of the family, press 2 for another member, and so on. These were also very popular with smaller businesses who had a single phone line that was not that busy, and they wanted to have an auto-attendant on – and still are in fact. A business can have a central phone line with a voicemodem/answering computer on it that can take voice messages and email them to employees who can then reply with phone calls from their cell phones. Most modern phone system voicemails can do this as well.

Over the years, however, Voicemodems fell out of favor for a number of reasons. First was hardware inadequacies. Most earlier voicemodems had very small internal buffers and so when they were operated on a PC that was running a pre-emptive multitasking operating system, unless certain precautions were taken, sound quality was terrible. As a result, people who had a very good experience with a voicemodem and a particular answering machine program under Windows 3.1 and 95 discovered audio problems when they upgraded to Windows XP.

Second, there are 2 incompatible command protocols out there for voicemodems, the A# set and the A+ set. A# was developed first and used by Rockwell/US Robotics/Conexant, the A+ protocol developed later and was used by Lucent and appears in modems like the Agere, & LSI chipsets. AT+V is documented as ANSI/TIA/EIA standard IS-101 entitled “Facsimile Digital Interfaces-Voice Control Interim Standard for Asynchronous DCE.” A follow-up to this specification is PN-3131 by TIA Technical Subcommittee TR-29.2. Many if not most Winmodems support this but no voicemodems that support A+ completely implement IS-101. As a result, programs that worked with a specific voicemodem wouldn’t generally work with a different one, and programs that were written to work with voicemodems in general often wouldn’t work properly with models that hadn’t been used during development for the software.

Third, Microsoft decided to support Voicemodems in the Windows operating system through Unimodem /V. Winmodem manufacturers took advantage of this architecture to produce voicemodems that worked by hooking them into the soundcard system in the PC. Early voicemodem chipsets on Winmodem cards physically connected to internal soundcards with cables and later chipsets (like the Conexant HCF) connected to the sound card through the PCI bus. This setup used a Unimodem sound driver that programs could write audio to using the TAPI interface along with a vendor-supplied driver that would get the sound card to communicate with the voicemodem. However, for this setup to work properly, the voicemodem INF files had to be properly defined with all voice AT commands in it so Unimodem /V would know how to handle the modem and know it was a voicemodem. Many voicemodems had the hardware but lacked proper INF files. And Unimodem /V also did not account for all of the oddball quirks in different voicemodems so even if their inf file was correct there still might be strange bugs. The TAPI interface itself went through several versions and caused problems for programmers who tried using it. Last, the decision to encourage combining the soundcard and the modem into a single chip and support this with Unimodem greatly increased the number of bugs in the implementation.

Linux/Unix also had the same issue as it’s voicemodem support is primariarly handled by a software package named vgetty, which had similar issues with quirks in voicemodems. But it was not as serious because the Windmodem support in Linux never advanced to the point that it ever got voice working properly on a Winmodem. Linuxant was testing voice support in it’s Conexant HSF and HCF drivers for Linux but never released it. As a result, the only way under Linux/Unix to send and receive voice over a Voicemodem is through the serial port – or through the emulated serial port that a USB voicemodem (like the currently selling US Robotics USB Voicemodem) creates.

Ultimately, though, the biggest reason Voicemodem’s popularity waned was because of the rise of Internet Telephoney and the need for FXO & FXS hardware support on a telephone line. Voicemodems are essentially FXO devices – they acted like “telephone stations”. They could receive ringing signals, and make and receive calls. But, they lacked hardware needed to generate ringing signals – so they could not operate as FXS devices. Basically what that meant was you could not program a computer and modem to have a plain old telephone plugged into it, so that a user could plug in, for example, a home or small office telephone set into the modem and be able to make and receive calls over the Internet with the telephone set. Voicemodems were limited to acting as Telephone Answering Devices (TADs). Because of this, someone seriously investing in computerized telephoney would typically spend their money on a Telephone Adapter that could be programed to be either FXO of FXS, so not only could they use it as a TAD they could also use it with the most popular free Software Telephone System available, named Asterisk. A TA adapter could do both jobs, a Voicemodem could only do one.

Why Use a Voicemodem

Regardless of Voicemodems inadequacies, they have one huge overriding factor that made them imperative for me to use – cost. Because the used market is flooded with elderly dialup 56k modems that people are hoping to sell (mainly to get rid of) it is a buyers paradise. After Verizon hung me out to dry by closing it’s TAP gateway, I ultimately ended up buying 3 separate voicemodems for my experimentation, and I found a 4th one out of a scrap PC. The modems I obtained were: An internal US Robotics Model 0642 (AKA US Robotics 56K Voice Win 1806) Winmodem, (manufactured in 1995 and supported by included drivers in Windows XP and downloadable ones from the USR Website) total cost $5. An internal Diamond Multimedia SupraMax V.92 PCI Pro voicemodem based on the Conexant HCF chipset (Manufactured in 1996 and supported by a difficult-to-find Windows XP driver written by Diamond) total cost; free, an External US Robotics 56k Voice Faxmodem model 0525 support model# 005605-00 total cost $14, an external Zoom Faxmodem V.92 Model 3049L , total cost $15. Both internal modems were Winmodems intended to be tested with a copy of What’s Up Gold running on Windows XP, both External modems were intended to be tested on my FreeBSD unix based monitoring system, and tested under Windows.

PC requirements for sound quality over the serial port

Voicemodems use internal codecs (coder/decoders) that convert a compressed data stream to voice, or voice to a compressed data stream. With Serial Port Voicemodems, because they are character-based devices (they accept and provide data a byte at a time) they must assemble incoming characters that comprise a data stream into blocks of audio data, which can be decompressed into voice audio which is then sent out over the phone line. (Windmodems can use the internal DMA mechanism supported by Unimodem and so don’t have this limitation) To enable this serial voicemodems use an internal buffer. When transmitting voice over the phone line, characters stream into the modem from the computer, are stored in the buffer, then the codec in the modem grabs a block of data from the buffer, and decompresses it into audio which is played out over the phone line. When receiving audio on the phone line, the codec records audio for a short period, then compresses it into a data block that is placed in the buffer then sent a character at a time to the computer.

In this scenario the main requirement for good sound quality is that the internal modem buffer does not overflow (when getting audio from the phone line) or underflow (when transmitting audio to the phone line) Audio must be sent and received over the phone line in a precise timeframe or there will be gaps, these gaps are audible as static clicks and pops. I found this issue to be one of the central problems with voicemodem sound quality under Unix/Linux. Part of this is because the earliest voicemodem chipset manufacturer, Rockwell, assumed that any computer feeding data to a voicemodem chipset would be able to IMMEDIATELY drop what it was doing and service the COM port when a character arrived from the modem or was sent to the modem. This was not an unreasonable assumption as this was during the days of MS-DOS and Windows 3.1/95 where a DOS driver or VxD driver was in use where a com port driver could grab the com port and freeze the rest of the machine while it was servicing the port. So, the early Rockwell programming data mentions serial port baud rates needed to be a 384000 bps. This was apparently accepted as some kind of canon for Voicemodem software programmers because it worked with the operating system and hardware of the time.

Today, however, we have operating systems (Windows, Linux) that spend long periods off doing other things and when a character arrives from the serial port, the OS does not immediately service it. As a result, it’s critical that the serial interface to an external voicemodem be run at the highest speed the port will allow, typically 115200bps. It is also important that a fast CPU be in use on the server the voicemodem is attached to if the server is going to be multitasking and doing other things. It is also critical that the Hardware Flow Control lines on the serial port are used. And last, the sound file that is sent should be as small as possible.

Types of voice compression

Today we have a plethora of audio standards for compression from .mp3 to .au. However for voice work with voicemodems we must feed the sound file to the voicemodem in a special raw format. Under Windows, unimodem /V handles this. Under Unix, the software program “pvftormd” that is part of the vgetty distribution is used to do this. Sound files must be converted into pvf format for this program to use them, so a second program, wavtopvf, that converts from Windows-formatted .wav files is used. pvftormd supports a number of different voicemodem encoding standards, examples are V253modem (8 bit PCM) that works with the Zoom voicemodem, and US Robotics 4 (G.721 ADPCM) The modems also support a limted number of sampling rates, the US Robotics modem supports a single rate of 8khz for sampling. The Zoom modem supports 3 sampling rates, 7200, 8000, and 11025khz. The 7200 sampling rate was standard on the original Rockwell voicemodem chipsets but isn’t as easy to create sound files with that rate. The last thing needed is the compression method, the US Robotics modem supports ADPCM with 4 bits per sample, the Zoom supports both 8-bit linear and 16-bit linear as well as A-law and u-law (which we cannot use) Note that all of these are monoraurel.

Creating the warning sound files

When I originally started my TAP replacement/notification project I recorded the warning messages. However this has 2 major drawbacks. First is that the project is stuck with a fixed warning message. I could record a warning that a specific mailserver was offline but then if I wanted to monitor anything else, I could not do it without re-recording the alerting message or recording a new one. Secondly, because the voice was recorded, it had more background noise in it and static. So, I decided then to install espeak and use that with text. The big advantage of doing this is that I can have the monitoring system send text messages containing anything, which are then read over the phone line to the remote tech’s cell phone. If you want to use recorded messages, though, you need to record them as monaurale, 8Khz. Windows Sound Recorder (in Windows XP and earlier Windows) can do this but you must set the file format first, then save it, then start the recording. Otherwise it records in the full 44Khz and then samples it down if you try changing the sampling speed during File Save As, and the quality is not as good.

Dialing Issues

Another issue I ran into had to do with the phone line itself. My setup is behind a phone system where it is necessary to dial 9 for an outside line. While adding the 9, to the dialing script was not a problem, what was a problem is that the phone system itself produces a dialtone that is played when the phone is taken off-hook, then when 9 is pressed the dialtone is the dialtone from the outside line. For some reason, perhaps due to tone, the internal dialtone was not recognized as a dialtone by the US Robotics modem. So, I had to add X3 to make the modem blind-dial. Interestingly, the Zoom modem DID recognize the dialtone.

Windows voicemodems

Once I had my hardware assembled the first thing I tried was installing the 2 internal PCI voicemodems in my Windows system and attempting to use them with What’s Up. My copy of What’s Up cannot do text-to-speech it can only transmit pre-recorded woice messages. (although it will read the .wav files directly without having to convert them) Unfortunately, although both modems installed properly and loaded the Unimodem Full Duplex telephoney audio driver, neither worked with WhatsUp without crashing. I dug around and found 2 other Windmodems that were also voicemodems, once was the BCM (Broadcom) V.92 56K Voicemodem, the other was the Creative DI5630-4 Winmodem using another HCF chipset, neither of these worked either.

It was obvious that this kind of result was pretty standard for Winmodems used as voicemodems. All of these modems worked as 56k dialup modems, and all loaded the Unimodem half(or full) duplex audio driver. Most of them were sold in the past bundled with a crude telephone answering machine program that was only tested with a specific model of modem and specific operating system. For example the US Robotics winmodem was bundled with Rapidcomm Voice for Windows 98. Also, most Windmodems are “controllerless” for example the USR Voicemodem I tried is a controllerless Winmodem meaning that the codec and DSP is implemented in a Windows driver. The CPU of the machine the Winmodem is in must be fast enough to run this. These are all typical Winmodem issues and are exacerbated when attempting to use them in voice applications.

The last thing that is important with older Winmodems is that Microsoft included a lot of generic drivers in Windows XP for dialup modems. When I installed the Supra modem it was detected as a Conexant HCF Winmodem and Windows XP did NOT load the Voice driver for it. I had to run the Windows XP driver installer which replaced the generic HCF modem with the full voicemodem driver for the modem.

Ultimately, I abandonded further efforts to use a Voicemodem under Windows. It is clear that the only way to successfully do so is to either write your own TAPI program (a complex undertaking, although enough code examples exist on the Internet to figure it out, you can start here http://www.microsoft.com/msj/archive/S408.aspx) or to purchase a commercial program that works with a Voicemodem and then test it with an assortment of different Voicemodems, and different PC’s until you get something stable – with the understanding that in 2015, the developer is likely making so little money on sales of any TAPI program that they will be pretty unmotivated to fix any bugs you may find. Or, you can find an old copy of Rapidcomm Voice and a US Robotics modem and run it under Windows 98.

Using Voicemodems under Unix/Linux

Using internal Winmodems under Linux/Unix is full of pitfalls. Most drivers don’t compile on newer Linux kernels although some people have had success with Conexant HCF modems under Ubuntu 12, running the 3.2 or earlier kernels (later kernels seem not to work) detailed here http://ubuntuforums.org/showthread.php?t=1903439 and here https://help.ubuntu.com/community/DialupModemHowto/Conexant. That is the most current mention I have seen of successful use of the old binary Linux driver that Conexant wrote back in the mid 2000’s. But, this driver never supported Voice in the first place.

Using the External Voicemodems under Linux/Unix was much more fruitful, both external modems worked, and the sound quality from the old USR modem was actually quite good. I did have to run firmware updates on each modem. (I pulled out my 10 year old Windows 98 laptop from storage to run the firmware updater) I did ultimately end up creating a monitoring system that had the capability of sending out email notifications, texts, and directly dialing my cell phone and reading out a warning message. This is documented in another article titled “Build a FreeBSD and Linux network monitoring system”


Ultimately, the story of Voicemodems under both Windows and Linux/Unix is a story of missed opportunities, software that was written for no purpose, and hardware incompatabilities that served to pigeonhole the various telephone software packages that were written. The Unimodem /V and TAPI system calls and architecture served mainly to enhance Winmodems as Voice devices under Windows, yet presented so many bugs and problems in those fuction calls that software developers could never write general purpose voice applications using those calls that would be reliable. The vgetty libraries under Unix/Linux were premature and never supported enough different modem models out of the box with the end result the same as the Windows side – general purpose voice applications couldn’t be written to be reliable. If someone bypasses the vgetty libraries and vgetty software on Unix/Linux and goes directly to the modem hardware they can create a reliable system, and if a developer narrows their Windows application to a specific modem, and PC hardware, they can to the same thing under Windows, but beyond that, computerized telephoney has abandonded Voicemodems and gone to dedicated FXS/FXO devices.