Stopping Robocalls with a CAPTCHA - Node.js
This guide is available for Relay V2 and for Relay V4. You are currently viewing the guide for Relay V4. To switch to the older guide (which isn't maintained anymore), please use the selection field below:
Robocalling and spam calls have been increasing in number over the past few years. Only in the US, there were 165.1 million robocalls placed in 2020, an average of 14.1 per person, including children and people who do not have a phone! SignalWire can help with its communication technology, which allows us to easily create a robocall protection service.
By the end of this guide, we will have a fully-functional call forwarding service with a voice CAPTCHA to determine if the caller is a human. If they are, it forwards the call to your phone number or another number you configured as the destination.
A CAPTCHA is an automated mechanism used to determine if the user of a service is a human or a machine. You have certainly interacted with visual ones such as "pick all of the pictures with a boat in it" on websites. In this application, we ask the caller for the result of the sum of two random numbers.
In case the incoming caller is determined to be a spammer, it is sent straight to Lenny (more on that below), and it is flagged as a robo-call if someone tries to answer the CAPTCHA three times and fails. If the caller solves the CAPTCHA, their call gets forwarded to the configured private phone number for the DID.
Once the calls are connected, the user that received the call on their private number can press **
on their DTMF keypad at any time:
this will instantly flag the caller as spammer and add it to the database.
That way, if a human unwanted caller makes it through the CAPTCHA, they can still be banned.
What is Lenny?
Lenny is one of the most widely known anti-spam chatbots, designed to waste the time of telemarketers.
It is a set of connected audio files, spoken in a somewhat-Australian accent, that uses generic phrases such as "Are you there?" to lure a spammer into a long conversation about its "family", a supposed very smart daughter, or other topics. The average time wasted for a spam call is over 10 minutes, and it is also very fun to listen to recordings.
The bot itself is simple in its ingenuity, but setting up your own version has always been complicated due to needing some telephony infrastructure and a bit of logic. The SignalWire Communication API makes it easy to do.
What do I need to run this code?
As with all SignalWire scripts, you will need your API credentials (SignalWire Space URL, Project ID, and API token) from your SignalWire Space. For more information on how to find this information, you can read more about Navigating your SignalWire Space!
The application uses node-persist, a simple file-based database, to keep track of flagged numbers and automatically reject calls. In a production application, you would maybe use a different database such as PostgreSQL. Every phone number is saved and remembered, so any callers who you want to receive calls from will automatically get through the second time they dial-in. Spammers, on the other hand, will just be sent to have a chat with Lenny!
Lastly, you will also need to have the SignalWire Relay SDK installed.
The application database is persistent, so you will have to remove the .node-persist
folder in the directory to reset the database if you would like to test multiple times with the same number, or your call will be handled automatically as a spammer or a human depending on how you responded the first time.
Find the code for this application on GitHub!
Configuring the Code
Start by copying the env.example
file to a file named .env
, and fill in the necessary information.
The application needs a SignalWire API token. You can sign up here, then put the Project ID and Token in the .env
file as SIGNALWIRE_PROJECT_ID
and SIGNALWIRE_API_TOKEN
.
You also need to configure a phone number where legitimate calls will be forwarded. This can be any number (i.e., your personal number), and you can set it up in the .env
file as MY_NUMBER
.
Finally, you need a DID (phone number) that people will call instead of dialing your own number directly. For this, buy a phone number from your SignalWire Dashboard, then configure it to handle incoming calls using Relay Application
, with the same Relay context name that you configure in .env
as SIGNALWIRE_CONTEXT
. By default, the example config file uses captcha
as context name.
Running the Application
If you are running the application locally, run npm install
followed by npm start
.
You can also run the application via Docker, by first building the image with docker build -t nodelenny .
followed by docker run -it --rm -v ``pwd``/.node-persist:/app/.node-persist --name nodelenny --env-file .env nodelenny
.
If you prefer, you can just run sh run_docker.sh
in your shell and the container will be built and started for you. To test the code, give a call to the phone number you set up above and prepare for a simple math quiz... unless you are a robot!
Code Breakdown
Let's now walk through the more interesting code snippets.
Overall Organization
The application is composed of four files:
- index.js, the entry point,
- captcha.js, for the implementation of the captcha,
- transfer.js, for connecting to the real phone number after the captcha is solved, and
- lenny.js, for having fun with human spammers
The possible paths that a call can take are defined in index.js in the form of a state machine:
+--> HANGUP <---+
| |
START -+--> CAPTCHA ---+ (captcha.js)
| | |
| v |
+--> TRANSFER --+ (transfer.js)
| | |
| v |
+---> LENNY ----+ (lenny.js)
Right after the application is started, we create a Relay Consumer to connect with SignalWire:
const consumer = new RelayConsumer({
project: process.env.SIGNALWIRE_PROJECT_ID,
token: process.env.SIGNALWIRE_API_TOKEN,
contexts: [process.env.SIGNALWIRE_CONTEXT],
onIncomingCall: onIncomingCallHandler,
});
consumer.run();
Whenever someone calls a number which we have associated to the same context specified in SIGNALWIRE_CONTEXT
, the Relay Consumer will call our onIncomingCallHandler
function. In there, we first answer the call:
const answerResult = await call.answer();
We start recording:
const recordAction = await call.recordAsync({
direction: "both",
initial_timeout: 10,
end_silence_timeout: 0,
stereo: true,
});
console.log("Recording the call to " + recordAction.url);
And we go into the START
state. From there, is the caller is not known yet (i.e., is not in our db), we will switch to the CAPTCHA
state.
The CAPTCHA
The CAPTCHA implementation can be found in the captcha.js
file. In there, we first play a text-to-speech message to assert the current state of the call:
await call.playTTS({
text: "Hello! To complete your call, Please verify you are a human, by dialing or speaking the answer to this simple question.",
});
Then we generate two digits at random, and we prompt the user for the sum of those numbers:
const digit1 = Math.floor(Math.random() * 10);
const digit2 = Math.floor(Math.random() * 10);
const expectedAnswer = digit1 + digit2;
const result = await call.promptTTS({
type: "both", // Collect both digits and speech
digits_max: 2, // Max digits to collect
digit_timeout: 1.0, // Timeout in seconds between each digit
digits_terminators: "#", // DTMF digit that will end the recording
end_silence_timeout: 1.0, // How much silence to wait for end of speech.
speech_hints: ["denoise=true", ...Array(19).keys()], // Array of expected phrases to detect.
text: "What is " + digit1 + " plus " + digit2 + "?", // Our TTS prompt
});
At this point the user will hear, for example: "What is 3 plus 9?". They can answer either by speaking, or by using the keypad. We find the answer in the result
variable:
if (result.result.trim() == expectedAnswer) {
await call.playTTS({
text: "That is correct. You are probably a human, enjoy your call!",
});
storage.set(call.from, { isHuman: true });
return true;
}
Transfering the Call
If the captcha is solved successfully, or if the caller is a known legitimate human, they are transferred to the real phone number. In transfer.js
, you can see how to connect the currently active call (call
) to our real phone number:
const dial = await call.connect({
type: "phone",
to: destinationNumber,
from: call.from,
timeout: 30,
});
At this point we have:
call
, the original call, anddial.call
, the nested call to our real phone number
Now comes the interesting part. Say that a human spammer has been able to get to this point. We want to let the user mark them as a spammer while they're speaking. We'll do that by dialing **
. As soon as the user dials **
, we hangup the nested phone call and connect the spammer to Lenny for an endless meaningless conversation.
First, we connect an event to detect the digits that the user dials:
// Detect if the user presses '**'. If so, mark caller as spammer.
let dialed = "";
dial.call.on("detect.update", async (call, params) => {
if (params.detect.type === "digit") {
const digit = params.detect.params.event;
console.log("Dialed", digit);
dialed += digit;
if (dialed.endsWith("**")) {
console.log("Marking as scammer");
await storage.set(call.from, { isHuman: true, isScammer: true });
dial.call.hangup(); // Hangup the nested call
}
}
});
Then, we start the actual detection of the digits:
// Start the asynchronous detection
dial.call.detectAsync({
type: "digit",
timeout: 0,
});
The detection runs asynchronously while the user speaks.
Lenny
We are showing how to transfer the call to Lenny just for teaching purposes! In real applications, you may want to skip this step, otherwise you will be charged for all the minutes during which Lenny is talking!
If the current caller has been marked as a spammer by dialing **
, or if they were a known spammer to begin with, the result is the same: they are sent to Lenny.
Lenny consists in a sequence of audio files with the voice of a convincing old man. The sentences that Lenny uses are general enough to be appropriate for a large set of contexts: it is going to take a while before the spammer realizes they're talking with a record player!
How does Lenny know when to start speaking, so to avoid talking over the other person? Surprisingly, Lenny is quite simple. First, we simply play an audio (we pick audio files in sequence):
await call.playAudio({
url: lennyConfig.soundBase + "/" + lennyConfig.responses[responseId],
volume: +20,
});
Then, in order to wait for whatever the other person is replying, we use prompt
(the same function we used for the captcha), using as audio a light background noise:
await call.prompt({
type: "speech",
end_silence_timeout: 1.0,
media: [
{
type: "audio",
url: lennyConfig.soundBase + "/" + lennyConfig.background,
},
],
});
And that's it!
Conclusion
Now you have a powerful and persistent way to prevent robocalls from hitting your personal phone or your businesses' phones, brought to you by the power of the SignalWire Communication APIs. To see a live demonstration of this application at work, feel free to watch our previous LIVEWire on it below.
Get Started With SignalWire
You can sign up for a new SignalWire Space here! If you sign up for the first time, your account will start in trial mode, which you can exit by making a manual top up of $5.00. You can find more information on the Trial Mode resource page.
You can find more information, including where to get your credentials and how to set up the phone number, in the Getting Started with Relay guide.
Here are some other resources that may be helpful for you: