[P&R#1] How to Design an Internet Bot
I remember the time I tried to purchase tickets to a concert. The interminable wait as I stared at timer counting down on my computer screen. The intense euphoria as I dreamed of the amazing time I'll have at the concert. The excruciating anxiety as the timer approached the five-second mark. And the bitter disappointment as I clicked submit and the website froze.
Sorry, tickets are sold out.
It sucks to be late on the internet! Be it missing out on concert tickets, failing to register for a class, or commenting "first!" on a YouTube video after someone else has already commented, being late on the internet sucks.
And so if you're interested in not being late on the internet, I'd like to introduce you to bots.
What's a bot?
A bot is a software application that runs automated tasks over the internet. Typically, bots perform tasks that are simple and repetitive much faster than a person could.
Take for example 16-year-old Remus, who earns about USD 20,000 each month reselling shoes. Remus uses bots to speed through the buying process of sneakers, and this gives him an advantage over hundreds of other resellers!
Setting the stage
If we take a look at my experience against someone who did get those tickets, what's the difference really? Well, it's simple.
The website received that person's booking request before receiving mine!
You see, the typical ticket booking process (or most other booking processes for that matter) looks like this.
First, the user fills in a form. The user cannot submit the form until a specified time which is when the web server starts accepting booking requests. Let's suppose that in my example, the web server starts accepting booking requests at midnight. At midnight, when the user clicks submit, the web browser formulates a HTTP request and sends it to the web server hosting the booking website.
These requests are processed on a first-come, first-served basis, and users associated with the first 𝓧 requests (assuming 𝓧 number of tickets available) gets their tickets.
What did I learn?
Here are some lessons from building my own bot.
1. Non-browser requests
Back to our example, when I clicked submit at 12am, several things happened.
The web browser formulates a HTTP request, places the bytes on the wire and sends the request to the web server (which for all we know, could be located in another country). By the time the web server actually receives the request, it might already be 20 seconds past midnight.
20 seconds — that's a decade on the internet time scale!
To guarantee those tickets, we'll need to send our request before the stroke of midnight. But how's that possible when I can only click submit at midnight?
If you know what the request is supposed to look like, you don't have to wait for the web browser to formulate one for you. Instead, you can formulate the request without the use of a browser using tools such as Postman. You can then send this request to the web server before others can submit requests via their web browser.
And this increases the chances that the web server will receive your request before receiving theirs.
There's just one issue though. Since requests that are received before 12am are ignored, and since we only submit one request, how can we ensure that the web server receives our request only after 12am?
This is where multithreading comes in.
2. Multithreading
In computer architecture, multithreading is the ability of a central processing unit (CPU) (or a single core in a multi-core processor) to provide multiple threads of execution concurrently, supported by the operating system.
Multithreading allows us to send multiple requests to the server, with each request being handled by its own thread of execution. And the beauty of multithreading is, it's okay for requests to hit the server before 12am, as long as just one request hits the server after 12am.
Think of this as juggling balls in the air.
When I submit a request, I'm tossing a ball in the air. While I could wait for that ball to fall before tossing it again (similar to how we submit requests by waiting for a response before hitting the refresh key), I could instead toss multiple balls. Each ball is similar to a new web browser tab. I could cycle through the tabs hitting submit on each one, before cycling through the tabs again to view the results.
Now, think of a bot as the star juggler from the Cirque du Soleil. It's built for one and only one purpose — to keep as many balls in the air as possible.
This requires the bot to send requests to the web server at precise intervals (T1) while simultaneously processing responses returned by the web server.
If we assume that the time for each request to be sent to the web server is constant, the difference between when the web server starts accepting requests and when the web server first receives an acceptable request (T2) cannot exceed T1. And since T1 can be extremely small (e.g. 100 milliseconds), we can drastically reduce T2.
We're well on our way to being early on the internet.
3. Authentication
Let's suppose our ticket booking website requires authentication. In other words, users need to prove that they are who they are before they can submit a booking request to the web server.
This post provides a fantastic overview of the various authentication methods.
For this example, let's further stipulate that the web server uses cookies-based authentication. This means that users must provide some form of credentials (e.g. passwords, 2FA etc.), which the server validates. If valid, the server sends a Set-Cookie
header in the response. The browser puts it into a cookie jar, and the cookie will be sent along with every request made to the web server in the Cookie
HTTP header.
It is only with this header can I view the form and submit a booking request.
To deal with cookies-based authentication, our bot must emulate the authentication process. This is where the Chrome DevTools come in handy; we can use these tools to analyse requests and responses between the web browser and the web server.
But beware! If authentication is required, the web server now knows who is submitting the requests. And since the use of bots is often not well accepted (and may even be considered illegal), utmost care must be taken to obscure any evidence that you are using a bot.
Conclusion
Although I'm really excited by the use of bots, we must understand that it is still our responsibility to ensure that the bots we develop are not used for nefarious purposes. Besides that, I hope you've learnt a thing or two about bots, and that you're one step closer to reaping the benefits of being early on the internet!