Async Ruby - RubyConf 2021 talk transcript

November 13, 2021

This talk was given at RubyConf 2021. Below is the slightly edited talk transcript. You can also watch the video.

1. Introduction
2. Asynchronous programming
    2.1. Asynchronous programming benefits
3. Ruby is synchronous
    3.1. Ruby threads
    3.2. Ruby thread downsides
4. Async Ruby
    4.1. Async gem
    4.2. Async ecosystem
5. Basic example
    5.1. Async tasks
    5.2. Async program structure
6. Advanced example
    6.1. URI.open
    6.2. HTTParty
    6.3. Redis
    6.4. SSH
    6.5. SQL queries
    6.6. Blocking operations
    6.7. Spawning processes
7. Advanced scaling example
    7.1. Scalability limits
8. Understanding Async Ruby
    8.1. Event reactor
    8.2. Fibers
    8.3. Fiber scheduler
9. Common questions
    9.1. Async Rails?
    9.2. Production ready?
    9.3. How to get started?
10. Async Ruby creator
11. Conclusion

Introduction

Async Ruby is an awesome addition to the Ruby language. It's been available for some time now, but relatively few people know about it and it has stayed off of the Ruby mainstream.

The goal is to show you, at a high level, what Async Ruby is about. Whether you're a beginner or an advanced rubyist, I hope to show you something you didn't know about Ruby.

We're going to go through a couple simple examples that show the power of asynchronous programming, and we'll also explain the core concepts of how it all works.

I've been a Ruby programmer for 10 years now and this is, in my opinion, by far the most exciting addition to the Ruby language during this time.

My name is Bruno Sutic. I'm an Async Ruby early adopter, and I've made a couple small contributions to it. You can find me on GitHub as @bruno-. You can also find my contact info on my webpage, brunosutic.com.

Asynchronous programming

Before jumping into Async Ruby, let's explore what does async really mean? What is asynchronous programming?

It's commonly accepted that JavaScript brought async programming to the mainstream developer's consciousness, so it would be fitting to explain asynchronous programming with a simple JavaScript example. I assume a lot of you have written at least a little JavaScript, because it's so unavoidable these days.

Let's look at this example:

fetch("https://httpbin.org/delay/2").then((res) => {
  console.log(`Status is ${res.status}`)
})
console.log("runs first")

We make a simple HTTP GET request to httpbin.org.
We register a promise that runs when the request response is received. This function just prints the response status.
On the last, 4th line of this example, we're printing a string.

The output, shown below, is expected:

runs first
Status is 200

This is the simplest example of an async program, in which we typically make an I/O request, and then something happens later in a callback when the request is complete.

One thing to note in the output here is:

When the program first runs the code on the last line, it prints the string.
Later, when the request is done, it prints the response status.

If you think about it, it's unusual for simple programs to run backwards, such as:

line 1
line 4
then back to line 2

To us, developers, and humans, programs that run top-to-bottom are easier to understand.

The point I'm trying to make here is: async programs are harder to follow and understand. Programs that run top-to-bottom, synchronous programs, are easier to comprehend.

In the case of JavaScript, as the program becomes more complex, they may end up in an infamous state called a "callback hell" or "promise hell", or even "async await hell".

Asynchronous programming benefits

So then, why would we want to make our programs asynchronous? Why not just stick to a linear, top-to-bottom approach?

The answer is simple: performance. To understand this, let's look at the following example with JavaScript pseudo-code:

fetch("https://httpbin.org/delay/2").then(...)
fetch("https://httpbin.org/delay/2").then(...)
fetch("https://httpbin.org/delay/2").then(...)

Here, we're making 3 HTTP GET requests, and each one takes 2 seconds to run. How long will this whole program run? Surprise, surprise - the program will run for 2 seconds total!

In this example we're firing 3 HTTP requests at practically the same time. The trick is that waiting for the responses happens in parallel. Asynchronous programming enables this to happen, and that's how we achieve these big performance gains.

Ruby is synchronous

If we look at the equivalent code in Ruby, we'll see that the same example takes 3x longer to run.

require "open-uri"

URI.open("https://httpbin.org/delay/2")
URI.open("https://httpbin.org/delay/2")
URI.open("https://httpbin.org/delay/2")

In this case the math is predictable: 3 x 2 = 6 seconds. The reason for this is that there's no parallel waiting on the responses. Ruby is synchronous.

Ruby threads

So, how do you make 3, or 5, or 100 requests in Ruby more performant? You use threads.

This example shows how to speed up our program with 3 requests in Ruby.

require "open-uri"

1.upto(3).map {
  Thread.new do
    URI.open("https://httpbin.org/delay/2")
  end
}.each(&:join)

The whole program finishes in 2 seconds!

Ruby thread downsides

And now you may be wondering: Ruby isn't asynchronous by design, but it has threads, so are we good?

If you've done any real-world programs with raw Ruby threads, then you've probably learned that threads in Ruby are hard.

There are two specific problems with them:

Language-level race conditions: These are particulary nasty and hard to debug. This type of problem can occur with even the simplest of thread programs.
Maximum number of threads: This matters when you want to make a large number of parallel requests.

I just tried maxing out the number of threads on my machine, which is a mid-range laptop. The maximum number of threads I could spawn was 2048. That may seem like a lot, but if you have say a million HTTP requests to make that number of threads is not sufficient.

Async Ruby

Async Ruby is a new type of concurrency in Ruby. If you ever think "I want to do multiple things at the same time in Ruby", then Async may be a good fit.

Here are a couple of examples:

Serving more requests per second with the same hardware.
Making more requests with your API client at the same time.
Handling more websocket connections concurrently.

Ruby has a couple options when you want to do multiple things at the same time:

Processes
Ractors
Threads
Async

Async is the new addition to the above list.

Async gem

So, how do you run Async Ruby? Async is just a gem, and you install it with gem install async - that's it.

It's a very nice gem, because Matz invited it to Ruby's standard library. The invite has not yet been accepted.

The gem creator is Samuel Williams, a Ruby core committer. He also wrote the Fiber scheduler, an important Ruby 3.0 that makes Async integrate with Ruby in a super-nice way.

So, you can kinda feel that the Ruby core team, including Matz himself, are backing this gem.

Async ecosystem

Async Ruby is also an ecosystem of gems. Here's a couple of them:

async-http: A powerful HTTP client.
async-await: Adds some syntax sugar to Async.
falcon: A highly scalable asynchronous HTTP server built around the Async core.
async-redis: Redis client.
async-websocket: The name says it all.

This talk focuses on the core async gem and the accompanying Ruby language integration.

Basic example

Let's do an Async Ruby example that is equivalent to the JavaScript example we had before.

require "async"
require "async/http/internet"

time = Time.now

Async do |task|
  internet = Async::HTTP::Internet.new

  task.async do
    internet.get("https://httpbin.org/delay/2")
  end

  task.async do
    internet.get("https://httpbin.org/delay/2")
  end

  task.async do
    internet.get("https://httpbin.org/delay/2")
  end
end

puts "Duration: #{Time.now - time}"

In this example we're using async-http gem. The only thing you have to know about it is that it's an HTTP client. You call get on it, and it makes a request.

The actual code starts with a capitalized Async - a kernel method with a block. All the asynchronous code in a Ruby program is always wrapped in an Async block.

Async tasks

Async Ruby has a concept of tasks, and we spin multiple tasks when we want to make things run concurrently. In this example we're running three requests at the same time.

And just like in the previous JavaScript example, all three requests start at virtually the same time. The big win is that waiting on the responses happens in parallel. The example output confirms this:

Duration: 2.428274121

The total running time of this example is slightly more than the expected 2 seconds because of the network latency.

Async program structure

The basic example above shows the general structure of Async Ruby programs:

You start with an Async block that is passed a main task.
That main task is usually used to spawn more Async sub-tasks.
These sub-tasks run concurrently to each other and to the main task.

Just to make it explicitly clear: Async tasks can be nested indefinitely. So, a task block is passed a sub-task, which can again create a sub-sub-task, etc.

Another thing to clarify is that it's all just Ruby. Async does not contain any DSL - nor does it do gimmicks, like monkey patching. The previous example performs only HTTP requests within tasks, but you can run any Ruby code anywhere - in a main task or sub-tasks. It's just regular Ruby with method calls and blocks.

Advanced example

Hopefully the above example has given you a positive first impression of Async Ruby. Once you get a little used to how things work, you see it's actually really neat, and the performance benefits are awesome. Let's now see another code example. If you're not impressed yet, this may just blow. your. mind.

URI.open

You may've not liked that we're using a new HTTP client in the first example. The truth is, you can use Ruby's URI.open to achieve the same result.

require "async"
require "open-uri"

start = Time.now

Async do |task|
  task.async do
    URI.open("https://httpbin.org/delay/2")
  end

  task.async do
    URI.open("https://httpbin.org/delay/2")
  end
end

puts "Duration: #{Time.now - start}"

The example output:

Duration: 2.440876417

Here, we see that two requests triggered with URI.open are completed in about 2 seconds. Since it's the same result as before, we know the requests ran at the same time.

HTTParty

But, URI.open may also not be your favorite tool. The brilliant thing about Async Ruby is that any HTTP client is supported. Let's try running HTTParty gem and see how that works.

require "async"
require "open-uri"
require "httparty"

start = Time.now

Async do |task|
  task.async do
    URI.open("https://httpbin.org/delay/2")
  end

  task.async do
    HTTParty.get("https://httpbin.org/delay/2")
  end
end

puts "Duration: #{Time.now - start}"

And the output is:

Duration: 2.415048833

Ok, the program ran in about 2 seconds which means that all requests ran concurrently.

Redis

So far, we've only seen examples making HTTP requests. But, what about other network requests? Let's try Redis which has its own protocol built on top of TCP.

This example extends the previous one with another task at the bottom of Async block.

require "async"
require "open-uri"
require "httparty"
require "redis"

start = Time.now

Async do |task|
  task.async do
    URI.open("https://httpbin.org/delay/2")
  end

  task.async do
    HTTParty.get("https://httpbin.org/delay/2")
  end

  task.async do
    Redis.new.blpop("abc123", 2)
  end
end

puts "Duration: #{Time.now - start}"

The added Redis command runs for 2 seconds before returning.

We run the example:

Duration: 2.410222604

It completes in about 2 seconds. Wow! We can also make Redis commands asynchronous.

In fact, any I/O operation can be made asynchronous. All existing, synchronous code is fully compatible with Async. You don't have to use async-only gems, like async-http or async-redis. You can just continue using the libraries you are already familiar with.

SSH

Let's add another example to the mix. I'll use net-ssh gem to execute an SSH command on the remote server.

This example extends the previous one with another task at the bottom of Async block.

require "async"
require "open-uri"
require "httparty"
require "redis"
require "net/ssh"

start = Time.now

Async do |task|
  task.async do
    URI.open("https://httpbin.org/delay/2")
  end

  task.async do
    HTTParty.get("https://httpbin.org/delay/2")
  end

  task.async do
    Redis.new.blpop("abc123", 2)
  end

  task.async do
    Net::SSH.start("164.90.237.21").exec!("sleep 1.5")
  end
end

puts "Duration: #{Time.now - start}"

This SSH command runs sleep 1.5 on the target server. Because of some overhead, it finishes in about 2 seconds total.

And the output is:

Duration: 2.400152144

Ok, there you have it. We added SSH to the mix and it works seamlessly with other network requests.

SQL queries

You may be wondering - what about databases? We connect to the databases over the network. Does Async support SQL queries?

I'll use sequel gem to check if asynchronous database operations are supported. The query added to the example takes exactly 2 seconds to run.

This example extends the previous one with another task at the bottom of Async block.

require "async"
require "open-uri"
require "httparty"
require "redis"
require "net/ssh"
require "sequel"

DB = Sequel.postgres
Sequel.extension(:fiber_concurrency)
start = Time.now

Async do |task|
  task.async do
    URI.open("https://httpbin.org/delay/2")
  end

  task.async do
    HTTParty.get("https://httpbin.org/delay/2")
  end

  task.async do
    Redis.new.blpop("abc123", 2)
  end

  task.async do
    Net::SSH.start("164.90.237.21").exec!("sleep 1.5")
  end

  task.async do
    DB.run("SELECT pg_sleep(2)")
  end
end

puts "Duration: #{Time.now - start}"

The output:

Duration: 2.465881664

And yes, database queries are supported as well. Cool, right?

Blocking operations

Let's see another example that uses Ruby's sleep method.

This example extends the previous one with another task at the bottom of Async block.

require "async"
require "open-uri"
require "httparty"
require "redis"
require "net/ssh"
require "sequel"

DB = Sequel.postgres
Sequel.extension(:fiber_concurrency)
start = Time.now

Async do |task|
  task.async do
    URI.open("https://httpbin.org/delay/2")
  end

  task.async do
    HTTParty.get("https://httpbin.org/delay/2")
  end

  task.async do
    Redis.new.blpop("abc123", 2)
  end

  task.async do
    Net::SSH.start("164.90.237.21").exec!("sleep 1.5")
  end

  task.async do
    DB.run("SELECT pg_sleep(2)")
  end

  task.async do
    sleep 2
  end
end

puts "Duration: #{Time.now - start}"

What do you expect this sleep will do? Will it increase the total example duration by 2 seconds? Let's check the output:

Duration: 2.397805105

The whole program runs in about 2 seconds, which indicates this sleep ran concurrently with other tasks. Nice! So, not only can we run network I/O asynchronously, we can also run other blocking operations async.

Spawning processes

What other, often used, blocking operations do we run in Ruby? How about we try spawning new child processes?

This example extends the previous one with another task at the bottom of Async block.

require "async"
require "open-uri"
require "httparty"
require "redis"
require "net/ssh"
require "sequel"

DB = Sequel.postgres
Sequel.extension(:fiber_concurrency)
start = Time.now

Async do |task|
  task.async do
    URI.open("https://httpbin.org/delay/2")
  end

  task.async do
    HTTParty.get("https://httpbin.org/delay/2")
  end

  task.async do
    Redis.new.blpop("abc123", 2)
  end

  task.async do
    Net::SSH.start("164.90.237.21").exec!("sleep 1.5")
  end

  task.async do
    DB.run("SELECT pg_sleep(2)")
  end

  task.async do
    sleep 2
  end

  task.async do
    `sleep 2`
  end
end

puts "Duration: #{Time.now - start}"

I'm using a sleep system command in this example. Don't get confused, this is actually running an external system command. It could be any other executable, but I chose sleep so I can easily control the duration.

The output is:

Duration: 2.396816366

And there you have it: system commands can run async as well.

Advanced scaling example

We've covered a lot so far, and hopefully these features look exciting to you. You saw something new, something really innovative in Ruby. But that's not all. Let me show you how easily Async Ruby scales.

This example extends the previous one by repeating the content of Async block 10.times.

require "async"
require "open-uri"
require "httparty"
require "redis"
require "net/ssh"
require "sequel"

DB = Sequel.postgres(max_connections: 10)
Sequel.extension(:fiber_concurrency)
start = Time.now

Async do |task|
  10.times do
    task.async do
      URI.open("https://httpbin.org/delay/2")
    end

    task.async do
      HTTParty.get("https://httpbin.org/delay/2")
    end

    task.async do
      Redis.new.blpop("abc123", 2)
    end

    # task.async do
    #   Net::SSH.start("164.90.237.21").exec!("sleep 1.5")
    # end

    task.async do
      DB.run("SELECT pg_sleep(2)")
    end

    task.async do
      sleep 2
    end

    task.async do
      `sleep 2`
    end
  end
end

puts "Duration: #{Time.now - start}"

Quick note about Net::SSH. I had to remove that operation because I couldn't figure out the correct SSH configuration for this example.

Let's see how long this example runs:

Duration: 2.82646708

2 seconds! Yes, we're running 60 tasks, each lasts 2 seconds, and total example run time is slightly more than 2.5 seconds.

How about cranking things up? How about we repeat this 100 times. Let's see what happens.

This example is almost the same as the previous one, but Async block is repeated 100.times.

require "async"
require "open-uri"
require "httparty"
require "redis"
require "net/ssh"
require "sequel"

DB = Sequel.postgres(max_connections: 100)
Sequel.extension(:fiber_concurrency)
start = Time.now

Async do |task|
  100.times do
    task.async do
      URI.open("https://httpbin.org/delay/2")
    end

    task.async do
      HTTParty.get("https://httpbin.org/delay/2")
    end

    task.async do
      Redis.new.blpop("abc123", 2)
    end

    # task.async do
    #   Net::SSH.start("164.90.237.21").exec!("sleep 1.5")
    # end

    task.async do
      DB.run("SELECT pg_sleep(2)")
    end

    task.async do
      sleep 2
    end

    task.async do
      `sleep 2`
    end
  end
end

puts "Duration: #{Time.now - start}"

We're now running 600 concurrent operations. Example duration is:

Duration: 3.753404045

The total program run time increased by a second because of the overhead of establishing so many network connections. Still, I find this pretty impressive.

So, there you have it: easy scaling with Async. You can crank the numbers up, but in my case Redis server and PostgreSQL database started complaining, so I left it at that.

Scalability limits

You can argue we could make the last example work with threads - creating 600 threads. I think that's really pushing the limits with threads. My hunch is the thread scheduling overhead would be just too high. When using threads, it's more common to limit the number of threads to say, 50 or 100.

On the other hand, 600 concurrent Async tasks is a common thing to do. The upper limit on the number of Async tasks per process is in the single digit millions. Some users have successfully done that.

This limit, of course, depends on the system and what you're trying to do. For example, if you're making or receiving network requests, you'll probably run out of ports at 40-50 thousand concurrent tasks, unless you play with your network settings.

In any case, I hope that you get the idea that Async Ruby is a very, very, powerful tool.

Understanding Async Ruby

To me, the biggest part of the magic is running 3 HTTP requests with URI.open. With vanilla Ruby that takes 6 seconds. And then, by using the same method within the Async block, the program runs for 2 seconds.

It's the same with other examples: sleep, Redis etc. They all normally run in a blocking way, but then inside an Async block they work asynchronously. It's a great example of keeping Ruby code fully backwards compatible. But how does that work?

There's a lot to learn about Async Ruby, but I think there are 3 main concepts to understand:

Event reactor
Fibers
Fiber scheduler

Each of these 3 topics is very broad, so I'll just provide a summary here.

Event reactor

The event reactor is sometimes called by other names: event system or event loop. Every async implementation, in every language, say JavaScript, always has some kind of event reactor behind it.

Async Ruby is no exception. The current version of async gem uses nio4r gem as an event reactor backend. nio4r then uses libev to wrap systems' native APIs - epoll on linux, kqueue on Mac etc.

What does the event reactor do? It efficiently waits for I/O events. When an event happens, it performs an action we programmed it to do. On a very high level:

We make an HTTP request and then we wait.
Event reactor notifies us when the response for that request is ready and can be read from the underlying socket.
We read from the socket.

These notifications are very efficient with resource usage and allow for high scalability. For example, if you hear a server can handle 10 thousand connections at the same time or a crawler can make a large number of concurrent requests - an event reactor is probably the technology behind that.

Fibers

You saw that Async has tasks. Tasks are just wrappers around fibers. Event reactor drives the execution of these fibers. For example:

When a response in task 1 is ready, the event reactor resumes task or fiber number 1.
Later on, when response in task 2 is ready, it resumes task or fiber number 2.

You get the idea.

Due to the decision to register fibers with event reactor we get a really nice property that code within a single task behaves completely synchronously. This means you can read it top-to-bottom. This is huge! It means our Async programs are easy to write and understand.

The code behaves asynchronously only if you use task.async. There's no way you can get "callback hell" with Async Ruby.

Fiber scheduler

The last piece of the puzzle, and the last big concept, is the fiber scheduler. Fiber scheduler was listed as one of the big Ruby 3.0 features. It provides hoooks for blocking functions inside Ruby. Examples of those blocking features are:

Waiting for an I/O read or write
Waiting on a sleep method to finish

In essence, fiber scheduler turns blocking behavior into non-blocking inside an Async context.

Let's take the sleep method for example. If you're running sleep 2 in an Async block, instead of blocking the whole program for 2 seconds, the fiber scheduler will run that sleep in a non-blocking fashion. It will use event reactor's timing features to effectively sleep in one task, while allowing other tasks to run during that time.

Now you know the big benefit of fiber scheduler. Along with fibers and event reactor, it makes Async Ruby seem like magic.

Common questions

Async Rails?

It's time for the big question: does Async work with Ruby on Rails? The answer is currently, no. The reason is that ActiveRecord needs more work to support async gem.

Production ready?

Another big question you may have: is Async Ruby production ready? The answer to that question is - yes! Async Ruby is production ready and a number of people are running it in production. Everyone using it has nothing but praises for Async.

As an example, Trevor Turk recently started using async on helloweather.com. They've replaced Puma and Typheous::Hydra with Falcon and Async::HTTP. They immediately cut their server costs to one third, and their overall system is now more stable.

How to get started?

If you're excited about what you've seen so far, you're probably asking: how do I get started? I think the single best starting point to learn Async Ruby is async gem github repo github.com/socketry/async. From there you'll find a link to project documentation.

Async Ruby creator

I've already mentioned Samuel Williams, but I think it doesn't hurt to say this again: this guy is the sole creator of Async Ruby, Async ecosystem and a Ruby core committer that implemented fiber scheduler feature.

Huge thanks to Samuel! He's making an awesome contribution to all of us Ruby developers.

Conclusion

I hope you liked what you saw in this speech. Async is an exciting new addition to Ruby. It's a whole new type of concurrency added to the language. As you saw, it's super powerful and very scalable.

This changes what's possible with Ruby. It changed the way I think about designing programs and apps.

One of the best things is that it does not obsolete any of the existing code.

Just like Ruby itself, Async Ruby is beautifully designed and a joy to use.

Happy hacking with Async Ruby!