tbucket documentation¶

tbucket 1.1.0 last updated Feb 03, 2018

Installation¶

This module depends on apsw, which is excellent but persnickety about build configuration. The intended way to install apsw is via the system package manager. The PyPI releases of apsw are not maintained by its author, but rather by an unrelated third party. As of writing, these releases are undermaintained.

To avoid pulling broken builds from PyPI, apsw is not included in tbucket’s dependencies. You should install apsw from your system package manager, or follow apsw’s pip-based build instructions.

TL;DR:

Do one of the following:

sudo apt-get install python-apsw
pip install tbucket

or:

pip install --user https://github.com/rogerbinns/apsw/releases/download/3.22.0-r1/apsw-3.22.0-r1.zip \
--global-option=fetch --global-option=--version --global-option=3.22.0 --global-option=--all \
--global-option=build --global-option=--enable-all-extensions
pip install tbucket

For more information, see apsw’s documentation.

Documentation¶

tbucket: some stateful classes for “token bucket” style rate limiting.

In a token bucket setup, each token represents some costly action, such as making one API call or sending one byte of data. We also have a “bucket” of tokens, which represents how many actions we’re allowed to take. The bucket fills up with tokens according to various rate-limiting semantics. When we’re about to take an action, we remove a token from the bucket. If the bucket is empty, we must wait for it to refill.

These classes all use a SQLite database to store state. They take a “key” parameter which identifies the bucket within the database. This way, token bucket state may be shared between many different processes on a single machine.

This library is tailored for the case of calling various APIs found in the wild which have low rate limits. Our goal is to closely model the algorithm behind an API’s rate limiter, so we can make as many calls as possible as early as possible, without going over the limit.

Note

This library treats SQLite’s transaction semantics as a white box. In general, user-visible functions will initiate a BEGIN IMMEDIATE transaction.

class tbucket.TokenBucket(path, key, rate, period)¶

Bases: object

A “classic” token bucket rate limiter.

In the classic implementation, the bucket fills at a constant and continuous rate, up to a maximum.

The configuration parameters are rate and period. The bucket fills at rate / period, with rate being the maximum value.

The state of the bucket is represented as a (tokens, timestamp) tuple. This state gets returned from most functions.

This class will create a table called “tbf” in the database to store state. The “key” attribute gets used as the primary key within the table. It only stores one row per bucket, consisting of the most recent (tokens, timestamp) state tuple.

path¶: The path to the sqlite database.

key¶: A unique key for this bucket within the database.

rate¶: The maximum number of tokens.

period¶: The time for the bucket to reach the maximum number of tokens. The bucket refills at a rate of rate / period.

consume(n, leave=None)¶

Consume tokens, waiting for them if necessary.

This will perform a BEGIN IMMEDIATE transaction on the database while querying and updating state. The transaction is only used for updating state and won’t be held while waiting for tokens.

Parameters:	n – The number of tokens to consume. leave – A number of tokens. Only successfully consume tokens once we would be able to leave this many behind.
Returns:	A (tokens, timestamp) tuple.

db¶: A thread-local apsw.Connection.

peek()¶

Peek at the current number of tokens, and update the state.

This will perform a BEGIN IMMEDIATE transaction on the database.

Returns:	A (tokens, timestamp) tuple.

set(tokens, timestamp=None)¶

Explicitly set the number of tokens.

This will perform a BEGIN IMMEDIATE transaction on the database.

Parameters:	tokens – The number of tokens. timestamp – The time at which the tokens are measured. If None, defaults to now.
Returns:	A (tokens, timestamp) tuple.

try_consume(n, leave=None)¶

Try to consume some tokens.

If there are fewer tokens available than the number requested, this function will just signal failure, without waiting for a token.

This will perform a BEGIN IMMEDIATE transaction on the database while querying and updating state.

Parameters:	n – The number of tokens to try to consume. leave – A number of tokens. Only succeed if we would have this many tokens left over, after n are consumed.
Returns:	A (success, tokens, timestamp) tuple.

class tbucket.ScheduledTokenBucket(path, key, rate, period)¶

Bases: tbucket.TokenBucket

A token bucket which resets to a fixed number of tokens at regular intervals.

The bucket will be filled whenever now % period == 0. There is currently no support for any offsets to this schedule.

When filled, the bucket will be reset to have rate tokens.

The state of the bucket is represented as a (tokens, timestamp) tuple. This state gets returned from most functions.

path¶: The path to the sqlite database.

key¶: A unique key for this bucket within the database.

rate¶: The number of tokens the bucket will be reset to.

period¶: How often the bucket is reset.

consume(n, leave=None)¶

Consume tokens, waiting for them if necessary.

This will perform a BEGIN IMMEDIATE transaction on the database while querying and updating state. The transaction is only used for updating state and won’t be held while waiting for tokens.

Parameters:	n – The number of tokens to consume. leave – A number of tokens. Only successfully consume tokens once we would be able to leave this many behind.
Returns:	A (tokens, timestamp) tuple.

db¶: A thread-local apsw.Connection.

peek()¶

Peek at the current number of tokens, and update the state.

This will perform a BEGIN IMMEDIATE transaction on the database.

Returns:	A (tokens, timestamp) tuple.

set(tokens, timestamp=None)¶

Explicitly set the number of tokens.

This will perform a BEGIN IMMEDIATE transaction on the database.

Parameters:	tokens – The number of tokens. timestamp – The time at which the tokens are measured. If None, defaults to now.
Returns:	A (tokens, timestamp) tuple.

try_consume(n, leave=None)¶

Try to consume some tokens.

If there are fewer tokens available than the number requested, this function will just signal failure, without waiting for a token.

This will perform a BEGIN IMMEDIATE transaction on the database while querying and updating state.

Parameters:	n – The number of tokens to try to consume. leave – A number of tokens. Only succeed if we would have this many tokens left over, after n are consumed.
Returns:	A (success, tokens, timestamp) tuple.

class tbucket.TimeSeriesTokenBucket(path, key, rate, period, trim_func=None)¶

Bases: tbucket.TokenBucket

A token bucket which tracks the exact timestamps of tokens withdrawn.

This token bucket implementation enforces that exactly N tokens may be consumed in any window of the target size.

However, this implementation requires tracking more state. The classic implementation keeps a single tuple for state, but this implementation must track the timestamps of at least the last N tokens.

The bucket has tokens available whenever fewer than rate tokens have been consumed in the last period.

This class will create a table called “ts_token_bucket” in the database to store state. The “key” attribute gets used as a key in this table. It will store one row per token consumed, but by default will only store rate tokens (see trim()).

path¶: The path to the sqlite database.

key¶: A unique key for this bucket within the database.

rate¶: The number of tokens the bucket will be reset to.

period¶: How often the bucket is reset.

consume(n, leave=None)¶

Consume tokens, waiting for them if necessary.

This will perform a BEGIN IMMEDIATE transaction on the database while querying and updating state. The transaction is only used for updating state and won’t be held while waiting for tokens.

Parameters:

n – The number of tokens to consume.
leave – A number of tokens. Only successfully consume tokens once we would be able to leave this many behind.

Returns:

A tuple of (tokens, list_of_timestamps, query_Time). The number of: tokens will always be rate - len(list_of_timestamps).

db¶: A thread-local apsw.Connection.

estimate(n, query_time=None)¶

Estimate the timestamp at which we would have a number of tokens.

This will perform a SAVEPOINT/RELEASE on the database.

Parameters:	n – The number of tokens we need. query_time – The time as of which the query is made. If None, defaults to now.
Returns:	The timestamp at which we would have n tokens available.

mutate(mutator, query_time=None)¶

Mutate the set of token timestamps recorded in a recent period.

This will get the list of token timestamps that occurred from query_time - period and query_time, pass them to a mutator function. The mutator should return a new set of timestamps, which must all be within the same window. This function will then update the database so that the returned timestamps will be the only timestamps that exist within the window.

Will perform a BEGIN IMMEDIATE transaction on the database. The transaction will be held while mutator is called.

Parameters:	mutator – A function which takes (list_of_timestamps, query_time), and returns a new list of timestamps. query_time – A target query time. The target window will be from query_time - period to query_time. If None, defaults to now. This value will be passed to the mutator function.

peek(query_time=None)¶

Peek at the recorded token timestamps in a recent window.

This will perform a SAVEPOINT/RELEASE on the database.

Parameters:	query_time – The target query time. If None, defaults to now.
Returns:	A tuple of (tokens, list_of_timestamps, query_time). The number of tokens will always be rate - len(list_of_timestamps).

record(*times)¶

Record new token timestamps.

Will perform a BEGIN IMMEDIATE transaction on the database.

Parameters:	*times – A list of timestamps when some tokens were given out.

set(n, query_time=None, fill=None, prune=None)¶

Updates the set of token timestamps recorded in a recent window such that there are exactly n.

This is useful for the case when you know n tokens are available, but don’t have an exact history of token timestamps. For example, if you get a “call limit exceeded” error from an API which uses this style of call tracking, you know that there must be rate token timestamps should be recorded in the last period, leaving 0 available now.

This function helps make guesses when mutating a recent window in this case.

If it needs to record new token timestamps, by default, it will record them all at query_time. In other words, the default is to guess that some tokens were consumed just now. This is the most conservative guess, as it means the consumer will need to wait the longest possible time before consuming more.

If it needs to remove some token timestamps, by default it will remove them at random.

Will perform a BEGIN IMMEDIATE transaction on the database. The transaction will be held while fill or prune are called.

Parameters:

Parameters:	n – The number of tokens that should exist in the recent window of query_time - period to query_time. query_time – The end of the window. If None, defaults to now. fill – A function to guess when some token timestamps might have occurred, if new ones must be recorded. Takes (list_of_timestamps, query_time, n) as arguments, and must return a new list of timestamps of length n. The new timestamps should all be within query_time - preiod and query_time. The default returns [query_time] * n. prune – A function to guess which timestamps should be deleted, if some must be removed. Takes (list_of_timestamps, query_time, n) as arguments, and must return a new list of timestamps of length n, where each timestamp occurs in the old list at least as many times as in the new one. The default returns random.sample(list_of_timestamps, n).
Returns:	The new list of timestamps in the window.

n – The number of tokens that should exist in the recent window of query_time - period to query_time.
query_time – The end of the window. If None, defaults to now.
fill – A function to guess when some token timestamps might have occurred, if new ones must be recorded. Takes (list_of_timestamps, query_time, n) as arguments, and must return a new list of timestamps of length n. The new timestamps should all be within query_time - preiod and query_time. The default returns [query_time] * n.
prune – A function to guess which timestamps should be deleted, if some must be removed. Takes (list_of_timestamps, query_time, n) as arguments, and must return a new list of timestamps of length n, where each timestamp occurs in the old list at least as many times as in the new one. The default returns random.sample(list_of_timestamps, n).

Returns:

The new list of timestamps in the window.

try_consume(n, leave=None)¶

Try to consume some tokens.

If there are fewer tokens available than the number requested, this function will just signal failure, without waiting for a token.

This will perform a BEGIN IMMEDIATE transaction on the database while querying and updating state.

Parameters:	n – The number of tokens to try to consume. leave – A number of tokens. Only succeed if we would have this many tokens left over, after n are consumed.
Returns:	A (success, tokens, list_of_timestamps, query_time) tuple.