API

birding.spout.DispatchSpout()[source]

Factory to dispatch spout class based on config.

class birding.spout.TermCycleSpout[source]
initialize(stormconf, context)[source]

Initialization steps:

  1. Prepare sequence of terms based on config: TermCycleSpout/terms.
next_tuple()[source]

Next tuple steps:

  1. Emit (term, timestamp) for next term in sequence w/current UTC time.
class birding.bolt.TwitterSearchBolt[source]
initialize(conf, ctx)[source]

Initialization steps:

  1. Get search_manager_from_config().
  2. Prepare to track searched terms as to avoid redundant searches.
process(tup)[source]

Process steps:

  1. Stream in (term, timestamp).
  2. Perform search() on term.
  3. Emit (term, timestamp, search_result).
class birding.bolt.TwitterLookupBolt[source]
initialize(conf, ctx)[source]

Initialization steps:

  1. Get search_manager_from_config().
process(tup)[source]

Process steps:

  1. Stream in (term, timestamp, search_result).
  2. Perform lookup_search_result().
  3. Emit (term, timestamp, lookup_result).
class birding.bolt.ElasticsearchIndexBolt[source]
initialize(conf, ctx)[source]

Initialization steps:

  1. Prepare elasticsearch connection, including details for indexing.
process(tup)[source]

Process steps:

  1. Index third positional value from input to elasticsearch.
class birding.bolt.ResultTopicBolt[source]
initialize(conf, ctx)[source]

Initialization steps:

  1. Connect to Kafka.
  2. Prepare Kafka producer for tweet topic.
  3. Prepare to track tweets published to topic, to avoid redundant data.
process(tup)[source]

Process steps:

  1. Stream third positional value from input into Kafka topic.
birding.search.search_manager_from_config()[source]

Get a SearchManager instance dynamically based on config.

config is a dictionary containing class and init keys as defined in birding.config.

class birding.search.SearchManager[source]

Abstract base class for service object to search for tweets.

lookup(id_list, **kw)[source]

Lookup list of statuses, return results directly from source.

Input can be any sequence of numeric or string values representing status IDs.

lookup_search_result(result, **kw)[source]

Perform lookup() on return value of search().

search(q=None, **kw)[source]

Search for q, return results directly from source.

class birding.twitter.Twitter(format=u'json', domain=u'api.twitter.com', secure=True, auth=None, api_version=<class 'twitter.api._DEFAULT'>, retry=False)[source]
classmethod from_oauth_file(filepath=None)[source]

Get an object bound to the Twitter API using your own credentials.

The twitter library ships with a twitter command that uses PIN OAuth. Generate your own OAuth credentials by running twitter from the shell, which will open a browser window to authenticate you. Once successfully run, even just one time, you will have a credential file at ~/.twitter_oauth.

This factory function reuses your credential file to get a Twitter object. (Really, this code is just lifted from the twitter.cmdline module to minimize OAuth dancing.)

class birding.twitter.TwitterSearchManager(twitter)[source]

Service object to provide fully-hydrated tweets given a search query.

static dump(result)[source]

Dump result into a string, useful for debugging.

lookup(id_list, **kw)[source]

Lookup list of statuses, return results directly from twitter.

Input can be any sequence of numeric or string values representing twitter status IDs.

lookup_search_result(result, **kw)[source]

Perform lookup() on return value of search().

search(q=None, **kw)[source]

Search twitter for q, return results directly from twitter.

birding.twitter.TwitterSearchManagerFromOAuth()[source]

Build TwitterSearchManager from user OAuth file.

Arguments are passed to birding.twitter.Twitter.from_oauth_file().

class birding.gnip.Gnip(base_url, stream, username, password, **params)[source]

Simple binding to Gnip search API.

search(q, **kw)[source]

Search Gnip for given query, returning deserialized response.

session_class

alias of Session

class birding.gnip.GnipSearchManager(*a, **kw)[source]

Service object to provide fully-hydrated tweets given a search query.

static dump(result)[source]

Dump result into a string, useful for debugging.

lookup(id_list, **kw)[source]

Not implemented.

lookup_search_result(result, **kw)[source]

Do almost nothing, just pass-through results.

search(q, **kw)[source]

Search gnip for q, return results directly from gnip.

birding.config.get_config(filepath=None, default_loader=None, on_missing=None)[source]

Get a dict for the current birding configuration.

The resulting dictionary is fully populated with defaults, such that all valid keys will resolve to valid values. Invalid and extra values in the configuration result in an exception.

See Configuring birding (module-level docstring) for discussion on how birding configuration works, including filepath loading. Note that a non-default filepath set via env results in a OSError when the file is missing, but the default filepath is ignored when missing.

This function caches its return values as to only parse configuration once per set of inputs. As such, treat the resulting dictionary as read-only as not to accidentally write values which will be seen by other handles of the dictionary.

Parameters:
  • filepath (str) – path to birding configuration YAML file.
  • default_loader (callable) – callable which returns file descriptor with YAML data of default configuration values
  • on_missing (callable) – callback to call when file is missing.
Returns:

dict of current birding configuration; treat as read-only.

Return type:

dict

birding.shelf.shelf_from_config()[source]

Get a Shelf instance dynamically based on config.

config is a dictionary containing shelf_* keys as defined in birding.config.

class birding.shelf.Shelf[source]

Abstract base class for a shelf to track – but not iterate – values.

Provides a dict-interface.

clear()[source]

Remove all items from the shelf.

delitem(key)[source]

Remove an item from the shelf.

getitem(key)[source]

Get an item’s value from the shelf or raise KeyError(key).

pack(key, value)[source]

Pack value given to setitem, inverse of unpack.

setitem(key, value)[source]

Set an item on the shelf, with the given value.

unpack(key, value)[source]

Unpack value from getitem.

This is useful for Shelf implementations which require metadata be stored with the shelved values, in which case pack should implement the inverse operation. By default, the value is simply passed through without modification. The unpack implementation is called on __getitem__ and therefore can raise KeyError if packed metadata indicates that a value is invalid.

class birding.shelf.FreshPacker[source]

Mixin for pack/unpack implementation to expire shelf content.

expire_after = 300

Values are no longer fresh after this value, in seconds.

freshness()[source]

Clock function to use for freshness packing/unpacking.

is_fresh(freshness)[source]

Return False if given freshness value has expired, else True.

pack(key, value)[source]

Pack value with metadata on its freshness.

set_expiration(expire_after)[source]

Set a new expiration for freshness of all unpacked values.

unpack(key, value)[source]

Unpack and return value only if it is fresh.

class birding.shelf.LRUShelf(maxsize=1000)[source]

An in-memory Least-Recently Used shelf up to maxsize..

class birding.shelf.FreshLRUShelf(maxsize=1000)[source]

A Least-Recently Used shelf which expires values.

class birding.shelf.ElasticsearchShelf(index='shelf', doc_type='shelf', **elasticsearch_init)[source]

A shelf implemented using an elasticsearch index.

class birding.shelf.FreshElasticsearchShelf(index='shelf', doc_type='shelf', **elasticsearch_init)[source]

An shelf implementation with elasticsearch which expires values.