Configuring birdingΒΆ
birding uses a validated configuration file for runtime details.
Configuration files use a YAML format. All values have a
default (below) and accept values of the same name in the configuration file,
which has a default path of birding.yml
in the current working
directory. If needed, the BIRDING_CONF
environment variable can point to
the filepath of the configuration file.
The scope of the configuration file is limited to details of birding itself, not of Storm-related topics. Storm details are in the project topology definition.
When a configuration value is a Python dotted name, it is a string reference to
the Python object to import. In general, when the value is just an object name
without a full namespace, its assumed to be the relevant birding namespace,
e.g. LRUShelf
is assumed to be birding.shelf.LRUShelf
. Respective
*_init
configuration values specify keyword (not positional) arguments to
be passed to the class constructor.
See Using birding in production for further discussion on configuration in production environments.
For advanced API usage, see get_config()
. The config includes an
Appendix to support any additional values not known to birding, such that
these values are available in config['Appendix']
and bypass any
validation. This is useful for code which uses birding’s config loader and
needs to define additional values.
Defaults:
Spout: TermCycleSpout
TermCycleSpout:
terms:
- real-time analytics
- apache storm
- pypi
SearchManager:
class: birding.twitter.TwitterSearchManagerFromOAuth
init: {}
TwitterSearchBolt:
shelf_class: FreshLRUShelf
shelf_init: {}
shelf_expiration: 300
ElasticsearchIndexBolt:
elasticsearch_class: elasticsearch.Elasticsearch
elasticsearch_init:
hosts:
- localhost: 9200
index: tweet
doc_type: tweet
ResultTopicBolt:
kafka_class: pykafka.KafkaClient
kafka_init:
hosts: 127.0.0.1:9092 # comma-separated list of hosts
topic: tweet
shelf_class: ElasticsearchShelf
shelf_init: {}
shelf_expiration: null
Appendix: {}