Getting text from Twitter and making it accessible to a Python app


(Daniel Hollands) #1

@Twisted and I are going to work on a Raspberry Pi project together - specifically we’re looking to use the UnicornHAT to display scrolling text received in the form of @mentions to its Twitter account.

Now, the bit which you might assume is the hard bit - specifically getting messages to scroll across the LEDs - is probably the easy bit thanks to a tutorial on how to do exactly that inside the Raspberry Pi Projects book.

I’m also not all that concerned about getting the messages from Twitter either, as I’m sure there will be a python library which will make that easy.

No, the bit which concerns me is the linking of the two things above - i.e. taking the messages from Twitter and putting them into a buffer, ready to be served up, one after the other, by the LED script. It is this part of the process for which I’m looking for help.

My idea for solving this problem is to use a CSV file as a buffer - every couple of minutes the Twitter script will grab all the new @mentions from Twitter and put them into the bottom of a CSV file. At the same time, the LED script watch this file for changes, and any time it has data in it, it’ll grab (and remove) the top line to process it.

I have no idea if this is a good idea, or if it can be improved in some way, but it’s the best I have, so I’d be welcome to ideas.

Cheers.

(PS, two extensions to this idea, which I’m sure I’ll be looking for help on at some point - the first is to have an animation play on the HAT while it’s idle and waiting for messages - and the second is to build it into an attractive display unit - but first things first)


(Steve Pitchford) #2

Hi.

I’ve done something very similar - csv as a form of IPC is crude, but with flock tends to work well in this sort of purpose. System V shared memory or a cache ( redis ? ) are alternatives, but possibly overkill.

In order to avoid errors due to a failing consumer, you could keep the display csv code read-only, and have some manner of last record pointer, as long as the file size is maintained by the writer to a size of a few hundred lines, the overheads in this use case should be minimal?

Hope that helps.

Steve


(Steve Jalim) #3

Sounds like a good use-case for a message queue - the tweets are pumped into it and the LED script pops each one off the queue as/when they are there.

I’m a big fan of http://python-rq.org/

PS I used tweepy in the past to get tweets


(Andy Wootton) #4

Having never coded on Unix or Linux, I was going to suggest Unix pipes between 2 processes and polling for next message, but as they’re messages, this sounds a better idea. Not much point glueing the messages together just to cut them up again.


(Marc Cooper) #5

Yup, what @stevejalim said. It sounds like a queuing problem. So… let’s make it interesting :slightly_smiling:

It appears that redis is available on the Pi, and redis has a handy data type that provides a blocking queue, brpoplpush.

That means that you can lpush each tweet to a redis list, and read the list in a blocking loop, something like:

Read tweets and put them in a redis list (works in any process)

$redis.lpush TWEETS_LIST_KEY, tweet_text

Process tweet queue/list daemon

while($running) do
  $redis.brpoplpush TWEETS_LIST_KEY, TWEET_KEY # this will block
  handle_tweet($redis.lpop TWEET_KEY)
end

(Steve Jalim) #6

rq uses redis and basically does just what you sketched out there, plus plugs in a mechanism for going from arbitrary Python object (eg a Tweet object) to storable string. It also has a handy failure/retry queue. I love it


(Marc Cooper) #7

Heh, sounds like what I’ve done in ruby. I use redis for so many things. It solves so many problems. Rather, it creates so many more design options that most languages can’t solve easily. (Probably why I like elixir so much :wink: )


(Daniel Hollands) #8

This sounds like a winner to me. I don’t have much experience with redis, but this sounds like the perfect opportunity to learn.

@Twisted and I were having a play with python-twitter using the shell last night, and even made our first tweet using it:

(So it might not be as good as “What hath God wrought?”, but it’s better than nothing)

But I think it’s probably worth having a look at tweepy as well, to see how it compares.


(Steve Jalim) #9

Go with python-twitter - it works, move on :slight_smile:


(Stuart Langridge) #10

You certainly can use a queue of your choice; that’s a pretty standard way to do this (you are literally queueing up the tweets for display). Me, personally, I’d write each tweet as a separate file in the directory, with filenames that increase alphabetically (you could do this by time, or (I think) use the tweet IDs themselves). That way you don’t need to care about file locking, and maintaining your place in the “queue” is easy; if babble crashes, just start again from the first file.


(Daniel Hollands) #11

That’s an even better idea - I guess I’d just dump the contents of the json into the file, allowing for even less faffing about.


(Daniel Thompson) #12

+1 for that approach. Maybe add a timestamp to the filenames, just to be safe.

In theory queuing tools are simple, but inevitably the complication creeps in. Processes need to kept running, so you add supervisord. Security needs to be configured, endpoints need to be named correctly and gems/npms need to be installed, so you end up with config files for each environment and ansible files for deploying everything you need. Then you write bash scripts querying the queue and diagnosing issues, purging the queue, sending a test message etc.

OTOH, files and folders just works. The ‘service’ (i.e. the OS) is always running, API support is mature and well documented in every language, it preserves state during shutdown really well, and you get a bunch of diagnostic tools for free (like ‘ls’ and ‘cat’).


(Daniel Hollands) #13

I also like the idea that you can have multiple folders to represent status, even if only a backlog and a processed - should we want to keep a record of what we’ve already displayed. I’m not sure if there’s much need for that, but it’s an option.


(Marc Cooper) #14

This is interesting. Where folk are seeing simplicity in files and folders, I’m seeing unnecessary complexity, Where folk are seeing complexity in queues, I’m seeing simplicity. (I speak as someone who cut his teeth on COBOL file-based batch systems.)

It’s true that files and folders just work, and so do queues.


(Steve Jalim) #15

That’s just what i was thinking. We should definitely meet up @auxbuss :slight_smile:


(Andy Wootton) #16

I think this is interesting too. I started with card-image files and all my programming experience was on VMS which had a record-based file-system. The thing that put me off my Unix pipes idea (implemented as hidden files, I think?) was the Unix bit-stream not having automatic record structuring, like queues do. Are we seeing different things as ‘the natural way to do things’? I’m having a similar problem getting my head around the cultural differences of functional programming.

I don’t like the idea of tree structured status because that only allows binary choices. I’ve just seen seen a relationship between flags and hash-tags for the first time.


(Andy Wootton) #18

I’ve realised that last bit might not make much sense to someone who never met a dinosaur. VMS system calls loved passing parameters as bitmaps. You could pass 32 flags in a single ‘long-word’. Procedure calls were expensive, compared to Unix, because better :slightly_smiling: . Hash-tags effectively allow you to be in 2 branches of a tree at once, like in Gmail, just as bits could be ‘chunked’ to pass 2, 4, 8… bits at once, as values or masks to choose values, such as system, group, owner, world security, for read, write, execute or delete, in a 16 bit word.


(Stuart Langridge) #19

For this specific example, I think the necessary extra effort to run a queue at all (how does it restart on reboot? On crash? Does it keep all queued items when crashing? What if the process hangs but doesn’t crash?) outweighs the minimal benefits you get from it; that’s why I recommended it. For a different example (for instance, if you needed multiple queue consumers) then a queue would be loads better (if you use the filesystem for that then you end up reinventing a queue). But this isn’t really a queue; it’s a one-producer one-consumer FIFO. It doesn’t need any of the cleverness that a queue process provides.


(Marc Cooper) #20

There is effort in creating a queue. A tiny amount of effort — one line in a while loop. And there is also effort in reading and writing a CSV. There’s always effort, of course. In this case, I suggested redis, so that would need to be installed, which is also a tiny bit of effort.

There are going to be two processes whether we use CSVs or a queue, and either could fail, as could the entire machine or its power supply. No-one specified robustness, but a degree can be presumed. So, sure, there are failure cases, but I see no significant difference between the two cases.

So, I’m not disregarding robustness as premature optimisation, but it’s not a priority. If robustness was a priority, we’d likely not use a pi. YMMV :wink:

The implementation I suggested is a queue, but it’s also FIFO. Though, since there is only one producer, it doesn’t really matter.

One additional point, if loss of the tweet message were acceptable on failure of the machine (we don’t know), then the queue solution could run entirely in memory, thus requiring no disk management or maintenance. It would just work, unless it didn’t!

As you suggest, if we are likely to be doing more stuff with this system, a more flexible solution is preferable, and the queue idea is better.


(Daniel Hollands) #21

I’m keen on playing with redis, as I have very little experience with it, and this would provide the perfect opportunity to do so - but I also have some additional requirements (thought up over the past week) which I’d like the solution to take into account.

  • Ideally, I want to avoid showing the same tweet twice, so I’m going to need some way of storing the timestamp/id of the last tweet in the last batch-get from the API.

  • Additionally, I want the ability to power off the machine (to swap the SD card for another one, and play with the UnicornHAT for other projects), and have it resume from where it left off once I power it back on, taking into account (and thus displaying) any tweets which might have been sent while it was turned off.

I’ve not given these too much thought as of yet, but I’d imagine a file-based solution, using a standard file-naming convention, can solve them both - where as I’m not sure that redis can (although, as discussed, I’m new to redis, so I’m not entirely sure what it can do).

As always, feedback welcome.