Getting text from Twitter and making it accessible to a Python app

(Steve Jalim) #22

Redis persists to disk at intervals and has the equivalent of a relational DB’s write-ahead logging (docs here), so it’s not out of the running :slight_smile:

(Stuart Langridge) #23

You jest, surely? Don’t write your own queue! There are a bunch of great queueing servers out there. I suspect you didn’t mean that, but instead meant that there’s a while loop which talks to the queueing server – redis or whatever, as you note. So, I think you’re undervaluing the effort required in configuration and daemon management: you need to install your queue manager (not hard), and then configure it to work the way you want. That bit is hard, because you have to understand the queue software and how its config works and what that means: do you want the queue to be preserved if the queue manager crashes? How do you configure your software to do that? To restore the queue after a dirty restart? All of this stuff is possible: none is trivial. Redis is better than this at most because Salvatore works really really hard on this particular aspect, but you still have to understand it quite well to configure it right. That’s why, for example, running the queue purely in memory isn’t a good idea – you don’t just lose the message you’re working on, you lose all messages in the queue on any crash, which is not ideal at all. Basically, I’m of the opinion that using the right tools for the right job is, of course, the right thing to do… but using a very powerful tool to meet a very simple need is often extremely difficult exactly because you have to work out how to configure all the power out of the tool so that it only meets your simple use case!

However. I should note, here, that I have known form with thinking “this project doesn’t need anything complex like Django, I’ll just lash something together” and then six months down the line realising that the project is still going, is more complex now than it was then, and I wish I had the django admin lying around. So there is a reasonable case for using something with way more features than you need now, because you might need them later and adding them later will be really hard. This idea is in direct contravention to the whole “keep it simple, stupid” approach, and deciding where the boundary lies is something that I’m working on getting better at myself!

(Marc Cooper) #24

<\sigh> Clearly.

I’m really not.

I said:

There is no need to turn off redis’ persistence.

See above. I use this approach all the time.

This isn’t over-engineering. A powerful tool is fine when it’s simple to use (and, as in this case, has a negligible overhead). No-one’s suggesting rabbitmq or even AWS SQS.

That also solves the initial problem, both simply and elegantly.

My tag line is: obsessed with simplicity. As I mentioned earlier, I regard the CSV solution as more complex.

My day job is, basically, helping businesses get out of the mess they’ve created. Typically, a business that has become successful without any focus on the quality of their software (by which I chiefly mean maintainability) and that now finds itself painted into a corner (where the limiting factor has become the ability to adapt their software).

After lack of tests, the biggest problem I encounter is home brewed solutions to problems that have grown out of all proportion to the original use and on which the whole system depends. You probably know the rest of the story: no-one knows how deep the rabbit hole goes, and no-one want to touch it.

On that basis, when developing, I practice a kind of “YAGNI, but you probably will”. If the overhead to provide vastly more flexibility and functionality is very small, then it’s worth doing from the outset.

I should add that we are considering redis here, which I regard as ubiquitous nowadays.

(Stuart Langridge) #25

I think we may be talking at cross-purposes here. Let me try and explain what I mean a different way.

If you already know and like redis, sure, definitely use it! But if you’re coming at this from the perspective of “do I need to use a queue? Which queue?” then you don’t know whether to pick redis, or beanstalkd, or rabbitmq, or SQS, or celery, or, or, or. And you don’t know whether redis has persistence enabled by default, or whether beanstalkd does, or whether rabbitmq does… and you don’t know how to find that out without a relatively deep dive into the documentation for each of those systems, which is a lot of work… and you don’t know how to ensure that the queue server starts on boot, and restarts on crash, and restores its data automatically in both of those situations, and how to configure how much memory it uses, and …

As I say: redis does a pretty good job of sensible defaults for most of these things. But if you’re not already a semi-experienced redis user, you don’t know this. And even if you are, you don’t know it about beanstalkd, and so you don’t know how to choose which is best for your current project.

I think that there is a huge amount of sheer decision paralysis which comes from not knowing anything about an area and having to choose between all the different possibilities there. That’s quite a terrifying experience, for most people. I don’t think that using redis is overengineering. But I do think that it’s difficult to get to the stage where you know enough about it to be confident that it’s actually doing what you want, and normally getting to that stage involves getting bitten a bunch of times when you thought it was doing a thing and actually it isn’t. As noted, Salvatore really cares about the default experience and so works hard on this; others don’t anywhere near as much.

(Marc Cooper) #26

I guess we’re miles off topic now :slight_smile: I pretty much agree with what you’ve said, though I believe you overstate the complexity, and I don’t share your view about the complexity of learning redis. It’s a k/v store with tons of extras; learn them as you go. (Geez, beanstalkd, folk I talk to keep bringing that up.)

For sure, knowledge and experience is a bonus. However, @LimeBlast asked a question, and I suggested a solution. I’m more than happy to extend that to knowledge sharing. I genuinely wish more folk in our business asked questions and persisted. If everyone goes through trial and error on each problem, then progress will be excruciatingly slow.

At bottom, I’d prefer (wish?) folk didn’t revert to CSV (or even flat file) solutions as the default “simple” solution. I don’t believe they are, in most cases. (Caveat: CSV solutions are valid, especially for exports for users requiring spreadsheet data, and slow moving, large datasets that can be compressed. And yup, flat files are cool too. I use them :slight_smile: )

(Stuart Langridge) #27

That’s fair comment, indeed. :slight_smile: We have different philosophies on how to set up a project, I think, but that doesn’t make either of them wrong, and I suspect this discussion has probably been enlightening to anyone wondering which path to go down – some will go your way and some mine, and we can grab a beer and talk about who was right at some point :slight_smile:

I'm a Ruby girl, in a Python world (sung to the tune of "Barbie Girl")
(Marc Cooper) #28

:thumbsup: (meet the minimum character count)

(Andy Wootton) #29

I’m not sure how I tripped over this thread again but since I saw it the first time, I’ve watched this

It contains some interesting discussion on ‘simple’, ‘complex’ and ‘easy’.

You are in a maze of twisty braids…