Interesting application of AI in database indexing

(Greg Robson) #1

Interesting paper by MIT and Google. You can easily get the gist from the introduction and results table.

tl;dr – by using AI to learn the structure of the data (is it dispersed/narrow, regular/random etc) they got AI to decide what indexing algorithm to use (with the idea that the AI could index parts of the data in different ways if it thought that was best). Results were 50%+ speed boost and ~90% reduction in index size in some cases! Apparently the GPU speed increases and dedicated Tensorflow chips make the AI part really quick to compute now.

Considering that AI can now become a world-champion chess player from scratch in a matter of hours, I would think that this is something that’s going to become quite relevant!

(Andy Wootton) #2

I certainly wouldn’t start training for a career in database optimisation.

(Andy Wootton) #3

When I worked on DEC VMS, the native database was RdB, which was bought by Oracle. They immediately started porting RdB features into Oracle and applying pressure on customers to move over. I think automatic index optimisation was one of the main reasons our DBAs were reluctant to move.

I just checked what happened and was amazed to discover the product is still alive, since 1994!
HP didn’t get people off VMS either, in mission critical operations. VMS was always good for distributed databases because it can do cluster-wide record-level locking in the distributed file system.

I think the Unix version, DFS, died with object brokers and the Distributed Computing Eenvironment, DCE
I’m not angry, just disappointed that we can’t have nice things.

(Michael Almyros) #4

This is fascinating, at work we used PGHero to optimise indexes for PostgreSQL. I have been studying deep learning for a few weeks now and one of the things I am curious about is how they find the data and decide what to do with it.

(Andy Wootton) #5

I’m sure a good DBA could tell you. They’ve been hand-tweaking indexes for decades. I first came across these dark arts on IBM mainframes. They used stats tools on disk blocks and 3D graphs to identify hot spots, then experience but that’s learning isn’t it? The info will exist on the Docs or the source code about what data the DB logs. You also need usage patterns over time in case the critical periods were non-typical. You have to be careful not to optimise when there isn’t any problem. I worked in an environment where disk space use had a number of cycles. I was instructed to install a corporate tool which allowed high-water limits to be set to alarm. It was useless. We needed to know that a lower peak, earlier in the day was higher than normal, to give us time to react before high-tide. I imagine RDBMS indexes would be an order of magnitude more complex.

(Andy Wootton) #6

@malmyros I’ve just re-read my reply and realised it may have sounded rude if you are a DBA. Sorry. I meant more ‘good old’: someone who had to do it by hand when there were no tools. I knew a couple though no-one else ever really knew if they were making things better or worse.