Dejan Levec

Not so random thoughts about random

For one of my websites, I needed solution for randomly displaying different rows from MySQL database. As probably everyone knows, the easiest solution is to use following SQL:

select * from table order by rand()

There are no problems with using that, except one: It’s REALLY SLOW. This might not be a problem with table with only few rows, but with bigger tables it really decreases the performance.

After spending a few hours of Googling and testing different methods, I find one, that works really great.

Following query is a lot faster than order by rand() and works without a problem:

SELECT * FROM Table T JOIN (SELECT CEIL(MAX(ID)*RAND()) AS ID FROM Table) AS x ON T.ID >= x.ID LIMIT 1;

I found this solution on Jay Paroline’s blog: http://wanderr.com/jay/order-by-slow/2008/01/30/

I wondered if there is an even faster approach to get random row and then it hit me. So simple solution and it works even faster.

Why use only MySQL if we’re building a website? Why wouldn’t we rather use MySQL in connection with PHP (or other language) to get random result?

Solution

$x = mysql_fetch_array(mysql_query(“SELECT COUNT(*) FROM table”));

$y = $x[0]-1;

$x = rand(0,$y);

$query = mysql_query(“SELECT * FROM table LIMIT “.$x.”, 1″);

Explanation
Firstly, we get count of all rows in table (this value can be memcached if we want even better performance).

Then we subtract 1 from total count and a get random number between 0 and (n-1).

After that we simply use limit to select 1 row at position $x.

I implemented this on my site and everything was a bit faster, but this solution can only fetch one row at a time. In my case I have to display 10 – 40 random rows at once.

So I searched for SQL to randomly get multiple rows and I found following solution:

mysql_query(“SELECT * FROM (
SELECT @cnt := COUNT( * ) +1, @lim :=10
FROM table
)vars
STRAIGHT_JOIN (
SELECT table . * , @lim := @lim -1
FROM table
WHERE (
@cnt := @cnt -1
)
AND RAND( “.time(0).” ) < @lim / @cnt
)i”);

Again we need to use something else than MySQL to get random seed in this case current time in seconds is used.

This is a great solution, but still slow.

I thought of using PHP’s rand to randomly select rows but in that case I will have to have sequential IDs and because of that we won’t be able to delete rows somewhere in the middle.

So I ended up with following PHP code:

Multiple rows solution

$query = mysql_query(“select id from table”);

$ids = array();

while($id = mysql_fetch_query($query)) {
$ids[] = $id[0];
}

$temp = ”;

for($i = 0; $i < 40; $i++) {
$temp .= $ids[rand(0, count($ids)-1)].’,’;
}

$temp = substr($t, 0, strlen($t)-1);

$query = mysql_query(“select * from table where id in (“.$t.”)”);

Explanation
This code might seem a bit harder but it’s really easy.
First of all we get all IDs from table and put it in array $ids.

Then we use for loop to select 40 random IDs from array and add it to $temp variable and add comma, to separate the.

After that we remove last comma and then select rows from table where id matches ones from $temp.

Result
In my case I have about 50 000 rows and I needed to display 40 random rows. With previous solution with joins the page executed in about 0.7 to 0.8 seconds.
With my solution I speed it up to about 0.1 – 0.15 seconds of execution time. This is about 85% performance improvement.

Leave a Reply

Your email address will not be published. Required fields are marked *