SQL Help
I don't think we have a thread
Reply
Hello SQL-knowing chums!

I'm banging my head against a wall on this one. I'm getting close, but it's just not giving me the right answer.

Gaywood and Grim... have been doing a load of marathons lately! They've stored all their race information in this table, 'races':

Quote:
Racer | RaceDate | Position | Completed
Grim... 10/3/16 4 Y
Grim... 12/3/16 4 Y
Grim... 1/2/16 1 Y
Grim... 10/5/16 NULL N
Grim... 5/4/16 6 Y
Gaywood 6/4/16 NULL N
Gaywood 17/1/16 2 Y
Gaywood 10/9/16 4 Y
Gaywood 4/8/16 4 Y
Gaywood 10/3/16 4 Y



Now, they're trying to get some friends to join in. So they've got another table too for all their friends, 'friends'. Most of their friends haven't done any races yet.

Quote:
Racer
Gaywood
Grim...
Curiosity
Bobbyaro


Now, they're pretty competitive, so they only care about the most recent results. Also they want to annoy their friends who haven't joined in yet. So this is the output they want:

Quote:
Racer | Position | Completed
Grim... NULL N
Gaywood 4 Y
Curiosity NUL:L NULL
Bobbyaro NULL NULL


This is proving challenging. Lots of having to do groupings, but then the groupings bugger up the filtering by most recent, etc. This is what I've got so far:

Code:
select f.Racer, r.Position, r.Completed
from   (select Racer, MAX(Races.[RaceDate]) as RecentRace, Position, Completed
             from races) group by races.Racer, Races.Position, Races.Completed) as r
right outer join friends as f
on r.Racer=f.Racer
group by f.Racer, r.Position, r.Completed


It's horridly complicated, probably wildly inefficient, and it still doesn't bloody work properly. Thoughts?
What output are you trying for? I can't see why Grim... is NULL.

[edit] Oh, most recent, right? Hang on.
[edit] Wait, that won't work
I'd be tempted to break the task up a little (because I'm crap at SQL mainly). Does it work up to the point you include the ones who haven't run yet?

Of course the real answer is to stop people running marathons.
This seems to be it.

Code:
SELECT * FROM friends f
LEFT JOIN (SELECT * FROM races ORDER BY RaceDate DESC LIMIT 1) r ON r.Racer = f.Racer
Grim... wrote:
This seems to be it.

Code:
SELECT * FROM friends f
LEFT JOIN (SELECT * FROM races ORDER BY RaceDate DESC LIMIT 1) r ON r.Racer = f.Racer


I think that works for this data set, because the subquery returns Gaywood's race and it's the overall most recent race - but because that subquery returns only one result, if Grim... had completed his most recent race I don't think it would return that info, would it?
devilman wrote:
I'd be tempted to break the task up a little (because I'm crap at SQL mainly). Does it work up to the point you include the ones who haven't run yet?

Of course the real answer is to stop people running marathons.


Yeah - I mean I could just iterate through the values in friends and run a query for each one, but the real world usage for this will have hundreds of 'friends' so it seems pretty inefficient.
Just get Myp to run it. Only 2 friends makes it easy.
RemindMe! Tomorrow
Can you make a sqlfiddle out of it?
Cras wrote:
Grim... wrote:
This seems to be it.

Code:
SELECT * FROM friends f
LEFT JOIN (SELECT * FROM races ORDER BY RaceDate DESC LIMIT 1) r ON r.Racer = f.Racer


I think that works for this data set, because the subquery returns Gaywood's race and it's the overall most recent race - but because that subquery returns only one result, if Grim... had completed his most recent race I don't think it would return that info, would it?

Should do. That's the order of the subquery.
I mean LIMIT 1 means the subquery can only contain 1 row. So when you join that with the friends table you can only possibly have 1 row that contains any data in the results.
It contains one row per row, if that makes sense.
Wtf.
Marathon whats?
Note that on a large dataset I suspect it would be horrendously slow.

Bro do you even unique identifier?
Am I sorely misunderstanding how something in there works then? It doesn't rerun the subquery for each value of racer, does it?
Grim... wrote:
Note that on a large dataset I suspect it would be horrendously slow.

Bro do you even unique identifier?


How would an ID help, in this instance?
Cras wrote:
Am I sorely misunderstanding how something in there works then? It doesn't rerun the subquery for each value of racer, does it?

Yup. It'll optimise hard, but that's essentially what it does.

A unique identifier on the friends table would let you join with an integer rather than a text field and index way better.

Even making a CRC 32 hash would help.

Obviously I'm assuming you have a rather larger dataset than the one presented.
Also, note that I'm testing in MySql, and I assume you're using MSSql. If it doesn't work with subqueries in the same way it's a bit silly though.
Marathon wank sessions? Marathon burger eating sessions? Marathon firefly watching? WHAT
Grim... wrote:
Also, note that I'm testing in MySql, and I assume you're using MSSql. If it doesn't work with subqueries in the same way it's a bit silly though.


It bloody should do. It uses TOP instead of LIMIT but fuck all difference there. I'm just astonished that I'm actually that wrong about how subqueries work.
Is this for song wars?
Cras wrote:
Grim... wrote:
Also, note that I'm testing in MySql, and I assume you're using MSSql. If it doesn't work with subqueries in the same way it's a bit silly though.


It bloody should do. It uses TOP instead of LIMIT but fuck all difference there. I'm just astonished that I'm actually that wrong about how subqueries work.

It's a fucking odd use of one, but it does work.
I'm convinced you're wrong, but I can't be arsed checking until tomorrow :D
I am intrigued as to whether this works but have no pint based interest.
As I expected, everything except the data for the 1 most recent race is null in the output.

Unless mySQL really does do subqueries totally differently, that ain't gonna work.
Do you want the most recent race for each unique racer where each racer could have a different most recent race date, or the most recent race for the whole result set that is shared by different racers?
Mr Russell wrote:
Do you want the most recent race for each unique racer where each racer could have a different most recent race date, or the most recent race for the whole result set that is shared by different racers?


The former. So the resulting output will have one line for each racer, and their corresponding most recent race results if they have ever done a race
I done a fiddle - can the URL just be shared?
Cras wrote:
I done a fiddle - can the URL just be shared?

I think you copy and paste it, and any edits to that generate a new URL
Although fiddle appears to be proper fucked
Does something like this work to get started, or have I really misunderstood it?

Code:
SELECT Races.Racer, f.RaceDate FROM Races
LEFT JOIN (
SELECT Racer, MAX(RaceDate) as RaceDate from Friends
Group BY Racer
) f ON f.racer = Races.Racer
Mr Russell wrote:
Does something like this work to get started, or have I really misunderstood it?

Code:
SELECT Races.Racer, f.RaceDate FROM Races
LEFT JOIN (
SELECT Racer, MAX(RaceDate) as RaceDate from Friends
Group BY Racer
) f ON f.racer = Races.Racer


That would work absolutely fine - but it won't get you the rest of the info, and that's where it starts falling apart.
In that case I simply can't see a way of doing it without adding a unique ID to the racing table.

Code:
SELECT * FROM friends f
LEFT JOIN (
    SELECT * FROM races r
    WHERE id = (
        SELECT id FROM races r2
        WHERE r.racer = r2.racer
        ORDER BY r2.racedate DESC LIMIT 1
    )
) r3 ON r3.racer = f.racer
Here's a SQLFiddle schema, if it helps anyone else.

ZOMG Spoiler! Click here to view!
CREATE TABLE races
(`Racer` varchar(7), `RaceDate` varchar(7), `Position` varchar(4), `Completed` varchar(1))
;

INSERT INTO races
(`Racer`, `RaceDate`, `Position`, `Completed`)
VALUES
('Grim...', '10/3/16', '4', 'Y'),
('Grim...', '12/3/16', '4', 'Y'),
('Grim...', '1/2/16', '1', 'Y'),
('Grim...', '10/5/16', NULL, 'N'),
('Grim...', '5/4/16', '6', 'Y'),
('Gaywood', '6/4/16', NULL, 'N'),
('Gaywood', '17/1/16', '2', 'Y'),
('Gaywood', '10/9/16', '4', 'Y'),
('Gaywood', '4/8/16', '4', 'Y'),
('Gaywood', '10/3/16', '4', 'Y')
;


CREATE TABLE friends
(`Racer` varchar(9))
;

INSERT INTO friends
(`Racer`)
VALUES
('Gaywood'),
('Grim...'),
('Curiosity'),
('Bobbyaro')
;


Edit -- oh, I see, you can't actually run any queries.
It'd help more if SQL fiddle wasn't broken :(
Cras wrote:
Mr Russell wrote:
Does something like this work to get started, or have I really misunderstood it?

Code:
SELECT Races.Racer, f.RaceDate FROM Races
LEFT JOIN (
SELECT Racer, MAX(RaceDate) as RaceDate from Friends
Group BY Racer
) f ON f.racer = Races.Racer


That would work absolutely fine - but it won't get you the rest of the info, and that's where it starts falling apart.

OK, so you need to rank the whole races table but for each racer.
This StackOverflow question looks similar:
http://stackoverflow.com/questions/7473 ... h-customer

And this one looks to be very similar to how Grim... was suggesting to do it:
http://stackoverflow.com/questions/3313 ... h-customer
Wait, shit, I think I've got the filthiest idea about how to do it without an ID table.
Check out this shit:

Code:
SELECT * FROM friends f
LEFT JOIN (
    SELECT * FROM races r
    WHERE CONCAT(r.racer, r.racedate) = (
        SELECT CONCAT(r2.racer, r2.racedate) FROM races r2
        WHERE r.racer = r2.racer
        ORDER BY r2.racedate DESC LIMIT 1
    )
) r3 ON r3.racer = f.racer


Probably going to run into all sorts of trouble if you have two races on the same day, mind.

[edit] Oh, except then it won't order it anyway, so I don't think it would actually be a problem.
Grim... wrote:
Check out this shit:

Code:
SELECT * FROM friends f
LEFT JOIN (
    SELECT * FROM races r
    WHERE CONCAT(r.racer, r.racedate) = (
        SELECT CONCAT(r2.racer, r2.racedate) FROM races r2
        WHERE r.racer = r2.racer
        ORDER BY r2.racedate DESC LIMIT 1
    )
) r3 ON r3.racer = f.racer


Probably going to run into all sorts of trouble if you have two races on the same day, mind.


Works a treat, but yeah it's a little bit filthy :D

The real world data has that data actually be a timestamp which goes down to the millisecond, so clashes are highly unlikely.

Thanks all!
Have you tried more inner joins? Inner joins always help.

I know SQL
Cras wrote:
The real world data has that data actually be a timestamp which goes down to the millisecond, so clashes are highly unlikely.

Quoted for posterity when your data goes down in flames because this assumption was incorrect.
Doctor Glyndwr wrote:
Cras wrote:
The real world data has that data actually be a timestamp which goes down to the millisecond, so clashes are highly unlikely.

Quoted for posterity when your data goes down in flames because this assumption was incorrect.

I don't think it matters - the date is key for the ordering, so if you've got matching dates it's not going to work anyway.

That said - ID columns, motherfucker!

Lonewolves wrote:
Have you tried more inner joins? Inner joins always help.

An inner join would be the opposite of what you want, as you need the null rows.
Grim... wrote:
Lonewolves wrote:
Have you tried more inner joins? Inner joins always help.

An inner join would be the opposite of what you want, as you need the null rows.

Listen, leave this to the sequel experts
Lonewolves wrote:
Grim... wrote:
Lonewolves wrote:
Have you tried more inner joins? Inner joins always help.

An inner join would be the opposite of what you want, as you need the null rows.

Listen, leave this to the sequel experts

k, soz
Grim... wrote:
That said - ID columns, motherfucker!


I could add an ID column easily - but it amuses me more not to.
Have you tried truncating the table?
I'm reminded of a recent request from one of my colleagues in Canada who need a custom data extract doing. I told him I'd need to look how the data was structured, but before I'd had chance to look into it, I got this reply

Quote:
Do you mind querying select * from the database to see what we could have?


A little knowledge is a dangerous thing.
Lonewolves wrote:
Have you tried truncating the table?

Hey, I know about that! If you have queries that are running slow, truncating the tables they're run on makes future ones way faster!
Page 1 of 3 [ 137 posts ]