I am doing SQL for more than ten years now, but I just cannot solve this

problem...:

I have the following master-data three-level table structure (A is the top

master, C is the lowest detail):

A <--FK-- B <--FK-- C

The master A has a field over that I want to SUM. But I want only the rows

in A that have a grandchild in C with a specific field content OR that do

not have a grandchild in C at all.

So I did:

SELECT SUM(A.X) FROM A LEFT JOIN B LEFT JOIN C WHERE C.Y = ... OR C.Y IS

NULL

Obviously that MUST be the wrong value, since the rows of A are multiplied

for each existing B and C row! It only calculates correctly if there is not

more than one child and grandchild.

But what to do? How to tell the SQL engine that it shall add the value of A

only once for each "real" row of A?

THIS is not correct:

SELECT SUM(DISTINCT A.X) FROM A LEFT JOIN B LEFT JOIN C WHERE C.Y = ... OR

C.Y IS NULL

NEITHER is this:

SELECT DISTINCT SUM(A.X) FROM A LEFT JOIN B LEFT JOIN C WHERE C.Y = ... OR

C.Y IS NULL

But how to do it?

I am totally confused! Please help me! :-)

Thanks

Markus

Forgot to mention:

SELECT SUM(AX) FROM A WHERE AID IN(SELECT DISTINCT AID FROM A LEFT JOIN B

LEFT JOIN C WHERE C.CY = ... OR C.CY IS NULL)

That obviously will return the correct result, but (a) I think it is way too

complex and (b) it will not work in real world (outside this simplified

quiz) since actually I need to SUM(AX) inside of a group, and the group is

to be built from a column in table B (GROUP BY B.BZ)... :-(

Thanks

Markus

"Markus KARG" <karg@quipsy.de> schrieb im Newsbeitrag

news:48ee1573$1@forums-1-dub...

Markus,

What about

SELECT SUM(AX)

FROM A

WHERE (SELECT CASE count(DISTINCT C.CY)

WHEN 0 THEN ...

WHEN 1 THEN max(C.CY)

ELSE NULL

END

FROM B LEFT JOIN C

WHERE B.AID = A.AID)

= ...

?

I did not test it, but apart from some typo removing and fine tuning it

should run.

It does not look absolutely pretty, but from an efficiency point of view,

should be more or less the same as the two LEFT JOINs that you used.

The CASE expression possibly needs some explanation:

It evaluates the number of distinct values of CY.

The WHEN 0 case is the case that no grandchild exists, so set the

subselect output directly to the expected value, which means the main

WHERE clause succeeds.

The WHEN 1 clause sets the subselect result to the unique existing value

which might or might not be your searched value "...", but whatever it is,

the main WHERE should deliver the correct result.

The ELSE case just makes the main WHERE condition to be UNKNOWN, and hence

the case that there are several different values for grandchildren is

excluded form the final result.

All the above assumes that CY cannot be NULL. If this is not the case,

then some code extensions would be necessary to deal with the NULLs which

are ignored by the count(DISTINCT C.CY).

Frank

Am 09.10.2008, 17:00 Uhr, schrieb Markus KARG <karg@quipsy.de>:

Correction:

The LEFT should be left out from the inner join, as it could introduce

NULLs.

Frank

Am 09.10.2008, 18:42 Uhr, schrieb Frank Ploessel

<fpl...@d_e.i_m_s_h_e_a_l_t_h.c_o_m>:

Frank,

thank you for this proposal! I need all the help that I can get!

In fact I do not understand what you want to reach with this quite complex

syntax that would be in any case superiour to my proposal above, but

nevertheless, I think it will not work in the real world -- due to the same

fact I mentioned in my own answer above: In reality, in need to have not

only a single sum over all, but actually a sum per group, and the group must

be built from a field in B. Since your most outer SELECT (the one with the

SUM(AX)) is not containing B, that group cannot be built -- and when adding

it, you're screwed again thanks to the "magic multiplication" of rows...

So your proposal might work for this simple quiz case (which ignores the

needed GROUP BY), it will not solve the general question: How to sum over

the LEFT table while to group over the MIDDLE table and filter on the RIGHT

table (maybe I should have started with the complete complexity in my first

posting to point out the real problem).

What I cannot believe is: This is such a simple question that thousands of

people must have everyday. But it seems there is no solution. BTW, when

dropping C and "just" filtering AND grouping on B, then SELECT DISTINCT

SUM(AX) is actually correctly working -- what I do not understand at all,

since the DISTINCT has nothing to do since there is only one single result

row. But with the original case (having three tables but not two) it

doesn't. Pretty strange.

I mean, how do others solve the problem "SUM(LEFT TABLE) and GROUP BY(middle

table) and FILTER ON(right table)"???

If I wouldn't need the SUM but just the COUNT, then I could use a simple

COUNT(DISTINCT A.PK), with no more needed tricks. But for SUM and other

aggregates, there seems to be no solution. Something like SUM(AX WITH

DISTINCT(A.PK)) would be great...

Also, do we really need ASA11 tricks like CASE and all that...? Isn't there

an ANSI SQL solution...? :-)

Thanks a lot! :-)

Markus

"Frank Ploessel" <fpl...@d_e.i_m_s_h_e_a_l_t_h.c_o_m> schrieb im Newsbeitrag

news:op.uirnvgq2j0bybf@bonw01164.internal.imsglobal.com...

Markus,

CASE expressions are ANSI standard, and are available in SQL Anywhere

since a long time already, probably version 5 or earlier. And "partial

groupings" are sometimes a bit tricky in SQL. If the DISTINCT gave the

correct result, then this is just coincidence, I would not expect this to

be the case in all situations.

As you still did not state your problem in every detail, I try to

re-formulate the it as far as I understand your requirement:

You have a foreign key relationship from A (parent) to B (children), and

one from B to C (grandchildren of A).

You want to sum column A.AX, grouped by column B.BX for records from table

A where there either is no grandchild in C or all existing grandchildren

have the value '...'.

Please correct me if this is not what you intended.

If the above is correct, the following statement should give you the

required result:

SELECT sum(AX), BX

FROM A

LEFT JOIN (SELECT B.BX as bx,

count(distinct C.CY) as cy_cnt,

max(C.CY) as cy

FROM B LEFT JOIN C

GROUP BY B.BX

) as bc

WHERE bx IS NOT NULL // there is at least one child

AND ((cy_cnt = 0 AND cy IS NOT NULL ) // there is no grandchild

OR (cy_cnt = 1 AND cy = ...) // there are only grandchildren...

) // ...fulfilling the condition

GROUP BY BX

This avoids the multiplication of rows by already grouping the derived

table.

Frank

Frank,

thank you so much for your kind help.

First of all, yes, you correctly understood the core problem.

But the reality is larger scaled: We have about eight tables in the real

world, so there are grand-grand-grand-grand....-children to filter upon,

grand-grand-children to GROUP BY, while still SUMming over the top master

(the same schematics of the problem, but more LEFT JOINed tables between the

main actors of parent, child, and grandchild). So it is doubtful whether

your proposal can be scaled to that level (at least I do not see how)? Still

I think that we all think too complex and there might be a simple trick. I

come to this conclusion because: If the sides would be reversed (SUMming on

the grandest child, filtering on the topmost parent) then it is as simple as

SELECT SUM(CX) FROM A LEFT JOIN B LEFT JOIN C GROUP BY A_PK. So I really

wonder whether it is really so complex to just "turn the sides"... :-(

And unfortunately your solution does not work: When running your proposal,

SQL Anywhere says that there is no way to join table A with the virtual

table BC. :-(

I'm really thinking whether I will just write a stored procedure for that...

But damned, there MUST be a simple solution that I just do not see... I'm

sure there is just a knot in my brain. ;-)

Thanks!

Markus

"Frank Ploessel" <fpl...@d_e.i_m_s_h_e_a_l_t_h.c_o_m> schrieb im Newsbeitrag

news:op.uirteampj0bybf@computer...

Markus,

CASE expressions are ANSI standard, and are available in SQL Anywhere

since a long time already, probably version 5 or earlier. And "partial

groupings" are sometimes a bit tricky in SQL. If the DISTINCT gave the

correct result, then this is just coincidence, I would not expect this to

be the case in all situations.

As you still did not state your problem in every detail, I try to

re-formulate the it as far as I understand your requirement:

You have a foreign key relationship from A (parent) to B (children), and

one from B to C (grandchildren of A).

You want to sum column A.AX, grouped by column B.BX for records from table

A where there either is no grandchild in C or all existing grandchildren

have the value '...'.

Please correct me if this is not what you intended.

If the above is correct, the following statement should give you the

required result:

SELECT sum(AX), BX

FROM A

LEFT JOIN (SELECT B.BX as bx,

count(distinct C.CY) as cy_cnt,

max(C.CY) as cy

FROM B LEFT JOIN C

GROUP BY B.BX

) as bc

WHERE bx IS NOT NULL // there is at least one child

AND ((cy_cnt = 0 AND cy IS NOT NULL ) // there is no grandchild

OR (cy_cnt = 1 AND cy = ...) // there are only grandchildren...

) // ...fulfilling the condition

GROUP BY BX

This avoids the multiplication of rows by already grouping the derived

table.

Frank

Markus,

Just some more general thoughts before the weekend:

In all of your approaches, I see you use table A in a direct join with B

and C, and join this to a second instance of A. This is not necessary, as

B already contains the key to join to A.

Sorry, as I did not try my last proposal in practice, I did not realize

you have to add B.Aid to the select and the group by list of the derived

table so that you can join this to A using

A LEFT JOIN (...) bc ON A.AId = bc.Aid

Some other approaches to what you want to achieve would be:

* You can have a select with a join between A and the rest of tables and

"group away" all the multiple records, and add Ax to the GROUP BY list.

Then based on this intermediate result (which may be a derived table, a

view, or a temporary table), you issue a second group by, this time

summing.

This would be similar to my last approach, just extending the derived

table to include A.

* You can also use a constuct like

sum(Ax) / count(*)

in the first step before building the final sum instead of adding Ax to

the GROUP BY list. This essentially should deliver the same result, as

long as you are sure that the primary key of A is in the group by list of

this inner group by, as then Ax in the different groups would be the same

base record multiplied several times.

* Maybe something like FIRST_VALUE(Ax) OVER(PARTITION BY Aid ORDER BY 1)

could deliver a similar result. But I do not have much experience with

this, as FIRST_VALUE() was only introduced in version 10.0.1.

All approaches that I know use two levels of grouping, first up to the

combination of records of A mutliplied only by the intended groups from

other tables, and then the final grouping. In my first approach, the inner

grouping was just implicitely as you had left out the requirement of

grouping by a value from B and so the grouping to at most one recor was

enough, in the other cases it is explicitely.

As I see, with normal SQL methods, there does not seem to be absolutely

simple way to achieve what you want.

Something that would be really helpful in situations like this but which

is not yet(?) standard SQL would be a construct like

sum(Ax for distinct Aid)

(just my proposal for a new syntax, which is a kind of extension of the

"distinct" syntax and would only add Ax once for each different value of

Aid).

An obvious issue would be what the engine should do if there are different

values of Ax for the same value of Aid in a general case where Aid would

not be the primary key. But maybe this could be left to the optimizer then

chosing an arbitrary value, like e. g. the order of items in the result of

list() if you do not use an ORDER BY. And if this construct could be used

with window specifications, that would be really great!

Maybe we should raise a discussion about this in the

product_futures_discussion newsgroup?

I would support this, and have also several real world examples where this

would be very useful.

Frank

Am 10.10.2008, 09:01 Uhr, schrieb Markus KARG <karg@quipsy.de>:

Frank,

I think my last proposal from Thursday is exactly what you described:

SELECT BX, SUM(AX) FROM (SELECT DISTINCT AID, AX, BX FROM A LEFT JOIN B LEFT

JOIN C WHERE CX IS NULL OR CX = ?) X GROUP BY BX

The inner SELECT DISTINCT does the same as you wanted to do with "inner

grouping": It removes the duplicated A-Rows. Since DISTINCT is shorter to

write than GROUP BY, I will keep this one. :-)

For your idea of new features: Yes, I would support the idea of "SUM(X

DISTINCT Y)" syntax. Will you start the thread? I then comment your posting

there.

Thanks for your kind help!

Markus

"Frank Ploessel" <fpl...@d_e.i_m_s_h_e_a_l_t_h.c_o_m> schrieb im Newsbeitrag

news:op.uitgrhi4j0bybf@bonw01164.internal.imsglobal.com...

Markus,

I just posted the proposal in the other newsgroup.

Frank

Am 13.10.2008, 08:41 Uhr, schrieb Markus KARG <karg@quipsy.de>:

Another partial solution would be:

SELECT SUM(AX) FROM (SELECT DISTINCT AID, AX FROM A LEFT JOIN B LEFT JOIN C)

X

It returns the correct result, but in real world, outside this simplified

quiz sample, it will not work: As mentioned earlier in this thread, in real

world, the sum must be calculated not over all but over a group, and the

group must be built from a field in B.

So still the question is:

How to SELECT SUM from A with GROUP BY from B and WHERE in C...?

Is this really impossible...?

Thanks!

Markus

"Markus KARG" <karg@quipsy.de> schrieb im Newsbeitrag

news:48ee1573$1@forums-1-dub...

Frank,

I think I found the simple solution that I wanted to badly. What do you

think about this one:

SELECT BX, SUM(AX) FROM (SELECT DISTINCT AID, AX, BX FROM A LEFT JOIN B LEFT

JOIN C WHERE CX IS NULL OR CX = ?) X GROUP BY BX

It seems to work correctly and it looks correctly. Do you think I missed

some issues that could arise or did we finally see the light? :-)

Thanks!

Markus

"Markus KARG" <karg@quipsy.de> schrieb im Newsbeitrag

news:48ee1573$1@forums-1-dub...

It's a bit late, but wanting to join the quiz.

Ever thought of a construction like this:

SELECT BX, SUM(IF CX IS NULL OR CX = ? THEN AX ENDIF) FROM A LEFT JOIN B

LEFT JOIN C GROUP BY BX

Arco,

yes, it's really rather late... I hardly can't remember what the problem was

about. ;-)

Regards

Markus

"Sybase" <arco.witter@quadrant.nl> schrieb im Newsbeitrag

news:4962fe45$1@forums-1-dub...