Sybase NNTP forums - End Of Life (EOL)

The NNTP forums from Sybase - forums.sybase.com - are now closed.

All new questions should be directed to the appropriate forum at the SAP Community Network (SCN).

Individual products have links to the respective forums on SCN, or you can go to SCN and search for your product in the search box (upper right corner) to find your specific developer center.

Rows to columns: finding a workable solution

3 posts in Performance and Tuning Last posting was on 2012-04-07 22:31:07.0Z
Paul Posted on 2012-04-06 14:26:38.0Z
Sender: 7c28.4f7ef12c.1804289383@sybase.com
From: Paul
Newsgroups: sybase.public.ase.performance+tuning
Subject: Rows to columns: finding a workable solution
X-Mailer: WebNews to Mail Gateway v1.1t
Message-ID: <4f7efd1e.7d95.1681692777@sybase.com>
NNTP-Posting-Host: 172.20.134.41
X-Original-NNTP-Posting-Host: 172.20.134.41
Date: 6 Apr 2012 07:26:38 -0700
X-Trace: forums-1-dub 1333722398 172.20.134.41 (6 Apr 2012 07:26:38 -0700)
X-Original-Trace: 6 Apr 2012 07:26:38 -0700, 172.20.134.41
Lines: 145
Path: forums-1-dub!not-for-mail
Xref: forums-1-dub sybase.public.ase.performance+tuning:13116
Article PK: 91691

-----------------------------------------------------------
SCHEMA
-----------------------------------------------------------
CREATE TABLE reports
(
report_id INTEGER NOT NULL PRIMARY KEY,
report VARCHAR(50) NOT NULL UNIQUE
-- ...
);

CREATE TABLE fields
(
field_id INTEGER NOT NULL PRIMARY KEY,
field VARCHAR(50) NOT NULL UNIQUE
-- ...
);

CREATE TABLE report_fields
(
report_id INTEGER NOT NULL REFERENCES
reports,
field_id INTEGER NOT NULL REFERENCES
fields,
-- ...

CONSTRAINT pk_report_fields
PRIMARY KEY (report_id, field_id)
);

CREATE TABLE report_data1
(
record_id INTEGER NOT NULL PRIMARY KEY,
field1 VARCHAR(100),
field2 VARCHAR(100),
field3 INTEGER,
field4 NUMERIC(25, 9),
field5 DATETIME,
field6 DATETIME
-- ...
);

CREATE TABLE report_data2
(
record_id INTEGER NOT NULL REFERENCES
records,
field_id INTEGER NOT NULL REFERENCES
fields,

-- populate 1 (potentially but unlikely more) columns
depending on data type:
"integer" INTEGER,
"numeric" NUMERIC(25, 9),
"date" DATETIME,
"text" VARCHAR(100),

CONSTRAINT pk_report_data2
PRIMARY KEY (record_id, field_id)
);
-----------------------------------------------------------

I know of two ways of converting rows to columns required to
build a report:

-----------------------------------------------------------
Version 1
-----------------------------------------------------------
SELECT D.record_id,
D.field1 "column 1",
D.field3 "column 2",
D.field5 "column 3",
D1."integer" "column 4",
D2."text" "column 5",
D3."text" "column 6"
FROM report_data1 D
JOIN
report_data2 D1 ON D1.record_id =
D.record_id
AND D1.field_id = 1
JOIN
report_data2 D2 ON D2.record_id =
D.record_id
AND D2.field_id = 2
JOIN
report_data2 D3 ON D3.record_id =
D.record_id
AND D3.field_id = 3
WHERE D.field5 BETWEEN TIMESTAMP '2012-02-01 00:00:00' AND
TIMESTAMP '2012-02-29 23:59:59'
AND D1."integer" = 101555
AND D3."text" = 'confirmed'

-----------------------------------------------------------
Version 2
-----------------------------------------------------------
SELECT D1.record_id,
D1.field1
"column 1",
D1.field3
"column 2",
D1.field5
"column 3",
MIN(CASE D2.field_id WHEN 1 THEN D2."integer" END)
"column 4",
MIN(CASE D2.field_id WHEN 2 THEN D2."text" END)
"column 5",
MIN(CASE D2.field_id WHEN 3 THEN D2."text" END)
"column 6"
FROM report_data1 D1
JOIN
report_data2 D2 ON D2.record_id =
D1.record_id
WHERE D1.field5 BETWEEN TIMESTAMP '2012-02-01 00:00:00'
AND TIMESTAMP '2012-02-29 23:59:59'
GROUP BY D1.record_id
HAVING MIN(CASE D2.field_id WHEN 1 THEN D2."integer" END) =
101555
AND MIN(CASE D2.field_id WHEN 3 THEN D2."text" END) =
'confirmed'
-----------------------------------------------------------

Version 1 is not a workable solution, for the number of
columns may reach 90 whereas the Sybase limit for the number
of tables in a query is 50. Version 2 is not scalable: when
the number of rows expected is 50,000, it may return in
under 3 minutes; when it is just over 170,000, it will never
return ultimately complaining about the lack of space in
tempdb.

Interestingly, the filtering conditions work OK in both
these versions, although version 1 is obviously somewhat
better. Ultimately it is the number of columns that is a
problem. So, version 1 just does not work because of the
number of tables required. With version 2, Sybase creates
many work tables (their number depends on the number of
columns) and I could not find a way of convincing it that
null-skipping MIN() is not expected to find more than just 1
value. I tried to use PLAN clause but it does not provide a
way of reducing the number of work tables.

The questions then are, is their another way of converting
rows to columns? If not, is there a way to make the above
work? The current solution extracts data from Sybase
piecemeal and then completes the query, so to speak, but
since tables are both populated and read from, resulting
datasets are inconsistent.


"Mark A. Parsons" <iron_horse Posted on 2012-04-06 21:08:51.0Z
From: "Mark A. Parsons" <iron_horse@no_spamola.compuserve.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.13) Gecko/20101207 Lightning/1.0b2 Thunderbird/3.1.7
MIME-Version: 1.0
Newsgroups: sybase.public.ase.performance+tuning
Subject: Re: Rows to columns: finding a workable solution
References: <4f7efd1e.7d95.1681692777@sybase.com>
In-Reply-To: <4f7efd1e.7d95.1681692777@sybase.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
NNTP-Posting-Host: vip152.sybase.com
X-Original-NNTP-Posting-Host: vip152.sybase.com
Message-ID: <4f7f5b63$1@forums-1-dub>
Date: 6 Apr 2012 14:08:51 -0700
X-Trace: forums-1-dub 1333746531 10.22.241.152 (6 Apr 2012 14:08:51 -0700)
X-Original-Trace: 6 Apr 2012 14:08:51 -0700, vip152.sybase.com
Lines: 155
Path: forums-1-dub!not-for-mail
Xref: forums-1-dub sybase.public.ase.performance+tuning:13117
Article PK: 91606

Why do you want to convert from rows to columns in the first place? [Trying to understand what your actual requirement
is, and why you can't use the row-oriented data.]

How do you determine the number of columns? Is this something the end user chooses (eg, the number of columns is
determined dynamically based on end-user choices??

Is the underlying data changing often or is it fairly static?

On 04/06/2012 08:26, Paul wrote:
> -----------------------------------------------------------
> SCHEMA
> -----------------------------------------------------------
> CREATE TABLE reports
> (
> report_id INTEGER NOT NULL PRIMARY KEY,
> report VARCHAR(50) NOT NULL UNIQUE
> -- ...
> );
>
> CREATE TABLE fields
> (
> field_id INTEGER NOT NULL PRIMARY KEY,
> field VARCHAR(50) NOT NULL UNIQUE
> -- ...
> );
>
> CREATE TABLE report_fields
> (
> report_id INTEGER NOT NULL REFERENCES
> reports,
> field_id INTEGER NOT NULL REFERENCES
> fields,
> -- ...
>
> CONSTRAINT pk_report_fields
> PRIMARY KEY (report_id, field_id)
> );
>
> CREATE TABLE report_data1
> (
> record_id INTEGER NOT NULL PRIMARY KEY,
> field1 VARCHAR(100),
> field2 VARCHAR(100),
> field3 INTEGER,
> field4 NUMERIC(25, 9),
> field5 DATETIME,
> field6 DATETIME
> -- ...
> );
>
> CREATE TABLE report_data2
> (
> record_id INTEGER NOT NULL REFERENCES
> records,
> field_id INTEGER NOT NULL REFERENCES
> fields,
>
> -- populate 1 (potentially but unlikely more) columns
> depending on data type:
> "integer" INTEGER,
> "numeric" NUMERIC(25, 9),
> "date" DATETIME,
> "text" VARCHAR(100),
>
> CONSTRAINT pk_report_data2
> PRIMARY KEY (record_id, field_id)
> );
> -----------------------------------------------------------
>
> I know of two ways of converting rows to columns required to
> build a report:
>
> -----------------------------------------------------------
> Version 1
> -----------------------------------------------------------
> SELECT D.record_id,
> D.field1 "column 1",
> D.field3 "column 2",
> D.field5 "column 3",
> D1."integer" "column 4",
> D2."text" "column 5",
> D3."text" "column 6"
> FROM report_data1 D
> JOIN
> report_data2 D1 ON D1.record_id =
> D.record_id
> AND D1.field_id = 1
> JOIN
> report_data2 D2 ON D2.record_id =
> D.record_id
> AND D2.field_id = 2
> JOIN
> report_data2 D3 ON D3.record_id =
> D.record_id
> AND D3.field_id = 3
> WHERE D.field5 BETWEEN TIMESTAMP '2012-02-01 00:00:00' AND
> TIMESTAMP '2012-02-29 23:59:59'
> AND D1."integer" = 101555
> AND D3."text" = 'confirmed'
>
> -----------------------------------------------------------
> Version 2
> -----------------------------------------------------------
> SELECT D1.record_id,
> D1.field1
> "column 1",
> D1.field3
> "column 2",
> D1.field5
> "column 3",
> MIN(CASE D2.field_id WHEN 1 THEN D2."integer" END)
> "column 4",
> MIN(CASE D2.field_id WHEN 2 THEN D2."text" END)
> "column 5",
> MIN(CASE D2.field_id WHEN 3 THEN D2."text" END)
> "column 6"
> FROM report_data1 D1
> JOIN
> report_data2 D2 ON D2.record_id =
> D1.record_id
> WHERE D1.field5 BETWEEN TIMESTAMP '2012-02-01 00:00:00'
> AND TIMESTAMP '2012-02-29 23:59:59'
> GROUP BY D1.record_id
> HAVING MIN(CASE D2.field_id WHEN 1 THEN D2."integer" END) =
> 101555
> AND MIN(CASE D2.field_id WHEN 3 THEN D2."text" END) =
> 'confirmed'
> -----------------------------------------------------------
>
> Version 1 is not a workable solution, for the number of
> columns may reach 90 whereas the Sybase limit for the number
> of tables in a query is 50. Version 2 is not scalable: when
> the number of rows expected is 50,000, it may return in
> under 3 minutes; when it is just over 170,000, it will never
> return ultimately complaining about the lack of space in
> tempdb.
>
> Interestingly, the filtering conditions work OK in both
> these versions, although version 1 is obviously somewhat
> better. Ultimately it is the number of columns that is a
> problem. So, version 1 just does not work because of the
> number of tables required. With version 2, Sybase creates
> many work tables (their number depends on the number of
> columns) and I could not find a way of convincing it that
> null-skipping MIN() is not expected to find more than just 1
> value. I tried to use PLAN clause but it does not provide a
> way of reducing the number of work tables.
>
> The questions then are, is their another way of converting
> rows to columns? If not, is there a way to make the above
> work? The current solution extracts data from Sybase
> piecemeal and then completes the query, so to speak, but
> since tables are both populated and read from, resulting
> datasets are inconsistent.


Paul Posted on 2012-04-07 22:31:07.0Z
Sender: 4efd.4f80bb94.1804289383@sybase.com
From: Paul
Newsgroups: sybase.public.ase.performance+tuning
Subject: Re: Rows to columns: finding a workable solution
X-Mailer: WebNews to Mail Gateway v1.1t
Message-ID: <4f80c02b.4f8a.1681692777@sybase.com>
References: <4f7f5b63$1@forums-1-dub>
NNTP-Posting-Host: 172.20.134.41
X-Original-NNTP-Posting-Host: 172.20.134.41
Date: 7 Apr 2012 15:31:07 -0700
X-Trace: forums-1-dub 1333837867 172.20.134.41 (7 Apr 2012 15:31:07 -0700)
X-Original-Trace: 7 Apr 2012 15:31:07 -0700, 172.20.134.41
Lines: 47
Path: forums-1-dub!not-for-mail
Xref: forums-1-dub sybase.public.ase.performance+tuning:13118
Article PK: 155120


> Why do you want to convert from rows to columns in the
> first place? [Trying to understand what your actual
> requirement is, and why you can't use the row-oriented
> data.]

The output is a report in the form of CSV written to a file
with columns and rows corresponding to the ones in the two
query examples.

> How do you determine the number of columns? Is this
> something the end user chooses (eg, the number of columns
> is determined dynamically based on end-user choices??

Indeed, users determine which fields go into which reports -
in other words, populate the equivalent of the report_fields
table.

> Is the underlying data changing often or is it fairly
> static?

Report structures do not change often, although users can
change them whenever they like. Report data, on the
contrary, seem to change often enough to cause problems if
retrieved piecemeal.

...

I think I have found a way to make it work, at least for
now. I replaced this, as suggested under Version 2,

MIN(CASE D2.field_id WHEN 2 THEN D2."text" END)

with this:

MIN(CASE D2.field_id WHEN 2 THEN LEFT(D2."text", 31) END)
(31 is an example)

as I did for all "text" fields limiting them to the maximum
number of characters expected for the corresponding field_id
(D2.field_id above). These maxima are unlikely to change
often but can always be updated if necessary. In the
specific example I was looking at this reduced the number of
work tables required from 8/9 to 1/2, which made all the
difference.

I can do very little about the current table structure or
user behaviour.