Sybase NNTP forums - End Of Life (EOL)

The NNTP forums from Sybase - forums.sybase.com - are now closed.

All new questions should be directed to the appropriate forum at the SAP Community Network (SCN).

Individual products have links to the respective forums on SCN, or you can go to SCN and search for your product in the search box (upper right corner) to find your specific developer center.

"Shrinking the blob" (or text) columns.

10 posts in Product Futures Discussion Last posting was on 2002-05-07 05:11:26.0Z
Matt Rogish Posted on 2002-04-12 19:56:54.0Z
From: "Matt Rogish" <matt@fanhome.com>
Subject: "Shrinking the blob" (or text) columns.
Date: Fri, 12 Apr 2002 15:56:54 -0400
Lines: 15
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2600.0000
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000
Message-ID: <6#uAz0l4BHA.278@forums.sybase.com>
Newsgroups: sybase.public.ase.product_futures_discussion
NNTP-Posting-Host: ip134-053-064-186.s64.muohio.edu 134.53.64.186
Path: forums-1-dub!forums-master.sybase.com!forums.sybase.com
Xref: forums-1-dub sybase.public.ase.product_futures_discussion:539
Article PK: 93709

We use text datatypes in our application quite a bit. The problem is that
even the smallest amount of entered data is padded to 2K (does it follow
page size, so that with 4K pages the multiple is 4K?!). We have some 3.4
million rows which all are NOT NULL text (since without the page text it's
pretty much useless! :)). So, regardless of the amount of actual text we
have what, 6.6GB eaten up? It would be really nice if text columns had this
restriction removed and *actually took up the same amount of space as data
entered*. Or maybe lower the padding to x *bytes* instead of *kilobytes*.

Thanks,

--
Matt


Anthony Mandic Posted on 2002-04-12 23:38:10.0Z
Message-ID: <3CB76FE2.3E086675@start.com.au>
Date: Sat, 13 Apr 2002 09:38:10 +1000
From: Anthony Mandic <sp_am_block@start.com.au>
Organization: Mandic Consulting Pty. Ltd.
X-Mailer: Mozilla 4.61 [en] (WinNT; I)
X-Accept-Language: en
MIME-Version: 1.0
Subject: Re: "Shrinking the blob" (or text) columns.
References: <6#uAz0l4BHA.278@forums.sybase.com>
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Newsgroups: sybase.public.ase.product_futures_discussion
Lines: 31
NNTP-Posting-Host: DC-24-99.bpb.bigpond.com 203.40.24.99
Path: forums-1-dub!forums-master.sybase.com!forums.sybase.com
Xref: forums-1-dub sybase.public.ase.product_futures_discussion:531
Article PK: 93699


Matt Rogish wrote:
>
> We use text datatypes in our application quite a bit. The problem is that
> even the smallest amount of entered data is padded to 2K (does it follow
> page size, so that with 4K pages the multiple is 4K?!).

Its on a page basis. But the really weird thing is that ASE can
only store 1800 bytes per 2K page. I'm not sure what happens to
the other 248 bytes but a normal 2K page can store 1962 bytes or
so, so something odd is going on.

> We have some 3.4
> million rows which all are NOT NULL text (since without the page text it's
> pretty much useless! :)). So, regardless of the amount of actual text we
> have what, 6.6GB eaten up? It would be really nice if text columns had this
> restriction removed and *actually took up the same amount of space as data
> entered*. Or maybe lower the padding to x *bytes* instead of *kilobytes*.

The text pointers are page offsets, so it has to be page boundaries
unfortunately. You could try Michael's suggestion (its one I would
recommend). Try analysing the lengths of the strings and see if its
worthwhile splitting them up. On ASE 12.5 you could look at using
larger pages sizes and longer varchars.

-am © 2002


Sethu Posted on 2002-05-07 05:11:26.0Z
Message-ID: <3CD761FE.1090909@sybase.com>
Date: Mon, 06 May 2002 22:11:26 -0700
From: Sethu <sethu@sybase.com>
Organization: Sybase, Inc.
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.4.1) Gecko/20020314 Netscape6/6.2.2
X-Accept-Language: en-us
MIME-Version: 1.0
Subject: Re: "Shrinking the blob" (or text) columns.
References: <6#uAz0l4BHA.278@forums.sybase.com> <3CB76FE2.3E086675@start.com.au>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Newsgroups: sybase.public.ase.product_futures_discussion
Lines: 9
NNTP-Posting-Host: 10.22.120.54
Path: forums-1-dub!forums-master.sybase.com!forums.sybase.com
Xref: forums-1-dub sybase.public.ase.product_futures_discussion:500
Article PK: 93669

The text page has some control information for about ~180 bytes.

Sethu

>
> Its on a page basis. But the really weird thing is that ASE can
> only store 1800 bytes per 2K page. I'm not sure what happens to
> the other 248 bytes but a normal 2K page can store 1962 bytes or
> so, so something odd is going on.
>


Matt Rogish Posted on 2002-04-15 06:32:55.0Z
From: "Matt Rogish" <matt@fanhome.com>
References: <6#uAz0l4BHA.278@forums.sybase.com> <3CB76FE2.3E086675@start.com.au>
Subject: Re: "Shrinking the blob" (or text) columns.
Date: Mon, 15 Apr 2002 02:32:55 -0400
Lines: 53
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2600.0000
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000
Message-ID: <E9tFecE5BHA.309@forums.sybase.com>
Newsgroups: sybase.public.ase.product_futures_discussion
NNTP-Posting-Host: dhcp16469119.woh.rr.com 24.164.69.119
Path: forums-1-dub!forums-master.sybase.com!forums.sybase.com
Xref: forums-1-dub sybase.public.ase.product_futures_discussion:529
Article PK: 93696


"Anthony Mandic" <sp_am_block@start.com.au> wrote in message
news:3CB76FE2.3E086675@start.com.au...
> Its on a page basis. But the really weird thing is that ASE can
> only store 1800 bytes per 2K page. I'm not sure what happens to
> the other 248 bytes but a normal 2K page can store 1962 bytes or
> so, so something odd is going on.

I remember reading that some time ago. As I recall the docs don't say where
the other bytes wander off to. :)

> The text pointers are page offsets, so it has to be page boundaries
> unfortunately.
> -am © 2002

Shoot. I was reading my MS SQL Server 7 Internals book to see how they do
it, and they store in 8K pages not in a linked-list but in a B-tree. Kind
of interesting, although they say the benefit arises when you access data in
the middle of the string.. I'm not sure why you'd ever not want all of the
text (or at least in our case we want all of it) so I don't know how 'cool'
of a feature that is or not... However it says that multiple rows of
text/image data can share a page to help keep storage costs down (e.g.
instead of 8K * 2 pages for two posts which contain less than 8K of data
there would only be a single 8K page shared for both).

SQL 2000 gives you the option to store in the row itself which is useful if
you're always accessing the text data along with the rows you scan (which we
do).

"Michael Peppler" <mpeppler@peppler.org> wrote in message
news:rRk08Om4BHA.298@forums.sybase.com...
> For message boards this will likely gain you a lot, as messages tend to
> be pretty short.
>
> Downside - I suspect that it blows away your full-text search.


Yes, yes it would. :(

All-in-all disk is cheap compared to CPU, so it is not something I'm going
to lose sleep over -- especially if there exist performance reasons why 2K
linked-lists are better. :) We have ASE 12.5 so could make VARCHAR's which
are really big but that would fubar EFTS as well. However having this
limitation seems contrary to everything I know about design.. Essentially
you waste space with every row unless you happen to have text which is a
multiple of 1800 bytes!!

Thanks again all,

--
Matt


Anthony Mandic Posted on 2002-04-15 11:05:51.0Z
Message-ID: <3CBAB40F.DCF36BE8@start.com.au>
Date: Mon, 15 Apr 2002 21:05:51 +1000
From: Anthony Mandic <sp_am_block@start.com.au>
Organization: Mandic Consulting Pty. Ltd.
X-Mailer: Mozilla 4.61 [en] (WinNT; I)
X-Accept-Language: en
MIME-Version: 1.0
Subject: Re: "Shrinking the blob" (or text) columns.
References: <6#uAz0l4BHA.278@forums.sybase.com> <3CB76FE2.3E086675@start.com.au> <E9tFecE5BHA.309@forums.sybase.com>
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Newsgroups: sybase.public.ase.product_futures_discussion
Lines: 55
NNTP-Posting-Host: DC-24-8.bpb.bigpond.com 203.40.24.8
Path: forums-1-dub!forums-master.sybase.com!forums.sybase.com
Xref: forums-1-dub sybase.public.ase.product_futures_discussion:528
Article PK: 93698


Matt Rogish wrote:

> I remember reading that some time ago. As I recall the docs don't say where
> the other bytes wander off to. :)

Yes, its rather odd. Since a text link in a row is only 16 bytes,
why is more needed on a text page to link to its next page in
the chain.

> Shoot. I was reading my MS SQL Server 7 Internals book to see how they do
> it, and they store in 8K pages not in a linked-list but in a B-tree. Kind
> of interesting, although they say the benefit arises when you access data in
> the middle of the string.. I'm not sure why you'd ever not want all of the
> text (or at least in our case we want all of it) so I don't know how 'cool'
> of a feature that is or not... However it says that multiple rows of
> text/image data can share a page to help keep storage costs down (e.g.
> instead of 8K * 2 pages for two posts which contain less than 8K of data
> there would only be a single 8K page shared for both).

Naturally we couldn't expact MS to implement anything sensibly,
intelligibly and coherently. Personally, I can't understand why
they keep pumping out DB interface after DB interface. Either
each one is boned in some way (perhaps deliberately designed that
way) or they like to keep book authors in business (and click
monkeys chasing certifications).

> SQL 2000 gives you the option to store in the row itself which is useful if
> you're always accessing the text data along with the rows you scan (which we
> do).

If your text lengths aren't too large you could look at ASE 12.5's
larger page sizes and large varchar fields. Unlike text, these are
also searchable.

> All-in-all disk is cheap compared to CPU, so it is not something I'm going
> to lose sleep over -- especially if there exist performance reasons why 2K
> linked-lists are better. :) We have ASE 12.5 so could make VARCHAR's which
> are really big but that would fubar EFTS as well. However having this
> limitation seems contrary to everything I know about design.. Essentially
> you waste space with every row unless you happen to have text which is a
> multiple of 1800 bytes!!

Yes.

-am © 2002


Marc Zampetti Posted on 2002-04-15 13:57:31.0Z
Message-ID: <3CBADC4B.602@aol.com>
Date: Mon, 15 Apr 2002 09:57:31 -0400
From: Marc Zampetti <zampmarc@aol.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.4.1) Gecko/20020314 Netscape6/6.2.2
X-Accept-Language: en-us
MIME-Version: 1.0
Subject: Re: "Shrinking the blob" (or text) columns.
References: <6#uAz0l4BHA.278@forums.sybase.com> <3CB76FE2.3E086675@start.com.au> <E9tFecE5BHA.309@forums.sybase.com> <3CBAB40F.DCF36BE8@start.com.au>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Newsgroups: sybase.public.ase.product_futures_discussion
Lines: 115
NNTP-Posting-Host: pix-fw.wan.aol.com 152.163.190.1
Path: forums-1-dub!forums-master.sybase.com!forums.sybase.com
Xref: forums-1-dub sybase.public.ase.product_futures_discussion:526
Article PK: 93707


Anthony Mandic wrote:

> Matt Rogish wrote:
>
>
>>I remember reading that some time ago. As I recall the docs don't say where
>>the other bytes wander off to. :)
>>
>
> Yes, its rather odd. Since a text link in a row is only 16 bytes,
> why is more needed on a text page to link to its next page in
> the chain.
>
>

First, lets clear up some confusion. The text/image column on a row is a
pointer to the first text page in the chain. The blobs are stored on
their own pages, not the pages that the rest of the row is stored on.
I'm not going to debate the logic of this, basically, blobs in all
RDBMS's have issues, its the nature of trying to store unstructured in a
structured manner. So, if you insert some about of space (including the
empty string), ASE has to create a text page. So, the overhead is not
the text pointer in the row, but the text page itself.
Now, every text page is a standard page. For a 2K server, thats 2048.
Remove the standard page header, which is 32 bytes. That leaves 2016
possible bytes for the data. Well, everything has to be logged (or at
least support that, and there is overhead there. The log has to fit on a
standard page, with a page header). Removing all that gets to the 1962
number that we all know a love. I don't remember the specifics of
overhead. For the first text/image column, there is also the TIPSA area,
which is basically a structure on the first text/image page that
describes the chain, like how long it is, and some other information
that I don't recall. This is where the 1800 limit comes in, I think.
Now, starting in 12.0, Sybase added a new i-node like structure (similar
to the B-Tree that MS apparently has) to allow better access to the page
chain. I don't remember the overhead for this.
Also, a text/image page can only be a member of a single chain, so if
there is space left on the page at the end of the chain, then that is
waste. However, you will never have gaps in the middle of the chain, so
at most, the waste you will have is (Max Data on Page) - ((Size of a
Character) or 1 Byte), since the worse case is having to put a single
character (which may take more than one byte) or a single byte on the
last page.


>>Shoot. I was reading my MS SQL Server 7 Internals book to see how they do
>>it, and they store in 8K pages not in a linked-list but in a B-tree. Kind
>>of interesting, although they say the benefit arises when you access data in
>>the middle of the string.. I'm not sure why you'd ever not want all of the
>>text (or at least in our case we want all of it) so I don't know how 'cool'
>>of a feature that is or not... However it says that multiple rows of
>>text/image data can share a page to help keep storage costs down (e.g.
>>instead of 8K * 2 pages for two posts which contain less than 8K of data
>>there would only be a single 8K page shared for both).
>>
>
> Naturally we couldn't expact MS to implement anything sensibly,
> intelligibly and coherently. Personally, I can't understand why
> they keep pumping out DB interface after DB interface. Either
> each one is boned in some way (perhaps deliberately designed that
> way) or they like to keep book authors in business (and click
> monkeys chasing certifications).
>
>
>>SQL 2000 gives you the option to store in the row itself which is useful if
>>you're always accessing the text data along with the rows you scan (which we
>>do).
>>
>
> If your text lengths aren't too large you could look at ASE 12.5's
> larger page sizes and large varchar fields. Unlike text, these are
> also searchable.
>
>
>>All-in-all disk is cheap compared to CPU, so it is not something I'm going
>>to lose sleep over -- especially if there exist performance reasons why 2K
>>linked-lists are better. :) We have ASE 12.5 so could make VARCHAR's which
>>are really big but that would fubar EFTS as well. However having this
>>limitation seems contrary to everything I know about design.. Essentially
>>you waste space with every row unless you happen to have text which is a
>>multiple of 1800 bytes!!
>>
>
> Yes.
>
> -am © 2002
>


Not sure why you talk about not being able to use EFTS with either
varchar or text columns. EFTS can index almost any data in the row,
including float, int, and numeric as well as char/varchar/text. You can
also use LIKE on any character data, so I'm not sure about the comment
"unlike text, these are also searchable".

Marc Zampetti


Anthony Mandic Posted on 2002-04-16 11:37:17.0Z
Message-ID: <3CBC0CED.4DE4F8A0@start.com.au>
Date: Tue, 16 Apr 2002 21:37:17 +1000
From: Anthony Mandic <sp_am_block@start.com.au>
Organization: Mandic Consulting Pty. Ltd.
X-Mailer: Mozilla 4.61 [en] (WinNT; I)
X-Accept-Language: en
MIME-Version: 1.0
Subject: Re: "Shrinking the blob" (or text) columns.
References: <6#uAz0l4BHA.278@forums.sybase.com> <3CB76FE2.3E086675@start.com.au> <E9tFecE5BHA.309@forums.sybase.com> <3CBAB40F.DCF36BE8@start.com.au> <3CBADC4B.602@aol.com>
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Newsgroups: sybase.public.ase.product_futures_discussion
Lines: 40
NNTP-Posting-Host: 203.3.176.10
Path: forums-1-dub!forums-master.sybase.com!forums.sybase.com
Xref: forums-1-dub sybase.public.ase.product_futures_discussion:524
Article PK: 93692


Marc Zampetti wrote:

> Now, every text page is a standard page. For a 2K server, thats 2048.
> Remove the standard page header, which is 32 bytes. That leaves 2016
> possible bytes for the data. Well, everything has to be logged (or at
> least support that, and there is overhead there. The log has to fit on a
> standard page, with a page header). Removing all that gets to the 1962
> number that we all know a love. I don't remember the specifics of
> overhead. For the first text/image column, there is also the TIPSA area,
> which is basically a structure on the first text/image page that
> describes the chain, like how long it is, and some other information
> that I don't recall. This is where the 1800 limit comes in, I think.

Ah, that's it, now I remember. Its due to log pages.

> > If your text lengths aren't too large you could look at ASE 12.5's
> > larger page sizes and large varchar fields. Unlike text, these are
> > also searchable.
...
> Not sure why you talk about not being able to use EFTS with either
> varchar or text columns. EFTS can index almost any data in the row,
> including float, int, and numeric as well as char/varchar/text. You can
> also use LIKE on any character data, so I'm not sure about the comment
> "unlike text, these are also searchable".

I meant without the aid of EFTS. If you aren't using it, then
the larger varchar fields are more viable since you can do likes
etc. With EFTS, it doesn't matter except that EFTS incurs its
own overhead.

-am © 2002


Matt Rogish Posted on 2002-04-15 18:17:38.0Z
From: "Matt Rogish" <matt@fanhome.com>
References: <6#uAz0l4BHA.278@forums.sybase.com> <3CB76FE2.3E086675@start.com.au> <E9tFecE5BHA.309@forums.sybase.com> <3CBAB40F.DCF36BE8@start.com.au> <3CBADC4B.602@aol.com>
Subject: Re: "Shrinking the blob" (or text) columns.
Date: Mon, 15 Apr 2002 14:17:38 -0400
Lines: 24
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2600.0000
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000
Message-ID: <4yYhirK5BHA.257@forums.sybase.com>
Newsgroups: sybase.public.ase.product_futures_discussion
NNTP-Posting-Host: ip134-053-064-186.s64.muohio.edu 134.53.64.186
Path: forums-1-dub!forums-master.sybase.com!forums.sybase.com
Xref: forums-1-dub sybase.public.ase.product_futures_discussion:525
Article PK: 93693


"Marc Zampetti" <zampmarc@aol.com> wrote in message
news:3CBADC4B.602@aol.com...
> First, lets clear up some confusion.

Thanks for the clarification Marc!

> Not sure why you talk about not being able to use EFTS with either
> varchar or text columns. EFTS can index almost any data in the row,
> including float, int, and numeric as well as char/varchar/text. You can
> also use LIKE on any character data, so I'm not sure about the comment
> "unlike text, these are also searchable".
>
> Marc Zampetti

Well in our case it would take a lot of extra work to combine two indexes
together (if we did the varchar( 1600 ) + text approach on a single row) or
multiple rows (if we had a chain of varchars). So it wouldn't work very
well. :)

--
Matt


Anthony Mandic Posted on 2002-04-16 11:40:36.0Z
Message-ID: <3CBC0DB4.AE940F10@start.com.au>
Date: Tue, 16 Apr 2002 21:40:36 +1000
From: Anthony Mandic <sp_am_block@start.com.au>
Organization: Mandic Consulting Pty. Ltd.
X-Mailer: Mozilla 4.61 [en] (WinNT; I)
X-Accept-Language: en
MIME-Version: 1.0
Subject: Re: "Shrinking the blob" (or text) columns.
References: <6#uAz0l4BHA.278@forums.sybase.com> <3CB76FE2.3E086675@start.com.au> <E9tFecE5BHA.309@forums.sybase.com> <3CBAB40F.DCF36BE8@start.com.au> <3CBADC4B.602@aol.com> <4yYhirK5BHA.257@forums.sybase.com>
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Newsgroups: sybase.public.ase.product_futures_discussion
Lines: 15
NNTP-Posting-Host: 203.3.176.10
Path: forums-1-dub!forums-master.sybase.com!forums.sybase.com
Xref: forums-1-dub sybase.public.ase.product_futures_discussion:523
Article PK: 93689


Matt Rogish wrote:

> Well in our case it would take a lot of extra work to combine two indexes
> together (if we did the varchar( 1600 ) + text approach on a single row) or
> multiple rows (if we had a chain of varchars). So it wouldn't work very
> well. :)

I was thinking along the lines of 16K page size for ASE server-wide
and then using the maximum varchar length allowed with that page
size (once you take off the other space required for each of your
table's rows). Would that be enough or would your text be larger
than that?

-am © 2002


Michael Peppler Posted on 2002-04-12 20:42:37.0Z
From: Michael Peppler <mpeppler@peppler.org>
Subject: Re: "Shrinking the blob" (or text) columns.
Date: Fri, 12 Apr 2002 13:42:37 -0700
References: <6#uAz0l4BHA.278@forums.sybase.com>
User-Agent: Pan/0.11.2 (Unix)
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Comment-To: "Matt Rogish" <matt@fanhome.com>
Message-ID: <rRk08Om4BHA.298@forums.sybase.com>
Newsgroups: sybase.public.ase.product_futures_discussion
Lines: 40
NNTP-Posting-Host: gw.peppler.org 206.55.243.57
Path: forums-1-dub!forums-master.sybase.com!forums.sybase.com!not-for-mail
Xref: forums-1-dub sybase.public.ase.product_futures_discussion:538
Article PK: 93706


On Fri, 12 Apr 2002 12:56:54 -0700, Matt Rogish wrote:

> We use text datatypes in our application quite a bit. The problem is
> that even the smallest amount of entered data is padded to 2K (does it
> follow page size, so that with 4K pages the multiple is 4K?!). We have
> some 3.4 million rows which all are NOT NULL text (since without the
> page text it's pretty much useless! :)). So, regardless of the amount
> of actual text we have what, 6.6GB eaten up? It would be really nice if
> text columns had this restriction removed and *actually took up the same
> amount of space as data entered*. Or maybe lower the padding to x
> *bytes* instead of *kilobytes*.

I've used the following technique - not great but allows you to limit the
waste somewhat:

If the string to be inserted is less than 255 (or, if you are on 12.5,
less than the widest column you can get onto your page size and still
accomodate the rest of your row), insert it into a varchar(xxx) and leave
the TEXT column NULL.
If the string is longer, insert it into the TEXT column instead.

Now when you query, do something like select ... isnull(short_col,
text_col) ...

For message boards this will likely gain you a lot, as messages tend to
be pretty short.

Downside - I suspect that it blows away your full-text search.

Michael
--
Michael Peppler Data Migrations, Inc.
mpeppler@peppler.org *or* mpeppler@mbay.net
http://www.mbay.net/~mpeppler
International Sybase User Group: http://www.isug.com