Data Warehousing and Business Intelligence

21 June 2015

Varchar(255)

Filed under: Analysis Services — Vincent Rainardi @ 6:45 am

I’ve always wondered why people put 255 as the length of the varchar, not 200, 250 or 300. I know it’s from 2^8, but why use 2^8? I understand one of the reason that people argue is that “to match the binary nature of the data store” so we can safe space/not wasting space.

SQL Server 2005, 2008, 2012 and 2014 all has the storage size as the length of data entered + 2 bytes. So whether we create the column as varchar(255) or varchar(258), when the column contains “406 Middleton Street” it uses 22 bytes and when it contains “10 High Street” it uses 17 bytes. It is that simple: bytes = length + 2.

Now that I’m taking a role as a data architect, I feel obliged to know this thing. I need to know what is the right data type to use, is it varchar(200) or varchar(255). And the answer to this question is very simple: use varchar(200). Or 100, or 300. Just go with what is required, and we don’t need to pay attention to 255 (or 511, or 64, or 131). These 2^n-1 numbers don’t have any roles in the storage of SQL Server database. Just ignore them.

But where is this 255 convention come from? Yes it is from history / historical reason, but what is it? Yes it is 2^8-1, but what is it? What was the historical reason? What happened in the past, and when, and where?

Proposal #1: because of UTF-8 (link)
No that doesn’t make sense. UTF-8 (link) is 8 bits, not bytes.

Proposal #2: because in SQL Server 6.5 the maximum length for varchar was 255 (link, same page as above)
Bingo! This makes sense. And it is true (link). In SQL Server up to 6.5 the max length was 255, both for char and varchar.

#2 is why we don’t see people uses the other 2^n-1 like 131 and 511. It makes sense.

I know it is old story, but it is good to remember that up to version 6.5, SQL Server code was still Sybase (link), i.e. Microsoft SQL Server 1.0 = Sybase SQL Server 3.0. The brand was Microsoft but the code was Sybase. It is in Microsoft SQL Server 7.0 that this Sybase code base was changed (link).

Advertisements

1 Comment »

  1. […] Address: nvarchar(300). This cover street1 (70), street2 (70), street 3 (70), local area (50), but not city/town, state/county/provice, post code and country. This is for international address, not country specific. I don’t believe in 255 (link). […]

    Pingback by Data Types for Common Columns | Data Warehousing and Business Intelligence — 19 September 2015 @ 4:50 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: