SQL Server's Journal (original) (raw)

Tuesday, November 30th, 2004
10:17a - Data Warehousing whens and whys (mildly obfuscating for confidentiality) I maintain a database of retail location information. Stuff like address, phone number, etc. for said retail locations. I like to think of data warehousing possibilities since that is a field I may be immersed in soon (and because I find it to be interesting), even though this particular database is not in a data warehouse. So here's the thing I'm contemplating at the moment, though I'm having trouble getting it all into words:First, lets assume that my table looks like this, and I'll load some data in it: LeaseIdAddressAddressStartDateAddressEndDate 101234 Main Street2000-01-019999-12-31 202345 Your Mall Blvd Suite 1002000-01-019999-12-31 _(Note above that I've learned a couple of ways to capture time periods: one is to put NULLs in the enddate column, the other is to put a far-flung date (preferably the largest the system can handle). I do the latter, because with the former you end up CASEing the nulls so you can do comparisons anyway. I've seen arguments for both, but the arguments eventually just say, pick one and use it consistently.)_Say a location changes their address. It seems to me that there are two facts I need to capture with three circumstances. First of all, we need to change the address. All right, in this system we just add a new row. We'll move location 20 to a new suite within their mall, and they did it on May 1st of 2004. LeaseIdAddressAddressStartDateAddressEndDate 101234 Main Street2000-01-019999-12-31 202345 Your Mall Blvd Suite 1002000-01-012004-04-30 202345 Your Mall Blvd Suite 2052004-05-019999-12-31 Peachy! Assuming you query this table with a where clause that has the current date (i.e. CURRENT_TIMESTAMP), the next query after that row is inserted will show the new address. But here's the thing I'm wondering: do you need to capture when the change was made to the database in addition to when the new address was valid, and if so what's the best way to do it? For instance, what if this data wasn't entered until October 1st? That means that the address was being incorrectly reported all that time. But if you just look at the data in November, it looks like it's been fine all that time.It seems that capturing when the change was made would be valuable at minimum to answer the question, "why have all the shipments been going to the wrong address?" It seems intuitive to me that there are other questions that could be answered, though I can't think of any at this second. This is an easy example for illustration. I'm thinking that this would be a much bigger deal with something like sales. So you report to your investors that your retail location sold four thousand dollars worth of goods in a month, say from a table that has an entry of location sales by day. But the fact is that your cash registers were screwed up and your sales didn't get transmitted for a full week. It seems like you need to be able to say that "hey, this was what we said but this is what it really is," or "we're going to report those lost sales in the next month" or something like that. It seems like you have to set everything up so that you can both query what was "real" at a given point in time and also query what the database "said" was real at a given point in time.Is that so? If so, what's the best way to do it? I'm going to find that out. current mood: productive (1 comment \|comment on this)
11:21a - Ghost characters... I've got a join which compares many columns within a table against another table. The problem is that there are two columns in the first table which are behaving... badly. Both are varchar fields. For some records this field has a value in it (one-digit number or letter), and for others it is supposedly blank.I can test one of these blank fields with things like:ISNULL(filter6)LEN(filter6) = 0filter6 = ''filter6 = ' 'And none of those evaluate to true. However the field is supposedly blank. I copy and paste it into a text file and there's nothing there. I tested it using SELECT LEN(filter6) and got a value of 1. I want to set this value to an empty string, so that it won't screw up my joins, however for the life of me I have no idea what the hell this thing is. For reference the data in this table comes from a bcp'd text file with fixed-length fields using a format file. None of the other columns have any problems; weather they're left empty or not... just filter6 and filter7. I've also tried rtrim(ltrim()) on it to see if I can pull out any leading/trailing spaces and opening the original file in either notepad or vi to see if I can find out where the hell this came from... nothing.So, I tried to convert it a bit to see what I could get out of it.SELECT CONVERT(INT, filter6) FROM tableReturns:Server: Msg 245, Level 16, State 1, Line 1Syntax error converting the varchar value '.' to a column of data type int.But as you can probably guess, trying to update on a value of '.' doesn't really pan out either. Any ideas? current mood: confused (8 comments \|comment on this)

Tuesday, November 30th, 2004

10:17a - Data Warehousing whens and whys (mildly obfuscating for confidentiality) I maintain a database of retail location information. Stuff like address, phone number, etc. for said retail locations. I like to think of data warehousing possibilities since that is a field I may be immersed in soon (and because I find it to be interesting), even though this particular database is not in a data warehouse. So here's the thing I'm contemplating at the moment, though I'm having trouble getting it all into words:First, lets assume that my table looks like this, and I'll load some data in it: LeaseIdAddressAddressStartDateAddressEndDate 101234 Main Street2000-01-019999-12-31 202345 Your Mall Blvd Suite 1002000-01-019999-12-31 _(Note above that I've learned a couple of ways to capture time periods: one is to put NULLs in the enddate column, the other is to put a far-flung date (preferably the largest the system can handle). I do the latter, because with the former you end up CASEing the nulls so you can do comparisons anyway. I've seen arguments for both, but the arguments eventually just say, pick one and use it consistently.)_Say a location changes their address. It seems to me that there are two facts I need to capture with three circumstances. First of all, we need to change the address. All right, in this system we just add a new row. We'll move location 20 to a new suite within their mall, and they did it on May 1st of 2004. LeaseIdAddressAddressStartDateAddressEndDate 101234 Main Street2000-01-019999-12-31 202345 Your Mall Blvd Suite 1002000-01-012004-04-30 202345 Your Mall Blvd Suite 2052004-05-019999-12-31 Peachy! Assuming you query this table with a where clause that has the current date (i.e. CURRENT_TIMESTAMP), the next query after that row is inserted will show the new address. But here's the thing I'm wondering: do you need to capture when the change was made to the database in addition to when the new address was valid, and if so what's the best way to do it? For instance, what if this data wasn't entered until October 1st? That means that the address was being incorrectly reported all that time. But if you just look at the data in November, it looks like it's been fine all that time.It seems that capturing when the change was made would be valuable at minimum to answer the question, "why have all the shipments been going to the wrong address?" It seems intuitive to me that there are other questions that could be answered, though I can't think of any at this second. This is an easy example for illustration. I'm thinking that this would be a much bigger deal with something like sales. So you report to your investors that your retail location sold four thousand dollars worth of goods in a month, say from a table that has an entry of location sales by day. But the fact is that your cash registers were screwed up and your sales didn't get transmitted for a full week. It seems like you need to be able to say that "hey, this was what we said but this is what it really is," or "we're going to report those lost sales in the next month" or something like that. It seems like you have to set everything up so that you can both query what was "real" at a given point in time and also query what the database "said" was real at a given point in time.Is that so? If so, what's the best way to do it? I'm going to find that out. current mood:

productive (1 comment |comment on this)

11:21a - Ghost characters... I've got a join which compares many columns within a table against another table. The problem is that there are two columns in the first table which are behaving... badly. Both are varchar fields. For some records this field has a value in it (one-digit number or letter), and for others it is supposedly blank.I can test one of these blank fields with things like:ISNULL(filter6)LEN(filter6) = 0filter6 = ''filter6 = ' 'And none of those evaluate to true. However the field is supposedly blank. I copy and paste it into a text file and there's nothing there. I tested it using SELECT LEN(filter6) and got a value of 1. I want to set this value to an empty string, so that it won't screw up my joins, however for the life of me I have no idea what the hell this thing is. For reference the data in this table comes from a bcp'd text file with fixed-length fields using a format file. None of the other columns have any problems; weather they're left empty or not... just filter6 and filter7. I've also tried rtrim(ltrim()) on it to see if I can pull out any leading/trailing spaces and opening the original file in either notepad or vi to see if I can find out where the hell this came from... nothing.So, I tried to convert it a bit to see what I could get out of it.SELECT CONVERT(INT, filter6) FROM tableReturns:Server: Msg 245, Level 16, State 1, Line 1Syntax error converting the varchar value '.' to a column of data type int.But as you can probably guess, trying to update on a value of '.' doesn't really pan out either. Any ideas? current mood:

confused (8 comments |comment on this)