Monday, September 26, 2016

Tuning Very Large SQL-Server Databases - Part 3 (Index Optimization - Clustered / Non-Clustered)

This is the third in a series of three SQL-Server best practice tuning guides to help you achieve optimal performance, speed, and efficiency from your Very Large SQL-Server Databases.  Although these techniques will apply to Microsoft SQL Server, they will generally also apply to other relational database systems (RDMBS) like MySQL, Oracle, DB2, Postgres and others.

The series now continues with a focus on optimizing physical data-storage requirements (i.e., space reduction) by making smart use of Clustered Indexes when possible.

Tuning Very Large SQL-Server Databases — Part 3

In Part 1 of SQL-Server Large Database Performance Optimization Tuning and Part 2 of SQL-Server Large Database Performance Optimization Tuning, I discussed the need to Reduce Physical Storage Requirements and use proper data-normalization as the first foundation-step in achieving optimal SQL-Server processing Speed, and explored the reasons WHY a reduction in physical storage requirements was so important — whether the physical data-storage is hard drive storage, RAM device storage or SSD (Solid State Disks), RAID arrays, SAN / NAS networked storage... the need for speed implies a need to reduce storage requirements regardless.

Note: the database performance tuning concepts discussed in this series thus far, apply nearly universally to all mainstream RDBMS (Relational Database Management Systems) products — not just Microsoft SQL-Server — but also to the likes of MySQL, PostgreSQL, Oracle, IBM DB/2 (DB2), and others. And some of the concepts discussed herein (clustered indexes and such) will also apply to many database solutions too, since this concept is not unique to SQL-Server.

Now I will discuss some specific database-tuning strategies with respect to indexes in SQL Server relational databases.

Clustered vs. non-Clustered Indexes in SQL-Server

One of the most important decisions to make when defining an indexing structure and strategy throughout your SQL-Server databases is how to best make use of Clustered and/or non-Clustered indexes on individual tables.

Knowing how to best use, and when to use, a clustered index can make a serious difference in performance tuning large SQL-Server databases. Basically, if a key-column is ALWAYS increasing (in sort order) as data is added, and that column is relatively "narrow" (in bytes), make it CLUSTERED and you'll save a LOT space (since, the data for the Clustered version of an index is already PART of the same file). This is why applying clustered indexing to pre-ordered data is a great fit, and why applying clustered indexes to monotonically increasing values (like those in IDENTITY columns) makes a lot of sense — the data is physically ordered by the values on the table, and adding values to the end of a table can be done quite efficiently in most cases.

Certainly you can still use clustered indexed for data that is not pre-sorted or otherwise ordered already, but depending on the situation, this can lead to some significant inefficiencies as SQL-Server will need to physically re-order the table's data as values are added in order to arrange the data in the table by the clustered-index column(s) data values — which can make disks thrash like crazy. Keep this in mind, especially in high-volume and large database scenarios.

Here are some further points to consider regarding Clustered vs. non-Clustered Indexes and performance in SQL-Server:

  • As mentioned before, if there is less data (actual bytes) to move, regardless of what speed your system can move this data (whether in RAM or to and from disk), SQL-Server will still perform the same operation in less time — and, there is a good chance the elapsed processing time will decrease by a percentage greater than the percentage of storage-space you have saved...
  • ... so, choose an indexing strategy that prevents unnecessary data-movement operations, and/or minimizes data-movement. This will involve understanding what percentage of time your application spends performing INSERTS as opposed to SELECTS, UPDATES, and DELETES. Why does this matter? Well...
  • beware that UPDATES to Clustered-Index Columns can be painfully slow operations... perhaps the reasons are obvious; if not, let me know and perhaps I can expand upon this article in the future. Basically, just think about how, if a table is physically ordered by the column(s) that you are altering, the entire table is subject to physical re-ordering when those column values(s) change.


REAL-WORLD SQL-SERVER TUNING EXAMPLE(S)

Now I will quickly discuss a real-world (i.e., production-environment) time-savings and performance comparison I have witnessed before and after optimizing database indexing strategies, so you can get a feel for the magnitude of performance increases that are possible:

Before Tuning

I encountered a rather substantial SQL-Server database at a client site that had an Order-Lines table with millions of rows of data residing in it. The client had a custom application that performed inserts, updates, and deletions on this OrderLines table on a regular basis, and with a rather high volume of transactions.

And, though the custom application's GUI was responding within the threshold of "acceptable response time" at a few seconds (up to as much as 10-15 seconds under the heaviest loads throughout the day), the delay was bothersome (and costly) as hourly workers had to basically slow down and wait for the application to "catch up to them".


Tuning Performed

A quick examination of the database stored procedures that were used to perform Order-Line additions, updates, and deletes, revealed which database tables were involved in the order-processing maintenance business process (and related application GUI forms) and what tables and columns and indexes these procedures interacted with. Further digging revealed that within the primary Order-Lines table, there was an already-existing clustered index on that table which had been applied to a group of columns that, together, uniquely identified each OrderLine row.

I also noticed that the table in question had another OrderLineID column (a unique value, IDENTITY value, monotonically increasing auto-increment value) on it already. Perhaps it was added at a later point in the software development and database design process — who knows — but, this jumped out as an easy, immediate, on-the-spot fix opportunity that would deliver serious performance improvements.

I simply altered the database table index definitions to make the table use the IDENTITY value as the Clustered index, and the multi-column-identifying-value column-group a non-clustered index (i.e., the new post-tuning index setup was using a Clustered Index on the Identity values vs. a cluster on a combination of other columns' values — or, that is, what used to be the clustered index was now a multi-column unclustered index).

After Tuning

The results were dramatic and GUI operation responsiveness became nearly instant (as compared to multi-second response times). I also had to verify (as you always should do) that my SQL Server Database Index-tuning changes did not impact the performance of OTHER application functions — like reporting operations, ad-hoc queries, web-application access to this same data, and so on — all of which needed to be verified as having not been negatively impacted.

This testing proved out OK, as no regressions were encountered; though, thanks to the opportunity to do this regression testing, a direct outcome was that I was even able to make a few other index-tuning changes that delivered additional speed gains and performance improvements to the various portions of the database applications. This is just one simple case of looking for some "low hanging fruit" to pick when performance tuning with indexing strategy considerations in focus.

I hope you have found this a useful series of Microsoft SQL-Server database tuning and optimization information. Feel free to leave a comment to contact me if your business has a particular SQL performance-tuning challenge, or if you wish to leave any suggestions, point out errata, etc. Enjoy!

by Mike Eberhart (originally published: Jun-2007; updated Sep-2016)

Continue to read this Software Development and Technology Blog for computer programming, software development, and technology Techniques, How-To's, Fixes, Reviews, and News — focused on Dart Language, SQL Server, Nvidia CUDA, VMware, TypeScript, SVG, other technology tips and how-to's, and my varied political and economic opinions.

Sunday, September 25, 2016

Tuning Very Large SQL-Server Databases - Part 2 (Physical Storage Optimization - Continued)

This is the second in a series of three SQL-Server best practice tuning guides to help you achieve optimal performance, speed, and efficiency from your Very Large SQL-Server Databases.  Although these techniques will apply to Microsoft SQL Server, they will generally also apply to other relational database systems (RDMBS) like MySQL, Oracle, DB2, Postgres and others.

The series continues with a focus on optimizing physical data-storage requirements (i.e., space reduction), since it is perhaps the most important overall tuning consideration for large databases.

Tuning Very Large SQL-Server Databases — Part 2

In Part 1 of SQL-Server Large Database Performance Optimization Tuning, I discussed the need to Reduce Physical Storage Requirements and use proper data-normalization as the first foundation-step in achieving optimal SQL-Server processing Speed. I am now going to go deeper into explaining the reasons why this is so important and how it can make considerable speed and performance improvements possible in SQL Server relational databases.

Your SQL Database Server has Finite Resources

I do not care how modern or powerful your database server is, how many processors it has, or how much physical memory (RAM) or hard drive space it has. The fact is, your SQL-Server machine has finite resources and bandwidth / processing-power, and it is utterly foolish to waste that power on pushing around extra data when it is not necessary.

In my years of dealing with large databases for various clients, I repeatedly see management more willing to throw huge sums of money at a "hardware solution" to a performance problem instead of applying some common sense database tuning and optimization techniques. Attempting to fix a performance problem by just throwing ever larger and more powerful computers at it may work in some cases, but in many situations it will just act to postpone the inevitable performance implosion that is lurking in poorly optimized databases. Certainly your database server equipment needs to be up to the task, but the "task" should not include performing completely avoidable disk I/O and extraneous processing.

I don't want to get too deep into how to configure an optimal SQL-Server hardware solution here, but the general things you will need to be aware of when tuning and optimizing for very large SQL databases include:
  • First and foremost, you will need to understand the various (interacting) limitations of Microsoft Operating Systems and Microsoft SQL Server versions. Both the Windows Server Operating System and the database product impose limits on the physical resources that can be utilized by your database application.  There are "Standard" and "Enterprise" (and other) versions of each product, and they each impose limitations on the number of processors and RAM that will be made available to your database, not to mention the availability or lack of other substantial features of the database or OS software itself.
  • Hardware: When I originally wrote this how-to article back in 2007, CPUs were no where near as advanced as they are in 2016, and RAM memory was considerably more expensive, as were disk drives, and SSDs were seen as ridiculously expensive, and things like virtual servers and cloud-based SQL-Server solutions were rather in their infancy. Regardless, some basics remain the same about how to choose appropriate hardware for your SQL-Server requirements.

    Processors (CPUs) — SQL Server can really benefit from multiple CPUs and processor cores. At a minimum, you should have a dual-processor server if you really want to improve performance.

    Likewise, RAM is your friend, and SQL-Server's friend too. Get as much as you can afford, and be sure to couple the version of SQL-Server and Windows Server with the amount of RAM in your system.

    Disk Space — you had better have a RAID array of some sort, or an ultra high-speed SAN or such for your physical storage device (if you are not simply using some hosted cloud solution that takes care of all this for you). And, though you will need a large amount of space, increasing the physical number of hard-drives (whether old-school spindles, or modern SSDs) is how to really start boosting performance. I.e., if you have your database on one or two huge disks, instead of a dozen or more (probably smaller) drives in an array that in aggregate offers the space you need, chances are you'll quickly reach the limitations of the disk subsystem's throughput.

    Note: our main focus is to keep physical disk I/O to a minimum, and use RAM wherever possible since physical disk I/O, even on the fastest drive arrays, is many orders of magnitude slower than going to RAM for data.

    Network — if you'll be moving large quantities of data to and from the database server machine(s), you best have a modern ultra-high-speed network in place between those machines.


Moving Data around is Costly, and occurs within the Confines of your Hardware

I've focused on minimizing the data-storage requirements in your massive SQL-Server database for the sole purpose of increasing performance. The primary way this strategy improves performance is by reducing the amount of data that is moved around, especially to and from (slow) disk drives.

Here are some explanations for how minimizing data-storage requirements increases performance and speed within your SQL Server database solution:
  • Just think about it: if you have less data (actual bytes) to move, regardless of what speed your system can move this data (whether in RAM or to and from disk), means that your machine will be able to perform the same operation in less time — and, there is a good chance the elapsed processing time will decrease by a percentage greater than the percentage of storage-space you have saved...
  • ...this is especially true if the reduced data-size causes operations that otherwise were unable to be performed in RAM to now fit within the confines of your server's RAM. If disk-swapping has been eliminated, performance improvements will be stellar.
  • Every SQL-Server operation, whether you are performing a SELECT, INSERT, UPDATE, or DELETE, must move data between RAM and physical storage. Smaller data equals faster movement and higher row-throughput per time-slice. Period. I am not ignoring caching here either: keep in mind that even data retrieved from any cache is some type of physical storage, even if it is other locations in RAM, and data is still going to move around.
  • If an operation is logged, as are any operations that modify data, the amount of data that must be written to the log will also be reduced, and thus speed up your database operation. Optimizing your database design so as to reduce physical storage requirements can therefore also help keep your transaction-log size under control.
  • Index-widths (in bytes) should be smaller, thus reducing storage requirements, and even more so allowing much more index information to be cached in RAM at any given time — this is crucially important for the speed of JOIN operations, sorting, and much more.
  • Any comparison operations will be faster. And, all sorts of operation are comparison operations, including sorts (ORDER BY) operations. Things like JOIN operators, and ORDER BY clauses, and WHERE conditions — these all require comparisons to be performed. And, the less data to compare, the smaller the amount of resources required to perform the comparison (RAM / Disk). If you've done a great job of reducing physical data-storage requirements, these operations will all run much faster.
  • Backup jobs, and even replication, will run much faster, since there is much less data to write out to tape or copy to another storage device. Likewise, data-recovery and transfer is much more efficient.

REAL-WORLD SQL-SERVER TUNING EXAMPLE(S)

Now I will quickly discuss a real-world (i.e., production-environment) time-savings and performance comparison I have witnessed before and after optimizing data-storage requirements, so you can get a feel for the magnitude of performance increases that are possible:

Before Tuning

I was confronted with a SQL-Server database a client had, where they basically "hit a wall" with their internally designed database and stored procedures. Their (relatively new) database was growing at such a phenomenal pace, both in row-counts and physical size, that their database-server hardware was being pushed to its limits, and some reporting processes were unable to finish in less than a couple hours.

There were a few core tables, each in excess of 50 million rows, that were involved in the queries which were so long running. The total database size, with indexes, was pushing past 45 Gigabytes, and growing by nearly 10 Gigabytes a month (and, Indexes were about ⅓ of the total DB size). This thing was a serious time-bomb ready to implode, and it was (of course) meant to be a data-warehouse critical to management-reporting in a Fortune-100 firm.

Tuning Performed

This was one of the largest databases I had ever seen built by someone with absolutely no clue how to properly normalize data and use the minimum amount of storage space possible. Though the few core tables had relatively "narrow" (byte-width) rows (just a couple hundred characters per row), a large part of each row was used as an index (NON-Clustered), and the bulk of the columns in this index were VarChar fields where the columns were mostly filled with data on each row. There were columns like "DivisionName" and "OfficeName" and the like in this Index, each containing data like "Midwest US" and "Houston, Texas Branch". I couldn't even believe it when I saw it.

So, I began by ripping those Varchars out, and replacing them with a TinyInt foreign-key to a Divisions table I made (there were only like eight total values!), and a SmallInt foreign-key to an Offices table I made (with less than a thousand offices). Needless to say, the database size took a large step downward in size. I continued with this type of work...

After Tuning

By the time I was done, the database was only a third of its original size. And, I rewrote the reporting process, that was taking hours to run, to now join on the various Integer columns (vs. all the varchars) and only bring in the "lookup data" (Office Names, etc) when needed at the final output stage.

As soon as I broke through the size-threshold where much more of the information required to perform complex JOIN conditions could fit in the database server's 4 Gigabytes of RAM (which was relatively large RAM back in 2007), performance was now incredible. Procedures that took hours to run before were now running in a minute or less.

This all required very little coding changes, and just some logical application of normalization techniques. The biggest challenge was getting enough processing time to convert the existing production database to the new layout, since it's poor design and massive size made alterations a very slow process — the production server didn't have enough disk-space to allow the conversion to take place at first even!

More such examples to come, as I move forward with SQL Server Performance tuning through proper database design, normalization, and even Transact-SQL coding...

Continue with Part 3 of SQL-Server Database Tuning now.


by Mike Eberhart (originally published: May-2007; updated Sep-2016)

Continue to read this Software Development and Technology Blog for computer programming, software development, and technology Techniques, How-To's, Fixes, Reviews, and News — focused on Dart Language, SQL Server, Nvidia CUDA, VMware, TypeScript, SVG, other technology tips and how-to's, and my varied political and economic opinions.

Saturday, September 24, 2016

Tuning Very Large SQL-Server Databases - Part 1 (Physical Storage Optimization)

This is the first in a series of three SQL-Server best practice tuning guides to help you achieve optimal performance, speed, and efficiency from your Very Large SQL-Server Databases.  Although these techniques will apply to Microsoft SQL Server, they will generally also apply to other relational database systems (RDMBS) like MySQL, Oracle, DB2, Postgres and others.

The series begins with a focus on optimizing physical data-storage requirements (i.e., space reduction), since it is perhaps the most important overall tuning consideration for large databases.

Tuning Very Large SQL-Server Databases — Part 1

SQL-Server Performance Optimization and SQL-Server Speed Optimization should always be a primary consideration during the database design phase, regardless of the size of the relational database — with the exception perhaps of "tiny" or "trivial" databases where future growth is guaranteed not to be an issue. And, very large databases (in excess of a Gigabyte of data, or millions of rows of data in any particular table) definitely require the SQL-Server database designer / administrator to consider the ramifications of proper (or improper) database design. Perhaps this seems obvious, but given my many years of experience and encounters with large corporate databases, what should be obvious is anything but obvious to many developers, designers, and administrators that implement these massive databases.

Most of this discussion applies to all modern Microsoft SQL-Server versions (SQL 2008, 2012, 2014, 2016) as well as versions as early as SQL-Server 7.0, 2000, 2005. The newest versions of SQL-Server, especially the full-featured Enterprise editions, may also offer you additional features of relevance like built-in data-compression. But, even fancy features like data-compression are no substitute for proper design.

Reduce Physical Storage Requirements

The first thing I always make sure of when designing databases, and when tuning large SQL-Server databases for performance, is that physical storage space is not being wasting with data-types that are overly precise or excessively large for their intended function, and that data is properly normalized. If you don't understand the concepts of normalization, please study up on it before reading further (or pretending you can do your job as a SQL-Server developer or DBA).

Keep in mind that this discussion is all to be considered in the context of very large databases, where there will be millions of rows of data in one or more tables. In fact, you may be working with tens of millions or hundreds of millions of rows of data. As such, every byte of storage saved on a per-row basis will potentially result in Megabytes saved (in reduced physical storage), or perhaps even gigabytes saved in the overall table and database size.

I will now present some very simple rules for reducing physical storage requirements and some (storage space) savings-opportunity examples to consider:
  • Do not use NVarchar or NChar (vs. Varchar and Char respectively) unless you need Unicode character support. Using NVarchar (or NCHAR) where a standard Varchar/Char will suffice will double the storage requirements for a column, since Unicode characters map to double-byte pairs (to accommodate all those foreign language characters and extended symbols, etc).

    Savings Opportunity: I have encountered NVarChar use on columns where there is an Alphanumeric "TransactionID" on millions of rows of data, but where that AlphaNumeric value is most certainly only containing the 10 digits and the standard 26 letters of the English alphabet. Instant space savings!
  • Use SmallDateTime instead of DateTime, unless you really need the precision. SmallDateTime fields are accurate to the nearest minute, whereas full DateTime types are accurate to 3.33 millisecond intervals. And, that extra precision comes with the price tag of double the storage — SmallDateTime requires 4 Bytes instead of 8 Bytes for DateTime.

    Savings Opportunities: if you see fields like "AccountingPeriodBeginDate", it should be obvious that an accounting period is going to start on a particular day boundary, and SmallDateTime will more than suffice. I also regularly see fields like "LastChangeDate" on tables in large databases where, upon inquiring, nobody can provide any reason why any accuracy beyond a particular minute is needed, but yet the fields are defined as full DateTime fields.
  • Use the smallest Integer data-types that will safely and comfortably handle all possible values in an integer-data column. SQL-Server provides a few datatypes for this, including bigint, int, smallint, and tinyint. Each has their storage limitations and storage-space requirements. TinyInt takes just one byte, but can only store values ranging from 0-255 (i.e., 2^8 values), whereas SmallInt is two bytes in length, and can store 2^16 values, Int is 4-bytes and holds 2^32 values, and BigInt is 8-bytes and holds 2^64 values (see the pattern?)

    Savings Opportunity: look for columns that have an obviously small number of values, but whose column-definitions are grossly oversized. E.g., if you see a field like (accounting) "Quarter" defined as an Int, TinyInt can easily hold the four possible values (Quarter 1, 2, 3, or 4) and saves 3bytes/row. Or, perhaps you'll see UserID defined as an Int where there's no way in the next 100 years the application will have over 10 or 20,000 users — this sure looks like a good situation to downsize to SmallInt and save half the space!
  • As with Integer types, the money data-types come in a large and small size — Money and SmallMoney respectively; the first of which takes 8-bytes to store, and the latter which takes only 4 bytes. SmallMoney values are limited to roughly +/-214,000.0000 compared to hundreds of trillions for Money.

    Savings Opportunity: if you have per-unit Costs, Charges, or the like stored in your database columns where the price is never going to even begin to approach that $214,000 value, use SmallMoney and cut your storage by half. E.g., if you are storing the price of items sold in your convenient-store locations, or even the price-per-gallon of gasoline you sell, this limitation should suffice for the foreseeable future.
  • Decimal and Numeric data-types provide a bit more complexity when analyzing the proper "best" size for the fields. See the SQL-Server Books Online to size appropriately, since the maximum total number of decimal digits that can be stored, both to the left and to the right of the decimal point, will determine the storage-bytes required. You want to size your columns in a way that they will always be large enough to hold the values you will place in them, without over-sizing them.
  • Variable-length fields where fixed-length should be used. Varchar fields require 2 extra bytes in addition to the the defined maximum field-width in order to store a value which holds the characters (used) count. Varchars are great for when there really is heavy variation in the actual length of data being stored. But, they do not make sense if you'll always be filling all bytes in the field (or even all bytes minus two).

    Savings Opportunity: perhaps one of the most ridiculous things you will encounter is a one or two character field that is defined as Varchar(1) or (2). This is one sure way to make a one-character field take 3-bytes of storage per row. I have seen this repeatedly with things like "Flag" fields or "Code" fields people design into table structures. I personally think the RDBMS should not allow a user to even define Variable-width columns of just one or two characters.

    Note: SQL-Server has one oddity to watch out for to ensure that Char fields are truly treated as fixed-length fields — as stated in the Books Online: "If SET ANSI_PADDING is OFF when either CREATE TABLE or ALTER TABLE is executed, a char column that is defined as NULL is handled as Varchar"... just so you are aware (note: I find this wacky to say the least - if I define something as Char, I really mean Char and not Varchar)
  • Look for obvious cases where proper data-normalization is being ignored. Quite often, you'll encounter massive data-repetition in tables with millions or tens/hundreds of millions of rows. The key to speed is to make that repetitive data take up as little space as possible (without employing some data-compression algorithm that is going to kill performance on save/retrieves - in case you were thinking about such a thing).

    Savings Opportunity: start by looking for just plainly blatant ignorant denormalization. I have seen so many massive tables where things like "LastUpdateUser" is a 10 or 20 character (if not Unicode character) field in databases where those last-update "Users" are a finite group of individuals that can be counted on both hands. Talk about a waste! If you encounter this type of situation, it's time to create a "UpdateUsers" table (or whatever) with an Integer primary-key (in this example, a TinyInt) and a field called "UserName" or whatever, and replace the "LastUpdateUser" on the massive-rows table with "LastUpdateUserID" that is a TinyInt column (foreign-key to the new UpdateUsers Table). This will save a ton of space! And, there is nearly zero cost in joining to such a small table later if you really need to get the name associated with the ID.

    Another Savings Opportunity: there is another technique I have employed to tune very large tables (in excess of 50 million rows/table) that may get a bit tougher to initially grasp. But, this technique can be a huge space and processing-time saver. It involves a bit of data-analysis, and thinking "out of the box". Basically, I will look for data-patterns that repeat, and whose unique (pattern) values are finite though not instantly apparent — especially when the patterns span multiple columns on the same table. I've routinely discovered that the unique values from a combination of few fields may only number in the thousands, even though the total table rows exceeds 50 or 100 million. When I find these, and establish the reason for the pattern (and ensure my logic will accommodate future conditions too), I will pull the unique field-combinations into a "lookup" table and create a small FK (SmallInt or Int quite often) to point to those combinations (and replace the column(s) whose data I have moved to a lookup table). Why? I'll get into this in the next section about why to do any of this space-tuning, but the bottom line is speed and efficiency. If you have a particular situation you need tuning help with, contact me.

If you begin with the above space-savings guidelines, chances are you'll find a way to make some significant storage-space reductions in your SQL-Server databases. And, you will have made the first steps towards performing some serious speed-optimizations in your database as well.

One note of caution when making any database changes: be sure you understand the ramifications beyond the scope of the database. Many of the changes I've outlined will be more or less transparent to other applications, depending on what (languages) the applications are developed in, and how the applications access the data (note: I always recommend that applications do not have direct access to the data-tables, but rather access data through stored procedures — for many reasons, one of which is to allow simplified maintenance to the underlying database tables without impacting other applications).

If you don't know what side-effects your changes could cause for Graphical User Interfaces (GUI applications) or reporting applications, and so on are, then be sure to involve someone that does. E.g., When changing Int fields to SmallInt, a Borland (aka CodeGear) Delphi based program most likely won't care, but C# data-bindings may have issues on an ASP.NET page or such. So, be careful. From what I have seen, Delphi as well as C# all seem fine with SmallDateTime in place of full DateTime fields in many cases, but it can also depend on how customized an application is.

Computing / Estimating Storage Requirements

To realize how much potential for storage-space savings there is, you will want to be able to quickly rough-out some approximations based on the columns in table(s), the number of bytes used per-column, and the number of rows per table.

In theory, you should be able to quickly estimate a Table's (physical storage) size just by knowing the number of rows, and multiplying that by the total of all columns' storage requirements. This can be put into a simple formula something like this (I didn't put ALL data types in here, but you should get the idea), where you just multiple the number of rows in a table by:
(NVarChar-field-widths-in-aggregate * 2 + #-of-Nvarchar-fields * 2) +
(VarChar-field-widths-in-aggregate + #-of-varchar-fields*2) +
(NChar-field-widths-in-aggregate * 2) +
(char-field-widths-in-aggregate) +
BigInts*8 + Int's*4 + SmallInts*2 + TinyInts*1 +
DateTimes*8 + SmallDateTimes*4 +
Money*8 + SmallMoney*4 +
(Total-bit-fields-DIV-8 + 1)
This calculation does not take into consideration Keys or Indexes, but it will give you the approximate data-space requirements. And, it's a good way to quickly estimate reductions in storage space too. For the data-types I didn't cover, see the Microsoft SQL-Server books-online, which give a description of how much each data-type takes for storage - see the "Data Types" (Transact SQL) section.

Continue with Part 2 of SQL-Server Database Tuning now.


by Mike Eberhart (originally published: May-2007; updated Sep-2016)

Continue to read this Software Development and Technology Blog for computer programming, software development, and technology Techniques, How-To's, Fixes, Reviews, and News — focused on Dart Language, SQL Server, Nvidia CUDA, VMware, TypeScript, SVG, other technology tips and how-to's, and my varied political and economic opinions.

Friday, September 23, 2016

SQL SERVER Function (UDF) to Parse / Split string of delimited Integers into Table

SQL Server UDF to Parse / Split string of delimited Integers into Table

Here is an incredibly useful Microsoft SQL-Server Transact-SQL (T-SQL) user-defined function (UDF) for splitting a string of delimited integer values into a table result.

This is very handy if you have a list of integer index or primary-key values that you want to retrieve results from a table for. Just pass a concatenated and delimited string of integer values to this function, and JOIN to the results of this function or use IN within the SELECT condition, etc.

For example, @vIDs is a string of comma-separated values, like ''1,2,4,22,101'' :

CREATE PROCEDURE uspDoWhatever
    @vIDs    VARCHAR(8000)
AS

SELECT [column list]
  FROM [SomeTable]
  WHERE [intColumnName] IN (SELECT * from dbo.udfGetDelimitedIntegerListAsTable(@vIDs, '',''))

This comes in very handy when working with one-to-many relationships in UIs, whether web-based or otherwise, and other applications.

I will add links here to additional related functions and procedures.

SQL-Server User Defined Function (UDF) Source Code

--********************************************************************************
--This source code is Copyright (c) 2007-2016
--     Author: Mike Eberhart
--
--I hereby release this code under the terms of the MIT License (for freeware).
--
--Permission is hereby granted, free of charge, to any person obtaining a copy
--of this software and associated documentation files (the "Software"), to deal
--in the Software without restriction, including without limitation the rights
--to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
--copies of the Software, and to permit persons to whom the Software is
--furnished to do so, subject to the following conditions:
--
--The above copyright notice and this permission notice shall be included in
--all copies or substantial portions of the Software.
--
--THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
--IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
--FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
--AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
--LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
--OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
--THE SOFTWARE.
--********************************************************************************
--**********************************************************************************************
-- This function takes a string of delimited numbers and explodes them out into a table.  
-- By default, the delimiter is set to '|' if another is not specified in the second parameter.
--
-- The function also selects any "missing" values as NULL.
-- E.g., if a comma-delimited string to parse is '1,,,2' the function assumes each (missing) 
-- value between consecutive commas represents a NULL value.
-- The calling code can turn NULLs into whatever is needed.
--
-- NOTE:There is no error testing for malformed strings that contain non-numeric values other 
-- than the defined delimiter.  Also, currently just setup for INT values at max (vs. BIGINT).
--
-- SOME EXAMPLES / TEST CASES:
-- ===========================
--SELECT * FROM dbo.[udfGetDelimitedIntegerListAsTable](NULL, ',');       --No rows returned
--SELECT * FROM dbo.[udfGetDelimitedIntegerListAsTable](NULL, NULL);      --No rows returned
--SELECT * FROM dbo.[udfGetDelimitedIntegerListAsTable]('1', NULL);       --No rows returned
--SELECT * FROM dbo.[udfGetDelimitedIntegerListAsTable]('', ',');         --No rows returned
--SELECT * FROM dbo.[udfGetDelimitedIntegerListAsTable]('1', DEFAULT);    --Returns ONE value
--SELECT * FROM dbo.[udfGetDelimitedIntegerListAsTable]('1|2|3', DEFAULT);--Returns THREE values
--SELECT * FROM dbo.[udfGetDelimitedIntegerListAsTable]('1,2,3', ',');    --Returns THREE values
--SELECT * FROM dbo.[udfGetDelimitedIntegerListAsTable]('1,', ',');       --Return '1' and NULL
--SELECT * FROM dbo.[udfGetDelimitedIntegerListAsTable]('1,2,', ',');     --Two values and NULL
--SELECT * FROM dbo.[udfGetDelimitedIntegerListAsTable](',', ',');        --Return TWO NULLs
--SELECT * FROM dbo.[udfGetDelimitedIntegerListAsTable]('1,2,3', DEFAULT);--Conversion Error!
--**********************************************************************************************
CREATE FUNCTION [udfGetDelimitedIntegerListAsTable]
(
  @IntegerList  VARCHAR(MAX),
  @Delimiter    CHAR(1) = '|'
)
RETURNS @IntegersTable TABLE (IntegerFromList  INT)
AS
BEGIN
  IF (@IntegerList IS NULL) OR (LEN(@IntegerList) = 0) OR (@Delimiter IS NULL) RETURN;

  DECLARE  @DelimPos INT = PATINDEX('%' + @Delimiter + '%', @IntegerList);

  WHILE @DelimPos <> 0
  BEGIN
    --If nothing between delims, save as NULL
    IF LEN(SUBSTRING(@IntegerList, 1, @DelimPos - 1)) = 0
      INSERT INTO @IntegersTable(IntegerFromList)  VALUES(NULL);
    ELSE
      INSERT INTO @IntegersTable(IntegerFromList)
      VALUES(CONVERT(INT, SUBSTRING(@IntegerList, 1, @DelimPos - 1)));

    SET @IntegerList = SUBSTRING(@IntegerList, @DelimPos + 1, LEN(@IntegerList));
    SET @DelimPos = PATINDEX('%' + @Delimiter + '%', @IntegerList);
  END --While...

  --If no additional chars after a final delim, treat as an additional NULL
  IF LEN(@IntegerList) = 0
    INSERT INTO @IntegersTable(IntegerFromList)  VALUES(NULL);
  ELSE
    INSERT INTO @IntegersTable(IntegerFromList)  VALUES(CONVERT(INT, @IntegerList));

  RETURN;

END --Function


Continue to read this Software Development and Technology Blog for computer programming, software development, and technology Techniques, How-To's, Fixes, Reviews, and News — focused on Dart Language, SQL Server, Nvidia CUDA, VMware, TypeScript, SVG, other technology tips and how-to's, and my varied political and economic opinions.

Wednesday, September 21, 2016

SQL-Server Set-Based Running Subtotals (i.e., Row-Level Accumulators) Example 2 - Multiple Simultaneous Running-Totals

SET-BASED Multiple Simultaneous Running Subtotals in SQL

Have you ever wanted to perform a set-based running-subtotals and/or running-totals operation in SQL-Server (i.e., have row-level accumulators store a running total or running subtotal?) Want the solution to not use cursors? Don't have access to SQL-Server 2012 or newer yet?

Here I present a cursorless method of implementing such functionality in a SQL-Server Query, Stored-Procedure, or User Defined Function (UDF) that works with all versions of SQL-Server (SQL-Server 2005, 2008, 2008r2, 2012, 2014, 2016...) without requiring SQL2012-specific OVER / PARTITION BY functionality. I have seen running totals done before, with varied methods, but what I have not seen much about elsewhere is how to do running subtotals. The code that appears below does both running totals and running subtotals. It is easily adapted to do multi-column break-value running subtotals, as noted in the code comments.

This example builds off of Set-Based Running Subtotals (i.e., Row-Level Accumulators) Example 1, which introduces the techniques shown here (and more), and includes more discussion of the solution(s) — see the comments in the example source code.

SQL-Server 2012 (or newer) Note: once SQL-Server 2012 became widely available, this example was updated to show how to implement the running totals and running subtotals using SQL-Server 2012 OVER and PARTITION BY (windowing functionality) clauses that are ANSI standard SQL available to Microsoft Transact-SQL query writers now. See the lower portion of the source code for a solution using these modern SQL features. Included are performance (run-time) comparison values for both solution methods.

Note: for this example, I used the SQL Server 2012 Adventureworks sample database, which provided me with a test case of approximately 113,000 rows against which I perform the running-subtotal (by product level) logic and do my row-level accumulator manipulations (using SET logic). This query is relatively efficient and averaged just under 3 seconds to execute withing a VM (virtual machine) on my single-CPU Intel Core-i7 development desktop machine.

SQL-Server Source Code

--********************************************************************************
--This source code is Copyright (c) 2007-2016
--     Author: Mike Eberhart
--
--I hereby release this code under the terms of the MIT License (for freeware).
--
--Permission is hereby granted, free of charge, to any person obtaining a copy
--of this software and associated documentation files (the "Software"), to deal
--in the Software without restriction, including without limitation the rights
--to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
--copies of the Software, and to permit persons to whom the Software is
--furnished to do so, subject to the following conditions:
--
--The above copyright notice and this permission notice shall be included in
--all copies or substantial portions of the Software.
--
--THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
--IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
--FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
--AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
--LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
--OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
--THE SOFTWARE.
--********************************************************************************
--**********************************************************************************************
-- BEGIN: SET-BASED Running-Subtotals Technique #2 - Multiple Simultaneous Running-Totals
--**********************************************************************************************
CHECKPOINT; --to produce a cold buffer cache
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS; -- Remove all buffers from the buffer pool / clear data cache
DBCC FREEPROCCACHE    WITH NO_INFOMSGS; -- Clears the procedure (plan) cache

SET NOCOUNT ON;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

DECLARE @starttime DATETIME = GetDate();

--Accumulator variables
DECLARE @RunningTotal           MONEY = 0.0;
DECLARE @RunningTranDtSubtotal  MONEY = 0.0;
DECLARE @RunningProductSubtotal MONEY = 0.0;

--Our break-level-value variables (Data-Types match those of the columns we are comparing to)
--Initialize these to some values that will NOT exist in each break-column's actual data.
DECLARE @SubTotalBreakValue1    CHAR(8) = '18010101';
DECLARE @SubTotalBreakValue2    INT = -1;


--Our work-table, where we can ensure the ORDER with which we later access the data, being
--by default, the order of the PRIMARY KEY declared here in UniqueID column.
DECLARE @ResultsTable TABLE
(
    UniqueID                INT IDENTITY NOT NULL PRIMARY KEY,
    SubtotalBreakColumn1    CHAR(8),
    SubtotalBreakColumn2    INT,
    ActualCost              SMALLMONEY,
    RunningTranDtSubtotal   MONEY,
    RunningProductSubtotal  MONEY,
    RunningTotal            MONEY
);

--Insert all values to be totaled, into our work table in the REQUIRED BREAK-LEVEL(S) ORDER
INSERT INTO @ResultsTable
(
    SubtotalBreakColumn1,
    SubtotalBreakColumn2,
    ActualCost,
    RunningTranDtSubtotal,
    RunningProductSubtotal,
    RunningTotal
)
SELECT
    dbo.[udfConvertDateToString](TransactionDate),
    ProductID,
    ActualCost,
    0,
    0,
    0
FROM
    Production.TransactionHistory
ORDER BY
    TransactionDate, ProductID    --Insert into table in our subtotal-break order (IMPORTANT!)
;

--**********************************************************************************************
-- ALGORITHM EXPLANATION:
-- See SET-BASED RUNNING SUBTOTALS Example 1, which this builds upon as discussed 
-- in the comments therein.
--**********************************************************************************************
UPDATE
    @ResultsTable
SET
    @RunningTotal           = RunningTotal          = @RunningTotal + ActualCost,
    @RunningTranDtSubtotal  = RunningTranDtSubtotal =
        CASE 
            WHEN @SubTotalBreakValue1 = SubtotalBreakColumn1 
                THEN @RunningTranDtSubtotal + ActualCost
            ELSE ActualCost 
        END,
    @RunningProductSubtotal = RunningProductSubtotal =
    CASE 
        WHEN ( @SubTotalBreakValue1 = SubtotalBreakColumn1 AND 
                @SubTotalBreakValue2 = SubtotalBreakColumn2  ) 
            THEN @RunningProductSubtotal + ActualCost 
        ELSE ActualCost 
    END,
    @SubTotalBreakValue1    = SubtotalBreakColumn1,
    @SubTotalBreakValue2    = SubtotalBreakColumn2
;

--Output the results... format to fit printout better.
SELECT
    UniqueID,
    SubtotalBreakColumn1                    AS 'DateBreak',
    CONVERT(CHAR(6),SubtotalBreakColumn2)   AS 'ProdID',
    ActualCost,
    RunningTranDtSubtotal                   AS 'RunningDtSub',
    RunningProductSubtotal                  AS 'RunningProdSub',
    RunningTotal
FROM
    @ResultsTable;


PRINT dbo.[udfUtility_PrintElapsedRuntimeDebugInfo]('Custom Multiple-Running-Subtotals / Total Example', @starttime, DEFAULT);
--**********************************************************************************************
--Using the AdventureWorks DB (with SQL2005), you'll see a large set with data like this:
--**********************************************************************************************
--UniqueID DateBreak ProdID ActualCost   RunningDtSub      RunningProdSub    RunningTotal
---------- --------- ------ ------------ ----------------- ----------------- ---------------------
--1        20030901  3      0.00         0.00              0.00              0.00
--2        20030901  316    0.00         0.00              0.00              0.00
--...
--...
--25       20030901  534    0.00         0.00              0.00              0.00
--26       20030901  707    20.994       20.994            20.994            20.994
--27       20030901  707    20.994       41.988            41.988            41.988
--28       20030901  707    20.994       62.982            62.982            62.982
--...
--...
--75       20030901  707    34.99        1105.684          1105.684          1105.684
--76       20030901  707    34.99        1140.674          1140.674          1140.674
--77       20030901  708    34.99        1175.664          34.99             1175.664
--78       20030901  708    34.99        1210.654          69.98             1210.654
--79       20030901  708    34.99        1245.644          104.97            1245.644
--...
--...
--3942     20030901  999    388.7928     1674280.9813      7775.856          1674280.9813
--3943     20030901  999    388.7928     1674669.7741      8164.6488         1674669.7741
--3944     20030901  999    388.7928     1675058.5669      8553.4416         1675058.5669
--3945     20030902  3      0.00         0.00              0.00              1675058.5669
--3946     20030902  316    0.00         0.00              0.00              1675058.5669
--...
--...
--113437   20040903  929    37.086      1086.456          37.086             27307057.0617
--113438   20040903  930    43.0395     1129.4955         43.0395            27307100.1012
--113439   20040903  931    34.3455     1163.841          34.3455            27307134.4467
--113440   20040903  931    34.8705     1198.7115         69.216             27307169.3172
--113441   20040903  932    39.7635     1238.475          39.7635            27307209.0807
--113442   20040903  932    39.2385     1277.7135         79.002             27307248.3192
--113443   20040903  948    82.8345     1360.548          82.8345            27307331.1537


--**********************************************************************************************
-- END: SET-BASED Running-Subtotals Technique #2 - Multiple Simultaneous Running-Totals
--
-- Average run-time against source data count of 113K rows in AdventureWorks2012 are decent: 
-- 2980ms (3.0 seconds)
--**********************************************************************************************



--**********************************************************************************************
--**********************************************************************************************
-- BEGIN: alternative SET-BASED Running-Subtotal and Running-Totals Technique using 
-- "OVER" (windowing) functionality introduced in SQL-Server 2012.
--**********************************************************************************************
--**********************************************************************************************
CHECKPOINT; --to produce a cold buffer cache
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS; -- Remove all buffers from the buffer pool / clear data cache
DBCC FREEPROCCACHE    WITH NO_INFOMSGS; -- Clears the procedure (plan) cache

--DECLARE @starttime DATETIME = GetDate();
SET  @starttime = GetDate();

SELECT
    dbo.[udfConvertDateToString](TransactionDate) AS DateBreak, --show without zero'd HHMMSS info
    ProductID AS ProdID, 
    ActualCost, 
    SUM(ActualCost) OVER
        (
        PARTITION BY TransactionDate
        ORDER BY TransactionDate, ProductID
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS RunningDtSub,
    SUM(ActualCost) OVER
        (
        PARTITION BY ProductID
        ORDER BY TransactionDate, ProductID
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS RunningProdSub,
    SUM(ActualCost) OVER
        (
        PARTITION BY @@SERVERNAME
        ORDER BY TransactionDate, ProductID
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS RunningTotal
FROM
    Production.TransactionHistory
ORDER BY
    --Use @@SERVERNAME as first column in order by so we can create a "partition" of the ENTIRE set for Running-TOTAL.
    --ORDER BY requires non-constant-values for columns, and @@SERVERNAME yields a "constant" (without being flagged as such) 
    @@SERVERNAME,
    TransactionDate, 
    ProductID;

PRINT dbo.[udfUtility_PrintElapsedRuntimeDebugInfo]('OVER / PARTITION BY Multiple-Running-Subtotals / Total Example', @starttime, DEFAULT);
--SELECT DateDiff(ms, @starttime, GetDate()); --Display elapsed Milliseconds 
--**********************************************************************************************
-- END: "OVER" (WINDOWING FUNCTINO) BASED Runing-Total/Running-Subtotals Technique, 
-- which produces same result (NOTE: requires SQL-Server 2012+)
--
-- 12200ms (1.2 sec) average : FAST!
-- Being "native" functionality in SQL-Server 2012, it is not surprising this approach produces
-- the SPEEDIEST EXECUTION TIME. 
--
-- But, for a non-native and/or pre-SQL2012 solution, our original approach we laid out (above)
-- is perhaps as performant as possible while being relatively easy to implement. 
--**********************************************************************************************


Continue to read this Software Development and Technology Blog for computer programming, software development, and technology Techniques, How-To's, Fixes, Reviews, and News — focused on Dart Language, SQL Server, Nvidia CUDA, VMware, TypeScript, SVG, other technology tips and how-to's, and my varied political and economic opinions.

Tuesday, September 20, 2016

SQL-Server Set-Based Running Subtotals (i.e., Row-Level Accumulators) Example 1

SET-BASED Running Subtotals in SQL

Have you ever wanted to perform a set-based running-subtotals and/or running-totals operation in SQL-Server (i.e., have row-level accumulators store a running total or running subtotal?) Want the solution to not use cursors? Don't have access to SQL-Server 2012 or newer yet?

Here I present a cursorless method of implementing such functionality in a SQL-Server Query, Stored-Procedure, or User Defined Function (UDF) that works with all versions of SQL-Server (SQL-Server 2005, 2008, 2008r2, 2012, 2014, 2016...) without requiring SQL2012-specific OVER / PARTITION BY functionality. I have seen running totals done before, with varied methods, but what I have not seen much about elsewhere is how to do running subtotals. The code that appears below does both running totals and running subtotals. It is easily adapted to do multi-column break-value running subtotals, as noted in the code comments.

SQL-Server 2012 (or newer) Note: once SQL-Server 2012 became widely available, this example was updated to show how to implement the running totals and running subtotals using SQL-Server 2012 OVER and PARTITION BY (windowing functionality) clauses that are ANSI standard SQL available to Microsoft Transact-SQL query writers now. See the lower portion of the source code for a solution using these modern SQL features. Included are performance (run-time) comparison values for both solution methods.

Note: for this example, I used the SQL Server 2012 Adventureworks sample database, which provided me with a test case of approximately 113,000 rows against which I perform the running-subtotal (by product level) logic and do my row-level accumulator manipulations (using SET logic). This query is relatively efficient and took just 2 seconds to execute withing a VM (virtual machine) on my single-CPU Intel Core-i7 development desktop machine.

For a more complex running-totals / running-subtotals example that builds off this one, see Set-Based Running Subtotals (i.e., Row-Level Accumulators) Example 2 — Multiple Simultaneous Running-SubTotals, which builds upon the techniques shown here.

SQL-Server Source Code

--********************************************************************************
--This source code is Copyright (c) 2007-2016
--     Author: Mike Eberhart
--
--I hereby release this code under the terms of the MIT License (for freeware).
--
--Permission is hereby granted, free of charge, to any person obtaining a copy
--of this software and associated documentation files (the "Software"), to deal
--in the Software without restriction, including without limitation the rights
--to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
--copies of the Software, and to permit persons to whom the Software is
--furnished to do so, subject to the following conditions:
--
--The above copyright notice and this permission notice shall be included in
--all copies or substantial portions of the Software.
--
--THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
--IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
--FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
--AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
--LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
--OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
--THE SOFTWARE.
--********************************************************************************
--**********************************************************************************************
-- BEGIN: SET-BASED Running-Subtotals Technique #1
--**********************************************************************************************
CHECKPOINT; --to produce a cold buffer cache
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS; -- Remove all buffers from the buffer pool / clear data cache
DBCC FREEPROCCACHE    WITH NO_INFOMSGS; -- Clears the procedure (plan) cache

SET NOCOUNT ON;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

DECLARE @starttime DATETIME = GetDate();

--Accumulator variables
DECLARE @RunningTotal    MONEY = 0.0;
DECLARE @RunningSubtotal MONEY = 0.0;

--Our break-level-value variable (Data-Type should match that of the column we are comparing)
--Initialize this to some value that will NOT exist in the break-column's actual data.
DECLARE @SubTotalBreakValue1    INT = -1;

--Our work-table, where we can ensure the ORDER with which we later access the data, being
--by default, the order of the PRIMARY KEY declared here in UniqueID column.
DECLARE @ResultsTable TABLE
(
    UniqueID                INT IDENTITY NOT NULL PRIMARY KEY,
    SubtotalBreakColumn1    INT,
    ActualCost              MONEY,
    RunningSubtotal         MONEY,
    RunningTotal            MONEY
);

--Insert all values to be totaled, into our work table in the REQUIRED BREAK-LEVEL(S) ORDER
INSERT INTO @ResultsTable
(
    SubtotalBreakColumn1,
    ActualCost,
    RunningSubtotal,
    RunningTotal
)
SELECT
    ProductID,
    ActualCost,
    0,
    0
FROM
    Production.TransactionHistory
ORDER BY
    ProductID    --Insert into table in our subtotal-breaking order (IMPORTANT!)
;

--**********************************************************************************************
-- ALGORITHM EXPLANATION:
-- We will be updating the row-level accumulators now through an UPDATE statement.
--
-- The subtotal logic relies on the fact that our break-value-detection variable(s) 
-- (@SubTotalBreakValue1,[...]) have their value(s) updated AFTER the current row's running 
-- total and subtotal variable values have had their values calculated and updated.
-- The sequence of variable and column updates here is important, since the variable(s) for
-- break-detection comparisons must have access to "prior row" values essentially, during the
-- CASE statement that determines whether to "reset" a running-subtotal variable at a break.
--
-- Additionally, you will notice the running-total and running-subtotal variables are SET at
-- the same time the table-column value is assigned, taking advantage of the order of operation
-- within the statement, whereby we can update both the variable(s) and the column(s) with any 
-- newly calculated running-total/subtotal(s).
--
-- This logic can easily be extended to support subtotal-breaks that rely on multiple-column 
-- values simply by:
-- 1) declaring additional break-value variables
-- 2) adding additional column(s) to the work-table for those break-levels and/or any
--     additional subtotal-accumulator column(s) you desire
-- 3) altering the INSERT (into work-table) to also ORDER BY the additional break-column(s)
-- 4) altering the CASE statement used in the UPDATE to compare more than one column prior to 
--    resetting the Subtotal accumulator.  E.g.,:
-- CASE
--  WHEN ( @SubTotalBreakValue1 = SubtotalBreakColumn1 AND 
--    @SubTotalBreakValue2 = SubtotalBreakColumn2 AND...) 
--   THEN @RunningSubtotal + ActualCost 
-- ELSE 0
--**********************************************************************************************
UPDATE
    @ResultsTable
SET
    @RunningTotal       = RunningTotal      = @RunningTotal + ActualCost,
    @RunningSubtotal    = RunningSubtotal   =
        CASE 
            WHEN @SubTotalBreakValue1 = SubtotalBreakColumn1 
                THEN @RunningSubtotal + ActualCost
            ELSE ActualCost 
        END,
    @SubTotalBreakValue1= SubtotalBreakColumn1
;

--Output the results...
SELECT 
    SubtotalBreakColumn1 AS ProductID, ActualCost, RunningSubtotal, RunningTotal 
FROM 
    @ResultsTable
ORDER BY
    UniqueID
;

--**********************************************************************************************
--Using the AdventureWorks2012 DB, you'll see a large result set with data like this:
--**********************************************************************************************
--ProductID ActualCost   RunningSubtotal RunningTotal
------------- ---------- --------------- ------------
--1         50.2635      50.2635         50.2635
--1         50.2635      100.527         100.527
--1         50.2635      150.7905        150.7905
--1         50.2635      201.054         201.054
--1         50.2635      251.3175        251.3175
--...
--... skip forward to first product break now...
--... 
--1         50.2635      2211.594        2211.594
--1         50.2635      2261.8575       2261.8575
--2         41.916       41.916          2303.7735
--2         41.916       83.832          2345.6895
--...
--... and take a look at the end of the entire result set now...
--...
--999       388.7928     286518.694      27306402.3709
--999       388.7928     286907.4868     27306791.1637
--999       0.00         286907.4868     27306791.1637
--999       539.99       287447.4768     27307331.1537


PRINT dbo.[udfUtility_PrintElapsedRuntimeDebugInfo]('Custom Running-Subtotal Example', @starttime, DEFAULT);
--SELECT DateDiff(ms, @starttime, GetDate()); --Display elapsed Milliseconds 
--**********************************************************************************************
-- END: SET-BASED Running-Subtotals Technique #1
--
-- Average runtime against source data count of 113K rows in AdventureWorks2012 are decent: 
-- 2100ms (2.1sec) when selecting the results for output
-- 1560ms (1.55sec) when calculating but not selecting results for output
--**********************************************************************************************



--**********************************************************************************************
--**********************************************************************************************
-- BEGIN: alternative SET-BASED Running-Subtotal and Running-Totals Technique using 
-- "OVER" (windowing) functionality introduced in SQL-Server 2012.
--**********************************************************************************************
--**********************************************************************************************
CHECKPOINT; --to produce a cold buffer cache
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS; -- Remove all buffers from the buffer pool / clear data cache
DBCC FREEPROCCACHE    WITH NO_INFOMSGS; -- Clears the procedure (plan) cache

SET NOCOUNT ON;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

--DECLARE @starttime DATETIME = GetDate();
SET  @starttime = GetDate();

SELECT
    ProductID, 
    ActualCost,
    SUM(ActualCost) OVER
        (
        PARTITION BY ProductID
        ORDER BY ProductID
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS RunningSubtotal,
    SUM(ActualCost) OVER
        (
        PARTITION BY @@SERVERNAME
        ORDER BY ProductID
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS RunningTotal
FROM
    Production.TransactionHistory
ORDER BY
    --Use @@SERVERNAME as first column in order by so we can create a "partition" of the ENTIRE set.
    --ORDER BY requires non-constant-values for columns, and this function produces a "constant" (without being flagged as such) 
    @@SERVERNAME, 
    ProductID;


PRINT dbo.[udfUtility_PrintElapsedRuntimeDebugInfo]('OVER / PARTITION BY Running-Subtotal Example', @starttime, DEFAULT);
--SELECT DateDiff(ms, @starttime, GetDate()); --Display elapsed Milliseconds 
--**********************************************************************************************
-- END: "OVER" (WINDOWING FUNCTION) BASED Running-Total/Running-Subtotal Technique, 
-- which produces same result (NOTE: requires SQL-Server 2012+)
--
-- 610ms average : FAST!
-- Being "native" functionality in SQL-Server 2012, it is not surprising this approach produces
-- the SPEEDIEST EXECUTION TIME. 
--
-- But, for a non-native and/or pre-SQL2012 solution, our original approach we laid out (above)
-- is perhaps as performant as possible while being relatively easy to implement. 
--**********************************************************************************************


Continue to read this Software Development and Technology Blog for computer programming, software development, and technology Techniques, How-To's, Fixes, Reviews, and News — focused on Dart Language, SQL Server, Nvidia CUDA, VMware, TypeScript, SVG, other technology tips and how-to's, and my varied political and economic opinions.

Tuesday, August 30, 2016

Apple Computer: EU / Ireland Taxation and Jobs in 2016

European Commission orders Apple to Pay €13 billion ($14.5 billion) in back taxes

This is certainly one of the bigger financial news stories today.  And, Apple's official response to this EU judgement is rather predictable.  As Business Insider (BI) reports:
"Apple's official statement on the EU ruling against its Irish tax arrangements tells you all you need to know about what is at stake: You can have taxes, or you can have jobs, but Apple is in no mood to deliver both."
Really?,.. can Ireland, or any other jurisdiction, truly choose between jobs and taxes?
If so, let's do the math and make the right choice: take the TAXES!

Jobs or Taxes: Taxes Win!


According to that same BI article:
"In the 15 years since [the 1991 tax deal with Ireland], Apple has created thousands of jobs in Ireland. By 2015 it had 5,000 employees in the country [Ireland]."
OK, now the math, people:
What is $14.5 Billion divided by the number of current employees [i.e., apparently circa 5,000 employees] (never mind the fact that Apple didn't carry this many employees for the entirety of the 15 years in question) divided by that 15 years?  Well, it is a staggering $193,330 Dollars [173,333 Euros] per employee per year for 15 years, and again, this assumes that same headcount for all that time, which was not truly the case.

So, forget the jobs and take the taxes! According to 2014 statistics, the average Irish worker only earned 34,465.85 Euros per year.  The Irish government could apply the €13 billion in taxes levied upon Apple toward, essentially, full career-length payouts to each and every one of those 5,000 current Apple workers, as this is enough money to pay each current worker nearly 87,000 Euros per year for the next 30 years!

The numbers are simply staggering!  Sure, you can argue that the periphery businesses built up around the Apple offices and such wouldn't get the same amount of business, but then again, with those 5,000 people having an income that is well over double the average annual Irish wage, they would have a lot of money to spend on all sorts of things. Periphery businesses, around larger business, make quite a bit of their revenue from what employees spend, and these employees would still have a lot of income to spend, though perhaps not in the exact same neighborhoods.  As for any effect of building offices / facilities, well, those same people are probably likely to build homes and/or start businesses of their own (at least, the ones that don't simply want to do nothing for 30 years), and Ireland has a housing shortage currently too.

Update: I just decided that it may be easier for people to understand the magnitude of this Irish Apple computer company tax if I were to show how much tax €13 billion is if it were spread equally among every man, woman, and child in Ireland: it is still huge!  Ireland's population is approximately 6.4 million persons.  So, the math is quite easy, and it works out to just over €2,000 for each and every Irish Citizen... so a family of 4 would get €8000 in direct benefit from this proposed Apple tax settlement if it were simply distributed to the citizens equally.  That is extraordinary, and really demonstrates the size of the settlement, and how much tax has been saved, by Apple, at the apparent cost of the average citizen over the past decade or more.

US Tax Policy to Blame

The bottom line is: this type of tax situation exists for the simple reason that the United States government, for the simple fact that it is owned by and beholden to corporations and wealthy elites that can abuse existing tax law through shell companies and the like, refuses to simply implement a corporate tax on foreign-earned income that is instantly accrued and owed just like domestically-earned taxes are.  I.e., the USA needs to get rid of this ridiculous offshore zero-tax perpetual "holding-ground" for US Corporation profits.

As a United States individual, you cannot simply park your earnings overseas and not pay taxes!  In fact, it is the extreme opposite.  Thanks to legislation like FATCA, you are instead burdened by extraordinary reporting requirements on any foreign income, and failure by an individual to declare their income and/or pay taxes upon any income recognized from offshore holdings/investments during a calendar year will lead to truly massive penalties and fines in addition to the tax owed.  So, why have corporations, and individuals that are able to use corporations and partnerships and the like to hold massive portions of their wealth overseas, been able to get such preferential treatment whereby their income is not taxable until they physically move it back into the Unites States?

Therein lies the obvious problem and the obvious reason that was already illuminated earlier: the US government tax-policy has been hijacked by powerful multinationals and ultra-wealthy individuals.

Continue to read this Software Development, Technology, and Finance Blog for computer programming, software development, and technology Techniques, How-To's, Fixes, Reviews, and News — focused on Dart Language, SQL Server, Nvidia CUDA, VMware, TypeScript, other technology tips and how-to's, and my varied political, financial, and economic opinions.