Status

Goings-On

Working hard at my new job!

Graduated! B.A. in Applied Linguistics from PSU.

Spending way too little time reading comics, playing video games, and exercising...

Lingnik.com RSS Feed: Twitter Doings

@Xaosopher Nooooo! Have you become the dreaded monolith, here to inform the planet that you are here to destroy it since we haven't behaved? 2:43am Jun 26
@Xaosopher 243% of ntnl avg for fuel. But I do it to work at a hydro plant! That makes it all ok... Right? 2:24am Apr 3
@Xaosopher It was humorous! Katherine's old roommate got us screening tix on Thursday. 2:23am Mar 13
Touchdwn in Spokane, headed 2 Grand Coulee for the day. 7th largest hydroelec dam in the world, largest US power plant & concrete structure. 1:55am Nov 9

-kthx,lingnik@twitter

Feature Detection in The Visual Pattern Recognition of Reading

Giving a guest lecture in Dr. Thomas Dieterich's class on Linguistics and Cognitive Science at PSU this Thursday on The Visual Pattern Recognition of Reading. To that end, I'm releasing the paper I wrote last year on the topic, as well as my slides and notes, into the Creative Commons. FWIW.

Creative Commons License
Evidence and Consequences of Feature Detection in The Visual Pattern Recognition of Reading by Taylor J. Meek is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.
These original publications may be accessed electronically from http://www.lingnik.com/visual-pattern-recognition

A Fast(er) Transact-SQL String Split() Function

Microsoft SQL Server 2005 has no SPLIT(InputString,Delimiter) function. For example, if you had some CSV (comma-separated-values) data like 'Toyota,Tacoma,2009,$20000', there is no built-in way in Transact-SQL to split that up into its discrete parts. There are other solutions out there that attempt to split this data up for you into either a table like (ColumnNumber Integer,ColumnData VarChar) with each row being a column in your single string, and then letting you PIVOT those into columns (CarMfg,CarMake,CarYear,CarCost).

Some of the other functions out there that I could find would iterate through the entire string character-by-character, seeking the next instance of @Delimiter within @InputString, marking that point, and using SUBSTRING() to go backwards in the string to the previous instance of @Delimiter to retrieve the string. Each time it found a delimiter, it would insert the column value and column number into a table variable @ResultTable.

This works for small strings, but as the length of each column within your string grows, this method becomes inefficient, especially when you're working with a large number of strings. My solution addresses this:

DECLARE @ResultTable TABLE (Col Integer, Val VarChar(255));
DECLARE @InputString VarChar(8000);
DECLARE @Delimiter VarChar(50);
SET @InputString = '1,2,3,4,5,6,7,8,9,0';
SET @Delimiter = ',';
----
DECLARE @Start Integer, @NextDelimiter Integer, @Length Integer, @Number Integer;
SET @InputString = @InputString + @Delimiter;
SET @Number = 1;
SET @Start = 1;
SET @Length = CHARINDEX(@Delimiter,@InputString,0) - 1;
WHILE @Start <= LEN(@InputString)
BEGIN
    INSERT @ResultTable VALUES(@Number, SUBSTRING(@InputString, @Start, @Length));
    SET @Number = @Number + 1;
    SET @Start = @Start + @Length + LEN(@Delimiter);
    SET @Length = CHARINDEX(@Delimiter,@InputString,@Start) - @Start;
END

Here are my results from testing these two methods:

Fields Rows FieldLength Intarweb
Method
My
Method
15 50000 1 18s 17s
15 50000 10 26s 17s
15 50000 100 149s 22s
100 50000 1 61s 56s
100 50000 10 200s 116s

As you can see, when the size of each column is the same, there really is no benefit, and as the number of columns grows, it does so exponentially. However, with the method described earlier, as the size of each column grows, it just gets ugly.

Syndicate content