All,

The below question was asked in one of the sql forum. I just
thought how to get solution for this kind of requirements in
sybase and came here to get the feasible solution.

Requirement:

Logic to measure the score is below:

there are a list of pre defined good and bad words with a
score of 1 and -1 respectively.
for each tweet, remove punctuation marks from text.
compare words from each tweet with the predefined words
list.
get score based on matched words.

Here is the sample DDL.

Predefined words and score:

CREATE TABLE #Words
(
Id int identity primary key
, Word char(10)
, Score int)

INSERT #Words
(Word, Score)
VALUES ('Good',1)
, ('Awesome', 1)
, ('Super', 1)
, ('Bad', -1)
, ('Fail', -1)
, ('Dirty', -1)


Tweets:

CREATE TABLE #Text
(Id int identity primary key
, [Text] varchar(140))

INSERT #Text
([Text])
VALUES
('New Bond movie is #awesome!')
, ('I hear dirty reviews. Product X is a fail. #fail')
, ('I am neutral!!!')


Result:

CREATE TABLE #Result
([Text] varchar(140), Score int)

INSERT #Result
VALUES
('New Bond movie is #awesome!',1)
, ('I hear dirty reviews. Product X is a fail. #fail',3)
, ('I am neutral!!!',0)


SELECT *
FROM #Result


For example, score for 'New Bond movie is #awesome!' is 1
because after removing punctuation mark (!) word awesome
matches with a word in the Words table and score is 1.
Score for 'I hear dirty reviews. Product X is a fail. #fail'
= 3 because of the words dirty, fail, and fail (after
removing #).

Query should be able to perform with a huge data set,
approximately 100K rows.


Inputs are welcome!