The below question was asked in one of the sql forum. I just
thought how to get solution for this kind of requirements in
sybase and came here to get the feasible solution.
Logic to measure the score is below:
there are a list of pre defined good and bad words with a
score of 1 and -1 respectively.
for each tweet, remove punctuation marks from text.
compare words from each tweet with the predefined words
get score based on matched words.
Here is the sample DDL.
Predefined words and score:
CREATE TABLE #Words
Id int identity primary key
, Word char(10)
, Score int)
, ('Awesome', 1)
, ('Super', 1)
, ('Bad', -1)
, ('Fail', -1)
, ('Dirty', -1)
CREATE TABLE #Text
(Id int identity primary key
, [Text] varchar(140))
('New Bond movie is #awesome!')
, ('I hear dirty reviews. Product X is a fail. #fail')
, ('I am neutral!!!')
CREATE TABLE #Result
([Text] varchar(140), Score int)
('New Bond movie is #awesome!',1)
, ('I hear dirty reviews. Product X is a fail. #fail',3)
, ('I am neutral!!!',0)
For example, score for 'New Bond movie is #awesome!' is 1
because after removing punctuation mark (!) word awesome
matches with a word in the Words table and score is 1.
Score for 'I hear dirty reviews. Product X is a fail. #fail'
= 3 because of the words dirty, fail, and fail (after
Query should be able to perform with a huge data set,
approximately 100K rows.
Inputs are welcome!
Subject: Word Analysis in Sybase
X-Mailer: WebNews to Mail Gateway v1.1t
Date: 20 Jan 2013 21:12:46 -0800
X-Trace: forums-1-dub 1358745166 172.20.134.41 (20 Jan 2013 21:12:46 -0800)
X-Original-Trace: 20 Jan 2013 21:12:46 -0800, 172.20.134.41
Xref: forums-1-dub sybase.public.ase.general:31664
Article PK: 1159175