Warning: strpos(): needle is not a string or an integer in /home/allisona/public_html/kcocco/index.php on line 31

Warning: strpos(): needle is not a string or an integer in /home/allisona/public_html/kcocco/index.php on line 37
KCOCCO ~ photo
Google Refine and Translate 
Wondering what Japan is saying about Fusion-io?
Tweets can be hard enough to decipher let alone translated tweets...


Here are some tips on translating Tweets with Refine and Google Translate REST api.

- Imported data into Google-Refine remember to specify utf-8 character encoding on import.
- You need to get a Google Translate API key
- Tweet text column drop down select Edit column - Add column by fetching URLs...
"https://www.googleapis.com/language/translate/v2?key=<access-key...goes here>&target=en&q=" + escape(value.substring(0,140),"url")

- The above creates a new column with the Translation json, now to parse, Use the add column based on this column command.
- below gets the text
value.parseJson()["data"]["translations"][0]["translatedText"]
- below get the predicted Language code
value.parseJson()["data"]["translations"][0]["detectedSourceLanguage"]

Some very wildly different results when comparing Twitter User Language, DataSift Prediction, Google Translation Prediction... check out the Google-refine text facets below:



  |  [ 0 trackbacks ]   |  permalink  |  related link
Got Dirty Data? Google Refine it! 
If you work with data, json, XML, especially dirty data you must check out
Google Refine. Thanks Google!



Watch part 2 & 3 for the real fun....
  |  [ 0 trackbacks ]   |  permalink  |  related link
Fusion-io Brand Tracking 

Tough day of trading for FIO 25.50 -4.84 (-15.95%)!
I started my brand tracking data capture for FIO today, capturing all(yep all) Tweets, FB posts, a few others... Interesting day to start following Fusion-io Q2 results last night.

Tweet from highest Klout score of 61 with 9,409 Twitter followers:
regvulture - The Register:
"Fusion-io revenues flash upwards: Market punishes Fusion for margin fall. Server flash array vendor Fusion-io saw http://t.co/YNbdov3G"

Tweet from user with most Twitter followers: 11,331 with a Klout score of 52
SeekingAlpha
"Disappointing Margins Crush Fusion-io: Storage Wars Heat Up http://t.co/EZyZnyBR $OCZ $STEC $FIO"

Each article mentions some possible FIO competition. The Register mentions "EMC's Project Lightning" and SeekingAlpha "OCZ Tech (OCZ)". I wonder who may be caught in Innovators Dilemma? Is EMC really not putting their A+ team and priority into Flash because their MBA's made a graph that showed this cannibalizing their core business model? If EMC doesn't cannibalize/disrupt their business model someone else will. Is Fusion-io brushing off lower sticker cost OCZ Flash solutions because they don't have the fastest X per Y performance and only meet a niche that is not as profitable...now? OCZ < Fusion-IO < EMC =?

I am personally still bullish* on FIO. The public company curse of balancing cashflow for public display every 3 months must be a nightmare for a startup. Brand building, engaging new clients and verticals in proof of concepts, tweaking manufacturing for new product... I would expect these investments to pay off.

Data capture running, predictive model building starting soon ...

* I am CS guy not stock guy, I own shared of FIO

  |  [ 0 trackbacks ]   |  permalink  |  related link
LucidCharts + Wordle = Google Prediction API Graphic 


I highly recommend LucidCharts for flow charts and simple graphics, very slick! The word map cloud was created with the the classic Wordle. The above is a draft graphic for a research project using CrowdFlower to build a training dataset for Google Prediction API's cloud machine learning systems. Looking forward to crunching marketing data in my next modeling project!


  |  [ 0 trackbacks ]   |  permalink  |  related link
200 Terabyte Home Movie 


Some amazing graphical representations of data! Here is the company that was started from this work: Bluefin Labs The days of Neilson television-metering boxes is over. My latest projet is in the world of machine learning w/ Google Prediction API ... currently modeling a large dataset of CrowdFlower labeled tweets, fun with data, more to come ...


  |  [ 0 trackbacks ]   |  permalink  |  related link

Back Next

// Google Analytics