A simple couchdb info widget for wordpress

I wrote a small wordpress widget to display some version details of the CouchDB server I’m running. Writing a wordpress widget wasn’t to hard (I used this excellent tutorial to get started), parsing CouchDB JSON responses is even simpler.

The code for the widget (put it in ‘wp-content/plugins’ and enable the plugin afterwards) looks like this:

[php]
/*
Plugin Name: Couch-Info
Plugin URI: http://log4p.com
Description: Shows information of the configured couchdb server
Version: 0.1
Author: Peter Maas
Author URI: http://log4p.com
*/

add_action("widgets_init", array('Couch_Info_Widget', 'register'));
class Couch_Info_Widget {
function control(){
$data = get_option('couchdb_info_widget');
?>

Configure CouchDB location


Port
CouchDB
I’m running CouchDB of the current SVN trunk and update once in a while. Current version:
My CouchDB instance contains all articles of this blog. All numbers in this widget are retrieved from CouchDB as well
echo $args['after_widget'];
}
function register(){
register_sidebar_widget('CouchDB Info', array('Couch_Info_Widget', 'widget'));
register_widget_control('CouchDB Info', array('Couch_Info_Widget', 'control'));
}
}
?>
[/php]

As you can see it is one class which has two functions which are registered in wordpress. ‘widget’ retrieves the needed data and renders the actual widget. ‘control’ creates the control form to edit the widgets’ settings. This is what the control form looks like in the admin interface:

Widget controlpanel (right bottom)

The output of the ‘widget’ function should be visible in the bottem of the right sidebar.

Posted in couchdb, php, wordpress | 2 Comments

3v12 api from Scala

I rewrote the code in my previous post in Scala, with a minor difference.. I’m not using an RSS api here.. Scala has native XML support… which makes writing basic RSS a breeze:

[code]
package v12_rss

import java.net.{URLConnection, URL}
import scala.xml._

// define some case classes as a simple model for the rss feed we're going to build
case class Channel(title:String, link:String, description:String, items:List[Item]) {
def toXML =

{link} {description}
{items.map{_.toXML}}

}
case class Item(title:String, link:String, description:String, enclosure:Enclosure) {
def toXML =

{link} {}
{enclosure.toXML}

}
case class Enclosure(url:String) {
def toXML =
}

// Helper for working with URN values from the API
case class Urn(urn:String) {
def comps = urn.split(":").slice(1, 4).toArray // remove the first value ('urn') not interesting

def src = comps.apply(0)
def mediaType = comps.apply(1)
def number = comps.apply(2)
}

object Main {

def main(args: Array[String]) = {
val groupElem = retrieveGroup(41129661)
println(
{new Channel(title(groupElem), "http://3voor12.vpro.nl/tv/", shortTitle(groupElem), itemize(groupElem)).toXML}
)
}

def retrieveGroup(num:Int):Elem = {
XML.load("http://3voor12.vpro.nl/api/media/1/rest/group/"+ num + ".xml")
}

// short for getting an attribute as string
def attr(node:Node, name:String) = node.attribute(name).get.text

// retrieve the first title field from the given XML
def title(elem:Elem):String = (elem \\ "title").first.text

// retrieve the first shortTitle field from the given XML
def shortTitle(elem:Elem):String = (elem \\ "shortTitle").first.text

// determine the image url for the given media xml
def imageUrl(elem:Elem):String = "http://images.vpro.nl/images/" + new Urn(attr((elem \\ "image").first, "urn")).number

def itemize(group:Elem):List[Item] = {
val items = (group \\ "media") slice(0, 15) map{ m =>
val urn = new Urn(attr(m, "urn"))
println ("http://3voor12.vpro.nl/api/media/1/rest/"+ urn.mediaType +"/"+ urn.number + ".xml")
val media = XML.load("http://3voor12.vpro.nl/api/media/1/rest/"+ urn.mediaType +"/"+ urn.number + ".xml")

new Item(title(media), "http://3voor12.vpro.nl/tv/#/41129661/" + urn.number , shortTitle(media), new Enclosure(imageUrl(media)))
}

items.toList
}
}
[/code]

Posted in 3voor12, api, scala | 1 Comment

Using the 3voor12 api to get an RSS feed of your favorite playlist

A couple of months ago we developed 3voor12TV. During festivals like Noorderslag, Pinkpop and (upcoming) Lowlands the 3voor12 crew tries to get as much high quality (h264) material online in the shortest time possible.

The application was developed in actionscript 3.0 on top of a public API. Well, it is public… but no public documentation yet. Sorry ;-) The 3voor12 Pinkpop mashup was the first real utilization of the API.

Whilst waiting for the new video’s I found myself refreshing the player over and over again; checking for new content.

I decided to automate this using the API. The Ruby script below uses the API to create a simple RSS feed with images and a direct link to each concert:

[ruby]
require ‘rubygems’
require ‘json’
require ‘open-uri’
require ‘rss’

API_URL_BASE = “http://3voor12.vpro.nl/api/media/1/rest/”
PLAYLIST_ID = “41129661″

# function to convert urn to urls
def urn_to_api_url(urn)
urn_parts = urn.split(“:”) # urn contains source, entity type and unique number
“#{API_URL_BASE}#{urn_parts[2]}/#{urn_parts[3]}.json” # bypass content negotiation, force json formatted responses
end

def urn_to_url(urn)
urn_parts = urn.split(“:”)
“http://3voor12.vpro.nl/tv/\#/#{PLAYLIST_ID}/#{urn_parts[3]}”
end

# retrieve and parse a playlist (group)
playlist = JSON.load(open(“#{API_URL_BASE}group/#{PLAYLIST_ID}.json”))
# extract the playlist items
programUrns = playlist['group']['members']['member'].map{|m| m['media']['@urn']}

# create playlist
rss = RSS::Maker.make(“2.0″) do |maker|
maker.channel.title = “3voor12tv :: #{playlist['group']['title']}”
maker.channel.description = playlist['group']['shortTitle']
maker.channel.about = “http://3voor12.vpro.nl/tv/”
maker.channel.link = “http://3voor12.vpro.nl/tv/”
# retrieve the playlist items
programUrns[0..15].each do |urn|
maker.items.new_item do |item|
program = JSON.load(open(urn_to_api_url(urn)))['program']
img_url = “http://images.vpro.nl/images/#{program['relatedImages']['relatedImage']['image']['@id']}+s(320)”

item.title = program['title']
item.description = (program['synopsis'] || program['title']) + “

item.link = urn_to_url(urn)

enclosure = maker.items.last.enclosure
enclosure.url = img_url
enclosure.length = -1
enclosure.type = “image/jpeg”
end
end
end

# write to disk
File.open(“3v12feed.xml”,”w”) do |f|
f.write(rss)
end
[/ruby]

I have a cronjob executing the script once in a while:

http://feeds2.feedburner.com/3voor12tvPinkpop

Posted in 3voor12, api, ruby | 3 Comments

Simple fulltext analysis in couchdb

In my previous post I presented a simple map function to query WordPress articles I imported in CouchDB. The map function looked at the categories / terms manually assigned to the articles. I decided to take this a step further and analyze the actual text in the posts to extract keywords.

I created a very simple parser which:

  • Strips out HTML
  • Removes (english) stopwords
  • Counts the number of occurences of the word to provide a hint for ‘scoring’ results

The mapping code looks like this:

[javascript]
Array.prototype.contains = function(obj) {
var i = this.length;
while (i–) {
if (this[i] === obj) {
return true;
}
}
return false;
}

Array.prototype.count = function(obj) {
var count = 0;
var i = this.length;
while (i–) {
if (this[i] === obj) {
count++;
}
}

return count;
}

function stripHTML(w){
return w.replace(/(<([^>]+)>)|nbsp/ig,”");
}

function stripNonWords(w){
return w.replace(/[^a-zA-Z]+/ig,” “);
}

stopwords = ['a','about','above','across','after','afterwards','again','against','all','almost','alone','along','already','also','although','always','am','among','amongst','amoungst','amount','an','and','another','any','anyhow','anyone','anything','anyway','anywhere','are','around','as','at','back','be','became','because','become','becomes','becoming','been','before','beforehand','behind','being','below','beside','besides','between','beyond','bill','both','bottom','but','by','call','can','cannot','cant','co','computer','con','could','couldnt','cry','de','describe','detail','do','done','down','due','during','each','eg','eight','either','eleven','else','elsewhere','empty','enough','etc','even','ever','every','everyone','everything','everywhere','except','few','fifteen','fify','fill','find','fire','first','five','for','former','formerly','forty','found','four','from','front','full','further','get','give','go','had','has','hasnt','have','he','hence','her','here','hereafter','hereby','herein','hereupon','hers','herself','him','himself','his','how','however','hundred','i','ie','if','in','inc','indeed','interest','into','is','it','its','itself','keep','last','latter','latterly','least','less','ltd','made','many','may','me','meanwhile','might','mill','mine','more','moreover','most','mostly','move','much','must','my','myself','name','namely','neither','never','nevertheless','next','nine','no','nobody','none','noone','nor','not','nothing','now','nowhere','of','off','often','on','once','one','only','onto','or','other','others','otherwise','our','ours','ourselves','out','over','own','part','per','perhaps','please','put','rather','re','same','see','seem','seemed','seeming','seems','serious','several','she','should','show','side','since','sincere','six','sixty','so','some','somehow','someone','something','sometime','sometimes','somewhere','still','such','system','take','ten','than','that','the','their','them','themselves','then','thence','there','thereafter','thereby','therefore','therein','thereupon','these','they','thick','thin','third','this','those','though','three','through','throughout','thru','thus','to','together','too','top','toward','towards','twelve','twenty','two','un','under','until','up','upon','us','very','via','was','we','well','were','what','whatever','when','whence','whenever','where','whereafter','whereas','whereby','wherein','whereupon','wherever','whether','which','while','whither','who','whoever','whole','whom','whose','why','will','with','within','without','would','yet','you','your','yours','yourself','yourselves'];

map = function(doc) {
var body = stripNonWords(stripHTML(doc.body)).toLowerCase();
var terms = [];
var words = body.split(/\s+/);

var i = words.length;
while (i–) {
var word = words[i];
if(word.length > 2 && !stopwords.contains(word)) {
if(!terms.contains(word)){
terms.push(word);
var weight = words.count(word);
if(weight > 1) {
emit([word, weight], {title: doc.title});
}
}
}
}
}
[/javascript]

The resulting view can be used similar to the previous one I described:

http://log4p.com:5984/articles/_design/split/_view/withoutStopWords?startkey=["groovy",{}]&endkey=["groovy",0]&descending=true

  • startkey=["java",{}] – the highest key which may be returned, {} is similar to numerical infinite
  • endkey=["java",0] – the lowest key to return
  • descending=true – order direction
  • limit=10 – max number of results to return

Calling the URL above will return posts containing the word ‘groovy’ ordered by the number of occurrences:

[javascript]
{“total_rows”:3527,”offset”:2253,”rows”:[
{"id":"301","key":["groovy",11],”value”:{“title”:”Grails – Soap”}},
{“id”:”432″,”key”:["groovy",9],”value”:{“title”:”Running your griffon application in fullscreen mode”}},
{“id”:”362″,”key”:["groovy",8],”value”:{“title”:”Using propertyMissing to enhance Date (in Groovy)”}},
{“id”:”380″,”key”:["groovy",7],”value”:{“title”:”How Elvis showed me a neat way of using operators in Ruby”}},
{“id”:”232″,”key”:["groovy",7],”value”:{“title”:”Spring and scripting languages… don’t go together?”}},
{“id”:”278″,”key”:["groovy",6],”value”:{“title”:”Grails – associations”}},
{“id”:”361″,”key”:["groovy",5],”value”:{“title”:”Ranges with dates (in Groovy)”}}
]}
[/javascript]

I modified my WordPress templates to use this view now and it seems to yield better results.

Note
One thing I noticed while writing the mapping function is that altering Javascripts’ array prototype (i.e. I wanted to add my contains and count method to it) seems to result in unpredictable problems. Still investigating.

update
I probable made a mistake with the prototype extensions, refactored it back and works now, updated the code above.

Posted in couchdb, fulltext analysis, javascript | 1 Comment

WordPress, Couchdb and Ruby

I did a small test to see how complex it would be to put a WordPress database into CouchDB. This might not seem very useful, and when only receiving data from WordPress it isn’t. In the future other applications would also publish content to the same database.

To get my posts into CouchDB I wrote the following Ruby script (disclaimer: this is a quick and dirty hack, don’t use it in a production environment):

[ruby]
require ‘rubygems’
require ‘mysql’
require ‘json’
require ‘couchdb.rb’

database = Mysql.real_connect(“localhost”, ===database user===, ===database pass===, ===database name===)

# utility function for storing articles
def store_article(couchdb_server, id, article)
begin
existing = couchdb_server.get(“/articles/#{id}”)
if existing.code == ’200′
article["_rev"] = JSON.parse(existing.body)["_rev"]
end
rescue
# ignore for now…
end

couchdb_server.put(“/articles/#{id}”, article.to_json)
end

puts “connected to #{database}”

# query will return cartesian product, num_category*blogposts
# grouping will be done afterwards. The query will only return published blogposts.
res = database.query(“select
p.id as id,
p.post_title,
p.post_content,
t.name
from
wp_posts p
join wp_term_relationships tr on tr.object_id = p.id
join wp_term_taxonomy wtt on wtt.term_taxonomy_id = tr.term_taxonomy_id
join wp_terms t on t.term_id = wtt.term_id
where
post_type = ‘post’
and post_status = ‘publish’
“)

# Convert the results to the internal datastructure
data = Hash.new()
while row = res.fetch_row do
post_id = row[0].to_i
post = data[post_id] ? data[post_id] : {:terms => []}

post[:title] = row[1]
post[:body] = row[2]
post[:terms] << row[3]

data[post_id] = post
end
puts "#{res.num_rows} posts queried, posting to couchdb"
res.free

# setup the couchdb class and post all articles
couchdb_server = Couch::Server.new("log4p.com", "5984")
data.each do |k,v|
store_article(couchdb_server, k,v)
end
[/ruby]

As you can see the bulk of the code is in the data retrieval SQL and conversion. The Couch module was taken from the couchdb wiki. And provides some really basic wrappers for the CouchDB REST interface.

After executing the script above all blogposts stored in CouchDB in JSON format:

[javascript]
{
“_id”: “454″,
“_rev”: “1-888454205″,
“terms”: [
"gadgets",
"android",
"g1"
],
“body”: “…..”,
“title”: “ADP1″
}
[/javascript]

One thing I wanted to do was creating a simple API to retrieve articles based on their category. To do this I created this simple view in couchdb:

[javascript]
function(doc) {
for each(term in doc.terms){
emit([term, parseInt(doc._id)], {title: doc.title});
}
}
[/javascript]

Which emits the post and its’ terms, which makes it possible to query like this:

http://log4p.com:5984/articles/_design/list/_view/category?startkey=["java",{}]&endkey=["java",0]&descending=true&limit=10

Auch, that’s a lot of parameters! Here’s what they do:

  • startkey=["java",{}] – the highest key which may be returned, {} is similar to numerical infinite ;)
  • endkey=["java",0] – the lowest key to return
  • descending=true – order direction
  • limit=10 – max number of results to return

which should return posts like this:

[javascript]
{“total_rows”:403,”offset”:151,”rows”:[
{"id":"600","key":["java",600],”value”:{“title”:”Binding mmbase nodes to strongly typed object graphs”}},
{“id”:”596″,”key”:["java",596],”value”:{“title”:”Oracle buys Sun…”}},
{“id”:”577″,”key”:["java",577],”value”:{“title”:”Composited objects with shared id’s in Hibernate”}},
{“id”:”555″,”key”:["java",555],”value”:{“title”:”CouchDB meetup in Amsterdam”}},
{“id”:”467″,”key”:["java",467],”value”:{“title”:”Ioke @ Amsterdam.rb”}},
{“id”:”428″,”key”:["java",428],”value”:{“title”:”I want closures \”bolted on to Java\”"}},
{“id”:”424″,”key”:["java",424],”value”:{“title”:”Review: \”Clean Code: A handbook of agile software craftmanship\”"}},
{“id”:”397″,”key”:["java",397],”value”:{“title”:”JavaOne 2008 – Summary & Reflection”}},
{“id”:”381″,”key”:["java",381],”value”:{“title”:”Closures and the return of the return”}},
{“id”:”359″,”key”:["java",359],”value”:{“title”:”CPD with maven2 and PMD”}}
]}
[/javascript]

Just to test the API I wrote the following code (see it in action underneath the ‘full’ post view) and added it to the single_post view of my blog:

[php]
require_once("class_couchdb.php");
$couchdb = new CouchDB('articles', '79.170.94.41', 5984);
?>

try {
$result = $couchdb->send(‘_design/list/_view/category?limit=10&startkey=["' . $category->name . '",{}]&endkey=["' . $category->name . '",0]&descending=true’);
// here we get the decoded json from the response
$all_docs = $result->getBody(true);

foreach($all_docs->rows as $r => $row) { ?>

  • Posted in couchdb, ruby, wordpress | 3 Comments
  • Binding mmbase nodes to strongly typed object graphs

    In past years I’ve spend quite some time converting MMBase node graphs to strongly typed object graphs. One of the reasons for doing this is to define ‘meta’ models on top of the cloud. ‘What is a newsitem?’ (i.e. which rules need to be applied to get all the needed data from MMBase).

    Due to recent developments within the VPRO I decided to have another go at it. And I came up with a working prototype of something which I think might be useful to others; or where others might be able to provide valuable feedback!

    The small framework I created is annotation based; one specifies the bindings to MMBase using annotations:

    [java]
    // ——— NewsItem.java
    @Entity(builder = “news”, root = true)
    public class NewsItem {
    private Long number;
    private String title;
    private String subtitle;
    private String credits;

    @Field(nodeField = “intro”)
    private String description;
    private String body;

    @Embedded(builder = “mmevents”, field = “start”, convertor = EpochDateConvertor.class)
    private Date created;

    @PosRel(orderDirection = Direction.DESC, queryDirection = QueryDirection.BOTH)
    private List image;

    @Rel(orderDirection = Direction.DESC, orderField = “value”, queryDirection = QueryDirection.DESTINATION)
    private List tag;
    [/java]

    [java]
    // ——— Image.java
    @Entity(builder = “images”)
    public class Image {
    private Long number;
    private String title;
    [/java]

    [java]
    // ——— Tag.java
    @Entity(builder = “tags”)
    public class Tag {
    private Long number;
    private String value;
    [/java]

    The implementation is still in concept phase, but as you can see it is already possible to define mappings for:

    • associations (works for typed collections only)
    • fields (populated by default, @Field annotation use to override properties)
    • Embedded values from one-to-one associations which are treated as embedded objects.

    Note: at the moment I’m only considering read operations.

    Entity definitions are automatically retrieved at startup (using Springs’ ClassPathBeanDefinitionScanner) of a simple MMBase module, after which binding can be done as follows:

    [java]
    NewsItem item = (NewsItem) populator.unmarshallNode(newsItemNode, “news”);
    [/java]

    There is still a lot of ground to cover, but the basics work, and the populator class is still less then 200 lines of code! No public sourcecode yet, but I’d be more than happy to contribute/make it availlable in the near future if others are interested.

    looking forwards to ideas, criticism etc.

    Posted in java, mmbase | 2 Comments

    Oracle buys Sun…

    After the news of IBM pulling back from the deal to buy Sun Microsystems rumors about other possible candidates scoured the web. Oracle was on top of most lists. Today the news hit the web. Oracle actually bought Sun (well, they’re still in the process of solving a lot legal stuff).

    I don’t know why, but it makes me feel sad. Although I didn’t like everything Sun came up with they made and/or own a lot of stuff I like:

    • Java/JVM
    • MySQL
    • Netbeans
    • Glassfish
    • JRuby/Jython
    • Solaris (yes, I’ve cursed it as well)
    • ZFS
    • Project Looking Glass (Great in it’s time)

    For sure they did some dreadful things as well:

    • Java Desktop System
    • JCreator
    • The JSF spec
    • java.util.Date
    • Endless Sun Spot demo’s
    • JavaFX

    Still I sort of like them. Which Oracle never managed to achieve. Hopefully good will come from it… but I really think Oracle will gain far to much power on the Java side of the story to not cause agony in the community.What will happen to the JPA spec? What about JRuby? Time for a Fork?

    Posted in java, oracle | 3 Comments

    Composited objects with shared id’s in Hibernate

    One of the models a developed in a previous project used embedded/embeddable annotations to create a composite object. The (simplified) object model looks like this:

    avattributes object model

    The embedded/embeddable solution would store the entire graph in a single table, all attributes flattened to one row:

    avattributes_table

    That works great as long as you don’t mind minor annoyances like:

    • when loaded, embedded object are filled with null values if no attributes are present in the database.
    • you have no constraints which need to be satisfied only if an entire object exists (ie. if I set videoattributes, the null value is required)

    Or in other words: only do this if the objects are always supposed to be present. Which in the case above is simply not true. If you would like to describe audio-only content you don’t want to set the ‘required’ video codec.

    My current project is a spin-off of the project mentioned above and uses some concepts of the model on a new database. I decided to alter the mappings and give Audio- and VideoAttributes their own table. This proved to be somewhat harder then I expected:

    • alter the embeddable annotation of Audio- and VideoAttributes, make them entities
    • since Audio- and VideoAttributes are entities now the need an id
    • add a reference to the AVAttributes object to Audio- and VideoAttributes
    • create a bi-directional one-to-one mapping between avattributes and audio- and VideoAttributes. mappedBy on the Audio- and VideoAttributes side
    • add cascade annotations to the Audio- and VideoAttributes in the AVAttributes class

    I started out with generated id’s for the Audio- and VideoAttributes. Hibernate will generate two columns in the AVAttributes containing a foreign keys to the Audio- and VideoAttributes tables. This synthetic id seemed a bit unnecessary IMHO. It took me a while to figure out how to do this. I ended up with the following annotations:

    AVAttributes
    [java]
    @Entity
    public class AVAttributes implements Serializable {
    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    private Long id;

    @OneToOne(optional = true)
    @PrimaryKeyJoinColumn
    @org.hibernate.annotations.Cascade({
    org.hibernate.annotations.CascadeType.ALL,
    org.hibernate.annotations.CascadeType.DELETE_ORPHAN
    })
    private AudioAttributes audioAttributes;

    @OneToOne(optional = true)
    @PrimaryKeyJoinColumn
    @org.hibernate.annotations.Cascade({
    org.hibernate.annotations.CascadeType.ALL,
    org.hibernate.annotations.CascadeType.DELETE_ORPHAN
    })
    private VideoAttributes videoAttributes;
    }
    [/java]

    AudioAttributes (Same annotations used for VideoAttributes)
    [java]
    @Entity
    public class AudioAttributes implements Serializable {
    @Id
    @GeneratedValue(generator=”fk”)
    @GenericGenerator(name=”fk”,strategy=”foreign”,parameters = {
    @Parameter(name=”property”,value=”avAttributes”)
    })
    private Long id;

    @OneToOne(mappedBy = “audioAttributes”, optional = false)
    private AVAttributes avAttributes;
    }
    [/java]

    The generated table structure looks like this:

    avattributes_split_table

    Id’s are automatically cascaded when a AVAttributes object is persisted. And yes, it’s a lot of annotations and I even stripped out the JAXB2 annotations which are also present.

    Posted in java | Leave a comment

    Android presentation

    Tonight I presented the Android platform at Finalists‘ quarterly tech meeting in Rotterdam. Tomorrow I’ll present it again in Eindhoven. I think the presentation went fairly well and attending colleagues managed to get a glimpse of the possibilities of the Android platform. Feel free to flip through the slides (I’m afraid I don’t have a video of the live demo’s):

    Android
    View more presentations from p3t0r.
    Posted in android | 2 Comments

    CouchDB meetup in Amsterdam

    Tonight I went to the CouchDB meetup in Amsterdam (‘In De Wildeman’) to discuss the architecture I’m designing for upcoming VPRO projects (more on that in a following blogpost). We had a really nice discussion about mostly the ‘edges’ of what CouchDB can do. Impressive numbers, that’s for sure! Big sites/companies are showing interest (craigslist, yahoo, myspace, facebook, BBC).
    Continue reading

    Posted in java | 1 Comment