May 24th, 2008
Recently Eduardo Lundgren pinged me wondering if I had an alternate solution to injecting wbr tags inside a long word.
The wbr tag tells the browser where a possible line break can be inserted, should the need arise. (Opera has some problems with rendering them correctly, but it can be rectified using some CSS.) By adding wbr tags into words at strategic locations you can allow a content area to resize gracefully while still being readable.
I looked at his simplified solutions for a moment and came up with this solution:
function wbr(str, num) {
return str.replace(RegExp("(\\w{" + num + "})(\\w)", "g"), function(all,text,char){
return text + "<wbr>" + char;
});
}
You would use it like so:
wbr
("Hello everyone how are you doing?" +
"I'm writing an extravagently long string.",
6);
"Hello everyo<wbr>ne how are you doing? I'm writin<wbr>g an extrav<wbr>agently long string."
Now this is an incredibly simple solution and having breaks like writin<wbr>g are quite undesirable. After I wrote the above I did some more digging and read about various hyphenation algorithms that exist.
Looking in the above article I found a recent JavaScript library which provides a full solution (breaking in appropriate places for multiple languages). Of course, the resulting code checks in at about 80kb (15kb base library + 65kb English word library) so you'll need to strongly consider if that solution is appropriate for your situation.
Tags: javascript, words, language
12 Comments on 'Injecting Word Breaks with JavaScript'
August 4th, 2005
The other day I was working on a new application which needed to process large batches of words - as comprehensively as possible. After some quick searches I found that there are (unsurprisingly) a number of freely available dictionary/wordlist files available on the Internet.
The first repository that I tried was that of one hosted on Sourceforge, simply called 'Wordlist'. Many of the lists hosted on that page are spell-checker centric, but the 12 Dicts package, in particular, was rather comprehensive. It originally contained 12 dictionaries, which has since been pruned down. Within the package there are a number of different dictionaries, some contain old English words, some have hyphenated words, some have acronyms, etc. You need to use the grid, that they provide, to determine which package is best suited for you. After doing some work with this list, however, I determined that it simply wasn't comprehensive enough for me (at 74,000 words).
After some more digging I came across the public domain list called ENABLE, which is overwelmingly comprehensive. This particular list is used in just about every word game on the planet - containing approximately 173,000 words! This particular list is very clear-cut and has no limitations imposed as to the words contained within it. If you need a word list for any of your upcoming projects, I highly recommend it!
Tags: data, words, dictionary
3 Comments on 'Dictionaries and Word Lists'