| |
| |
1 |
| 2 |
3 |
4 |
5 |
6 |
7 |
8 |
| 9 |
10 |
11 |
12 |
13 |
14 |
15 |
| 16 |
17 |
18 |
19 |
20 |
21 |
22 |
| 23 |
24 |
25 |
26 |
27 |
28 |
29 |
| 30 |
31 |
|
|
Dec Feb
|
|
|
Speculative searching
(by Tom Wilson, posted at 2:43 PM)
Prompted by Amir's message I took a look at the site and at others - especially GoogleRankings, where I found that in searches for 'knowledge management' the journal site ranks 104th in the top 1,000 and 224th for 'information management'; the World list of departments... ranks 126th in searches for 'information management' and the 'nonsense' paper ranks 22nd in searches for 'knowledge management'.
These are just pointless facts that will enable you to delight and baffle your friends :-) And, of course, a reminder that publishing in Information Research is sure to get you noticed :-)
Speculative Search Game (Google Game)
(by Amir Michail, posted at 12:00 AM)
A game where you predict which web pages will rank more highly on Google in the future! The output of the game will be used to build the Speculative Search Engine that ranks those web pages more highly today.
http://www.cse.unsw.edu.au/~amichail/spec/
|
Google - again, and other things.
(by Tom Wilson, posted at 11:53 PM)
Google has been much in the news as a result of its venture into the digital library - on a huge scale. Today's Observer (one of the so-called 'broadsheet' Sunday papers in the UK, for those who don't know it, and part of the Guardian family) has an article in its business section on Google's latest venture, in which John Naughton refers to Howard Reingold's seminal work on the virtual community:
Many years ago, Howard Rheingold, who was one of the first people to understand the social potential of cyberspace, posed an interesting question: 'Where is the Library of Congress, when it's on your laptop?' To most people at the time, it seemed a meaningless question. What lay behind it, however, was an attempt to think through a profound consequence of a networked society - what Frances Cairncross later dubbed 'the death of distance'.
Naughton also notes:
Once upon a time, being learned involved holding a lot of knowledge and information in one's head. Are we moving towards a world where the important thing is not what you know, but how to find it?
an idea expressed many long years ago by Dr. Johnson (as reported by James Boswell—in 1791):
Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information upon it."
which is also a very neat definition of the difference between 'knowledge' and 'information' :-)
Google was also the subject of one of Fortune's long articles last week, too. The focus was on the share price and the probability of investors getting their return (the verdict seemed to be, 'Be cautious'), but, among other things it has some interesting stuff on the competition.
Thanks also to Gerry Mckiernan and the ASIS-L mailing list for bringing another Google item to my attention; this time in the New York Times (you'll need to register to read the article, but registration is free).
The article contains a nice story about the irreplaceability of the physical book – for some purposes:
Mr. Jimerson said, 'A scanned image will only tell you some things, and the sheer volume of records makes scanning everything difficult'. But he added that he supported Google's plan in theory. 'I recall the story of a gentleman being in a library and watching a researcher sniff books', he said. 'It turned out that the aroma of vinegar was still embedded in those that had been treated with vinegar to prevent cholera during an epidemic'.
Thanks to Gerry also for another item in the New York Times – this time on Firefox. With Pennsylvania State University telling everyone on campus to switch from Internet Explorer, it would seem that Microsoft has a little problem on its hands – one that may result in a policy switch, unless arrogance holds sway in Redmond. If there is a policy switch it would require IE to be re-written from the ground up, so Firefox may go ahead by leaps and bounds. Try it—my guess is that, if you are an IE user, you'll need less than ten minutes with the new rival (well, not so new, if you've been using it for the past couple of years in its development phase) to convince you to switch.
|
Odds and ends
(by Tom Wilson, posted at 5:58 PM)
I've been working in Oporto for the past week with little chance to catch up on current developments, so here's my backlog:
- There's news of IBM's efforts to develop information retrieval systems for use in corporate networks, rather than on the Internet. It comes a little late to this sector, with Google Desktop and a new version of Copernic already in play. My guess is that IBM is likely to make the usual technology-led errors in producing a system, that is, greater complexity in preparing search formulations than users are likely to buy, and not enough work behind the interface to interpret relatively simple formulations. Corporate files also suffer from a very difficult problem for information retrieval, one that was described to me many years ago on a visit to Shell - a North Sea drilling platform could be identified in documents by a project code-name, by geographical coordinates, by the designation assigned once the platform was in use, such as 'Platform Alpha' or by a phrase such as, 'the project'.
- The The International Telecommunication Union has produced a press release headed, Low Cost Broadband and Internet Access Essential to Information Society with a link to Best Practice Guidelines for the Promotion of Low Cost Broadband and Internet Connectivity. This document lists some very worthy aims, but one wonders whether competition and regulation are really likely to deliver low prices. In many countries the national PTT or the dominant controller of existing wires can effectively control access to the necessary exchanges and so on; in these circumstances something stronger than 'regulation' may be needed. As for competition: well, we have that in fuel supply to the garage forecourt, but I don't see too much impact on price.
- The big news for libraries, of course, was that Google is in the process of scanning millions of books in the libraries of Harvard, Stanford and Michigan universities, in the New York Public Library, and the Bodleian Library in Oxford. Other contributions to the debate about this initiative can be found here, and here, and at the Wall Street Journal (setting aside its neocon bias for a change!)
|
Google Scholar again
(by Tom Wilson, posted at 10:44 AM)
As we might expect, Google Scholar has raised a lot of interest. There's an interesting Weblog entry from a guy who works for Ingenta on working with Google to enable content to be 'crawled'—rather 'techie' for a non-nerd like myself, but interesting nonetheless.
Search Engine Watch also has an item - a moan about the lack of documentation, so that we don't know what Google Scholar actually covers - a very necessary moan, particularly when students these days seem to believe that if they can't find something by using Google, it doesn't exist.
I haven't used 'Scholar' much yet, but I don't like the output form: for some totally irrational reason, I'm happy to put up with it for a Web search, but the format doesn't fit my conception of what output relating to the scholarly literature should look like. I'll have to take a closer look and figure out why I have this reaction.
|
Another Google initiative
(by Tom Wilson, posted at 9:30 PM)
Those folk at Google are certainly stirring things up with the launch of 'Google Scholar (Beta)' a variant of the search engine to access the scholarly literature.
According to the New York Times (you'll need to register):
The engineer who led the project, Anurag Acharya, said the company had received broad cooperation from academic, scientific and technical publishers like the Association of Computing Machinery, Nature, the Institute of Electrical and Electronics Engineers and the Online Computer Library Center.
The new Google service, which includes a listing of scientific citations as well as ways to find materials at libraries that are not online, will not initially include the text advertisements that are shown on standard pages for Google search results.
Testing something like this is rather tricky when the coverage is unknown. However, I tried just a simple, but slightly obscure search phrase, "colliery spoil" and got a list of 146 items. Some are listed as 'CITATION', for exampe:
[CITATION] Effective passive treatment of aluminium-rich, acidic colliery spoil drainage using a compost
- Web Search
PL Younger, TP Curtis, A Jarvis, R Pennell - Cited by 10
Journal of the Chartered Institution of Water and
, 1997
Click on the 'Web search' link and, as you see, it does just that; click on the 'Cited by 10' link and you are given a list of the ten sources that have cited this item, with the same layout and more links to items that cite the cited items—one could get rather dizzy going through this lot!
Other items in the original list are links to information on the Web, although not always the complete document. For example, this link leads to an abstract in PubMed, not to the original document:
Substrate characterisation for a subsurface reactive barrier to treat colliery spoil leachate
PW Amos, PL Younger - Cited by 4
Substrate characterisation for a subsurface reactive barrier to treat colliery
spoil leachate. Amos PW, Younger PL. FaberMaunsell ...
Water Research, 2003 - ncbi.nlm.nih.gov
The 146 items consisted of 35 Citation entries and 111 Web links
|
Re: Alternative browsers
(by Seth Dillingham, posted at 12:00 AM)
On 11/15/04, Tom Wilson said:
>Opera has many of the same features as FireFox (and had them
>earlier) and it does some things better; but I like the way
>FireFox does tabs better, even though its inability to stop sites
>from launching windows without the navigation bar is frustrating.
Actually, it can do that, but it's a hidden preference.
In your browser, go to this url: "about:config". (No http: or
anything, just exactly "about:config".)
At the top of the long list of preferences that it shows, there is
a textbox. Paste this into that checkbox (without the quotes):
"dom.disable_window_open".
Double click on the line that says
"dom.disable_window_open_feature.titlebar". That will change the
value from false to true. From now on, when a web page opens a new
window, it will be unable to hide the toolbar.
Seth
Alternative browsers
(by Tom Wilson, posted at 9:36 AM)
There's an interesting little discussion going on at ZD-Net about the open source browser, FireFox. One of the staff writers is bidding farewell to Internet Explorer and, as one or two of the discussants ask, "Why's it taken you so long?"
I've been using alternative browsers since Opera first appeared and I now use FireFox most of the time - it's something of a toss-up between these two: Opera has many of the same features as FireFox (and had them earlier) and it does some things better; but I like the way FireFox does tabs better, even though its inability to stop sites from launching windows without the navigation bar is frustrating. Also, unless you aren't bothered by ads, FireFox is free, whereas Opera costs - not a lot, but...
With Opera and FireFox in the market I just don't understand why anyone uses IE any longer, other than for those sites that seem to imagine that nothing else exists.
|
Microsoft's new search engine
(by Tom Wilson, posted at 11:18 AM)
There's been a burst of interest on the Net about Microsoft's new search technology, which can be found in beta form at MSN Search, but it doesn't look all that great to me.
"The release of our beta is a huge step towards delivering the information consumers are looking for online, faster.", says a Microsoft spokesman. However, my test is where Information Research appears when I search for it and, on this basis, MSN Search lags behind others. For example, when I used "information research", the Weblog was the first thing to appear—at the bottom of the first page of results. The journal site didn't appear until page six, when it was the last item on the page. One issue with MSN Search is that it appears to ignore the word order— there seemed to be as many occurrences of "research information" as of the phrase "information research", which doesn't seem very intelligent to me.
For comparison, here's a table of results with other search engines
| Search engine | Page number | Non-sponsored position |
| Alltheweb | 1 | 2 |
| Alta Vista | 1 | 1 |
| AOL Search | 1 | 1 |
| Ask Jeeves | 1 | 3 |
| Excite | 1 | 1 |
| Gigablast | 1 | 1 |
| Google | 1 | 1 |
| HotBot | 1 | 1 |
| Lycos | 1 | 2 |
| Teoma | 1 | 3 |
| Yahoo | 1 | 2 |
By this little test, the new MSN engine doesn't show up very well!
|
Odds and ends
(by Tom Wilson, posted at 11:00 PM)
Here's an interesting little item on Google.
TechWeb Today points to a new TechEncyclopedia, with 20,000 terms. Curiously, this doesn't display correctly in FireFox, although when I download the page to look at the code, the downloaded version displays perfectly well. Something strange going on here!
|
Weblogs and other things
(by Tom Wilson, posted at 9:38 PM)
Weblogs
My thanks to folks, on and off the Weblog, who've written to encourage me to keep the Weblog going—I'll plod on when I know that it has some effect. Carol Cahill kindly says:
Our library probably wouldn't have a wireless Internet connection if my interest hadn't first been piqued by your Weblog. Now we have a four-laptop wireless training lab and patrons can come in and connect with their own computers.
Which I think is rather better than a citation in a journal :-)
"The Chief's" comments on Weblog membership counts is also interesting - as are the usage stats for the Weblog - last year 13,776 hits, this year, so far, 13,588 with those hits distributed over the continents as follows:
| 1. | North-America | 10,780 | 39.4% |
| 2. | Europe | 10,565 | 38.6% |
| 3. | Asia | 2,961 | 10.8% |
| 4. | Australia | 1,735 | 6.3% |
| 5. | Africa | 415 | 1.5% |
| 6. | South America | 277 | 1.0% |
| 7. | Central America | 133 | 0.5% |
| | Unknown | 498 | 1.8% |
Yahoo! does a Google
News today of Yahoo!'s purchase of an e-mail start-up, by the name of Bloomba (why does the Internet generate so many silly names? Scope for a PhD dissertation here!). I'd never heard of Bloomba before, but it is an e-mail client, rather than a Web-based service. Reviews suggest that its killer feature is its search capacity; it indexes your mail as you receive it, including what's in attachments. Whatever plans Yahoo! has for the system, no one seems to know. The original parent company, Statalabs, says:
What does Yahoo! plan to do with the technology as a result of the acquisition?
At this time we do not have any announcements about the ongoing plans for the technology or the specifics of the transaction.
A case of 'Watch this space' - well, not this one, since I can't guarantee that I'll spot an announcement, but perhaps the Yahoo! site - and while you are there, you might like to take a look at MySearch
|
Odds and ends
(by Tom Wilson, posted at 10:35 AM)
The Weblog
It seems that my suspicions about the lack of general interest in the IR Weblog are confirmed :-) I've been contributing very little over the past month and so far no one has asked, Where are you?
New issue of the journal
The latest issue, Volume 10 No. 1, is now on the site. This one has the first batch of papers from the Information Seeking in Context conference, held in Dublin last month. The other half will be published in the January 2005 issue. I finally got round to checking on what logs were available on the server and discovered that, since, the 8th October (which is when the analysis software appears to have kicked in) there have been about 280,000 hits on the InformationR.net site - most of which are on the journal. This is considerably beyond my own estimates from the various counters. InformationR.net is the sixth most 'popular' virtual domain on the University's servers.
Voice over Internet Protocol (VoIP)
VoIP appears to be building up nicely. I finally got round to using it, along with colleagues in the AIMTech research group at Leeds University Business School. The voice quality, using Skype, is generally pretty good - not quite as good as the best landline, but good enough considering that it's free. I've also tried the SkypeOut service, which connects to landline numbers pretty well anywhere in the world and to mobile phones in some. You can connect to landlines in Western Europe, North America, Australia and New Zealand for 1.7 Euro cents a minute (£0.0118 or $0.02129) - mobiles cost a good deal more. Connection with landlines can be variable - sometimes connection is lost and in one case there was no voice connection at all. No doubt, with the interest being expressed, these problems will get ironed out.
Of course, governments and the big telecomms companies get very edgy over VoIP - here's a communication process where they may not be able to make any money, unless they REGULATE. Naturally, it is the USA where these concerns are raised.
It had to happen: "Boingo, Vonage Sign VoWi-Fi Pact"
Google again
A couple of things about Google - first, you'll find a review of its e-mail service, Gmail, in the latest issue of the journal. Secondly, I'm also trying out its 'desktop search' program - this enables you to do a Google search on your hard disc. It also checks your hard disc when you do a Web search - useful for bringing to your attention those items you'd forgotten you'd ever written!
|
Google in the news
(by Tom Wilson, posted at 4:44 PM)
Google is in the news again - on the 5th October it issued a 'new features' message to users of Gmail, to the effect that it was trialling a new mail forwarding system, which would be free during the trial. This prompted commentators to speculate on what other features of Google in general would become revenue streams.
As it happens, I've been using Gmail as a beta user for the past couple of months and a review will appear in the October issue of Information Research, and I'm now hooked on it. It's 1Gb filestore, use of 'labels' to index messages, and grouping of messages into 'conversations' make it a real winner.
|
New book
(by Tom Wilson, posted at 4:18 PM)
Congratulations to one of our Editorial Board members, Amanda Spink, for her new book, jointly authored with Bernard Jansen: "Web Search: Public Searching of the Web" - you can find details at the publisher's Website.
|
Popular papers in Information Research
(by Tom Wilson, posted at 8:42 PM)
Having recently published a new issue of Information Research, I thought it was time to find out how the ranking by 'hits per month' was standing. So here's the latest table. We see that some very recent papers appear to have struck a chord, while some of the oldest papers are still going strong.
|
AI and search engines
(by Tom Wilson, posted at 2:11 PM)
A highly favourable item on a new search engine, blinkx in the Guardian Online supplement, sent me off to its Website to check it out. blinkx uses, so we are told, an AI technique rather than page-ranking a la Google and it searches not only the Web, news services, and Weblogs, but also your hard disc. From one of the file names on the downloaded system I suspect that the engine behind blinkx is Autonomy
The Website includes an option to try out the beta version of blinkx optimised for broadband users and I discovered something rather odd. The PR claims that "blinkx understands your question and presents you with links as you search." - but the system obviously uses stop words. How can a question be understood if the stop words include terms of significance to the user?
Specifically, I searched for 'Information Research', expecting the journal site to pop up fairly quickly - no: only things on 'research' appeared. Similarly, when I used 'information behaviour', only 'behaviour' was used as a search term, and for 'information science', only 'science'. Not much use in the information management sector, then! The give-away is that the terms used in the search are highlighted and in all cases, where 'information' did appear in an item, it was not highlighted.
'Information' on its own may or may not be a useful search term - certainly it would generate millions of hits, but when used in compounds such as those mentioned, the concept so formed has much greater specificity. As long as AI systems continue to fail to recognize concepts and their semantic significance, they will fail to produce a search system that is a significant improvement on Google.
|
Google
(by Tom Wilson, posted at 9:51 PM)
Google is also hitting the news this week - with new services announced and, in today's Guardian, a big article about Google's intention to offer a free e-mail service to compete with Yahoo! and Hotmail, offering a gigbyte of storage - way above the limits of the other two. I'll join that! Get more on this from Google itself.
|
A Google Game
(by Tom Wilson, posted at 4:35 PM)
There are all kinds of games you can play with Google, including the well-known 'Googlewhacking'. I don't know whether I've invented this one, which I discovered accidentally.
It is very simple: just hit a few keys haphazardly, for example, "l;kd" in the search box of Yahoo and see what turns up. The aim is to put in something that returns nothing - which is surprisingly difficult! That combination, for example, turned up more than one and half million hits! Even entered as a phrase, it produced almost 20,000.
The string ";we[kear'k" resulted in 34 hits, largely as a result of the existence of an author called "K. Kear". However, as a phrase, it produced zero - so it can be done. Remember, however, that they entry of symbols should be haphazard, just let your fingers do the choosing.
|
In the news...
(by Tom Wilson, posted at 10:30 AM)
An interesting item on wireless in the public library from LIS News.com
...and a longer piece on IT in public libraries from D-Lib Magazine
Turning to the University sector, I picked this up from Seb's Open Research - a couple of courses at Prince Edward Island University are using Weblogs as resource pages and communication. Here's one on 'Networking, knowledge and the digital age'.
And here's an interesting one! I initiated a debate on the JESSE list some time back on the extent to which Web citation was beginning to overtake journal citation as a performance tool. I then found that this had been picked up by a couple of researchers (Vaughan and Shaw, Bibliographic and Web citations: what is the difference? JASIST, 54(14), 2003, 1313-1322) and now ISI is getting together with NEC: Thomson ISI and NEC Team Up to Index Web-based Scholarship
PHILADELPHIA & LONDON & PRINCETON, N.J.--(BUSINESS WIRE)--Feb. 25, 2004--Today, Thomson ISI and NEC Laboratories America (NEC) announced their collaboration to create a comprehensive, multidisciplinary citation index for Web-based scholarly resources. The new Web Citation Index(TM) will combine a suite of technologies developed by NEC, including "autonomous citation indexing" tools from NEC's CiteSeer environment, with the capabilities underlying ISI Web of Knowledge(SM). Thomson ISI editors will carefully monitor the quality of this new resource to ensure all indexed material meets the Thomson ISI high-quality standards.
During 2004, Thomson ISI and NEC will operate a pilot of the new resource to receive feedback from the scientific and scholarly community. Full access to the index is projected for early 2005.
When fully operational, the new resource will be a unique content collection within ISI Web of Knowledge. It will complement the Thomson ISI Web of Science®, and provide researchers with a new gateway to discovery -- using citation relationships among Web-based documents, such as pre-prints, proceedings, and "open access" research publications
OK - that's enough for now - I've got to go off to talk with the people at Orange about mobile technologies.
|
Search engines and the FT
(by Tom Wilson, posted at 10:02 AM)
I didn't get to the Saturday issue of the FT before this morning and there I found a leader item on search engines. I don't think I've seen a newspaper leader on the subject in the UK before. The item is 'Online searching: who's feeling lucky?' - available on the FT web site, but only to subscribers. The main point about the article is the suggestion that with the limited number of search engines available, or rather, the dominance of Google, there's a need for 'one fully transparent search engine, preferably maintained in the academic realm.' Isn't it curious how the advocates of capitalism always find a role for the public sphere when they want something unbiased? :-) The suggestion was made originally by Google's founders, Sergey Brin and Larry Page, in a research paper, but I haven't been able to locate it on the Web.
Good luck to the FT, but the chances of any university in the UK picking up the challenge to provide a 'fully transparent search engine' are pretty remote. You can count on the fingers of one hand and still have spare capacity the number of institutions pursuing serious information retrieval research and so deeply mired in managerialism are the institutions that the probability of selfless public service is remote. Everything these days must have an 'income stream', nothing is done for nothing, and the tentacles of central government's assessment procedures stretch everywhere.
|
Search engines
(by Tom Wilson, posted at 7:27 PM)
Old news now - two days old - that Yahoo! has dropped Google as its search engine in favour of its own search engine, provided by Inktomi. So I wondered how it compared. I searched for "Information Research" using both and, surprise, links to the journal were 1st and 2nd in Yahoo! Search and also in Google. Not much difference there. So, I searched for "case-based reasoning" at ".edu" sites. In the first 20 links for each search engine, only five institutions were duplicated, and from these five institutions only four Web pages were duplicated. It would seem, then, that the two engines are doing different things and that, if you want a reasonably comprehensive coverage of a topic, it would be a good idea to use both.
|
NewzCrawler
(by Tom Wilson, posted at 2:45 PM)
Having used the news aggregator, NewzCrawler, for some months now, I finally decided, when the evaluation period came to an end, that I can't live without it - and the $24.95 seems a modest price to pay. It isn't perfect, but then what software is?
The need for a news aggregator, assuming that you still haven't cottoned on to the need, is the increasing popularity of RSS feeds that provide the raw material for aggregators. A recent development at Yahoo! makes RSS feeds available for news searches. For example, if you want to pick up every mention of Tony Blair (heaven forfend) that occurs in the news sources covered by Yahoo!, use this URL in your aggregator. Read about this development at Jeremy Zawodny's Weblog
My aggregator now has links to fifty news and information sources - it's continually growing and continually being weeded as I find new things and get rid of dross - of which there is much!
Search engines
(by Tom Wilson, posted at 2:20 PM)
There's a useful account of developments in search engines during 2003 at Sitepoint - I was pointed there from the Logos Weblog, which has some interesting stuff. I liked this comment from the section on the future:
Watch Microsoft carefully. If a new Microsoft-based search initiative gets off the ground this year, you can bet it will be well funded and well promoted. Site owners can benefit from first-mover advantages in getting listed. If you can become an early expert in the new search technology, your site and traffic could soar.
|
Stuff you don't need to know
(by Tom Wilson, posted at 5:59 PM)
As everyone knows, Information Research uses the Atomz.com search engine - which is made freely available. I've just been experimenting with restricting the pages that are scanned, but it didn't work out. However, in the process I had to ask for the site to be re-indexed (this normally happens automatically every Sunday night) and the log for the indexing tells me that 417 pages have been indexed containing 1,546,605 words.
Wow - 1.5 million words - I had no idea that we'd published as much as that. Now, that includes contents pages (which I was trying to mask) and the editorials but, nevertheless, that's a lot of words. And, given the volume of hits, it seems that people find them useful words.
While I was at it, I checked on the language used in the searching: here are the search strings used last month:
| Frequency | Search string |
| 18 | knowledge management |
| 12 | information management |
| 11 | data mining |
| 8 | electronic resources |
| 8 | information conciousness |
| 7 | cko and failure |
| 7 | communication |
| 7 | digital library |
| 7 | information retrieval |
| 7 | management |
| 6 | cko |
| 6 | e publishing |
| 6 | information literacy |
| 6 | information seeking behaviour |
| 6 | online public access catalog |
| 6 | outsourcing |
| 6 | pattern of communication adopted by marketing department in industrial goods sec |
| 6 | search engine |
| 5 | company libraries |
Some of this strikes me as odd and must be the result of some people clicking on the 'Go' button more than once.
|
Google and InformationR.net
(by Tom Wilson, posted at 5:54 PM)
As a result of getting O'Reilly's 'Google hacks' for review, I've tweaked one of the examples to provide a site-search feature for InformationR.net. Try it out and let me know what you think about it.
|
Hitting the site...
(by Tom Wilson, posted at 7:48 PM)
I imagine that most readers of Information Research will be aware of the counter on the top page. What they may not know is that I regularly collect information from the counter service on where the hits are coming from. My 'harvest' now totals 4,158 hits - collected since 1 November 2002 - and shows hits arriving at the site from referring sites (almost 500 of them). Only a few sites account for 2.0% or more of the 4,158 and I show them in the table below.
It's a curious list consisting of a variety of organized resource 'directories', like BUBL, together with one other e-journal, a academic site hosted by the Department of Communication at the University of Washington, the search engine, Google, and one item in a newsletter about search engines.
The last of these - Searchday from Search Engine Watch - demonstrated the impact of certain sources: the item was published on 27 May 2003 and it immediately led to a peak in the hits curve, and hits from that page have been arriving ever since, to they effect that it now accounts for 2.5% of all the hits on the top page.
The Directory of Open Access Journals also illustrates how a new site can have an immediate impact on traffic - I don't recall when the hits first appeared, but it was only earlier this year, and it now accounts for almost 3% of the total.
The data on Google are a bit of a cheat - in fact, if one takes all 28 Google sites (from www.google.ae to www.google.sk, the search engine in its different manifestations accounts for 7.55% of the total hits.
Searching and the Weblog
(by Prof. Tom Wilson, posted at 5:02 PM)
Now there's a funny thing - it seems that a number of people are hitting the
Weblog through searching for something completely different, and yet decide to
have a look.
For example, someone searching for 'Captain Stabbin' on msn.com found the link
to my item on the Nigerian scam at number 28 on the output list - yet still
clicked on the link to the log. I can't imagine the naive user doing that, so I
assume that it must have been someone who recalled seeing my message on the log
and wanted to find it again.
Similarly, someone searching for 'Internet 2' on Google found the Weblog link at
number eight in the list, yet followed it. And another Google search for 'Joint
use libraries' AND 'Syllabus', resulted in 14 items, of which the Weblog item
was number 13 - and yet that was followed. Surely cases of people wanting to
find things they'd seen before.
It seems unlikely, however, that someone searching for "What is the official
name from the standards organization of the 11mbs wireless networking
standard?" again on Google, would have seen a specific message on this topic.
In fact, the link to the Weblog was number two on the list and led to the
'Wireless' channel of the log - nothing there to answer the question.
Most curious of all, however, was a search for 'sugar daddy phenomenon' - and,
lo and behold!, the Weblog item that includes all three words is item number
two - this time the 'Electronic publishing' channel of the log, which includes
an item on the open access 'phenomenon', posted on 12 September 2003, which
included a request for a 'sugar daddy' to support the journal.
Curious indeed are the ways of search engines and people - you can check this
out at the
counter service.
Incidentally, of 21 hits from search engine search outputs, 16 used some variant
of Google.
Thinking of buying Google?
(by Tom Wilson, posted at 4:10 PM)
Check out the Fortune article.
|
Information Research and SSIC
(by Tom Wilson, posted at 3:32 PM)
I've just taken the time to check Web of Science and it seems that all items in Information Research from Volume 8 no. 1 have now been indexed there. I look forward to every increasing hits :-) Speaking of which... the current hits on the top page now exceed last year's total by more than 10,000
|
Pricing Google
(by Tom Wilson, posted at 5:29 PM)
The possibility of privately-owned Google going public is giving financial analysts the trembles.
Wharton School of Management has a nice piece on it.
|
Odds and ends
(by Tom Wilson, posted at 1:12 PM)
Current Cites is an electronic publication I've drawn attention to before. Here are a couple of items that interested me:
I'm in the process of reviewing the latest version of EndNote, the bibliography organizer, and this version has a new feature, linking to the original source through the OpenURL protocol - coincidentally, Current Cites draws attention to an interview in the OCLC Newsletter with Herbert Van de Sompel, the originator of the protocol and a key figure in the Open Archives Initiative
The other piece is from First Monday that e-journal that is just a little younger than Information Research :-) This paper concerns 'open content' - that is, what you are reading now, and what you read in every new issue of Information Research. Magnus Cedergren, the author of 'Open content and value creation' states in the abstract:
In this paper, I consider open content as an important development track in the media landscape of tomorrow. I define open content as content possible for others to improve and redistribute and/or content that is produced without any consideration of immediate financial reward often collectively within a virtual community. The open content phenomenon can to some extent be compared to the phenomenon of open source. Production within a virtual community is one possible source of open content. Another possible source is content in the public domain. This could be sound, pictures, movies or texts that have no copyright, in legal terms.
and in the body of the paper he looks at three examples of open content:
All in all, an interesting paper.
|
The Visual Thesaurus
(by Tom Wilson, posted at 1:45 PM)
John Holgate has drawn the attention of IR-Discuss members to the:
Plumb Visual Thesaurus developed since 1996 in the Princeton University Concept Labs. IMO it's the biggest breakthrough in semantics since Carnap invented 'intension'.
It is interesting that the VT's 'view' of the concept information comes directly from
the Princetonian definition:
'a message sent and received that reduces the receiver's uncertainty' (ho hum)
but it also separates out facts/documents/data from 'selective information'
(a la Shannon communication theory) and the entropy/ectropy strand beloved
of the physicists.
The strange little entity labelled 'info', which is appearing more and more in
biology circles is, perhaps fittingly, without a definition.
I suggest you try playing with 'knowledge' and 'experience' for good measure
and see how meanings appear to have their own momentum and relationships -
like in the world beyond thesauri and dictionaries.
Thanks for that, John.
|
More odds and ends
(by Tom Wilson, posted at 9:48 PM)
Grahame Gould drew my attention to the fact that the Free-Conversant server has been down and the last lot of 'Odds and ends' was not reachable - so, as compensation, here's another lot. I've no idea why the server was down, not having had any information about it.
Music piracy is hitting the headlines again. Regular aficionados of this site may recall an earlier message on the subject and today the music industry won plaudits for its suing of a 71-year old grandfather and a 12-year old child. That's the way to do it, guys - go for the soft targets. Naturally, it has been picked up by the other Weblogs and The Shifted Librarian raises a point or two.
The whole thing makes another item from Techdirt all the more interesting: apparently the music industry is using file-sharing networks it abhors to collect market research data.
On the search front, there's a rather curious hybrid at Anacubis, described as an integration of :
...the Amazon and Google search APIs with the anacubisTM Viewer to deliver an innovative and powerful new way to browse the extensive catalogue of books, CD, DVDs and videos for sale at Amazon.com - and then explore related information amongst Google's 3 billion plus web documents
The demo worked fine the first time I used it, but refused to perform again. Try it out, however, you never know your luck. I'm not sure who it is intended for - perhaps simply to show that the Anacuba visualisation software works - but I'm always chary of visualisation of searches, given the way people search and the limited responses they are happy with. Pictures are not always worth a thousand words. Thanks to ResearchBuzz for that one.
|
Google again
(by Tom Wilson, posted at 2:38 PM)
News from Search Engine Watch on the issue of who's got the biggest index. [Perhaps there'll be a new burst of spam - 'Proven ways to increase your index size!!!!!!!!!']
As Danny Sullivan, the author, says:
Size figures have long been used as a surrogate for the missing relevancy figures that the search engine industry as a whole has failed to provide. Size figures are also a bad surrogate, because more pages in no way guarantees better results.
However, it's easy to use pages indexed (even if you aren't telling the truth - see the article) in the publicity battle, so I guess it will keep on going.
|
Monday morning
(by Tom Wilson, posted at 8:09 AM)
Here's an interesting item from Current Cites - that useful alerting system for things about information and information technology: it concerns a new book from O'Reilly on 'Amazon Hacks', describing the tricks you can get up to in searching for books using the very powerful search engine at Amazon.com
This particular 'hack' discusses the advanced search possibilities, which go well beyond the typical Boolean search. Read all about it at the book's Web site.
On the Weblogging front, there's a dispute brewing up about RSS - Really Simple Syndication, or whatever use you want to make of those initials. The dispute surrounds the future of RSS and is too complicated to summarise here, so go look at the CNET News site.
|

This work is licensed under a
Creative Commons License.
This site managed with Conversant, © Copyright 2008 Macrobyte Resources
|
Channels
Digital Libraries
Education
Electronic publishing
Freedom of information
Information Management
Intellectual Property
Internet
Knowledge management
Personal
Records management
Resources
Searching
Software
Technology
Weblogs
Wireless
Words
|