October, 2004
S M T W T F S
  1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31  
Aug  Nov

Guests
Welcome!
Sign Up
Log On

Search


 

Information Research Weblog









Day Link Icon 10/22/2004
Weblogs and other things (by Tom Wilson, posted at 9:38 PM)

Weblogs

My thanks to folks, on and off the Weblog, who've written to encourage me to keep the Weblog going—I'll plod on when I know that it has some effect. Carol Cahill kindly says:

Our library probably wouldn't have a wireless Internet connection if my interest hadn't first been piqued by your Weblog. Now we have a four-laptop wireless training lab and patrons can come in and connect with their own computers.

Which I think is rather better than a citation in a journal :-)

"The Chief's" comments on Weblog membership counts is also interesting - as are the usage stats for the Weblog - last year 13,776 hits, this year, so far, 13,588 with those hits distributed over the continents as follows:

1.North-America10,78039.4%
2. Europe 10,565 38.6%
3. Asia 2,961 10.8%
4. Australia 1,735 6.3%
5. Africa 415 1.5%
6. South America 277 1.0%
7. Central America 133 0.5%
Unknown 498 1.8%

Yahoo! does a Google

News today of Yahoo!'s purchase of an e-mail start-up, by the name of Bloomba (why does the Internet generate so many silly names? Scope for a PhD dissertation here!). I'd never heard of Bloomba before, but it is an e-mail client, rather than a Web-based service. Reviews suggest that its killer feature is its search capacity; it indexes your mail as you receive it, including what's in attachments. Whatever plans Yahoo! has for the system, no one seems to know. The original parent company, Statalabs, says:

What does Yahoo! plan to do with the technology as a result of the acquisition?
At this time we do not have any announcements about the ongoing plans for the technology or the specifics of the transaction.

A case of 'Watch this space' - well, not this one, since I can't guarantee that I'll spot an announcement, but perhaps the Yahoo! site - and while you are there, you might like to take a look at MySearch



Day Link Icon 10/20/2004
Odds and ends (by Tom Wilson, posted at 10:35 AM)

The Weblog

It seems that my suspicions about the lack of general interest in the IR Weblog are confirmed :-) I've been contributing very little over the past month and so far no one has asked, Where are you?

New issue of the journal

The latest issue, Volume 10 No. 1, is now on the site. This one has the first batch of papers from the Information Seeking in Context conference, held in Dublin last month. The other half will be published in the January 2005 issue. I finally got round to checking on what logs were available on the server and discovered that, since, the 8th October (which is when the analysis software appears to have kicked in) there have been about 280,000 hits on the InformationR.net site - most of which are on the journal. This is considerably beyond my own estimates from the various counters. InformationR.net is the sixth most 'popular' virtual domain on the University's servers.

Voice over Internet Protocol (VoIP)

VoIP appears to be building up nicely. I finally got round to using it, along with colleagues in the AIMTech research group at Leeds University Business School. The voice quality, using Skype, is generally pretty good - not quite as good as the best landline, but good enough considering that it's free. I've also tried the SkypeOut service, which connects to landline numbers pretty well anywhere in the world and to mobile phones in some. You can connect to landlines in Western Europe, North America, Australia and New Zealand for 1.7 Euro cents a minute (£0.0118 or $0.02129) - mobiles cost a good deal more. Connection with landlines can be variable - sometimes connection is lost and in one case there was no voice connection at all. No doubt, with the interest being expressed, these problems will get ironed out.

Of course, governments and the big telecomms companies get very edgy over VoIP - here's a communication process where they may not be able to make any money, unless they REGULATE. Naturally, it is the USA where these concerns are raised.

It had to happen: "Boingo, Vonage Sign VoWi-Fi Pact"

Google again

A couple of things about Google - first, you'll find a review of its e-mail service, Gmail, in the latest issue of the journal. Secondly, I'm also trying out its 'desktop search' program - this enables you to do a Google search on your hard disc. It also checks your hard disc when you do a Web search - useful for bringing to your attention those items you'd forgotten you'd ever written!



Day Link Icon 10/7/2004
Google in the news (by Tom Wilson, posted at 4:44 PM)

Google is in the news again - on the 5th October it issued a 'new features' message to users of Gmail, to the effect that it was trialling a new mail forwarding system, which would be free during the trial. This prompted commentators to speculate on what other features of Google in general would become revenue streams.

As it happens, I've been using Gmail as a beta user for the past couple of months and a review will appear in the October issue of Information Research, and I'm now hooked on it. It's 1Gb filestore, use of 'labels' to index messages, and grouping of messages into 'conversations' make it a real winner.



Day Link Icon 8/26/2004
New book (by Tom Wilson, posted at 4:18 PM)
Congratulations to one of our Editorial Board members, Amanda Spink, for her new book, jointly authored with Bernard Jansen: "Web Search: Public Searching of the Web" - you can find details at the publisher's Website.


Day Link Icon 8/2/2004
Popular papers in Information Research (by Tom Wilson, posted at 8:42 PM)

Having recently published a new issue of Information Research, I thought it was time to find out how the ranking by 'hits per month' was standing. So here's the latest table. We see that some very recent papers appear to have struck a chord, while some of the oldest papers are still going strong.

T.D. Wilson The nonsense of knowledge management
     Issue 8.1 Total hits 39553 Months active 24 Hits per month 1648.04
Terrence A. Brooks The nature of meaning in the age of Google
     Issue 9.3 Total hits 5204 Months active 4 Hits per month 1301.00
Jannica Heinström Five personality dimensions and their influence on information behaviour
      Issue 9.1 Total hits 9090 Months active 9 Hits per month 1010.00
P. Riding, S.P. Fowell, and P.C.M. Levy An action research approach to curriculum development
     Issue 1.1 Total hits 16507 Months active 19 Hits per month 868.79
Chun Wei Choo Environmental scanning as information seeking and organizational learning
     Issue 7.1 Total hits 19950 Months active 33 Hits per month 604.55
France Bouthillier and Kathleen Shearer Understanding knowledge management and information management: the need for an empirical perspective
      Issue 8.1 Total hits 12594 Months active 21 Hits per month 599.71
Zita Correia and Tom Wilson Scanning the business environment for information: a grounded theory approach
     Issue 2.4 Total hits 11000 Months active 19 Hits per month 578.95
Paul M. Hildreth and Chris Kimble The duality of knowledge
      Issue 8.1 Total hits 11984 Months active 21 Hits per month 570.67
Kalervo Järvelin and T.D. Wilson On conceptual models for information seeking and retrieval research
      Issue 9.1 Total hits 4819 Months active 9 Hits per month 535.44
Terrence A. Brooks Web search: how the Web has changed information retrieval.
     Issue 8.3 Total hits 7452 Months active 15 Hits per month 496.80
V. Mistry and Bob Usherwood Total quality management, British Standard accreditation, Investors in People and academic libraries
     Issue 1.3 Total hits 9414 Months active 19 Hits per month 495.47
Maija-Leena Huotari and T.D. Wilson Determining organizational information needs: the Critical Success Factors approach
     Issue 6.3 Total hits 18498 Months active 39 Hits per month 474.31
Sirje Virkus Information literacy in Europe: a literature review
     Issue 8.4 Total hits 5152 Months active 12 Hits per month 429.33
Kirsti Nilsen The Library Visit Study: user experiences at the virtual reference desk
     Issue 9.2 Total hits 2312 Months active 6 Hits per month 385.33
Barbara Niedźwiedzka A proposed general model of information behaviour
     Issue 9.1 Total hits 3021 Months active 9 Hits per month 335.67
Joyce Kirk Information in organisations: directions for information management
     Issue 4.3 Total hits 20602 Months active 63 Hits per month 327.02
Leonard J. Ponzi and Michael Koenig Knowledge management: another management fad?
     Issue 8.1 Total hits 6826 Months active 21 Hits per month 325.05
Shrianjani Marie (Gina) de Alwis and Susan Ellen Higgins Information as a tool for management decision making: a case study of Singapore
     Issue 7.1 Total hits 10054 Months active 33 Hits per month 304.67
Wallace Koehler A longitudinal study of Web pages continued: a consideration of document persistence
     Issue 9.2 Total hits 1797 Months active 6 Hits per month 299.50
Bo-Christer Björk Open access to scientific publications - an analysis of the barriers to change
     Issue 9.2 Total hits 1776 Months active 6 Hits per month 296.00



Day Link Icon 7/17/2004
AI and search engines (by Tom Wilson, posted at 2:11 PM)

A highly favourable item on a new search engine, blinkx in the Guardian Online supplement, sent me off to its Website to check it out. blinkx uses, so we are told, an AI technique rather than page-ranking a la Google and it searches not only the Web, news services, and Weblogs, but also your hard disc. From one of the file names on the downloaded system I suspect that the engine behind blinkx is Autonomy

The Website includes an option to try out the beta version of blinkx optimised for broadband users and I discovered something rather odd. The PR claims that "blinkx understands your question and presents you with links as you search." - but the system obviously uses stop words. How can a question be understood if the stop words include terms of significance to the user?

Specifically, I searched for 'Information Research', expecting the journal site to pop up fairly quickly - no: only things on 'research' appeared. Similarly, when I used 'information behaviour', only 'behaviour' was used as a search term, and for 'information science', only 'science'. Not much use in the information management sector, then! The give-away is that the terms used in the search are highlighted and in all cases, where 'information' did appear in an item, it was not highlighted.

'Information' on its own may or may not be a useful search term - certainly it would generate millions of hits, but when used in compounds such as those mentioned, the concept so formed has much greater specificity. As long as AI systems continue to fail to recognize concepts and their semantic significance, they will fail to produce a search system that is a significant improvement on Google.



Day Link Icon 4/2/2004
Google (by Tom Wilson, posted at 9:51 PM)

Google is also hitting the news this week - with new services announced and, in today's Guardian, a big article about Google's intention to offer a free e-mail service to compete with Yahoo! and Hotmail, offering a gigbyte of storage - way above the limits of the other two. I'll join that! Get more on this from Google itself.



Day Link Icon 3/30/2004
A Google Game (by Tom Wilson, posted at 4:35 PM)

There are all kinds of games you can play with Google, including the well-known 'Googlewhacking'. I don't know whether I've invented this one, which I discovered accidentally.

It is very simple: just hit a few keys haphazardly, for example, "l;kd" in the search box of Yahoo and see what turns up. The aim is to put in something that returns nothing - which is surprisingly difficult! That combination, for example, turned up more than one and half million hits! Even entered as a phrase, it produced almost 20,000.

The string ";we[kear'k" resulted in 34 hits, largely as a result of the existence of an author called "K. Kear". However, as a phrase, it produced zero - so it can be done. Remember, however, that they entry of symbols should be haphazard, just let your fingers do the choosing.



Day Link Icon 2/26/2004
In the news... (by Tom Wilson, posted at 10:30 AM)

An interesting item on wireless in the public library from LIS News.com

...and a longer piece on IT in public libraries from D-Lib Magazine

Turning to the University sector, I picked this up from Seb's Open Research - a couple of courses at Prince Edward Island University are using Weblogs as resource pages and communication. Here's one on 'Networking, knowledge and the digital age'.

And here's an interesting one! I initiated a debate on the JESSE list some time back on the extent to which Web citation was beginning to overtake journal citation as a performance tool. I then found that this had been picked up by a couple of researchers (Vaughan and Shaw, Bibliographic and Web citations: what is the difference? JASIST, 54(14), 2003, 1313-1322) and now ISI is getting together with NEC: Thomson ISI and NEC Team Up to Index Web-based Scholarship

PHILADELPHIA & LONDON & PRINCETON, N.J.--(BUSINESS WIRE)--Feb. 25, 2004--Today, Thomson ISI and NEC Laboratories America (NEC) announced their collaboration to create a comprehensive, multidisciplinary citation index for Web-based scholarly resources. The new Web Citation Index(TM) will combine a suite of technologies developed by NEC, including "autonomous citation indexing" tools from NEC's CiteSeer environment, with the capabilities underlying ISI Web of Knowledge(SM). Thomson ISI editors will carefully monitor the quality of this new resource to ensure all indexed material meets the Thomson ISI high-quality standards.

During 2004, Thomson ISI and NEC will operate a pilot of the new resource to receive feedback from the scientific and scholarly community. Full access to the index is projected for early 2005.

When fully operational, the new resource will be a unique content collection within ISI Web of Knowledge. It will complement the Thomson ISI Web of Science®, and provide researchers with a new gateway to discovery -- using citation relationships among Web-based documents, such as pre-prints, proceedings, and "open access" research publications

OK - that's enough for now - I've got to go off to talk with the people at Orange about mobile technologies.



Day Link Icon 2/23/2004
Search engines and the FT (by Tom Wilson, posted at 10:02 AM)
I didn't get to the Saturday issue of the FT before this morning and there I found a leader item on search engines. I don't think I've seen a newspaper leader on the subject in the UK before. The item is 'Online searching: who's feeling lucky?' - available on the FT web site, but only to subscribers. The main point about the article is the suggestion that with the limited number of search engines available, or rather, the dominance of Google, there's a need for 'one fully transparent search engine, preferably maintained in the academic realm.' Isn't it curious how the advocates of capitalism always find a role for the public sphere when they want something unbiased? :-) The suggestion was made originally by Google's founders, Sergey Brin and Larry Page, in a research paper, but I haven't been able to locate it on the Web.

Good luck to the FT, but the chances of any university in the UK picking up the challenge to provide a 'fully transparent search engine' are pretty remote. You can count on the fingers of one hand and still have spare capacity the number of institutions pursuing serious information retrieval research and so deeply mired in managerialism are the institutions that the probability of selfless public service is remote. Everything these days must have an 'income stream', nothing is done for nothing, and the tentacles of central government's assessment procedures stretch everywhere.



Day Link Icon 2/20/2004
Search engines (by Tom Wilson, posted at 7:27 PM)
Old news now - two days old - that Yahoo! has dropped Google as its search engine in favour of its own search engine, provided by Inktomi. So I wondered how it compared. I searched for "Information Research" using both and, surprise, links to the journal were 1st and 2nd in Yahoo! Search and also in Google. Not much difference there. So, I searched for "case-based reasoning" at ".edu" sites. In the first 20 links for each search engine, only five institutions were duplicated, and from these five institutions only four Web pages were duplicated. It would seem, then, that the two engines are doing different things and that, if you want a reasonably comprehensive coverage of a topic, it would be a good idea to use both.


Day Link Icon 1/18/2004
NewzCrawler (by Tom Wilson, posted at 2:45 PM)

Having used the news aggregator, NewzCrawler, for some months now, I finally decided, when the evaluation period came to an end, that I can't live without it - and the $24.95 seems a modest price to pay. It isn't perfect, but then what software is?

The need for a news aggregator, assuming that you still haven't cottoned on to the need, is the increasing popularity of RSS feeds that provide the raw material for aggregators. A recent development at Yahoo! makes RSS feeds available for news searches. For example, if you want to pick up every mention of Tony Blair (heaven forfend) that occurs in the news sources covered by Yahoo!, use this URL in your aggregator. Read about this development at Jeremy Zawodny's Weblog

My aggregator now has links to fifty news and information sources - it's continually growing and continually being weeded as I find new things and get rid of dross - of which there is much!

Search engines (by Tom Wilson, posted at 2:20 PM)

There's a useful account of developments in search engines during 2003 at Sitepoint - I was pointed there from the Logos Weblog, which has some interesting stuff. I liked this comment from the section on the future:

Watch Microsoft carefully. If a new Microsoft-based search initiative gets off the ground this year, you can bet it will be well funded and well promoted. Site owners can benefit from first-mover advantages in getting listed. If you can become an early expert in the new search technology, your site and traffic could soar.


Day Link Icon 1/3/2004
Stuff you don't need to know (by Tom Wilson, posted at 5:59 PM)

As everyone knows, Information Research uses the Atomz.com search engine - which is made freely available. I've just been experimenting with restricting the pages that are scanned, but it didn't work out. However, in the process I had to ask for the site to be re-indexed (this normally happens automatically every Sunday night) and the log for the indexing tells me that 417 pages have been indexed containing 1,546,605 words.

Wow - 1.5 million words - I had no idea that we'd published as much as that. Now, that includes contents pages (which I was trying to mask) and the editorials but, nevertheless, that's a lot of words. And, given the volume of hits, it seems that people find them useful words.

While I was at it, I checked on the language used in the searching: here are the search strings used last month:

FrequencySearch string
18knowledge management
12information management
11data mining
8electronic resources
8information conciousness
7cko and failure
7communication
7digital library
7information retrieval
7management
6cko
6e publishing
6information literacy
6information seeking behaviour
6online public access catalog
6outsourcing
6pattern of communication adopted by marketing department in industrial goods sec
6search engine
5company libraries

Some of this strikes me as odd and must be the result of some people clicking on the 'Go' button more than once.



Day Link Icon 12/8/2003
Google and InformationR.net (by Tom Wilson, posted at 5:54 PM)

As a result of getting O'Reilly's 'Google hacks' for review, I've tweaked one of the examples to provide a site-search feature for InformationR.net. Try it out and let me know what you think about it.



Day Link Icon 12/4/2003
Hitting the site... (by Tom Wilson, posted at 7:48 PM)

I imagine that most readers of Information Research will be aware of the counter on the top page. What they may not know is that I regularly collect information from the counter service on where the hits are coming from. My 'harvest' now totals 4,158 hits - collected since 1 November 2002 - and shows hits arriving at the site from referring sites (almost 500 of them). Only a few sites account for 2.0% or more of the 4,158 and I show them in the table below.

It's a curious list consisting of a variety of organized resource 'directories', like BUBL, together with one other e-journal, a academic site hosted by the Department of Communication at the University of Washington, the search engine, Google, and one item in a newsletter about search engines.

The last of these - Searchday from Search Engine Watch - demonstrated the impact of certain sources: the item was published on 27 May 2003 and it immediately led to a peak in the hits curve, and hits from that page have been arriving ever since, to they effect that it now accounts for 2.5% of all the hits on the top page.

The Directory of Open Access Journals also illustrates how a new site can have an immediate impact on traffic - I don't recall when the hits first appeared, but it was only earlier this year, and it now accounts for almost 3% of the total.

The data on Google are a bit of a cheat - in fact, if one takes all 28 Google sites (from www.google.ae to www.google.sk, the search engine in its different manifestations accounts for 7.55% of the total hits.








Referring siteNo. of hits%
www.com.washington.edu/rccs/links.asp832.00
www.doaj.org/links/term1870/term1940/1232.96
www.libdex.com/journals.html1253.01
www.shef.ac.uk/uni/academic/I-M/is/publications/index.html1573.78
bubl.ac.uk/journals/2004.81
libres.curtin.edu.au/periodicals.htm912.19
www.searchenginewatch.com/searchday/article.php/22049611042.50
www.google.com/search?1944.67
   
Total hits analysed 415825.90

Searching and the Weblog (by Prof. Tom Wilson, posted at 5:02 PM)
Now there's a funny thing - it seems that a number of people are hitting the Weblog through searching for something completely different, and yet decide to have a look.

For example, someone searching for 'Captain Stabbin' on msn.com found the link to my item on the Nigerian scam at number 28 on the output list - yet still clicked on the link to the log. I can't imagine the naive user doing that, so I assume that it must have been someone who recalled seeing my message on the log and wanted to find it again.

Similarly, someone searching for 'Internet 2' on Google found the Weblog link at number eight in the list, yet followed it. And another Google search for 'Joint use libraries' AND 'Syllabus', resulted in 14 items, of which the Weblog item was number 13 - and yet that was followed. Surely cases of people wanting to find things they'd seen before.

It seems unlikely, however, that someone searching for "What is the official name from the standards organization of the 11mbs wireless networking standard?" again on Google, would have seen a specific message on this topic. In fact, the link to the Weblog was number two on the list and led to the 'Wireless' channel of the log - nothing there to answer the question.

Most curious of all, however, was a search for 'sugar daddy phenomenon' - and, lo and behold!, the Weblog item that includes all three words is item number two - this time the 'Electronic publishing' channel of the log, which includes an item on the open access 'phenomenon', posted on 12 September 2003, which included a request for a 'sugar daddy' to support the journal.

Curious indeed are the ways of search engines and people - you can check this out at the counter service.

Incidentally, of 21 hits from search engine search outputs, 16 used some variant of Google.
Referring siteNo. of hits%
www.com.washington.edu/rccs/links.asp832.00
www.doaj.org/links/term1870/term1940/1232.96
www.libdex.com/journals.html1253.01
www.shef.ac.uk/uni/academic/I-M/is/publications/index.html1573.78
bubl.ac.uk/journals/2004.81
libres.curtin.edu.au/periodicals.htm912.19
www.searchenginewatch.com/searchday/article.php/22049611042.50
www.google.com/search?1944.67
   
Total hits analysed 415825.90

Thinking of buying Google? (by Tom Wilson, posted at 4:10 PM)
Check out the Fortune article.


Day Link Icon 11/30/2003
Information Research and SSIC (by Tom Wilson, posted at 3:32 PM)
I've just taken the time to check Web of Science and it seems that all items in Information Research from Volume 8 no. 1 have now been indexed there. I look forward to every increasing hits :-) Speaking of which... the current hits on the top page now exceed last year's total by more than 10,000


Day Link Icon 11/20/2003
Pricing Google (by Tom Wilson, posted at 5:29 PM)
The possibility of privately-owned Google going public is giving financial analysts the trembles.

Wharton School of Management has a nice piece on it.



Day Link Icon 9/30/2003
Odds and ends (by Tom Wilson, posted at 1:12 PM)

Current Cites is an electronic publication I've drawn attention to before. Here are a couple of items that interested me:

I'm in the process of reviewing the latest version of EndNote, the bibliography organizer, and this version has a new feature, linking to the original source through the OpenURL protocol - coincidentally, Current Cites draws attention to an interview in the OCLC Newsletter with Herbert Van de Sompel, the originator of the protocol and a key figure in the Open Archives Initiative

The other piece is from First Monday that e-journal that is just a little younger than Information Research :-) This paper concerns 'open content' - that is, what you are reading now, and what you read in every new issue of Information Research. Magnus Cedergren, the author of 'Open content and value creation' states in the abstract:

In this paper, I consider open content as an important development track in the media landscape of tomorrow. I define open content as content possible for others to improve and redistribute and/or content that is produced without any consideration of immediate financial reward — often collectively within a virtual community. The open content phenomenon can to some extent be compared to the phenomenon of open source. Production within a virtual community is one possible source of open content. Another possible source is content in the public domain. This could be sound, pictures, movies or texts that have no copyright, in legal terms.

and in the body of the paper he looks at three examples of open content:

All in all, an interesting paper.



Day Link Icon 9/11/2003
The Visual Thesaurus (by Tom Wilson, posted at 1:45 PM)

John Holgate has drawn the attention of IR-Discuss members to the:

Plumb Visual Thesaurus developed since 1996 in the Princeton University Concept Labs. IMO it's the biggest breakthrough in semantics since Carnap invented 'intension'.

It is interesting that the VT's 'view' of the concept information comes directly from the Princetonian definition:

'a message sent and received that reduces the receiver's uncertainty' (ho hum)

but it also separates out facts/documents/data from 'selective information' (a la Shannon communication theory) and the entropy/ectropy strand beloved of the physicists.

The strange little entity labelled 'info', which is appearing more and more in biology circles is, perhaps fittingly, without a definition.

I suggest you try playing with 'knowledge' and 'experience' for good measure and see how meanings appear to have their own momentum and relationships - like in the world beyond thesauri and dictionaries.

Thanks for that, John.



Day Link Icon 9/10/2003
More odds and ends (by Tom Wilson, posted at 9:48 PM)

Grahame Gould drew my attention to the fact that the Free-Conversant server has been down and the last lot of 'Odds and ends' was not reachable - so, as compensation, here's another lot. I've no idea why the server was down, not having had any information about it.

Music piracy is hitting the headlines again. Regular aficionados of this site may recall an earlier message on the subject and today the music industry won plaudits for its suing of a 71-year old grandfather and a 12-year old child. That's the way to do it, guys - go for the soft targets. Naturally, it has been picked up by the other Weblogs and The Shifted Librarian raises a point or two.

The whole thing makes another item from Techdirt all the more interesting: apparently the music industry is using file-sharing networks it abhors to collect market research data.

On the search front, there's a rather curious hybrid at Anacubis, described as an integration of :

...the Amazon and Google search APIs with the anacubisTM Viewer to deliver an innovative and powerful new way to browse the extensive catalogue of books, CD, DVDs and videos for sale at Amazon.com - and then explore related information amongst Google's 3 billion plus web documents

The demo worked fine the first time I used it, but refused to perform again. Try it out, however, you never know your luck. I'm not sure who it is intended for - perhaps simply to show that the Anacuba visualisation software works - but I'm always chary of visualisation of searches, given the way people search and the limited responses they are happy with. Pictures are not always worth a thousand words. Thanks to ResearchBuzz for that one.



Day Link Icon 9/3/2003
Google again (by Tom Wilson, posted at 2:38 PM)

News from Search Engine Watch on the issue of who's got the biggest index. [Perhaps there'll be a new burst of spam - 'Proven ways to increase your index size!!!!!!!!!']

As Danny Sullivan, the author, says:

Size figures have long been used as a surrogate for the missing relevancy figures that the search engine industry as a whole has failed to provide. Size figures are also a bad surrogate, because more pages in no way guarantees better results.

However, it's easy to use pages indexed (even if you aren't telling the truth - see the article) in the publicity battle, so I guess it will keep on going.



Day Link Icon 9/1/2003
Monday morning (by Tom Wilson, posted at 8:09 AM)

Here's an interesting item from Current Cites - that useful alerting system for things about information and information technology: it concerns a new book from O'Reilly on 'Amazon Hacks', describing the tricks you can get up to in searching for books using the very powerful search engine at Amazon.com

This particular 'hack' discusses the advanced search possibilities, which go well beyond the typical Boolean search. Read all about it at the book's Web site.

On the Weblogging front, there's a dispute brewing up about RSS - Really Simple Syndication, or whatever use you want to make of those initials. The dispute surrounds the future of RSS and is too complicated to summarise here, so go look at the CNET News site.



Day Link Icon 8/29/2003
Various (by Tom Wilson, posted at 1:16 PM)

It's been a while since I posted to the log as I'm in Sweden and have been for the past week and too busy to give time to it.

I've also been experiencing server problems - unable to access my Webmail box at Sheffield for the past couple of days, so people may have been trying to contact me with my knowing. My Swedish address will serve for anyone who has been trying to reach me - "tom.wilson@hb.se"

I assume that many of you have been infected by the SoBig virus - I received a message from one correspondent saying that he had had 700 messages in one morning. I don't think I had that many, but I certainly had several hundred over the course of last week. It is no comfort to learn (from BBC News) that this has been the fastest proliferating virus of all time.

News on the search front today: my last entry related to Overture and now we learn (from CNET news) that Google has expanded its index beyond the 3.2 billion pages claimed by Overture. As the report says:

But since then, Mountain View, Calif.-based Google has quietly leaped ahead again, expanding its database to more than 3.3 billion Web documents by Thursday this week, according to its home page. A Google representative confirmed the change.

"Google raised the number on its home page to accurately reflect the number of Web pages it offers consumers," a representative wrote in an e-mail. The search company's worldwide index now includes 3.3 billion Web documents, 800 million Usenet pages and 400 million images.

On another front, the legal system hit a new high in the UK this week as a result of the Hutton Enquiry. Its Web site is attracting 'upwards of 80,000 visitors a day', according to the Guardian's Online supplement. The transcripts of the hearings into the circumstances surrounding the death of David Kelly make fascinating reading as politicians, their public relations staff and journalists dance around the questions put. The big news, of course, related to Tony Blair's appearance before the Inquiry earlier this week - the jury is out on that performance but from what I read it was an assured performance with all the glibness of which the man is capable - whether anything he says these days can be trusted, is another matter, and the polls suggest that the public appreciation of him has waned considerably.

There news and screenshots of the latest versions of MSoft's new (three years down the road?) operating system, code-named Longhorn, at WinSupersite.com. The thrust appears to be more and more towards multimedia integration - so I guess that's another zillion features that the typical user will make little us of!

Enough for now! Have a good week-end



Day Link Icon 8/23/2003
Overture search engine (by Tom Wilson, posted at 2:52 PM)

The Overture search engine, bought by Yahoo!, now has an index, courtesy of FAST - also bought by Yahoo! - of, it is claimed, more than 3.2 billion Web pages. (News from Research Buzz)

Ah, but can one find "Information Research", you ask? Well, it seems that it can. My usual test is to see whether the journal comes in the top two or three when searched for as a phrase or as just the two words. Overture turns up trumps - IR is the first listed 'additional' site. The first site returned is always a sponsored site, i.e., one that is paying to be listed.

There's a twist, however, the IR index page only comes up number 2 on the US listing. If one selects UK as the country, there are three 'sponsored' sites and the fourth site is the redirection page for IR on the Department of Information Studies site at Sheffield. One can play games like this all day: when I searched from the Netherlands page, the first mention of the journal was at number 3, but that was the catalogue entry at the Royal Library. From the Japan page I found no mention at all. Obviously, these country pages cover sites in the country, rather than international sites - except for the USA, which, I suppose, is thought to be international?

The Research Buzz item asks how Overture is going to wean people off Google - good question. Yahoo! has spent a lot of money acquiring search capabilities so, presumably, it must have a cunning plan.



Day Link Icon 7/29/2003
News about Northern Light (by Tom Wilson, posted at 4:08 PM)

It seems that Northern Light is planning to bring back its public-use search engine. A note on the Web site says:

If you're looking for the Northern Light web search engine, it is not currently open to the public. We are planning to bring it back later this year. If you would like to be notified when it is available again please sign up for our mailing list.

I used to use Northern Light quite a lot, but it pulled out of the public-use market, for some reason or another and concentrated on seeling its search engine to corporations. Given how the search engine arena has changed since Northern Light 'died' I wonder what motivates the decision to relaunch.

I spotted this item on the ResearchBuzz weblog



Day Link Icon 7/18/2003
Hitting on Google. (by Tom Wilson, posted at 10:19 PM)
Slate has an article, Digging for Google Holes, that is critical of various aspects of searching Google, some of which seems pretty lame to me. For example, it is claimed that synonyms are a problem (big news, aren't they for everybody!?) and cites the fact that:

Search for “apple” on Google, and you have to troll through a couple pages of results before you get anything not directly related to Apple Computer — and it’s a page promoting a public TV show called Newton’s Apple. After that it’s all Mac-related links until Fiona Apple’s home page. You have to sift through 50 results before you reach a link that deals with apples that grow on trees: the home page for the Washington State Apple Growers Association.


Presumably the writer is someone who rarely searches: I put 'apples "Washington State"' into Google and the Washington State Apple Commission came up first, with no trace of an Apple Computer. Shouldn't journalists who write on search engines try to learn something about them?


Day Link Icon 6/6/2003
Google and Weblogs (by Tom Wilson, posted at 8:24 AM)
The relationship between Weblogs and Google seems to be in the news again. The Register carries an article (again by Andrew Orlowski, who seems to have a thing about Weblogs) on the subject, claiming that searchers are fed up with links to Weblogs cluttering up their search results. I'm not sure about the circumstances under which this occurs, since I have yet to experience the phenomenon - probably means I'm just searching for serious, boring stuff rather than the latest gossip about Madonna or whoever...

The subject is also tackled in a recent article in The Observer. In it, John Naughton suggests that it is all a matter of the professional journalists envying the amateur and he points out that much of the stuff written by the professional hack is not available on the Web. His moral?

The moral is: if you want to score with Google, be on the web. Otherwise, go whistle.

That seems fair!



Day Link Icon 5/19/2003
Google fictions? (by Tom Wilson, posted at 2:52 PM)
Here's a complicated story. Some time back Andrew Orlowski of The Register published a story about Google dropping Weblogs from its searches and, instead, using a special-purpose search engine. I commented on this at the time. A story in Monday's Guardian Online suggests that this is not the case. It all leaves one wondering what is reliable on the Web and in Weblogs. This message is, because I'm just pointing you to pages that exist :-)


Day Link Icon 5/13/2003
Get your local newspaper... (by Tom Wilson, posted at 12:00 AM)
Google has launched local news pages.

Take a look at Google News UK as well as those for Australia, Canada, New Zealand, and India.

Sample them all and you get an interesting perspective on what is thought important in the different places.


Day Link Icon 5/12/2003
Google offloads bloggers... in a manner of speaking (by Tom Wilson, posted at 8:17 PM)
The Register reports that Google is to develop a search engine specifically for Weblogs in order to reduce the 'noise' they create when included in the normal search.

The report notes:

"The main problem with blogs is that, as far as Google is concerned, they masquerade as useful information when all they contain is idle chatter," wrote Roddy. "And through some fluke of their evil software, they seem to get indexed really fast, so when a major political or social event happens, Google is noised to the brim with blogs and you have to start at result number 40 or so before you get past the blogs."

Perhaps both Webloggers and others will welcome the move, since it will provide a focused search for the former and reduce that noise for the latter. This happened when Google acquired the Usenet groups and provided a separate search process, so why not with Weblogs?





Creative Commons License
This work is licensed under a Creative Commons License.



This site managed with Conversant, © Copyright 2008 Macrobyte Resources

Channels


Digital Libraries

Education

Electronic publishing

Freedom of information

Information Management

Intellectual Property

Internet

Knowledge management

Personal

Records management

Resources

Searching

Software

Technology

Weblogs

Wireless

Words