| |
| |
1 |
2 |
| 3 |
4 |
5 |
6 |
7 |
8 |
9 |
| 10 |
11 |
12 |
13 |
14 |
15 |
16 |
| 17 |
18 |
19 |
20 |
21 |
22 |
23 |
| 24 |
25 |
26 |
27 |
28 |
29 |
30 |
|
Mar May
|
|
|
Firefox and Skype
(by Tom Wilson, posted at 1:00 PM)
No, I'm not suggesting that they are related, but just that there is news out about the two of them.
First, Janco Associates Inc. reports that in the business sector, Firefox now has 10% of the browser market. However, the total market share seems to be about half of that and if MSoft gets its act together in launching a new version of IE, the growth may disappear. Still, Firefox has lots of advantages in terms of customising by add-ins and 'themes', so companies may begin to adopt it, suitably customised, as company standard.
The picture from the point of view of Information Research seems to support Janco's data: this is a table of the distribution of hits over the browser used - a snapshot taken today:
| 1. | Internet Explorer 6.x | 80.3 % |
| 2. | Mozilla Firefox 1.x | 9.1 % |
| 3. | Internet Explorer 5.x | 4.5 % |
| 4. | Netscape 7.x | 1.5 % |
| 5. | Mozilla Firefox | 1.5 % |
| 6. | Mozilla 1.x | 1.5 % |
| | Unknown | 1.5 % |
| | Total | 100.0 % |
The news from Skype is interesting - the first announcement passed me by, since Skype appears not to have informed existing users. Skype 'Out' has been available for some time: you can call a land line from your computer - at low cost - and I use this for some international calls. Now, however, you can also have Skype 'In' - that is, you can have up to 10 telephone numbers assigned, for different countries in the world, which will allow residents of those countries to call your Skype number and get through to your VoIP phone at local phone rates. Very handy if your friends and relatives don't use computers: and also very useful if you are abroad, without a computer, and want to call home - you can call your own local number and make the international call at local rates. The service costs, of course, the princely sum of €30 a year!
|
Google and security
(by Tom Wilson, posted at 2:11 PM)
From the F-Secure computer security people:
F-Secure staff has found a malicious website that utilizes a spelling error when typing the name of the popular search engine - 'Google.com'. If a user opens a malicious website, his/her computer gets hijacked - a lot of different malware gets automatically downloaded and installed: trojan droppers, trojan downloaders, backdoors, a proxy trojan and a spying trojan. Also a few adware-related files are installed.
The name of the malicious website is 'Googkle.com'. PLEASE DO NOT GO TO THIS WEBSITE! Otherwise your computer will get infected! We have reported the case to the authorities.
|
Voting, etc.
(by Tom Wilson, posted at 9:17 AM)
Today's Guardian Online has a couple of interesting items. One is on the way the Web is assisting tactical voting in the UK general election (5th May). Using a Website Tacticalvoter.net, a voter can discover whether or not tactical voting will make a difference in their constituency and then 'swop' their vote with someone else in another constituency. So - voter in constituency A (let us say a life-long Labour voter) will agree to vote Liberal Democrat in that constituency, while a partner voter (normally a Liberal Democrat voter) will vote Labour in constituency B. The whole process, of course, is designed to give the party with the best chance of getting rid of a Conservative member of parliament an opportunity to do so. As a Professor of politics has said, it's a kind of proportional representation alternative to the actual electoral system we enjoy in Britain—as well as being an interesting example of the power of the Web.
The other item is on Yahoo's MyWeb—which is actually in the Online Weblog:
My Web is a personal search engine that extends users existing Yahoo! Search experience by providing a simple way to save, recall and share online information with friends and colleagues. My Web enables users to create their own personal online archive by saving their favourite pages, search results, and search history to My Web. In addition, users can share their information with friends and colleagues via integrated tools such as email, instant messenger, and personal networking provided by Yahoo!s new Yahoo! 360° tool
Of course, Google has something similar.
|
How Firefox works
(by Tom Wilson, posted at 2:15 PM)
If you haven't yet switched to Firefox, reading the pages from the How Stuff Works site may provide you with some good reasons.
|
Web services
(by Tom Wilson, posted at 3:14 PM)
Readers will probably recall Terry Brooks's short note on Web services from some time back. The subject pops up again on ZDNet in a column from David Berlind on Yahoo offering APIs to its search engine, so that Web services can be built on it by third parties.
Interesting in itself, but Berlind notes that his Weblog shows a Google search:
Although it still refers to the effort as a beta program, Google has been doing this for over two years. For example, if you check out my Transparency Channel, you can see on the lower right-hand side were I have pre-executed a Google search on "media transparency" and included a results box (Google-branded, of course) right on the page. Radio Userland, the blogging solution that I'm testing for review (using my Transparency Channel as the guinea pig) comes with pre-built macros for accessing Google's search APIs via a Web services interface. All you have to do is get a license key from Google (a relatively simple process that requires getting a user ID on Google's systems) and live with the limitation of 1,000 search executions per day. Google has some pretty tight licensing terms. For example, you can't build a commercial service off the company's APIs without asking first (according to the company's FAQ)
The fun thing is that Berlind hasn't done a search for "media transparency", but for 'media AND/OR transparency', with the result that he attributes third position on the output to one 'William J. Bennett' and sixth position to his own Weblog, whereas, in fact, 'William J. Bennett' doesn't appear on the first page of the results and Berlind's Weblog is at fifth position—or, indeed, third position if one removes items two and four, which are pages within sites one and three. Those inverted commas do make a difference :-)
|
Odds and ends
(by Tom Wilson, posted at 12:05 PM)
VoIP
It looks as though VoIP is forging ahead, with Skype announcing a PDA version (Pocket PC rather than Palm, unfortunately), and also a deal with Motorola. Motorola comment:
With over 68 million downloads of their client in the last 18 months, we believe Skype is a natural fit with our vision of simple and seamless connectivity for our consumer customers around the globe.
FireFox
It is announced that Microsoft will launch Internet Explorer 7 as a separate package, and the suggestion is that the success of Firefox has got it worried, since the plan was to keep it integrated with Windows. Molly Wood - columnist for C|Net - suggests that this will kill off FireFox. I wouldn't be too sure. MSoft's reputation for producing insecure, buggy code, which doesn't satisfy W3C standards is unlikely to make people confident about a new browser, even if it has all the goodies that FireFox brings. But FireFox may find it difficult to maintain the momentum.
|
Search strings and InformationR.net
(by Tom Wilson, posted at 5:54 PM)
The University of Sheffield server gives me data on the use of the InformationR.net site, monthly. One of the tables shows the search terms used and the table below shows the top 20 (21 actually, because the last two are tied)for four months from October 2004. The original information only shows the top 50 search strings and the percentage shown here refers to the total hits of that top 50 - not all hits.
| Search term | No. | % |
| Environment(al) scanning + | 1309 | 12.24 |
| Critical success factor(s) | 906 | 8.47 |
| Information research + | 823 | 7.70 |
| Curriculum development | 807 | 7.55 |
| Qualitative vs.quantitative + | 580 | 5.42 |
| Knowledge management + | 551 | 5.15 |
| Norway | 469 | 4.39 |
| Duality | 356 | 3.33 |
| Resumenes | 333 | 3.11 |
| Conceptual model(s) | 323 | 3.02 |
| Research methods | 287 | 2.68 |
| Business environment | 278 | 2.60 |
| Total Quality Management | 272 | 2.54 |
| Action research | 254 | 2.38 |
| Reference Manager | 199 | 1.86 |
| Research journal(s) | 198 | 1.85 |
| Information explosion | 184 | 1.72 |
| Knowledge | 181 | 1.69 |
| British Standard(s) | 139 | 1.30 |
| Five personality traits + | 137 | 1.28 |
| Management decision making | 137 | 1.28 |
A lot of the search strings suggest that users are looking for 'known items', rather than simply searching in general. For example, 'Resumenes' is the term used for the 'Abstracts in Spanish' page; 'Reference Manager' refers to one of the bibliographic software packages that has been reviewed in the journal, and 'EndNote' and 'Biblioscape' also appear; 'Information Research' and its variants (indicated in the table by +) 'ir' and 'InformationR.net', clearly suggest a search for the site or the journal. Some are more difficult to interpret - 'Environment(al) scanning' may be a general search string, or a search for papers in the special issue on that topic and, similarly, 'Curriculum development' may mean that the user was searching for one of the highly hit papers or that a more general search was intended. 'Duality' is an unusual term, but it makes sense in terms of one of the papers in the 'knowledge management' issue, while 'Five personality traits' and its variants is probably searching for Heinström's paper.
It's a subjective point, but I have the impression that these searches of the journal are rather more explicit and focussed than Web searches in general. And I guess that is to be expected—although the occasional oddity pops up, such as the 99 searches using 'Problems of the world'. I know that information scientists can do many things, but this seems to be asking a little too much!
Google Desktop
(by Tom Wilson, posted at 11:22 AM)
I've been using Google Desktop - or rather it has been on my hard disc, unused since October - and I was finding that the system was slowing down. I was also getting a message immediately after booting up to the effect that virtual memory was low. The only thing that I could think of that might be having this effect was Google Desktop, so I've removed it and the problems seem to have disappeared.
When you remove the software a page pops up in your Web browser asking why you've done it and offering some options, one of which was, My system has slowed down - or words to that effect - so, obviously, Google is aware that there is a problem.
Presumably it is the continuous indexing that is the problem and I suppose that the start-up memory needed, is more than what I have can bear. I also noted that Task Manager was telling me that CPU usage was 100% - and that was before I'd launched any application.
Nice idea these desktop search devices, but perhaps the system costs are too high?
|
Speculative searching
(by Tom Wilson, posted at 2:43 PM)
Prompted by Amir's message I took a look at the site and at others - especially GoogleRankings, where I found that in searches for 'knowledge management' the journal site ranks 104th in the top 1,000 and 224th for 'information management'; the World list of departments... ranks 126th in searches for 'information management' and the 'nonsense' paper ranks 22nd in searches for 'knowledge management'.
These are just pointless facts that will enable you to delight and baffle your friends :-) And, of course, a reminder that publishing in Information Research is sure to get you noticed :-)
Speculative Search Game (Google Game)
(by Amir Michail, posted at 12:00 AM)
A game where you predict which web pages will rank more highly on Google in the future! The output of the game will be used to build the Speculative Search Engine that ranks those web pages more highly today.
http://www.cse.unsw.edu.au/~amichail/spec/
|
Google - again, and other things.
(by Tom Wilson, posted at 11:53 PM)
Google has been much in the news as a result of its venture into the digital library - on a huge scale. Today's Observer (one of the so-called 'broadsheet' Sunday papers in the UK, for those who don't know it, and part of the Guardian family) has an article in its business section on Google's latest venture, in which John Naughton refers to Howard Reingold's seminal work on the virtual community:
Many years ago, Howard Rheingold, who was one of the first people to understand the social potential of cyberspace, posed an interesting question: 'Where is the Library of Congress, when it's on your laptop?' To most people at the time, it seemed a meaningless question. What lay behind it, however, was an attempt to think through a profound consequence of a networked society - what Frances Cairncross later dubbed 'the death of distance'.
Naughton also notes:
Once upon a time, being learned involved holding a lot of knowledge and information in one's head. Are we moving towards a world where the important thing is not what you know, but how to find it?
an idea expressed many long years ago by Dr. Johnson (as reported by James Boswell—in 1791):
Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information upon it."
which is also a very neat definition of the difference between 'knowledge' and 'information' :-)
Google was also the subject of one of Fortune's long articles last week, too. The focus was on the share price and the probability of investors getting their return (the verdict seemed to be, 'Be cautious'), but, among other things it has some interesting stuff on the competition.
Thanks also to Gerry Mckiernan and the ASIS-L mailing list for bringing another Google item to my attention; this time in the New York Times (you'll need to register to read the article, but registration is free).
The article contains a nice story about the irreplaceability of the physical book – for some purposes:
Mr. Jimerson said, 'A scanned image will only tell you some things, and the sheer volume of records makes scanning everything difficult'. But he added that he supported Google's plan in theory. 'I recall the story of a gentleman being in a library and watching a researcher sniff books', he said. 'It turned out that the aroma of vinegar was still embedded in those that had been treated with vinegar to prevent cholera during an epidemic'.
Thanks to Gerry also for another item in the New York Times – this time on Firefox. With Pennsylvania State University telling everyone on campus to switch from Internet Explorer, it would seem that Microsoft has a little problem on its hands – one that may result in a policy switch, unless arrogance holds sway in Redmond. If there is a policy switch it would require IE to be re-written from the ground up, so Firefox may go ahead by leaps and bounds. Try it—my guess is that, if you are an IE user, you'll need less than ten minutes with the new rival (well, not so new, if you've been using it for the past couple of years in its development phase) to convince you to switch.
|
Odds and ends
(by Tom Wilson, posted at 5:58 PM)
I've been working in Oporto for the past week with little chance to catch up on current developments, so here's my backlog:
- There's news of IBM's efforts to develop information retrieval systems for use in corporate networks, rather than on the Internet. It comes a little late to this sector, with Google Desktop and a new version of Copernic already in play. My guess is that IBM is likely to make the usual technology-led errors in producing a system, that is, greater complexity in preparing search formulations than users are likely to buy, and not enough work behind the interface to interpret relatively simple formulations. Corporate files also suffer from a very difficult problem for information retrieval, one that was described to me many years ago on a visit to Shell - a North Sea drilling platform could be identified in documents by a project code-name, by geographical coordinates, by the designation assigned once the platform was in use, such as 'Platform Alpha' or by a phrase such as, 'the project'.
- The The International Telecommunication Union has produced a press release headed, Low Cost Broadband and Internet Access Essential to Information Society with a link to Best Practice Guidelines for the Promotion of Low Cost Broadband and Internet Connectivity. This document lists some very worthy aims, but one wonders whether competition and regulation are really likely to deliver low prices. In many countries the national PTT or the dominant controller of existing wires can effectively control access to the necessary exchanges and so on; in these circumstances something stronger than 'regulation' may be needed. As for competition: well, we have that in fuel supply to the garage forecourt, but I don't see too much impact on price.
- The big news for libraries, of course, was that Google is in the process of scanning millions of books in the libraries of Harvard, Stanford and Michigan universities, in the New York Public Library, and the Bodleian Library in Oxford. Other contributions to the debate about this initiative can be found here, and here, and at the Wall Street Journal (setting aside its neocon bias for a change!)
|
Google Scholar again
(by Tom Wilson, posted at 10:44 AM)
As we might expect, Google Scholar has raised a lot of interest. There's an interesting Weblog entry from a guy who works for Ingenta on working with Google to enable content to be 'crawled'—rather 'techie' for a non-nerd like myself, but interesting nonetheless.
Search Engine Watch also has an item - a moan about the lack of documentation, so that we don't know what Google Scholar actually covers - a very necessary moan, particularly when students these days seem to believe that if they can't find something by using Google, it doesn't exist.
I haven't used 'Scholar' much yet, but I don't like the output form: for some totally irrational reason, I'm happy to put up with it for a Web search, but the format doesn't fit my conception of what output relating to the scholarly literature should look like. I'll have to take a closer look and figure out why I have this reaction.
|
Another Google initiative
(by Tom Wilson, posted at 9:30 PM)
Those folk at Google are certainly stirring things up with the launch of 'Google Scholar (Beta)' a variant of the search engine to access the scholarly literature.
According to the New York Times (you'll need to register):
The engineer who led the project, Anurag Acharya, said the company had received broad cooperation from academic, scientific and technical publishers like the Association of Computing Machinery, Nature, the Institute of Electrical and Electronics Engineers and the Online Computer Library Center.
The new Google service, which includes a listing of scientific citations as well as ways to find materials at libraries that are not online, will not initially include the text advertisements that are shown on standard pages for Google search results.
Testing something like this is rather tricky when the coverage is unknown. However, I tried just a simple, but slightly obscure search phrase, "colliery spoil" and got a list of 146 items. Some are listed as 'CITATION', for exampe:
[CITATION] Effective passive treatment of aluminium-rich, acidic colliery spoil drainage using a compost
- Web Search
PL Younger, TP Curtis, A Jarvis, R Pennell - Cited by 10
Journal of the Chartered Institution of Water and
, 1997
Click on the 'Web search' link and, as you see, it does just that; click on the 'Cited by 10' link and you are given a list of the ten sources that have cited this item, with the same layout and more links to items that cite the cited items—one could get rather dizzy going through this lot!
Other items in the original list are links to information on the Web, although not always the complete document. For example, this link leads to an abstract in PubMed, not to the original document:
Substrate characterisation for a subsurface reactive barrier to treat colliery spoil leachate
PW Amos, PL Younger - Cited by 4
Substrate characterisation for a subsurface reactive barrier to treat colliery
spoil leachate. Amos PW, Younger PL. FaberMaunsell ...
Water Research, 2003 - ncbi.nlm.nih.gov
The 146 items consisted of 35 Citation entries and 111 Web links
|
Re: Alternative browsers
(by Seth Dillingham, posted at 12:00 AM)
On 11/15/04, Tom Wilson said:
>Opera has many of the same features as FireFox (and had them
>earlier) and it does some things better; but I like the way
>FireFox does tabs better, even though its inability to stop sites
>from launching windows without the navigation bar is frustrating.
Actually, it can do that, but it's a hidden preference.
In your browser, go to this url: "about:config". (No http: or
anything, just exactly "about:config".)
At the top of the long list of preferences that it shows, there is
a textbox. Paste this into that checkbox (without the quotes):
"dom.disable_window_open".
Double click on the line that says
"dom.disable_window_open_feature.titlebar". That will change the
value from false to true. From now on, when a web page opens a new
window, it will be unable to hide the toolbar.
Seth
Alternative browsers
(by Tom Wilson, posted at 9:36 AM)
There's an interesting little discussion going on at ZD-Net about the open source browser, FireFox. One of the staff writers is bidding farewell to Internet Explorer and, as one or two of the discussants ask, "Why's it taken you so long?"
I've been using alternative browsers since Opera first appeared and I now use FireFox most of the time - it's something of a toss-up between these two: Opera has many of the same features as FireFox (and had them earlier) and it does some things better; but I like the way FireFox does tabs better, even though its inability to stop sites from launching windows without the navigation bar is frustrating. Also, unless you aren't bothered by ads, FireFox is free, whereas Opera costs - not a lot, but...
With Opera and FireFox in the market I just don't understand why anyone uses IE any longer, other than for those sites that seem to imagine that nothing else exists.
|
Microsoft's new search engine
(by Tom Wilson, posted at 11:18 AM)
There's been a burst of interest on the Net about Microsoft's new search technology, which can be found in beta form at MSN Search, but it doesn't look all that great to me.
"The release of our beta is a huge step towards delivering the information consumers are looking for online, faster.", says a Microsoft spokesman. However, my test is where Information Research appears when I search for it and, on this basis, MSN Search lags behind others. For example, when I used "information research", the Weblog was the first thing to appear—at the bottom of the first page of results. The journal site didn't appear until page six, when it was the last item on the page. One issue with MSN Search is that it appears to ignore the word order— there seemed to be as many occurrences of "research information" as of the phrase "information research", which doesn't seem very intelligent to me.
For comparison, here's a table of results with other search engines
| Search engine | Page number | Non-sponsored position |
| Alltheweb | 1 | 2 |
| Alta Vista | 1 | 1 |
| AOL Search | 1 | 1 |
| Ask Jeeves | 1 | 3 |
| Excite | 1 | 1 |
| Gigablast | 1 | 1 |
| Google | 1 | 1 |
| HotBot | 1 | 1 |
| Lycos | 1 | 2 |
| Teoma | 1 | 3 |
| Yahoo | 1 | 2 |
By this little test, the new MSN engine doesn't show up very well!
|
Odds and ends
(by Tom Wilson, posted at 11:00 PM)
Here's an interesting little item on Google.
TechWeb Today points to a new TechEncyclopedia, with 20,000 terms. Curiously, this doesn't display correctly in FireFox, although when I download the page to look at the code, the downloaded version displays perfectly well. Something strange going on here!
|
Weblogs and other things
(by Tom Wilson, posted at 9:38 PM)
Weblogs
My thanks to folks, on and off the Weblog, who've written to encourage me to keep the Weblog going—I'll plod on when I know that it has some effect. Carol Cahill kindly says:
Our library probably wouldn't have a wireless Internet connection if my interest hadn't first been piqued by your Weblog. Now we have a four-laptop wireless training lab and patrons can come in and connect with their own computers.
Which I think is rather better than a citation in a journal :-)
"The Chief's" comments on Weblog membership counts is also interesting - as are the usage stats for the Weblog - last year 13,776 hits, this year, so far, 13,588 with those hits distributed over the continents as follows:
| 1. | North-America | 10,780 | 39.4% |
| 2. | Europe | 10,565 | 38.6% |
| 3. | Asia | 2,961 | 10.8% |
| 4. | Australia | 1,735 | 6.3% |
| 5. | Africa | 415 | 1.5% |
| 6. | South America | 277 | 1.0% |
| 7. | Central America | 133 | 0.5% |
| | Unknown | 498 | 1.8% |
Yahoo! does a Google
News today of Yahoo!'s purchase of an e-mail start-up, by the name of Bloomba (why does the Internet generate so many silly names? Scope for a PhD dissertation here!). I'd never heard of Bloomba before, but it is an e-mail client, rather than a Web-based service. Reviews suggest that its killer feature is its search capacity; it indexes your mail as you receive it, including what's in attachments. Whatever plans Yahoo! has for the system, no one seems to know. The original parent company, Statalabs, says:
What does Yahoo! plan to do with the technology as a result of the acquisition?
At this time we do not have any announcements about the ongoing plans for the technology or the specifics of the transaction.
A case of 'Watch this space' - well, not this one, since I can't guarantee that I'll spot an announcement, but perhaps the Yahoo! site - and while you are there, you might like to take a look at MySearch
|
Odds and ends
(by Tom Wilson, posted at 10:35 AM)
The Weblog
It seems that my suspicions about the lack of general interest in the IR Weblog are confirmed :-) I've been contributing very little over the past month and so far no one has asked, Where are you?
New issue of the journal
The latest issue, Volume 10 No. 1, is now on the site. This one has the first batch of papers from the Information Seeking in Context conference, held in Dublin last month. The other half will be published in the January 2005 issue. I finally got round to checking on what logs were available on the server and discovered that, since, the 8th October (which is when the analysis software appears to have kicked in) there have been about 280,000 hits on the InformationR.net site - most of which are on the journal. This is considerably beyond my own estimates from the various counters. InformationR.net is the sixth most 'popular' virtual domain on the University's servers.
Voice over Internet Protocol (VoIP)
VoIP appears to be building up nicely. I finally got round to using it, along with colleagues in the AIMTech research group at Leeds University Business School. The voice quality, using Skype, is generally pretty good - not quite as good as the best landline, but good enough considering that it's free. I've also tried the SkypeOut service, which connects to landline numbers pretty well anywhere in the world and to mobile phones in some. You can connect to landlines in Western Europe, North America, Australia and New Zealand for 1.7 Euro cents a minute (£0.0118 or $0.02129) - mobiles cost a good deal more. Connection with landlines can be variable - sometimes connection is lost and in one case there was no voice connection at all. No doubt, with the interest being expressed, these problems will get ironed out.
Of course, governments and the big telecomms companies get very edgy over VoIP - here's a communication process where they may not be able to make any money, unless they REGULATE. Naturally, it is the USA where these concerns are raised.
It had to happen: "Boingo, Vonage Sign VoWi-Fi Pact"
Google again
A couple of things about Google - first, you'll find a review of its e-mail service, Gmail, in the latest issue of the journal. Secondly, I'm also trying out its 'desktop search' program - this enables you to do a Google search on your hard disc. It also checks your hard disc when you do a Web search - useful for bringing to your attention those items you'd forgotten you'd ever written!
|
Google in the news
(by Tom Wilson, posted at 4:44 PM)
Google is in the news again - on the 5th October it issued a 'new features' message to users of Gmail, to the effect that it was trialling a new mail forwarding system, which would be free during the trial. This prompted commentators to speculate on what other features of Google in general would become revenue streams.
As it happens, I've been using Gmail as a beta user for the past couple of months and a review will appear in the October issue of Information Research, and I'm now hooked on it. It's 1Gb filestore, use of 'labels' to index messages, and grouping of messages into 'conversations' make it a real winner.
|
New book
(by Tom Wilson, posted at 4:18 PM)
Congratulations to one of our Editorial Board members, Amanda Spink, for her new book, jointly authored with Bernard Jansen: "Web Search: Public Searching of the Web" - you can find details at the publisher's Website.
|
Popular papers in Information Research
(by Tom Wilson, posted at 8:42 PM)
Having recently published a new issue of Information Research, I thought it was time to find out how the ranking by 'hits per month' was standing. So here's the latest table. We see that some very recent papers appear to have struck a chord, while some of the oldest papers are still going strong.
|
AI and search engines
(by Tom Wilson, posted at 2:11 PM)
A highly favourable item on a new search engine, blinkx in the Guardian Online supplement, sent me off to its Website to check it out. blinkx uses, so we are told, an AI technique rather than page-ranking a la Google and it searches not only the Web, news services, and Weblogs, but also your hard disc. From one of the file names on the downloaded system I suspect that the engine behind blinkx is Autonomy
The Website includes an option to try out the beta version of blinkx optimised for broadband users and I discovered something rather odd. The PR claims that "blinkx understands your question and presents you with links as you search." - but the system obviously uses stop words. How can a question be understood if the stop words include terms of significance to the user?
Specifically, I searched for 'Information Research', expecting the journal site to pop up fairly quickly - no: only things on 'research' appeared. Similarly, when I used 'information behaviour', only 'behaviour' was used as a search term, and for 'information science', only 'science'. Not much use in the information management sector, then! The give-away is that the terms used in the search are highlighted and in all cases, where 'information' did appear in an item, it was not highlighted.
'Information' on its own may or may not be a useful search term - certainly it would generate millions of hits, but when used in compounds such as those mentioned, the concept so formed has much greater specificity. As long as AI systems continue to fail to recognize concepts and their semantic significance, they will fail to produce a search system that is a significant improvement on Google.
|
Google
(by Tom Wilson, posted at 9:51 PM)
Google is also hitting the news this week - with new services announced and, in today's Guardian, a big article about Google's intention to offer a free e-mail service to compete with Yahoo! and Hotmail, offering a gigbyte of storage - way above the limits of the other two. I'll join that! Get more on this from Google itself.
|
A Google Game
(by Tom Wilson, posted at 4:35 PM)
There are all kinds of games you can play with Google, including the well-known 'Googlewhacking'. I don't know whether I've invented this one, which I discovered accidentally.
It is very simple: just hit a few keys haphazardly, for example, "l;kd" in the search box of Yahoo and see what turns up. The aim is to put in something that returns nothing - which is surprisingly difficult! That combination, for example, turned up more than one and half million hits! Even entered as a phrase, it produced almost 20,000.
The string ";we[kear'k" resulted in 34 hits, largely as a result of the existence of an author called "K. Kear". However, as a phrase, it produced zero - so it can be done. Remember, however, that they entry of symbols should be haphazard, just let your fingers do the choosing.
|
In the news...
(by Tom Wilson, posted at 10:30 AM)
An interesting item on wireless in the public library from LIS News.com
...and a longer piece on IT in public libraries from D-Lib Magazine
Turning to the University sector, I picked this up from Seb's Open Research - a couple of courses at Prince Edward Island University are using Weblogs as resource pages and communication. Here's one on 'Networking, knowledge and the digital age'.
And here's an interesting one! I initiated a debate on the JESSE list some time back on the extent to which Web citation was beginning to overtake journal citation as a performance tool. I then found that this had been picked up by a couple of researchers (Vaughan and Shaw, Bibliographic and Web citations: what is the difference? JASIST, 54(14), 2003, 1313-1322) and now ISI is getting together with NEC: Thomson ISI and NEC Team Up to Index Web-based Scholarship
PHILADELPHIA & LONDON & PRINCETON, N.J.--(BUSINESS WIRE)--Feb. 25, 2004--Today, Thomson ISI and NEC Laboratories America (NEC) announced their collaboration to create a comprehensive, multidisciplinary citation index for Web-based scholarly resources. The new Web Citation Index(TM) will combine a suite of technologies developed by NEC, including "autonomous citation indexing" tools from NEC's CiteSeer environment, with the capabilities underlying ISI Web of Knowledge(SM). Thomson ISI editors will carefully monitor the quality of this new resource to ensure all indexed material meets the Thomson ISI high-quality standards.
During 2004, Thomson ISI and NEC will operate a pilot of the new resource to receive feedback from the scientific and scholarly community. Full access to the index is projected for early 2005.
When fully operational, the new resource will be a unique content collection within ISI Web of Knowledge. It will complement the Thomson ISI Web of Science®, and provide researchers with a new gateway to discovery -- using citation relationships among Web-based documents, such as pre-prints, proceedings, and "open access" research publications
OK - that's enough for now - I've got to go off to talk with the people at Orange about mobile technologies.
|
Search engines and the FT
(by Tom Wilson, posted at 10:02 AM)
I didn't get to the Saturday issue of the FT before this morning and there I found a leader item on search engines. I don't think I've seen a newspaper leader on the subject in the UK before. The item is 'Online searching: who's feeling lucky?' - available on the FT web site, but only to subscribers. The main point about the article is the suggestion that with the limited number of search engines available, or rather, the dominance of Google, there's a need for 'one fully transparent search engine, preferably maintained in the academic realm.' Isn't it curious how the advocates of capitalism always find a role for the public sphere when they want something unbiased? :-) The suggestion was made originally by Google's founders, Sergey Brin and Larry Page, in a research paper, but I haven't been able to locate it on the Web.
Good luck to the FT, but the chances of any university in the UK picking up the challenge to provide a 'fully transparent search engine' are pretty remote. You can count on the fingers of one hand and still have spare capacity the number of institutions pursuing serious information retrieval research and so deeply mired in managerialism are the institutions that the probability of selfless public service is remote. Everything these days must have an 'income stream', nothing is done for nothing, and the tentacles of central government's assessment procedures stretch everywhere.
|
Search engines
(by Tom Wilson, posted at 7:27 PM)
Old news now - two days old - that Yahoo! has dropped Google as its search engine in favour of its own search engine, provided by Inktomi. So I wondered how it compared. I searched for "Information Research" using both and, surprise, links to the journal were 1st and 2nd in Yahoo! Search and also in Google. Not much difference there. So, I searched for "case-based reasoning" at ".edu" sites. In the first 20 links for each search engine, only five institutions were duplicated, and from these five institutions only four Web pages were duplicated. It would seem, then, that the two engines are doing different things and that, if you want a reasonably comprehensive coverage of a topic, it would be a good idea to use both.
|
NewzCrawler
(by Tom Wilson, posted at 2:45 PM)
Having used the news aggregator, NewzCrawler, for some months now, I finally decided, when the evaluation period came to an end, that I can't live without it - and the $24.95 seems a modest price to pay. It isn't perfect, but then what software is?
The need for a news aggregator, assuming that you still haven't cottoned on to the need, is the increasing popularity of RSS feeds that provide the raw material for aggregators. A recent development at Yahoo! makes RSS feeds available for news searches. For example, if you want to pick up every mention of Tony Blair (heaven forfend) that occurs in the news sources covered by Yahoo!, use this URL in your aggregator. Read about this development at Jeremy Zawodny's Weblog
My aggregator now has links to fifty news and information sources - it's continually growing and continually being weeded as I find new things and get rid of dross - of which there is much!
Search engines
(by Tom Wilson, posted at 2:20 PM)
There's a useful account of developments in search engines during 2003 at Sitepoint - I was pointed there from the Logos Weblog, which has some interesting stuff. I liked this comment from the section on the future:
Watch Microsoft carefully. If a new Microsoft-based search initiative gets off the ground this year, you can bet it will be well funded and well promoted. Site owners can benefit from first-mover advantages in getting listed. If you can become an early expert in the new search technology, your site and traffic could soar.
|
Stuff you don't need to know
(by Tom Wilson, posted at 5:59 PM)
As everyone knows, Information Research uses the Atomz.com search engine - which is made freely available. I've just been experimenting with restricting the pages that are scanned, but it didn't work out. However, in the process I had to ask for the site to be re-indexed (this normally happens automatically every Sunday night) and the log for the indexing tells me that 417 pages have been indexed containing 1,546,605 words.
Wow - 1.5 million words - I had no idea that we'd published as much as that. Now, that includes contents pages (which I was trying to mask) and the editorials but, nevertheless, that's a lot of words. And, given the volume of hits, it seems that people find them useful words.
While I was at it, I checked on the language used in the searching: here are the search strings used last month:
| Frequency | Search string |
| 18 | knowledge management |
| 12 | information management |
| 11 | data mining |
| 8 | electronic resources |
| 8 | information conciousness |
| 7 | cko and failure |
| 7 | communication |
| 7 | digital library |
| 7 | information retrieval |
| 7 | management |
| 6 | cko |
| 6 | e publishing |
| 6 | information literacy |
| 6 | information seeking behaviour |
| 6 | online public access catalog |
| 6 | outsourcing |
| 6 | pattern of communication adopted by marketing department in industrial goods sec |
| 6 | search engine |
| 5 | company libraries |
Some of this strikes me as odd and must be the result of some people clicking on the 'Go' button more than once.
|
Google and InformationR.net
(by Tom Wilson, posted at 5:54 PM)
As a result of getting O'Reilly's 'Google hacks' for review, I've tweaked one of the examples to provide a site-search feature for InformationR.net. Try it out and let me know what you think about it.
|
Hitting the site...
(by Tom Wilson, posted at 7:48 PM)
I imagine that most readers of Information Research will be aware of the counter on the top page. What they may not know is that I regularly collect information from the counter service on where the hits are coming from. My 'harvest' now totals 4,158 hits - collected since 1 November 2002 - and shows hits arriving at the site from referring sites (almost 500 of them). Only a few sites account for 2.0% or more of the 4,158 and I show them in the table below.
It's a curious list consisting of a variety of organized resource 'directories', like BUBL, together with one other e-journal, a academic site hosted by the Department of Communication at the University of Washington, the search engine, Google, and one item in a newsletter about search engines.
The last of these - Searchday from Search Engine Watch - demonstrated the impact of certain sources: the item was published on 27 May 2003 and it immediately led to a peak in the hits curve, and hits from that page have been arriving ever since, to they effect that it now accounts for 2.5% of all the hits on the top page.
The Directory of Open Access Journals also illustrates how a new site can have an immediate impact on traffic - I don't recall when the hits first appeared, but it was only earlier this year, and it now accounts for almost 3% of the total.
The data on Google are a bit of a cheat - in fact, if one takes all 28 Google sites (from www.google.ae to www.google.sk, the search engine in its different manifestations accounts for 7.55% of the total hits.
Searching and the Weblog
(by Prof. Tom Wilson, posted at 5:02 PM)
Now there's a funny thing - it seems that a number of people are hitting the
Weblog through searching for something completely different, and yet decide to
have a look.
For example, someone searching for 'Captain Stabbin' on msn.com found the link
to my item on the Nigerian scam at number 28 on the output list - yet still
clicked on the link to the log. I can't imagine the naive user doing that, so I
assume that it must have been someone who recalled seeing my message on the log
and wanted to find it again.
Similarly, someone searching for 'Internet 2' on Google found the Weblog link at
number eight in the list, yet followed it. And another Google search for 'Joint
use libraries' AND 'Syllabus', resulted in 14 items, of which the Weblog item
was number 13 - and yet that was followed. Surely cases of people wanting to
find things they'd seen before.
It seems unlikely, however, that someone searching for "What is the official
name from the standards organization of the 11mbs wireless networking
standard?" again on Google, would have seen a specific message on this topic.
In fact, the link to the Weblog was number two on the list and led to the
'Wireless' channel of the log - nothing there to answer the question.
Most curious of all, however, was a search for 'sugar daddy phenomenon' - and,
lo and behold!, the Weblog item that includes all three words is item number
two - this time the 'Electronic publishing' channel of the log, which includes
an item on the open access 'phenomenon', posted on 12 September 2003, which
included a request for a 'sugar daddy' to support the journal.
Curious indeed are the ways of search engines and people - you can check this
out at the
counter service.
Incidentally, of 21 hits from search engine search outputs, 16 used some variant
of Google.
Thinking of buying Google?
(by Tom Wilson, posted at 4:10 PM)
Check out the Fortune article.
|

This work is licensed under a
Creative Commons License.
This site managed with Conversant, © Copyright 2008 Macrobyte Resources
|
Channels
Digital Libraries
Education
Electronic publishing
Freedom of information
Information Management
Intellectual Property
Internet
Knowledge management
Personal
Records management
Resources
Searching
Software
Technology
Weblogs
Wireless
Words
|