Wednesday, June 24, 2009

The age of Citizen cyber-infrastructure

[For more examples of citizen cyber-infrastructure please see http://citizen-science.blogspot.com/. Some excerpts from iSGTW--BSA]
http://www.isgtw.org/?pid=1001877
Opinion - The age of citizen cyberspace
________________________________________

Using LHC@home, particle beam dynamics can be studied with volunteer computing. Image courtesy CERN
(François Grey, one of the key people behind the founding of the present-day iSGTW and a frequent contributor to these pages, argues that with volunteer computing, we are about to embark upon a new era of “citizen science.”)

….PrimeGrid, is tackling a host of numerical challenges, such as finding the longest arithmetic progression of prime numbers (the current record is 25). Professional mathematicians now eagerly collaborate with Rytis to analyze the gems that his volunteers dig up. Yet he funds his project by selling PrimeGrid mugs and t-shirts. In short, Rytis and his online volunteers are a web-enabled version of a venerable tradition: they are “citizen scientists.”
There are nearly 100 science projects using such volunteer computing. Like PrimeGrid, most are based on an open-source software platform called BOINC (see 15 October 2008 iSGTW, “Reaching for the Exa-scale”) with volunteer computing. Many address topical themes, such as modelling climate change with ClimatePrediction.net), developing drugs for AIDS with FightAids@home, or simulating the spread of malaria with MalariaControl.net. (See 7 May 2008 iSGTW, “WISDOM unplugged: Malaria drug-leads graduate to the wet lab”)

Fundamental science projects are also well represented. Einstein@Home (iSGTW 14 May 2008) analyzes data from gravitational wave detectors, MilkyWay@Home simulates galactic evolution, and LHC@home studies accelerator beam dynamics. Each of these projects has easily attracted tens of thousands of volunteers.

[..]
A new wave of online science projects, which can be described as “volunteer thinking,” takes the idea of participative science to a higher level. A popular example is the project GalaxyZoo, where volunteers can classify images of galaxies from the Sloan Digital Sky Survey as either elliptical or spiral, via a simple web interface. In a matter of months, some 100,000 volunteers classified more than 1 million galaxies. People do this sort of pattern recognition more accurately than any computer algorithm. And by asking many volunteers to classify the same image, their statistical average proves to be more accurate than even a professional astronomer.

[..]
Going one step farther in interactivity, the project Foldit is an online game that scores a player’s ability to fold a protein molecule into a minimal-energy structure. Through a nifty web interface, players can shake, wiggle and stretch different parts of the molecule. Again, people are often much faster at this task than computers, because of their aptitude to reason in three dimensions. And the best protein folders are usually teenage gaming enthusiasts rather than trained biochemists.

Who can benefit from this web-based boom in citizen science? In my view, scientists in the developing world stand to gain most by effectively plugging in to philanthropic resources: the computers and brains of supportive citizens, primarily those in industrialized countries with the necessary equipment and leisure time. A project called Africa@home, which I've been involved in, has trained dozens of African scientists to use BOINC. Some are already developing new volunteer-thinking projects, and the first African BOINC server is running at the University of Cape Town.

A new initiative called Asia@home was launched last month with a workshop at Academia Sinica in Taipei and a seminar at the Institute of High Energy Physics in Beijing, to drum up interest in that region. Asia represents an enormous potential, in terms of both the numbers of people with internet access (more Chinese are now online than Americans) and the high levels of education and interest in science.

To encourage such initiatives further, CERN, the United Nations Institute for Training and Research and the University of Geneva will establish a Citizen Cyberscience Center on 2 July. This will help disseminate volunteer computing in the developing world and encourage new technical approaches. For example, as mobile phones become more powerful they, too, can surely be harnessed.
There are about one billion internet connections on the planet and three billion mobile phones. That represents a huge opportunity for citizen science.

Thursday, April 23, 2009

Free storage in the cloud for scientific datasets

[A number of cloud organizations are offering free hosting for open scientific datasets. If this trend continues to develop it will create a further impetus for “open” data which is a result of government funded research as well as the architecture of research networks and the need for large physical scientific computational facilities. One of the challenges facing researchers in the use of clouds or virtualization is the high cost of storage and the time it takes to ship large datasets over the network. Free storage and collocating large datasets with the computational cloud eliminates both challenges. Here are some pointers to initiatives in this area. Thanks to Richard Ackerman’s blog for this pointer – BSA]


Google provides free storage for datasets
http://www.dailybits.com/google-provides-free-storage-for-scientific-data/
Under Project Palimpsest, Google will be providing free storage and public access to large scientific data sets in what could be a major data organization challenge.
The storage would fill a major need for scientists who want to openly share their data, and would allow citizen scientists access to an unprecedented amount of data to explore. For example, two planned datasets are all 120 terabytes of Hubble Space Telescope data and the images from the Archimedes Palimpsest, the 10th century manuscript that inspired the Google dataset storage project.
The challenge would be in the ways in which Google is able to represent the data to the public. Also, the Trendanalyzer acquisition would come really handy here. The data source is open to the public which means that additions can be made to it as well.
There is also a presentation available at SearchEngineJournal on the talk delivered last May. And if you have huge datasets that just won’t get uploaded, Google is providing 3TB disk arrays for shipping them whole file system for the dataset.
-----
Talis Connected Commons
http://scilib.typepad.com/science_library_pad/
Talis Connected Commons is about fostering the Linked Data community, by providing a rich hosting service:
For qualifying data sets, Talis will provide, through the Talis Platform:
• Free hosting of up to 50 million RDF triples and 10Gb of content
• Access to data access services that operate on that data, including data retrieval and text search
• Free access to a public SPARQL endpoint for each dataset.
I asked Leigh how this fits with the Talis Project Xiphos initiative, and he explained that Xiphos is a more focussed initiative around "data in the education, library and publishing sectors", whereas Connected Commons is about any kind of data.
Talis, like Amazon, understands that a modern business is about fostering an ecosystem, a combination of shared data and services that can be used as a platform for software development and business development.
------

Amazon Public Datasets service

http://aws.amazon.com/publicdatasets/
Public Data Sets on AWS provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. AWS is hosting the public data sets at no charge for the community, and like all AWS services, users pay only for the compute and storage they use for their own applications.
Previously, large data sets such as the mapping of the Human Genome and the US Census data required hours or days to locate, download, customize, and analyze. Now, anyone can access these data sets from their Amazon Elastic Compute Cloud (Amazon EC2) instances and start computing on the data within minutes. Users can also leverage the entire AWS ecosystem and easily collaborate with other AWS users. For example, users can produce or use prebuilt server images with tools and applications to analyze the data sets. Users can also discuss best practices and solutions in the dedicated Public Data Sets forum.
By hosting this important and useful data with cost-efficient services such as Amazon EC2, AWS hopes to provide researchers across a variety of disciplines and industries with tools to enable more innovation, more quickly.

Thursday, March 19, 2009

Citizen Science - how you can make a contribution to study of climate change

[From Climate Progress - “the indispensable blog” — Tom Friedman, New York Times –BSA]

http://climateprogress.org/2009/03/19/so-you-want-to-be-a-citizen-scientist/

[..]
Online social networking is no longer just about tagging a picture of your dog on Facebook or announcing to the world what you’re having for dinner on Twitter. Scientific institutions worldwide are beginning to harness the power of online social networking for scientific research. Online communities are an ideal vehicle for matching professional scientists with armies of enthusiastic amateurs. This corps of citizen scientists has the capacity to capture far more data over a vastly expanded geographical spectrum than professional scientists can on their own.
The USA National Phenology Network is one organization that is reaching out to citizen scientists via the Internet. People have used phenology, the study of the timing of lifecycle events of plants and animals, to detect the signs of spring since the early 18th century. The rising threat posed by global warming has spurred scientists to put phenology to another use: to detect the signs of climate change.
Plants and animals are very sensitive to even the smallest changes in their climates. Shifts in the timing of their lifecycle events can therefore be an important indicator in the study of climate change and its effects. Slight changes can have huge repercussions; mutual relationships between species and even entire systems can begin to fall apart.
USA-NPN is asking people across the country to record the phenology of their local flora and then report it online. Amateur hikers and photographers can also participate in NPN’s Project Budburst. They are asked to identify the phenological stage of the flowers and plants they see using information provided by the project’s website. The participants record the location, longitude, and latitude of what they observe. Eventually, Project Budburst will use this information to include real-time mapping with Google maps.
Relying on anonymous volunteers to collect data that will be entered into important scientific databases certainly raises questions about the reliability of the information gathered. Yet it turns out that most of the data is remarkably accurate, and researchers do perform checks on anomalous data. What’s more, the large pool of samples collected by a large group of volunteers diminishes the impact of any faulty data.
This creative new use for social networking also answers critics’ accusations about the frivolity of Facebook, Twitter, and other sites with proof that online networking has the potential to mobilize users to actively participate in innovative programs. Jack Weltzin, executive director of NPN, has said that in the future NPN hopes to make it possible for people to submit their findings via Twitter. NPN, a nonprofit organization, also hopes that iPhone and Facebook applications might be created to more easily facilitate volunteer participation.
Climate change scientists are not the only members of the scientific profession to tap into the potential of these online communities. In addition to tracking climate change, the information participants collect can help scientists predict wildfires and pollen production and monitor droughts as well as detect and control invasive species. Other online projects, such as “The Great World Wide Star Count,” rely on volunteer participation to gauge the level of light pollution across the globe. Several websites are also dedicated to tracking the migratory and breeding patterns of animals such as birds, frogs, and butterflies. All of these observations will augment the databases available to scientists attempting to understand annual fluctuations.

Wednesday, January 14, 2009

Science 2.0: New online tools may revolutionize research

[Excellent article on how Web 2.0 tools are transforming science. The 2 projects mentioned have been funded by CANARIE in the latest NEP program amongst a total of 11 similar projects . For more examples of how web 2.0 is revolutionizing science please see my Citizen Science Blog. Thanks to Richard Ackerman for some of the FriendFeed pointers. Some excerpts from CBC website– BSA]

http://www.cbc.ca/technology/story/2009/01/08/f-tech-research.html

Citizen Science
http://citizen-science.blogspot.com/

CANARIE NEP program
http://www.canarie.ca/funding/nep/eoi.html

Described as an extension of the internet under the ocean, the Venus Coastal Observatory off Canada's west coast provides oceanographers with a continuous stream of undersea data once accessible only through costly marine expeditions. When its sister facility Neptune Canada launches next summer, the observatories' eight nodes will provide ocean scientists with an unprecedented wealth of information.
Sifting through all that data, however, can be quite a task. So the observatories, with the help of CANARIE Inc., operator of Canada's advanced research network, are developing a set of tools they call Oceans 2.0 to simplify access to the data and help researchers work with it in new ways. Some of their ideas look a lot like such popular consumer websites as Facebook, Flickr, Wikipedia and Digg.
And they're not alone. This set of online interaction technologies called Web 2.0 is finding its way into the scientific community.
Michael Nielsen, a Waterloo, Ont., physicist who is working on a book on the future of science, says online tools could change science to an extent that hasn't happened since the late 17th century, when scientists started publishing their research in scientific journals.
One way to manage the data boom will involve tagging data, much as users of websites like Flickr tag images or readers of blogs and web pages can "Digg" articles they approve. On Oceans 2.0, researchers might attach tags to images or video streams from undersea cameras, identifying sightings of little-known organisms or examples of rare phenomena.
The Canadian Space Science Data Portal (CSSDP), based at the University of Alberta, is also working on online collaboration tools. Robert Rankin, a University of Alberta physics professor and CSSDP principal investigator, foresees scientists attaching tags to specific data items containing occurrences of a particular process or phenomenon in which researchers are interested.
"You've essentially got a database that has been developed using this tagging process," he says.
If data tagging is analogous to Flickr or Digg, other initiatives look a bit like Facebook.
Pirenne envisions Oceans 2.0 including a Facebook-like social networking site where researchers could create profiles showing what sort of work they do and what expertise they have. When a scientist is working on a project and needs specific expertise — experience in data mining and statistical analysis of oceanographic data, for example — he or she could turn to this facility to find likely collaborators.
"It's a really exciting time," Lok says, "a really active time for Science 2.0."

it got lots of buzz on FriendFeed, there are multiple mentions of it

http://friendfeed.com/e/b2dc0a15-e076-4d2b-8771-d0e37733077e/Science-2-0-on-CBC/

http://friendfeed.com/e/7649d8b6-9f28-424e-9344-875bf2abfc25/Several-conference-attendees-are-quoted-in-this/

(The conference Eva's referring to is Science Online 2009.)

http://friendfeed.com/e/dcfc7f91-82e9-5aac-c7a4-b2cfea8f6b40/Science-2-0-New-online-tools-may-revolutionize/

http://friendfeed.com/e/333a5973-3aab-a3f7-0b84-993e20b94ce4/Science-2-0-New-online-tools-may-revolutionize/

http://friendfeed.com/e/c1824ca1-93d5-452a-b404-199b5d8e04d3/Nature-Network-in-the-news-Expression-Patterns/

http://friendfeed.com/e/9a2d1d68-fb76-c5a6-c43e-3909d7bebec4/Science-2-0-article-quotes-four-ScienceOnline-09/

http://friendfeed.com/e/556befb8-bfa0-4c3e-8c7a-63b94243bf5e/Science-2-0-article-quotes-four-ScienceOnline-09/

http://friendfeed.com/e/03dca000-9f33-849c-b40a-b22178339428/CBCnews-Article-on-Science2-0/

Tuesday, January 6, 2009

e-science, virtual organisations, data curation

[Thanks to Richard Ackerman for this pointer, and I agree with his assessment there are lots of interesting presentations here –BSA]

A lot of interesting presentations here

Proceedings from the ARL/CNI Fall Forum
October 16-17, 2008
Arlington, Virginia
http://www.arl.org/resources/pubs/fallforumproceedings/forum08proceedings.shtml

E-Science: Trends, Transformations & Responses
E-Science: Trends, Transformations & Responses
Chris Greer, Director, National Coordination Office (NCO) for the multiagency Federal Networking and Information Technology Research and Development (NITRD) Program
Audio [MP3 22 min.] | Slides [PPS 24.7 MB]
A Case Study in E-Science: Building Ecological Informatics Solutions for Multi-Decadal Research
William Michener, Research Professor (Biology) and Associate Director, Long-Term Ecological Research Network Office, University of New Mexico
Audio [MP3 26 min.] | Slides [PPS 8 MB]
Making a Quantum Leap to eResearch Support
Rick Luce, Vice Provost and Director of University Libraries, Emory University Libraries
Audio [MP3 19 min.] | Slides [PDF 2.5 MB]
Data Curation: Issues and Challenges
Transition or Transform? Repositioning the Library for the Petabyte Era
Liz Lyon, Director, UKOLN
Audio [MP3 23 min.] | Slides [PPS 10.3 MB]
Research and Data
Fran Berman, Director of the San Diego Supercomputer Center, UC San Diego, and Co-chair Blue Ribbon Task Force on Sustainable Digital Preservation and Access
Audio [MP3 44 min.] | Slides [PPS 5.5 MB]
Data Curation Issues and Challenges
Sayeed Choudhury, Associate Dean of University Libraries and Hodson Director of the Digital Research and Curation Center, Johns Hopkins University
Audio [MP3 20 min.] | Slides [PPS 424 KB]
Data Curation Panel
Pam Bjornson, Director-General, Canada Institute for Scientific and Technical Information
Audio [MP3 6 min.] | Slides [PPS 1.9 MB]
Supporting Virtual Organizations
The Coming Age of Virtual Organizations: The Early History and Future of Geographically Distributed Collaboration
Thomas A. Finholt, Director, Collaboratory for Research on Electronic Work (CREW) and Research Professor & Associate Dean for Research and Innovation, School of Information, University of Michigan
Audio [MP3 24 min.] | Slides [PPS 7.3 MB]
Cyberinfrastructure for Discovery, Learning, and Engagement:

The nanoHUB Experience
Mark Lundstrom, Don and Carol Scifres Distinguished Professor, Director, Network for Computational Nanotechnology, School of Electrical and Computer Engineering, Purdue University
Audio [MP3 22 min.] | Slides [PPS 3.5 MB]
Reactor Panel: Supporting the Virtual Organization: A Role for Libraries?
Medha Devare, Life Sciences and Bioinformatics Librarian, Mann Library, Cornell University
Audio [MP3 20 min.] | Slides [PPS 3.2 MB]
Reactor Panel: Supporting the Virtual Organization
D. Scott Brandt, Associate Dean for Research, Purdue University Library
Audio [MP3 7 min.] | Slides [PPS 4.3 MB]

Tuesday, December 16, 2008

Web 2.0 and social networking for scientists

[Excellent slide deck on various web 2.0 tools available for scientists and researchers, and the impact they are having on research. Thanks to Richard Ackerman for this pointer-BSA]

http://scilib.typepad.com/science_library_pad/2008/12/science-networking-online.html

Tuesday, September 2, 2008

Many eyes

From the New York Times -- >

Novelties
Lines and Bubbles and Bars, Oh My! New Ways to Sift Data By Anne Eisenberg

PEOPLE share their videos on YouTube and their photos at Flickr. Now they can share more technical types of displays: graphs, charts and other visuals they create to help them analyze data buried in spreadsheets, tables or text.

At an experimental Web site, Many Eyes, (www.many-eyes.com), users can upload the data they want to visualize, then try sophisticated tools to generate interactive displays. These might range from maps of relationships in the New Testament to a display of the comparative frequency of words used in speeches by Senators Hillary Rodham Clinton and Barack Obama.

The site was created by scientists at the Watson Research Center of I.B.M. in Cambridge, Mass., to help people publish and discuss graphics in a group. Those who register at the site can comment on one another's work, perhaps visualizing the same information with different tools and discovering unexpected patterns in the data.

Collaboration like this can be an effective way to spur insight, said Pat Hanrahan, a professor of computer science at Stanford whose research includes scientific visualization. "When analyzing information, no single person knows it all," he said. "When you have a group look at data, you protect against bias. You get more perspectives, and this can lead to more reliable decisions."

The site is the brainchild of Martin Wattenberg and Fernanda B.
Viégas, two I.B.M. researchers at the Cambridge lab. Dr. Wattenberg, a computer scientist and mathematician, says sophisticated visualization tools have historically been the province of professionals in academia, business and government. "We want to bring visualization to a whole new audience," he said — to people who have had relatively few ways to create and discuss such use of data.

"The conversation about the data is as important as the flow of data from the database," he said.

The Many Eyes site, begun in January 2007, offers 16 ways to present data, from stack graphs and bar charts to diagrams that let people map relationships. TreeMaps, showing information in colorful rectangles, are among the popular tools.

Initially, the site offered only analytical tools like graphs for visualizing numerical data. "The interesting thing we noticed was that users kept trying to upload blog posts, and entire books," Dr. Viégas said, so the site added techniques for unstructured text. One tool, called an interleaved tag cloud, lets users compare side by side the relative frequencies of the words in two passages — for instance, President Bush's State of the Union addresses in 2002 and 2003.

Almost all the tools are interactive, allowing users to change parameters, zoom in or out or show more information when the mouse moves over an image, Dr. Wattenberg said.

Users can embed images and links to their visualizations in their Web sites or blogs, just as they can embed YouTube videos. "It's great that people can paste in a YouTube video of cats" on their blogs, Dr.
Viégas said. "So why not a visual that gives you some insight into the sea of data that surrounds us? I might find one thing; someone else, something completely different, and that's where the conversation starts."

Rich Hoeg, a technology manager who lives in New Hope, Minn., and has a blog at econtent.typepad.com, was so taken with the possibilities for group collaboration that he wrote a tutorial on using Many Eyes as part of his series called "NorthStar Nerd Tutorials."

[snip]RSS Feed: