Tree of Life (ToL) Web Project

The Tree of Life Web Project is an ongoing Internet project providing information about the diversity and evolutionary relationships of life on Earth. This collaborative peer reviewed project began in 1995, and is written by biologists from around the world. The goal of the Tree of Life Web Project is to contain a page with pictures, text, and other information for every species and for each group of organisms, living or extinct. Connections between Tree of Life web pages follow phylogenetic branching patterns between groups of organisms, so visitors can browse the hierarchy of life and learn about phylogeny and evolution as well as the characteristics of individual groups.

Sample web page of the Tree of Life

Sample web page of the Tree of Life Web Project

Links to websites with similar projects are provided in the following list :

N-gram databases & N-gram viewers

Last update : May 13, 2013

An N-gram is a contiguous sequence of n items from a given sequence, collected from a text or speech corpus. An N-gram could be any combination of letters, phonemes, syllables, words or base pairs, according to the application.

An N-gram of size 1 is referred to as a unigram, size 2 is a bigram, size 3 is a trigram. Larger sizes are referred to by the value of N (four-gram, five-gram, …). N-gram models are widely used in statistical natural language processing. In speech recognition, phonemes and sequences of phonemes are modeled using a N-gram distribution.

“All Our N-gram are Belong to You” was the title of a post published in August 2006 by Alex Franz and Thorsten Brants in the Google Research Blog. Google believed that the entire research community should benefit from access to their massive amounts of data collected by scanning books and by analysing the web. The data was distributed by the Linguistics Data Consortium (LDC) of the University of Pennsylvania. Four years later (December 2010), Google unveiled an online tool for analyzing the history of the data digitized as part of the Google Books project (N-Gram Viewer). The appeal of the N-gram Viewer was not only obvious to scholars (professional linguists, historians, and bibliophiles) in the digital humanities, linguistics, and lexicography, but also casual users got pleasure out of generating graphs showing how key words and phrases changed over the past few centuries.

Google Books N-gram Viewer, an addictive tool

Google Books N-gram Viewer, an addictive tool

The version 2 of the N-Gram Viewer was presented in October 2012 by engineering manager Jon Orwant. A detailed description how to use the N-Gram Viewer is available at the Google Books website. The maximum string that can be analyzed is five words long (Five gram). Mathematical operators allow you to add, subtract, multiply, and divide the counts of N-grams. Part-of-speech tags are available for advanced use, for example to distinguish between verbs or nouns of the same word. To make trends more apparent, data can be viewed as a moving average (0 = raw data without smoothing, 3 = default, 50 = maximum). The results are normalized by the number of books published in each year. The data can also be downloaded for further exploration.

N-Gram data is also provided by other institutions. Some sources are indicated hereafter :

Links to further informations about N-grams are provided in the following list :

Compressing Human Knowledge

Marcus Hutter, a German computer scientist and professor at the Australian National University, funded in August 2006 a 50.000 euros cash prize, which rewards data compression improvements on the first 100.000.000 characters of a specific version of English Wikipedia (envik8). Specifically, the prize awards 500 euros for each percent improvement in the compressed size of the file enwik8. The prize baseline was 18,324,887 bytes, achieved by PAQ8F, a free lossless data compression archiver. The contest is open ended and is open to everyone. The ongoing competition is organized by Marcus Hutter, Matt Mahoney and Jim Bowery.

The goal of the Hutter Prize is to encourage research in artificial intelligence (AI). The organizers believe that text compression and AI are equivalent problems. There is no general solution to achieve this goal because the descriptive complexity is not computable. In algorithmic information theory the measure of the computational resources needed to specify an object is called the Kolmogorov complexity, named after Andrey Nikolaevich Kolmogorov, a Soviet mathematician.

EMPATHICA

Last update : August 6, 2013

EMPATHICA is a software program designed at the University of Waterloo, Canada, to help people understand and resolve conflicts. It is based on the hope that increasing empathy between people can help to overcome impasses in disputes in many domains.

EMPATHICA uses the idea of Cognitive-Affective Maps (CAMs) developed by Paul Thagard, Professor of Philosophy and director of Cognitive Sciences at the University of Waterloo, in collaboration with Thomas Homer-Dixon, Scott Findlay, and others. These maps derive from ideas about emotional cognition described in Thagard’s book Hot Thought.

A CAM is a diagram that shows concepts and beliefs along with the emotional values attached to them. It also shows the relationships between concepts that support each other or conflict with each other. CAMs are made up of simple nodes and edges. Nodes can have differing valences based on how a person feels about a concept related to the conflict.

EMPATHICA : Cognitive Affective Maps modeling the international Climate Change debates

Cognitive Affective Map modeling the international Climate Change debates

EMPATHICA handles the following web pages :

  • Conflict Management : open, close, view and edit conflicts
  • Conflict Overview : shows the CAMs associated with the conflict
  • Graph Editor : use manipulation tools to create and edit CAMs
  • Correlate : tie together concepts that are in both maps
  • Compare : shows the points of contention and the points of agreement in a conflict
  • Compromise : shows suggestions for the conflict

The program EMPATHICA was a Fourth-Year Design Project for the Software Engineering class of the University of Waterloo. Today I installed the Windows version of EMPATHICA (released in January 2013) on my PC. Great project !

Animal consciousness

Last update : August 6, 2013

Animal consciousness, or animal awareness, has been actively researched for over 100 years, but there has never been a agreement among scientists wether there is an animal consciousness or not, mainly because the problem of other minds.  As the field of consciousness research evolved and as new techniques and strategies for human and non-human animal research have been developed, the question has been answered last year.

Francis Crick, an English molecular biologist, biophysicist and neuroscientist, co-discovered the structure of the DNA molecule in 1953, together with James Watson. Francis Crick, James Watson and Maurice Wilkins were jointly awarded the 1962 Nobel Prize for Physiology or Medicine for their discoveries.

In 1994, Francis Crick published the book The Astonishing Hypothesis (The Scientific Search for the Soul) about consciousness. In Februrary 2003, Francis Crick and Christof Koch published A framework for consciousness in the Nature Neuroscience magazine. The same year, one year before his death, Francis Crick was one of 21 Nobel Laureates who signed the Humanist Manifesto, published by the American Humanist Association (AHA).

In July 2012 took place the first annual Francis Crick Memorial Conference at the University of Cambridge, UK. The upshot of the meeting was the Cambridge Declaration on Consciousness signed by Christof Koch, David Edelman, Philip Low, Diana Reiss, Bruno van Swinderen and Jaak Panksepp.

Cartoon about animal consciousness drawn by Andrzej Krauze

Cartoon by Andrzej Krauze

The Cambridge Declaration concludes that “non-human animals have the neuroanatomical, neurochemical, and neurophysiological substrates of conscious states along with the capacity to exhibit intentional behaviors. Consequently, the weight of evidence indicates that humans are not unique in possessing the neurological substrates that generate consciousness. Non-human animals, including all mammals and birds, and many other creatures, including octopuses, also possess these neurological substrates.”

The related cartoon about animal consciousness, drawn by the Polish-born, British cartoonist Andrzej Krauze, was published in the NewScientist article Animals are conscious and should be treated as such, edited by Marc Bekoff.

ThinkQuest Library

ThinkQuest is a complete online learning environment for primary and secondary schools that helps students develop important 21st century skills, including communication, critical thinking, and technology skills.

ThinkQuest includes:

  • ThinkQuest Projects: A project environment that supports collaborative learning
  • ThinkQuest Competition: Technology competitions that challenge students to solve real-world problems
  • ThinkQuest Library: The world’s largest online repository of student-developed learning projects. Featuring over 8,000 websites, the ThinkQuest Library is visited by over 30 million web learners each year.
  • ThinkQuest Professional Development: Professional development for educators

Alan Turing and Robert Moog Google Doodles

To celebrate Robert Moog’s 78th Birthday, Google published on May 23, 2012 an interactive doodle of the electronic analog Moog Synthesizer.

Google Moog Doodle

Google Doodle : Moog Synthesizer

The doodle was synthesized from a number of smaller components to form a unique instrument. When experienced with browsers supporting the Web Audio API, the sound is generated natively. For other browsers the Flash plugin is used. The doodle takes advantage of JavaScript, Closure libraries, CSS3 and tools like Google Web Fonts, the Google+ API, the Google URL Shortener and App Engine.

The Moog doodle was created by Google engineers Reinaldo Aguiar and Rui Lopes and the doodle team lead Ryan Germick.

For Alan Turing’s Centennial, Google published one month later (June 23, 2012) an interactive doodle showing a Turing Machine. The doodle was designed by Jered Wierzbicki and Corrie Scalisi, Software Engineers, and by Doodler Sophia Foster-Dimino. The code for this doodle was open sourced and is available at Google Code.

Turing Machine

Google Doodle : Turing Machine

A video about the Art & Technology behind Google Doodles is available at Youtube.

Processing software and projects

Last update : January 21, 2014

Processing Software Logo

Processing Software Logo

Processing is an open source programming language and environment for people who want to create images, animations, and interactions. Since 2001, Processing has promoted software literacy within the visual arts and visual literacy within technology. Initially created to serve as a software sketchbook and to teach computer programming fundamentals within a visual context, Processing evolved into a development tool for professionals.

Processing is an open project initiated by Ben Fry and Casey Reas. It evolved from ideas explored in the Aesthetics and Computation Group at the MIT Media Lab. The current version is 2.1, released on October 27, 2013.

The following websites help to learn processing :

Google text to speech (TTS) with processing

Referring to the post about Google STT, this post is related to Google speech synthesis with processing. Amnon Owed presented in November 2011 processing code snippets to make use of Google’s text-to-speech webservice. The idea was born in the processing forum.

The sketch makes use of the Minim library that comes with Processing. Minim is an audio library that uses the JavaSound API, a bit of Tritonus, and Javazoom’s MP3SPI to provide an easy to use audio library for people developing in the Processing environment. The author of Minim is Damien Di Fede (ddf), a creative coder and composer interested in interactive audio art and music games. In November 2009, Damien was joined by Anderson Mills who proposed and co-developed the UGen Framework for the library.

I use the Minim 2.1.0 beta version with this new UGen Framework. I installed the Minim library in the libraries folder in my sketchbook and deleted the integrated 2.0.2 version in the processing (2.0b8) folder modes/java/libraries.

Today I run succesful trials with the english, french and german Google TTS engine. I am impressed by the results.