Meta tags and web search

Meta tags provide information to all sorts of clients, such as browsers and search engines. A useful contribution how Google interprets meta tags has been published on the Webmaster Central Blog (Official news on crawling and indexing sites for the Google index). Another contribution provides hints how to create good descriptions.

A script how to create dynamic META descriptions for a WordPress Blog has been published by Darren Pangan on his blog. A useful contribution about the meta description tag has been published by Christopher Heng. A script how to extract and display meta tags is available at stackoverflow website.

The standalone XML-sitemap generator provides an option to display the meta description tags in the custom sitemap list.

ShareThis makes sharing easy!

ShareThis is a free one-step sharing tool that saves you time and makes sharing online hassle free. You can share anything on the web to your choice of social bookmarking options, post-to-profile and blog choices, Email, AIM, or even as a text message to a mobile phone.
The ShareBox is a place where registered ShareThis users’ shared content is stored. If you want to retrieve something you have shared in the past, share it again or would like to tag and organize your shares, this can all be done within the ShareBox. In addition, users can save and tag content directly to their ShareBox from within the widget for easy searching later.

I created today a ShareThis account to include the ShareThis widget to my WordPress websites.

class = “notranslate”

Google Translate Widget

Google Translate Widget

If you use the Google Translate widget on your webpages, sometimes there may be some content that you don’t want to translate. You can add class=”notranslate” to any HTML element to prevent that element from being translated. If you have an entire page that should not be translated, you can add:

<meta name=”google” value=”notranslate” />

Prevent caching of dynamically added iframes

I wanted to insert an iframe in a webpage, chosen at random among the files panel_1.html, panel_2.html, panel_3.html.

I first tried to setup the iframe with JavaScript at onload with innerHTML or with document.write in a div container called “content”. An example is shown herafter:

var i = 1 + Math.round(Math.random()*(2));

document.getElementById(‘content’).innerHTML='<iframe src=”panel_’ +i +’.html” width=”100%” height=”100%”></iframe>’;

These solutions worked as expected in the Google Chrome browser, but there were problems either with the Firefox browser or with the IE browser due to caching problems. The iframe was not refreshed despite changing the src data in the iframe tag.

Most proposals found on the web with a Google search did’nt solve the problem on all browsers. By setting the src of the Iframe instead of changing the innerHTML of its container as shown hereafter the iframes are refreshed in the 3 browser’s IE, Firefox and Chrome.

<iframe id=”mypanel” src=”about:blank” width=”100%”
height=”100%” ></iframe>
<script type=”text/javascript”>
var i = 1 + Math.round(Math.random()*(2));
document.getElementById(‘mypanel’).src=’panel_’+i+’.html’
</script>

I would like to add that by using the <object> tag insted of the <iframe> tag, cross-browser problems are even worse.

HTML client-side image maps

Last update : June 24, 2014

You can create different clickable areas within an image using the usemap attribute together with the map tag. The usemap attribute of the img tag points to an map tag containing a description of the clickable areas that the map is divided into. The map tag has a id attribute to define its name, and also declares which areas are clickable by using the area tags.

An example of HTML code is shown below :

<img src="image.jpg"
	width="150"
	height="70"
	usemap="#mymap">
<map id="mymap">
	<area shape="rect"
		   href="start.html"
		   coords="0,0,48,28"
                   alt="blablabla" />

	<area shape="circle"
		   href="previous.html"
		   coords="50,30,10"
                   alt="blablabla" />

	<area shape="poly"
		   href="next.html"
		   coords="60,50,80,30,100,40,50,100"
                   alt="blablabla" />
</map>

The coordinates for the different shapes are :

  • rect (rectangle) : top left x, y ; bottom right x, y
  • circle : center x, y ; radius
  • poly (polygon) : list of x, y coordinates, separated by commas

In XHTML 1.0, the “usemap” attribute in the “img” element is defined as being a URI. This means that # (pound sign) can be used, as it specifies a link to a ‘fragment identifier’ in the current page. For the “map” element, the “id” attribute is required. In XHTML 1.1, the “usemap” attribute for the “map” element is an IDREF rather than a URI. This means that this value must be the same value as the “id” attribute of the other element.

The problem is that the valid code does’nt work in all browsers. If you want an image map that functions, you must write invalid XHTML 1.1 code. More infos about this topic are available at the Randsco website.

The name tag is required besides the id tag for the Firefox and Chrome browser’s. To embed a usemap in another HTML page, the object tag returns a wrong URL with embedded links. A workaround is to use iframes which work in most browsers.

ISMAP is part of the old-fashioned way of making image maps which originally required a server-side file as well as the HTML in the page.

We can add events (that can call a javascript) to the tags inside the image map. The tag supports the onClick, onDblClick, onMouseDown, onMouseUp, onMouseOver, onMouseMove, onMouseOut, onKeyPress, onKeyDown, onKeyUp, onFocus, and onBlur events.

A code example is shown hereafter :

<area shape =”circle” coords =”90,58,3″
onMouseOver=”writeText(‘The mouse is inside the circle.’)”
href =”about.html” target =”_blank” alt=”Blablabla” />

A useful program to set up image maps is Mapedit 4.0 5 from Boutell.com.

Unobtrusive JavaScript

The following HTML code shows the basics of modern javascript programming :

<html>
<head>
<script type=”text/javascript” src=”helloworld.js”></script>
</head>
<body>
<p id=”hello”></p>
</body>
</html>

The concept is to keep the javascript separate from the HTML of the web page and to just use a script tag in the head section of the webpage to link the two together. An id tag is used to identify the part of the body of the web page that should be updated with javascript.

The javascript code is shown below :

function sayHello() {
document.getElementById(‘hello’).innerHTML = ‘Hello World’;
}
window.onload = sayHello;

In the past the function document.write() was used for this purpose, but this is “bad form” and not valid xhtml code. To adhere to strict xhtml code, then it would be better to create elements using document.createElement and element.appendChild (and other native DOM node manipulation functions). For inserting text into an existing element, innerHTML is still acceptable code (and completely cross-browser). The “all DOM” way requires a lot of extra code, is slower, harder to maintain than simpler method and it needs more bandwidth.

“alt” and “title” attributes for images

The alt attribute is designed to be an alternative text description for images. alt text displays before the image is loaded. It is a required element for images and can only be used for image tags.

In contrast, the title attribute can be used for any page element, but it isn’t required for any page element. Many search engine ranking algorithms read the text in title attributes as regular page content.

In the past,  the alt text description was often used as a sort of pop-up tooltip on images. That was because the title attribute wasn’t widely supported. Today most browsers interpret the W3C specifications rather strictly – alt text no longer “pops up” when you run your mouse over an image. You have to use a title attribute for that.

Destroy, delete, unlink files : access denied

Sometimes there are files on a hosted server which can not be deleted by FTP even if you are logged in as the domain administrator. This happens for example when you upload a file using PHP. These files will be marked as being owned by the Apache (or Fast-CGI) user. So when you log into the FTP or the Plesk or Confixx desktop with your own user id, you can’t delete or change the files PHP has uploaded as they are owned by another user.

The only way to delete or change these files is to do it by a PHP script. In PHP you delete files by calling the unlink function.

To delete the file test.txt we simply run a PHP script that is located in the same directory.

<?php
$myFile = “test.txt”;
unlink($myFile);
?>

I had this problem a few days ago when I tried to delete an old wordpress installation on a server and it was not possible to delete the uploads folder cretaed by the wordpress application.

Speak to the web robots

Last update : July 2, 2013
Web Robots (also called crawlers, wanderers or spiders) are programs that traverse many pages in the World Wide Web by recursively retrieving linked pages. For various reasons web robots are not always welcome to access certain web pages.

web robots

Googlebot

A simple method used to exclude web robots from a server is to create a file on the server which specifies an access policy for robots. This file must be accessible via HTTP on the local URL “/robots.txt”. The contents of this file uses two records: user-agent and disallow.

It is not an official standard backed by a standards body, or owned by any commercial organisation. It is not enforced by anybody, and there no guarantee that all current and future web robots will use it. Consider it a common facility the majority of web robot authors offer the WWW community to protect web servers against unwanted accesses by their robots.

The latest version of the robots.txt document can be found on http://www.robotstxt.org/orig.html.

Another way to tell web robots what to do is the use of meta tags index and follow. Informations about these meta tags are available at the metatags.info website.

Examples :
index the whole website
<meta name="robots" content="index, follow" />
index the current page and stop there
<meta name="robots" content="index, nofollow" />
ignore the current page, but crawl the other web pages
<meta name="robots" content="noindex, follow" />
ignore the whole website
<meta name="robots" content="noindex, nofollow" />

There are more robot meta tags. Sometimes search engines uses descriptions from the ODP (Open Directory Project) as the title and snippet for a web result. The tag noodp lets you opt out of the ODP title and description. The tag noydir does the same for the Yahoo directory. The tag noarchive prevents serach engines from showing the cached link for a page. The tag nosnippet prevents a snippet from being shown in the search results. The tag noimageindex lets you specify that you do not want your page to appear as the referring page for an image that appears in Google search results.