General talk

A place for all random thoughts and ideas that come across my mind and I can’t find the right “category” to put in…

Image formats review

1

For all of you designers out there that don't understand exactly what is the difference in "lay" terms of the most common web image formats I will go on with a little explanation and examples for you:
The background is a PNG image that is always the same, and the images are set over it

PNG

This is a relatively new file format, it stands for Portable Network Graphics, and it's primary characteristic is that it has lossless data compression, meaning that it won't lose any quality whatsoever.
And the other greatest thing about it is that it has full alpha support, meaning that it can have different opacities and transparencies.
Here are below some examples of PNG images with different backgrounds:

Dice in PNG format
Dice in PNG format
Dice in PNG format
Dice in PNG format

The size of the PNG file on top (100x50px in a row of 4) is of 12,4Kb
As you can see it can be perfectly seen even though the background is changed (stripes, solid color, alpha transparencies bg, and transparent bg)

GIF

GIF image format is probably the most common one in the web, or at least it has been for the past years. It stands for Graphics Interchange Format, and it has 3 characteristics that make it unique and sometimes useful:

  • It has lossless data compression (Pretty much like PNG)
  • It supports animations, although they increase a lot file size
  • It supports transparencies, although it doesn't support alpha opacities.

Here is the same demonstration as before, but now with GIF graphics on top:

Dice in GIF format
Dice in GIF format
Dice in GIF format
Dice in GIF format

In order to look good GIF images with transparency often use a surrounding border, that should match the background color. As you see in the example the image looks just great when displayed over white, but when displayed over other types of background colors the transparency looks awful.

A great advantage though is that this GIF image weights 4,04Kb, against the 12Kb of the PNG file...

JPEG

JPEG stands for "Joint Photographic Experts Group" and it is not the actual name of the format, it is the name of the comitee that created it, but since everyone decided to call it this way, lets stick to it ;)

JPEG offers some characteristics that are worth looking at when designing for web, it uses lossy compression, which basically means that when saved it will lose quality.
Depending on its use this may not be bad. For example in this site's design I've used JPEG files for backgrounds, since the quality loss contributes to the "rustic" effect, but it should never be used for logos or sharp things.

The good use for JPEG is for large photographs, big backgrounds, or other large images.
Here is an example of the dice image, using different compression quality:
Since there is no transparency the background will be seen always as white...

High quality:

Dice in JPEG format (High quality)
Dice in JPEG format (High quality)
Dice in JPEG format (High quality)
Dice in JPEG format (High quality)

Medium quality

Dice in JPEG format (Medium quality)
Dice in JPEG format (Medium quality)
Dice in JPEG format (Medium quality)
Dice in JPEG format (Medium quality)

Low quality

Dice in JPEG format (Low quality)
Dice in JPEG format (Low quality)
Dice in JPEG format (Low quality)
Dice in JPEG format (Low quality)

File sizes were:

  • High quality: 31,2Kb
  • Medium quality: 26,2Kb
  • Low quality: 25,0Kb

Conclusion

When designing you should try to use GIFs as many times as possible, since they provide the smallest file sizes, and better quality than most JPEGs. In the cases where you need alpha transparency you should consider changing to PNGs, but avoid their use, they'll increase a lot page load time...

Have fun designing ;)

Best free hosting site and forum

1

After having been a member of the frihost.com forum for more than two years, and having posted more than 500 posts I can say and justify that it is absolutely the best web development forum out there.
First of all you'll be given the opportunity to get your own free web hosting with them, the requirements are 10 quality posts and being active in the forums.
The web hosting features are:

  • 250 Mb of space for your entire collection of websites.
  • 10 GB of traffic each and every month.
  • 1 short free subdomain to be reachable in the whole world (you.frih.net).
  • Php and Perl scripting languages to fulfill all your programming needs.
  • The DirectAdmin control panel to manage your web hosting account.
  • No forced advertisements. Your own ads like google adsense are allowed.
  • Unlimited MySQL databases for all your data.
  • Unlimited subdomains and ftp accounts of all your domains.
  • Unlimited email accounts for your correspondence.
  • Unlimited parked and addon domains for your whole portfolio of websites.

Now only this would have been enough for most web developers who are in need for a wed host. In addition to this there is the frihost forum community, which is the main reason they are able to give out all those goodies ;)

There are over 500000 active members (inactive ones are deleted after a set period of time), at the moment of writing there were 733858 articles in the forum, most of which are about web development and gaming... :D

A question posted in the webdev categories in the forum usually takes one day to be answered. If you have a code that doesn't work, you could post the whole thing there and I assure you that in less than two days someone has read your whole code, found the error and optimized it to work best...

Seriously, if I'm making this review is only because I feel in complete debt with that forum, which has always helped me with programming problems along the way, some members have given me their feedback on this blog's design, and I of course have tried to always answer any question posted on the forum, and today I am sharing with the whole world this great site.

http://www.frihost.com
Enjoy,
Alex

Redesign is complete now!

2

This is an official announcement!
The site's redesign has been completed successfully, although in the emotive rush that comes after an accomplished job I didn't test the new design thoroughly so there might be errors that will indeed be corrected as soon as they are found.

I'm not even going to take a look at IE's way of displaying it. I simply don't care. If it looks bad to you because you run IE, get Firefox. This website has a perfect valid html markup and CSS, checked with the w3c validators, so I just don't care about IE!
Wow, saying that feels very good! It took a lot of worrying and re-programming out of my way, which is always nice :)

Anyway, I really hope you enjoy this new design, and if you don't let me know what you don't like, I'll try to change it!
(Even if you like it consider commenting something like "Hey, nice design, I like it"... it just feels good from time to time...

I'll restart posting as usual from now on, so don't stop coming!

First sponsorship in my blog

0


Personally it is quite a good idea to pay for the bandwidth and space online, since they pay users who make a small review of certain products related to the site's topic from time to time.
You chose the topics and the products and you write the review yourself. It is quite a good startup for all bloggers I guess...
So well, that will be it for my announcement. I am going to try out this marketing program and I'll tell you my experience and opinion after some weeks of testing.
One thing I would recommend though is to have at least 1 PR in Google. As you can see this blog has none, it is still too early for Google to give me some PR, but it will really help you get amazing payments for your reviews, and I'm talking of a PR above 3 to be "eligible" for those campaigns. But then also comes the question: Should I give up my blogging space for ads? and I guess it is not so bad. You can always post it right before you post a real thing so it never really appears on your homepage if you want and have set it to that option...
So well, I'll tell you how it went after a while!
Have fun developing ;)

Writing a good robots.txt file | SEO Tips

3

Search Bots, crawl each URL and the first thing they search on an URL root is the robots.txt file. So if we make our robots.txt file, we can change the Search Bots' behaviours, and we can tell them where to search and publish and where to not. Imagine we have privacy folders in our website, for example if we have folder or a file containing e-mail addresses we don't want published, we can avoid Search Engine robots' visits using a few simple commands on the robots.txt file. Here we go:

Introduction

We use the /robots.txt file to give instructions about our site to web robots; this is called The Robots Exclusion Protocol.

Simply, the robots.txt is a very simple text file that is placed on our root directory. For example http://urbanoalvarez.es/robots.txt. This file tells search engines and other robots which areas of our site they are allowed to visit and index.
The only thing you must take into account is that ONLY one robots.txt file is allowed on our site and ONLY in the root directory (where our home page is)

TRUE: http://urbanoalvarez.es/robots.txt (Works)

FALSE: http://urbanoalvarez.es/images/robots.txt (Doesn't work)

All major search engine spiders respect this file, but unfortunately most spambots (email collectors, harvesters) do not. If you want security on your site or if you have files or contents to hide, you have to actually put the files in a protected directory, you can't trust the robots.txt file only.

Setting up our file

So what programs we need to create it? Just the good ol'notebook or any text editor program. All we need to do is to create a new text file, and rename it! Attention, the name has to be "robots.txt", cannot be "robot.txt" or "Robot.txt" or "robots.TXT".
Simple, no Caps and robots!

Writing the rules

Now that we are starting to write in it, a simple robots.txt looks like this.

User-agent: *
Disallow:

The "User-agent: *" means this section applies to all robots, the wildcard "*" means all bots. The "Disallow: " tells the robots that they can go anywhere they want.

User-agent: *
Disallow: /

A wildcard "*" is used in this one too, so all bots must read this. But in this one, there is a little difference, a slash "/" in the Disallow line, which means "don't allow anything to be crwaled", so the bots won't crawl you website, the good ones of course ;)

If we want all the bots to read this text file, we should insert a "wildcard (*)" in the User-agent line. And when we leave the Disallow: line blank, it means come crawl my site you bots!, and when there is a slash it means keep out! Simple. This is the simplest way, now we can learn how to keep some bots crawling and some not.

Advanced rules

The User-agent line is the part we are going to work on to define the bot's identity and behavior. For example if we want the google bot to crawl the site but the yahoo bot not, how will our text file look?

User-agent: googlebot
Disallow:

User-agent: yahoo-slurp
Disallow: /

In this sample, we called the googlebot and left the disallow line blank so we said crawl my website. And in the second line we called the yahoo bot but in the disallow line we have a slash so we wanted it to go away.

Now we are going to learn how to avoid some folders of our site getting searched by search engine spiders and how to get some folders to be searched at the same time. For this, we will change the values in the disallow line. For example we have two folders in our domain, /images, and /emails. We want /images to be searched but /emails not. Then the text file would look like:

User-agent: *
Disallow: /emails/

As we can see, we called all the robots to read this, and we don't want the /emails folder to be seen, we excluded it but the rest of the website can be crawled by the robots.

Common samples

Here are few samples to make it clearer.
To exclude all folders from all the bots:

User-agent: *
Disallow: /

To exclude any folder from all the bots:

User-agent: *
Disallow: /emails/

To exclude all folders from a bot:

User-agent: googlebot
Disallow: /
User-agent: *
Disallow:

To allow just one bot to crawl the site:

User-agent: googlebot
Disallow:
User-agent: *
Disallow: /

To allow all the bots to see the all folders:

User-agent: *
Disallow:

Important tips

After learning these, I believe you guys got it. Now there are a few rules that we should know. We can't use a wild card "*" in the Disallow line, bots don't read it then ( Google and MSNbot can). so a line like "Disallow: /emails/*.htm" is not a valid line for most bots. Another rule is, you have to make new user-agent and disallow lines for each specific bots, and you have to make a new disallow line for each directory that you want to exclude. "user-agent: googlebot, yahoobot" and "disallow: /emails, /images" are not valid.

Robots can ignore your /robots.txt. Especially malware or spam harvesting robots that scan the web for security vulnerabilities and email addresses will pay no attention.
The /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use. So don't try to use /robots.txt to hide information.

Is it possible to allow just one file or folder or directory to be crawled and the rest not? Simply there is no allow line in robots.txt, but mentally yea that can be done. How? You can insert all the files that you don't want to be seen in a folder and disallow it.
For example, "Disallow: /files_that_I_dont_want_to_share/ "

Major robots

Major Known Spiders

Googlebot (Google), Googlebot-Image (Google Image Search), MSNBot (MSN), Slurp (Yahoo), Yahoo-Blogs, Mozilla/2.0 (compatible; Ask Jeeves/Teoma), Gigabot (Gigablast), Scrubby (Scrub The Web), Robozilla (DMOZ)

Google

Google allows the use of asterisks. Disallow patterns may include "*" to match any sequence of characters, and patterns may end in "$" to indicate the end of a name. To remove all files of a specific file type (for example, to include .jpg but not .gif images), you'd use the following robots.txt entry:

User-agent: Googlebot-Image
Disallow: /*.gif$

Yahoo

Yahoo also has a few specific commands, including the:

Crawl-delay: xx instruction, where "xx" is the minimum delay in seconds between successive crawler accesses. Yahoo's default crawl-delay value is 1 second. If the crawler rate is a problem for your server, you can set the delay up to up to 5 or 20 or a comfortable value for your server.

Setting a crawl-delay of 20 seconds for Yahoo-Blogs/v3.9 would look something like:

User-agent: Yahoo-Blogs/v3.9
Crawl-delay: 20

Ask / Teoma

Supports the crawl-delay command.

MSN Search

Supports the crawl-delay command. Also allows wildcard behavior

User-agent: msnbot
Disallow: /*.[file extension]$

(the "$" is required, in order to declare the end of the file)

Examples:

User-agent: msnbot
Disallow: /*.PDF$
Disallow: /*.jpeg$
Disallow: /*.exe$

Why do I want a Robots.txt?

There are several reasons you would want to control a robots visit to your site:

  1. It saves your bandwidth - the spider won't visit areas where there is no useful information (your cgi-bin, images, etc)
  2. It gives you a very basic level of protection - although it's not very good security, it will keep people from easily finding stuff you don't want easily accessible via search engines. They actually have to visit your site and go to the directory instead of finding it on Google, MSN, Yahoo or Teoma.
  3. It cleans up your logs - every time a search engine visits your site it requests the robots.txt, which can happen several times a day. If you don't have one it generates a "404 Not Found" error each time. It's hard to wade through all of these to find genuine errors at the end of the month.
  4. It can prevent spam and penalties associated with duplicate content. Lets say you have a high speed and low speed version of your site, or a landing page intended for use with advertising campaigns. If this content duplicates other content on your site you can find yourself in ill-favor with some search engines. You can use the robots.txt file to prevent the content from being indexed, and therefore avoid issues. Some webmasters also use it to exclude "test" or "development" areas of a website that are not ready for public viewing yet.
  5. It's good programming policy. Pros have a robots.txt. Amateurs don't. What group do you want your site to be in? This is more of an ego/image thing than a "real" reason but in competitive areas or when applying for a job can make a difference. Some employers may consider not hiring a webmaster who didn't know how to use one, on the assumption that they may not to know other, more critical things, as well. Many feel it's sloppy and unprofessional not to use one.

So, as a web site owner you need to put it in the right place on your web server for that resulting URL to work. Usually that is the same place where you put your web site's main "index.html" welcome page. Where exactly that is, and how to put the file there, depends on your web server software.

Remember to use all lower case for the filename: "robots.txt", not "Robots.TXT.

Major Search Engine Bots - Spiders Names

Google = googlebot
MSN Search = msnbot
Yahoo = yahoo-slurp
Ask/Teoma = teoma
GigaBlast = gigabot
Scrub The Web = scrubby
DMOZ Checker = robozilla
Nutch = nutch
Alexa/Wayback = ia_archiver
Baidu = baiduspider

Specific Special Bots:

Google Image = googlebot-image
Yahoo MM = yahoo-mmcrawler
MSN PicSearch = psbot
SingingFish = asterias
Yahoo Blogs = yahoo-blogs/v3.9

Main source of information can be found in this forum post by Paskall.
Feel free to ask any question, or correct my mistakes,
Cheers

Go to Top