How Browsers Work – Part1 the DOM

Following my university course on compilers and OS I decided to propose another hands on project. Instead of coding a web server, i would try to build a VERY simple browser.

As I’m writing my final report I decided to dust off this blog and tell you about some of my findings.

The first consideration: I’ll not do the network part; The whole process will start with local HTML and CSS files.

Second consideration:  I’ll write it all in Rust, a language I’ve been falling in love with since I finished my internship at Mozilla. So, any code reviews are welcomed.

Third consideration: I’m trying to take the common parts of WebKit, Blink, Gecko and Servo into account. I won’t dive into any specifics of any of this engines.

Now, let’s get started.

How to construct the Object Model?

Basically, a well formed* HTML file goes though 4 steps till we get a complete DOM tree:

1- Conversion: Browser reads raw bytes of HTML off the disk, turns them into chars based on specific encoding on the file. Note: this part is going to be simplified on my toy engine, just laying out the truth to y’all.

2 – Tokenizing: Converts strings of chars to tokens. Ex: “<html>”, “<body>”, “<p>”.

3 – Lexing: The aforementioned tokens are converted into objects. These objects hold their properties and rules. Again, my version is gonna be waaay too simple.

4 – DOM construction: HTML markup defines relationships between different tags, so the created objects get linked in a tree so we can explicit the hierarchy relationship between them.

My data structure is going to be something like**:

pub type AttrMap = HashMap<String, String>;

pub struct Node {
// comum a todos os nos
pub children: Vec<Node>,

// especifico de cada tipo de nó
pub node_type: NodeType,
}
pub enum NodeType {
Element(ElementData),
Text(String),
}
pub struct ElementData {
pub tag_name: String,
pub attributes: AttrMap,
}

After these 4 steps the output is going to be a marvellous DOM tree.

*well formed because in real life we have some rules in place to display even messed up HTML files.

**How to add code snippets to wordpress?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s