Holberton > Articles by: Julien Barbier

We are opening a new Holberton School campus in Paris, France, in September

It’s been a year since we have announced the opening of a special cohort in France, alongside a €1,000,000 scholarship to help people impacted by the economic consequences of the pandemic and resulting unemployment rate.

A lot has happened since then. Our French students are very happy and have finished their first year. Since then, some of them are working, while others are starting their second year of specialization. In the meantime, we have partnered with Actual Group, and we have opened two additional campuses, one in Lille, and one in Laval. This partnership brings a lot of value to Holberton School Students:  while Holberton School will ensure technical excellence amongst the students, Actual Group will help place students on the labor market.

Given the overwhelmingly positive feedback, as well as the lack of highly skilled software engineers in France, Actual Group is announcing, today, that they are opening a new Holberton School campus, in Paris. This will bring the total number of Holberton Schools around the world to 27 in September.

This opening feels very special for several reasons.

First, Paris is one of the major tech hubs in Europe, home to many disruptive startups and entrepreneurs. It is the home of the world’s largest startup campus, STATION F, as well as many incredible universities: the “traditional”, world renowned, Sorbonne University, Science Po, École Normale Supérieure… as well as the “non-traditional” École 42, Epitech and École de Guerre Economique, which I am an alumni of. It is with great enthusiasm that we will join this incredible and growing ecosystem.

Second, the partnership with Actual Group is a big value-add to our students. Given the Actual Group position as one of the French leaders in staffing and recruiting, they will be able to place students very fast, drastically reducing the placement time between graduation and placement, which is crucial to students.

As a result, seeing this strategic partnership grow that fast – three new schools in one year! – makes us really proud of working with Actual Group.

Opening the Paris campus is very special to me. I was born and raised in a southern suburb of Paris, in Val-de-Marne, 94. As a kid, I always dreamed of living in Paris. So, as soon as I graduated and got my first full-time job as a software engineer in 2006, I moved there, sharing an apartment with a friend. To me, moving to Paris was one of the happiest moments in my life, because it was a symbol: I could now access a life that was only a blurry dream before I became a software engineer. And it was at this very moment that I understood that education, if brought to the many, the right way, could change many lives for the better. It planted the seed that grew inside me for many years until I co-founded Holberton in 2015, in order to give more suburban kids and adults the opportunity to get a job that will allow them to unlock a new and better life for themselves and their families. Paris became my symbol, my proof, that I could do anything. It shattered all conscious and unconscious limitations and unlocked my life of entrepreneurship. Few months after moving to Paris, I co-founded my first company and started my journey as an entrepreneur in France, and then in the United States.

Last but not least, Paris is where I met my wife and fell in love.

After 9 years living abroad, in Miami and in Silicon Valley, the Holberton School opening in Paris revives all these memories and makes me both proud and joyful.

Welcome to all new Parisian students, I hope you will enjoy this new Holberton School campus, and I am looking forward to hearing your stories and to see you move to your own versions of Paris when you graduate.

Holberton School to open 8 new schools in Mexico, in partnership with Universidad Anáhuac Mayab

We are thrilled to announce that we have partnered with the Anáhuac university to expand in Mexico. 8 new schools will open in the country within the next 2 years. Mérida, Yucatán, will be the first city to host a new Holberton School campus in September this year. This will bring the total number of Holberton Schools around the world to 26 in September.

With more than 50 years of history, Anahuac University is currently recognized as one of the 3 best universities in Mexico, according to the QS World University Rankings 2020. It has eight campuses around the country and five in the rest of the world. More than 90 thousand alumni attest to the mission of the Universidad Anáhuac: Contribute to the comprehensive training of positive action leaders and promote the development of the person and the society. The Anahuac educational model allows the person’s integral development by embracing the professional, intellectual, human, social, and spiritual dimensions, which is well-aligned with Holberton’s values and mission.

After Mérida, the next Holberton School campus is scheduled to open in January 2022 in Aguascalientes, followed by 6 additional campuses in 2022 and 2023.

Until all the campuses are up and running, and to open the Holberton School’s education to as many students as possible, students from all over Mexico will be able to study online through Anáhuac Online.

In addition, Anáhuac university will offer scholarships to students of the first cohort.

According to PwC, Mexico’s working-age population is expected to grow by almost 10 million in the next decade, considerably boosting the supply of available talent. But at the same time, the country lags far behind advanced economies in terms of overall talent quality. Thanks to this partnership, Anáhuac aims to allow many more students to access in-demand and well-paying jobs and contribute to the economic development of the region by increasing the overall talent quality of Mexico.

We can’t wait to meet the first Holberton School Mérida students!

Holberton School opens 9 new schools in the Middle East.

We are thrilled to announce that we are expanding to the Middle East area! 9 new schools will open in 9 new countries in July this year. This will bring the total number of Holberton Schools around the world to 25.

Our first school in the Middle East opened in Beirut, Lebanon, in March 2020. The Holberton School Beirut has since sparked interest from other countries in the region. Today, we are very excited to announce the opening of Holberton Schools in Jordan, Egypt, Iraq, United Arab Emirates, Saudi Arabia, Qatar, Oman, Bahrain, and Kuwait.

The 9 new locations, as well as Holberton School Lebanon, will be regrouped into a new entity called Holberton School Middle East. It will be operated and managed by StarTechEUS. Alexandre Harkous will serve as Chairman of Holberton School Middle East.

This massive expansion will play a key role in the economic growth in the region.

“There is a high demand and need for qualified tech talent, and a tech education is the key to remaining competitive in a global economy. Our students have exceptional talents, and our goal is to develop those gifts into excellences while supporting their unique needs.” – Alexandre Harkous, Chairman of Holberton School Middle East. 

The First-of-its-kind Tech initiative will allow the region to provide best in class tech curriculum seeking to empower Arab technology-driven students, and giving them the opportunity to grow, thrive, and realize their potential by equipping them with the required skills to compete in the local, regional and international markets.

“We’re focused on developing strategies and programs for young Arabs with tech skills that encourage inclusion and a passion for Technology. We’re very excited about this potential social impact and our Holberton School expansion.” –  Alexandre Harkous, Chairman of Holberton School Middle East.

The 9 new campuses will start online first and will gradually open physical spaces as soon as COVID-19 allows it.

Holberton School Middle East plans to train 10,000 new students in software engineering and computer science by 2025.

To learn more about Holberton School Middle East, you can visit www.holberton-me.com.

Holberton School, WeWork, and the Future of Work

In the past 18 months, the COVID-19 pandemic has accelerated movements that had started many years ago, slowly but steadily. With no other choice than to embrace change, people and companies had to adapt quickly to the new world COVID-19 had created. One of these movements that was happening in the background was the idea that one day, remote work will be the norm because it makes so much sense at so many levels.

In the US, vaccinations have been working well, and fortunately, we are seeing a deep in the number of cases. This is great news for all of us, and hopefully, we will see the same trend soon all over the world. While remote was hard at the beginning for many of us, we had the time to really measure all its advantages. And it is not a surprise to us that we see both companies and employees embracing it.

Even though it is now possible to go back to the office, many companies such as Dropbox have become remote-first companies. At the same time, we see employees quitting instead of coming back to the workplace. Thanks to technology, many of us can now work from anywhere, and who wouldn’t take the opportunity to stop wasting time commuting? It also has allowed people to relocate next to their loved ones, to more sustainable regions, less expensive and/or more welcoming states and cities. In the US, Miami is one of the big winners of this trend.

Not only that but there are a lot of big underrated positives to remote work. For instance, remote work enables disabled people. Remote work enables caregivers.

It also allows people from poor countries to have access to better jobs and, by extension, a better life. Latin American workers are now going to compete with US workers, and the same will happen with African workers competing with European workers. While this might not sound like good news for silicon valley developers, it will allow companies to attract talent on a global scale.

Big VC and big money believe that this is not temporary and that this is actually the beginning of a new era for work. So they have invested hundreds of millions of dollars in remote-work-focused companies. Remote work is here to stay, and companies like Teamflow, Firstbase, Deel, and others are riding that megatrend, solving specific challenges remote work is coming with.

But being remote doesn’t mean workers should be alone at home all week. Actually, one of the many other things that the COVID-19 pandemic highlighted is that we crave for social interaction. We are social animals first; this is what defines us, this is how we survive, how we evolved, and we need to interact with each other. The lack of interaction did bring anxiety and depression to a new high everywhere. People want remote jobs, but with in-person social interactions. They want to move, be able to go as often as they want to an office next to where home is, to recharge their social and energy batteries, to interact, brainstorm, and have fun too.

That is where and why WeWork will make its come back. And it will be a strong comeback. While many are saying the company is dead, we believe that it is actually one of the important pieces of this new remote-work era and economy. It will allow workers to work from wherever they like and have access to a community near them. It will allow companies to offer their employees an office that will be flexible and fully managed by WeWork. And with the flexible office comes the people and the social interaction, 100% on-demand for the employees. It comes with access to a network, in-person friendship… life! Companies which will offer this to their employees will be able to compete to hire the best.

It is in this context that Holberton Schools operator and partner Coderise has partnered with WeWork to offer our students access to WeWork spaces around the world. The alliance will initially benefit students from Colombia and the Netherlands. Eventually, we would like to expand to all other countries where Holberton Schools operate. In Africa, another Holberton partner, Sayna, has also started to work with other co-working spaces in Africa to give access to an office and internet to their students.

This is a big net positive for our students, who will now be able to work from any WeWork location with their peers. Not only that, but they will be directly connected to the current workforce: developers, managers, and companies who are looking for talent will see first-hand what our students are capable of, and we believe that it will help tremendously with placement. This opens the doors to many and deeper opportunities between Holberton School and WeWork.

We believe that remote work is the future. Teams will be more and more remote and distributed. And we believe that remote doesn’t mean alone. People, workers or students, will always need in-person interactions to perform better. Thanks to this partnership, our students will be able to experience this future of work while learning and get used to a flexible and distributed work style, giving them a clear advantage while searching for a job once they graduate.

Last but not least, this partnership will also enable Holberton Schools to open micro-campuses. It will permit us to increase accessibility because many more students will be able to access a campus around the world.

For all the reasons stated above, we are more than excited about this partnership. “Do what you love,” and indeed, we can see that our students are already loving it!


#HolbiesAtWeWork

Holberton doubles down on its Machine Learning program

While developing the tools and programs for education in the 21st century, we constantly challenge ourselves to look into the future of our society as well as the future of the job market. We are now at the beginning of the 4th industrial revolution, which is the convergence of many disruptive technologies. One of them is Machine Learning.

Machine learning has been able to revolutionize so many fields, ranging from medicine to social media, to food, to security. And this trend has accelerated with COVID-19. While yesterday, every company had to become a digital company to survive, tomorrow they will have to become machine learning companies. Already, on LinkedIn alone, you can find more than 60,000 machine learning jobs open in the US. Many are companies that you would expect to find, like Twitter or Tik Tok, but more and more non-traditional tech companies are investing in machine learning and recruiting machine learning engineers: even companies like McDonald’s. According to LinkedIn, machine learning has created one of the biggest employment opportunities of 2021. Machine learning hiring since 2019 is up by 32%.

But while many alternative educational institutions, both online and offline, have done a great job at opening the doors to programming, 99% of machine learning hires have a bachelor’s degree or higher. The field of machine learning is still not accessible to most aspirants, and it suffers from a huge lack of diversity that has already sparked many ethical controversies. The machine-learning community needs more diversity in the field. They are creating our future and we want that future to reflect the values and attitudes of everyone.

Very early on, we introduced a machine learning curriculum to our Holberton School students. Since then, we have constantly improved our curriculum and tools in order to offer the best machine learning education possible. The initial version of our curriculum was led by Alexa Orrico and our community of advisors from Silicon Valley and beyond.

Machine learning is not as easy as learning a new programming language. To learn machine learning to a depth of understanding required by industry, students need to study mathematics, programming, and machine learning itself, which is actually itself a collection of many sub-categories. Learning everything at the same time was a big challenge for our first students, and from that experience and thanks to their feedback, we now have improved both the program and the tools for both partners and students.

Starting a few weeks ago, a new Machine Learning & Mathematics team joined us to build on the great curriculum Alexa started, and better support machine learning students at Holberton.

Today, I’d like to officially welcome the new Machine Learning & Mathematics team: Florian Bolgar, Fares Nadjar, Myriam Azzouz, Oumaima Merhbene. The team will be led by Mohamed Amine Ben Amor.

Oleksandra Fedorova from Google and Greggory Renard from xBrain and The Frontier Development Lab (an applied research accelerator based at NASA ARC and the SETI Institute), will act as working advisors to make sure the curriculum is of the highest quality and meets industry needs.

In addition, and because machine learning is by far the hardest Holberton School specialization, we are now offering more support and more live sessions to our Holberton Schools students. For each project:

  • At mid project, we have a live session of up to 3 hours focusing on theory, Mathematics and resources
  • At the end of the project, PLD will become:
    • Live coding session of up to two hours
    • Live support on Slack for any questions during the rest of the PLD days

Because our partners’ schools spans over many time zones, we have also doubled all these live sessions:

  • One for the EMEA
  • One for the Americas

Finally, we will also organize regular events (fireside chat most of the time) with machine learning professionals from around the world.

Please find a few words about the new ML & Math team:

Mohamed Amine Ben Amor – team ML lead

Amine owns a PhD in Mathematics. He used to be the director of the Mathematics department of the Tunisian campus of the University Paris Dauphine for almost 5 years and is now both assistant master at France National Education as well as campus director of Holberton Tunis. 

Before that, Amine was an entrepreneur for 6 years, using Mathematics to help companies.

Amine has 10 publications in ML and Mathematics under his belt, and 2 more to be released this year.

Amine is a former student from Ecole Polytechnique. One of the best engineering schools in the world. He also has several distinctions.

Florian Bolgar – Senior

Florian owns a PhD in Astrophysics. He has been teaching Mathematics since 2013 and is currently working as ML expert in France. He also is also fluent in Python and Django and has been doing freelance work as a software engineer since early 2019.

Florian owns a “few” degrees from the best French universities:

  • PhD in Astrophysics from University Pierre et Marie Curie
  • Master in Physics from École Normale Supérieure (one of the best university in the world)
  • Master in Astrophysics from University Paris Diderot
  • Aggregation in Mathematics from École Normale Supérieure
  • Bachelor in Physics from École Normale Supérieure

Fares Nadjar – Senior

Fares has a triple competency as actuary (an actuary is a professional using probability and statistics in the fields of finance and insurance), data scientist, and software engineer.

He owns a:

  • Master in Computer Science and Intelligent Systems from University Paris Dauphine
  • Master in Actuary from University Paris Dauphine
  • Bachelor in Mathematic and Computer Science 

Myriam Azzouz – Tutor

Myriam just graduated From Paris Dauphine PSL university where she earned:

  • A Master in Artificial Intelligence, Systems and Data. This master is co-accredited by 

University Paris Dauphine, École Normale Supérieure of Paris and Mines ParisTech

  • A Master in Big Data
  • A Bachelor in Applied Mathematics

Myriam was mentored by Professor Tristan Cazenave. 

Myriam applied her skills in ML as a Machine Learning Engineer Intern in different companies, including Criteo, Equinor, Sagemcom. During her last internship in London (UK) she built an automated pipeline and deployed models to help people and companies extract insights from their data. 

Myriam is also volunteering, helping kids to better learn and love Mathematics.

Oumaima Merhbene – Tutor

Oumaima also graduated From Paris Dauphine PSL university where she earned:

  • A Master in Artificial Intelligence, Systems and Data. This master is co-accredited by 

University Paris Dauphine, École Normale Supérieure of Paris and Mines ParisTech

  • A Master in Big Data
  • A Bachelor in Applied Mathematics

She spent her last 6 months at the LAMSADE research laboratory (Decision Support, Operational Research, Business Intelligence, Decision Theory and Artificial Intelligence) at CNRS / Paris-Dauphine where she was supervised by both Tristan Cazenave and Florian Sikora. During this internship, she used a combination of Monte Carlo Tree Search and Deep Learning Techniques to solve graph optimisation problems. Her publication will be soon released at the Monte Carlo Search workshop. 

She also had a 3-month internship at the Pasteur Institute research laboratory, where she worked on predicting new drug-disease associations using Neural Networks (autoencoder) to identify potential new usages for existing drugs. 

About Tristan Cazenave

Tristan Cazenave is one of the most famous Machine Learning specialists. He wrote 225 papers on Machine Learning. With Bruno Bouzy, Tristan Cazenave invented the best approach to Computer Go. In 2005, He wrote a Monte-Carlo based Phantom Go program competitive with human Go players. An improved version won gold medals at the 2007, 2008 and 2009 computer Olympiads.

About Florian Sikora

Florian Sikora is interested in “Hard Problems”. He is working on Graph theory and Machine Learning solving methods on them. He wrote several softwares and several research papers. He is also Associate Professor at The Paris Dauphine University (1st French university).

Oleksandra Fedorova – advisor

Oleksandra is an ML software engineer at Google, in Sunnyvale (CA) where she works on the Dialogflow project (Natural Language Processing made easy for developers).

Previously Sasha attended school 42 in Silicon Valley where she was also an AI technical coordinator, after which she worked in different startups applying her skill as ML and full-stack software engineer before joining Google.

Gregory Renard – senior advisor

Greg is a star in the world of ML. He is an investor and an advisor at Holberton, Inc, and had talked to students many times in the past. His resume is super impressive:

  • Chief AI officer and head of applied Machine Learning and Deep Learning at xBrain (think Siri for cars) in Menlo Park (CA), since 10+ years
  • He is also on the committee and ML lead at the Frontier Development Lab, co-created by the NASA and the SETI, since 2016
  • Greg is an AI advisor for the French and Belgium governments

Recently, Greg also volunteered with the office of Science and Technology Policy at the White House, where he – with the help of two other Machine Learning specialists – trained an NLP hybrid model in order to help the administration to look for relevant information in the research papers and on the web. Daily, they are issuing a quantitative report and periodically a qualitative report of the publications on the COVID-19 cure. They automated semantic collection and analysis of nearly 5 million VIDOC-19 related items per day (representing more than 1 To raw data text per 3 weeks).

In addition to Holberton’s staff, Coderise, our partner in Colombia also has recruited ML specialists who will be available to help students in Colombia:

  • Mario Morales, Senior Research Scientist and MsC and Associate Faculty in Columbia University
  • Robrecht Jurriaans, CTO at Bomberbot and MSc in Artificial Intelligence from University of Amsterdam 
  • Cristian Garcia, Software Developer (Machine Learning) at Quansight

Machine learning has so much potential, and it’s so important to create more ways for students to learn about it. I am beyond excited to have such an awesome team joining us to continue improving our ML program. In combination with our OS of Education, our Holberton School partners have also been able to offer the ML specialization as a stand-alone to developers around the world, but also, universities are now able to offer our ML projects to their students. With this new tool under their belt, our students will be able to create a better future for everyone. I can’t wait to see what they will create!

Code review: string concatenation in C

Today, we are going to go through the following code.

The very interesting thing about this code, is that when compiled and ran, it seems it is working just fine. But actually it is not working properly and there is a big problem with it 🙂 And by going through it we are going to learn a ton of things.

At Holberton, we have a strict coding style for each programming language we are using. Let’s start by applying our coding style for C.
main() should be written main(void). Is that a huge mistake? No. But this forces us to be more structured, and always explicitly write everything.
– We don’t want to initialize our variables at the same time as we are declaring them (Exceptions can apply for arrays.)
– Also we want to group our variable declarations together.
– No empty lines within the code. Only one empty line between declaration and code.
So the code should look like this:

It’s cleaner, more professional, follows the style of the school (remember that in any company we have to follow the coding style of the company, so it’s important that we get into the habit of following strictly one particular style).

We are using the function printf without including its prototype. So when we will compile, we will get a warning from the compiler, even without additional flags.

In order to know the prototype of a function, we can always look at its man page. In this case, man 3 printf.

The man page gives us the prototype and what header to include. We can use either to indicate to the compiler what is the prototype of the function printf (and make the warning go away). We don’t strictly have to include the header, we can simply include the prototype itself. Like so:

But it’s a good habit to include the header (which includes the prototype).

Now let’s see what is happening in the program and why it is working but not really. We will go step by step through the code and look at what happens in memory. Let’s start with the declarations:

At this point, this is what the virtual memory looks like (we are going to assume we are working on a 64-bit, Linux machine):

Note: In the value line, I do show the letters for each byte of the arrays aa and bb, but what is actually stored in the virtual memory is the ascii code of this letter. (man ascii).
I also added some colors to make sure we can see the limit of each space reserved in the memory by each variable.

– The string literals are copied into the addresses of the arrays. The arrays have been automatically sized (the compiler can do that because it knows the size of the string literals to copy).
a and b are pointers so on a 64-bit machine they take 8 bytes in memory.
– The variable aa, is an array of chars of size 14 bytes (14 chars, so 14 * sizeof(char) = 14 bytes).
– At this point the variables a and b have a value but we do not know what it is. The next two lines of code will initialize them.

After these two lines of code, a points to the first letter of the array aa (so it contains the address of the first letter of the array aa, which is also the address of the array aa) and b points to the first letter of the array bb. This is what the virtual memory looks like:

So far so good. With the next lines of code we are going at the end of the “string” (remember there is no type string in C). This code is correct. So at the end of the while loop, a points to the \0 of the array aa.

At the end of this while loop, the virtual memory looks like this:

The next lines of code are the following:

The above loop copies the content of the array bb (remember, b points to the first char contained in bb) at the end of the array aa (as the variable a, at the beginning of the loop, points to the last char (\0) of the array aa). And that is both what we wanted the code to do, AND the problem 🙂

The content of bb is copied, one char at a time, starting from the memory address 19 (in our example). But, our variable aa ENDS at 19 too. That means that we are writing the content of bb AFTER the variable aa, not inside. After 12 iterations, the virtual memory (in our example) looks like this:

In red, we have written 11 bytes outside of the memory reserved for aa, and will continue to do so via the loop for another 10 bytes. The problem of course, is that we are probably replacing the values of other variables, or writing in a memory address that we do not have write access (and will get a beautiful Segmentation Fault). In this particular case, the program still runs “properly” and without warning (because we are unlucky), and as a result, we don’t realize that we are making a mistake.

In fact, in this example, we are actually “destroying” our array bb. Let’s modify a bit the program in order to check that out:

It seems like we changed bb by concatenating it to aa. But bb is not 1 char “shorter”, it still takes the same size in memory, but its content has changed. It is happening, because in the actual virtual memory of our running process the two arrays are next to each other, like so:

I removed the vars a and b for clarity.

So when we are concatenating bb to aa, we are doing this (concatenated letters in pink):

After this concatenation, bb size doesn’t change, but now the content has changed, and it “seems” it was shifted to the left by 1 char. But that’s because the - of the beginning is now part of aa as the last letter in the reserved memory for aa. Note that bb now has two \0, the one copied, and the initial one.

THE END 🙂 If you would like to learn more about the virtual memory, you can read these articles:

To finish with, I would like to thank the author of this code, because thanks to them we learn a ton of things!

“Experience is simply the name we give our mistakes.” Oscar Wilde

Happy coding!

Live coding sessions ++

Students love them – we listened and are bringing more live coding sessions!

While the COVID19 pandemic has been easing in some Holberton communities, the trend isn’t going in the right direction for most of them. Many of our students have been sheltering and studying from home, for months.

We were able to quickly move Holberton education online, without any interruptions for students, and have been adding features to make Holberton online education better. Among many features and initiatives, one is particularly adored by our student community: the weekly live-coding session.

These sessions were historically done by San Francisco Resident Software Engineer Kristen Loyd and I. For 1 to 4 hours, we would whiteboard and code live 1 or more projects that students had worked on, streaming it to our students in all campuses across the world.

Live coding sessions allow students to witness the thought-process of a senior software engineer solving a problem and how it is then implemented into code. Students can ask questions at any time and clarify concepts they may be unsure of. Students enjoy the learning benefits, but they also get to hang out with fellow students from all across the world – which can be a mood booster – in a time where we are all far from one another.

Given the success of these live coding sessions, we came together as a community and are now very happy to bring this to the next level. Starting next week, we will offer even more live sessions every week, with more formats!  These live coding sessions will be given by software engineers in residence from Holberton campuses around the world.

As an example, here is next week schedule:

  • Monday: printf concepts review with 🇨🇴Holberton Medellín SWE Fredy Mena Andrade (this is a project where students get to code their own version of printf)
  • Tuesday: SQL JOINs with 🇺🇾Holberton Montevideo SWE Javier Valenzani
  • Wednesday: Binary trees with 🇺🇸Holberton San Francisco SWE Kristen Loyd
  • Thursday: Introduction to assembly with 🇨🇴Holberton Bogotá SWE Nicolaz Pérez
  • Friday: Singly-linked lists with Holberton 🇺🇸Tulsa SWE Derek Webb

With at least one software engineer in residence for every Holberton campus, more campuses will mean more live coding sessions! With locations all across the world, it also means that students can learn from different points of view, ways of thinking, with more diversity of background, culture, and experience!

These live coding sessions are extracurricular. They are following our belief of helping one another and learning from each other – live-coding sessions are an embodiment of what our education is: project and peer-based learning.

If you are a software developer and want to do a live coding session with Holberton folks from all across the world, please reach out to me! Keep coding!

Hack the virtual memory: the stack, registers and assembly code

This is the fifth chapter in a series about virtual memory. The goal is to learn some CS basics in a different and more practical way.

If you missed the previous chapters, you should probably start there:

The Stack

As we have seen in chapter 2, the stack resides at the high end of memory and grows downward. But how does it work exactly? How does it translate into assembly code? What are the registers used? In this chapter we will have a closer look at how the stack works, and how the program automatically allocates and de-allocates local variables.

Once we understand this, we will be able to play a bit with it, and hijack the flow of our program. Ready? Let’s start!

Note: We will talk only about the user stack, as opposed to the kernel stack

Prerequisites

In order to fully understand this article, you will need to know:

  • The basics of the C programming language (especially pointers)

Environment

All scripts and programs have been tested on the following system:

  • Ubuntu
    • Linux ubuntu 4.4.0-31-generic #50~14.04.1-Ubuntu SMP Wed Jul 13 01:07:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
  • Tools used:
    • gcc
    • gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
    • objdump
    • GNU objdump (GNU Binutils for Ubuntu) 2.2

Everything we cover will be true for this system/environment, but may be different on another system

Automatic allocation

Let’s first look at a very simple program that has one function that uses one variable (0-main.c):

#include <stdio.h>

int main(void)
{
    int a;

    a = 972;
    printf("a = %d\n", a);
    return (0);
}

Let’s compile this program and disassemble it using objdump:

holberton$ gcc 0-main.c
holberton$ objdump -d -j .text -M intel

The assembly code produced for our main function is the following:

000000000040052d <main>:
  40052d:       55                      push   rbp
  40052e:       48 89 e5                mov    rbp,rsp
  400531:       48 83 ec 10             sub    rsp,0x10
  400535:       c7 45 fc cc 03 00 00    mov    DWORD PTR [rbp-0x4],0x3cc
  40053c:       8b 45 fc                mov    eax,DWORD PTR [rbp-0x4]
  40053f:       89 c6                   mov    esi,eax
  400541:       bf e4 05 40 00          mov    edi,0x4005e4
  400546:       b8 00 00 00 00          mov    eax,0x0
  40054b:       e8 c0 fe ff ff          call   400410 <printf@plt>
  400550:       b8 00 00 00 00          mov    eax,0x0
  400555:       c9                      leave  
  400556:       c3                      ret    
  400557:       66 0f 1f 84 00 00 00    nop    WORD PTR [rax+rax*1+0x0]
  40055e:       00 00 

Let’s focus on the first three lines for now:

000000000040052d <main>:
  40052d:       55                      push   rbp
  40052e:       48 89 e5                mov    rbp,rsp
  400531:       48 83 ec 10             sub    rsp,0x10

The first lines of the function main refers to rbp and rsp; these are special purpose registers. rbp is the base pointer, which points to the base of the current stack frame, and rsp is the stack pointer, which points to the top of the current stack frame.

Let’s decompose step by step what is happening here. This is the state of the stack when we enter the function main before the first instruction is run:

the stack

  • push rbp instruction pushes the value of the register rbp onto the stack. Because it “pushes” onto the stack, now the value of rsp is the memory address of the new top of the stack. The stack and the registers now look like this:

the stack

  • mov rbp, rsp copies the value of the stack pointer rsp to the base pointer rbp -> rpb and rsp now both point to the top of the stack

the stack

  • sub rsp, 0x10 creates a space to store values of local variables. The space between rbp and rsp is this space. Note that this space is large enough to store our variable of type integer

the stack

We have just created a space in memory – on the stack – for our local variables. This space is called a stack frame. Every function that has local variables will use a stack frame to store those variables.

Using local variables

The fourth line of assembly code of our main function is the following:

  400535:       c7 45 fc cc 03 00 00    mov    DWORD PTR [rbp-0x4],0x3cc

0x3cc is actually the value 972 in hexadecimal. This line corresponds to our C-code line:

a = 972;

mov DWORD PTR [rbp-0x4],0x3cc is setting the memory at address rbp - 4 to 972. [rbp - 4] IS our local variable a. The computer doesn’t actually know the name of the variable we use in our code, it simply refers to memory addresses on the stack.

This is the state of the stack and the registers after this operation:

the stack

leave, Automatic de-allocation

If we look now at the end of the function, we will find this:

  400555:       c9                      leave  

The instruction leave sets rsp to rbp, and then pops the top of the stack into rbp.

the stack

the stack

Because we pushed the previous value of rbp onto the stack when we entered the function, rbp is now set to the previous value of rbp. This is how:

  • The local variables are “de-allocated”, and
  • the stack frame of the previous function is restored before we leave the current function.

The state of the stack and the registers rbp and rsp are restored to the same state as when we entered our main function.

Playing with the stack

When the variables are automatically de-allocated from the stack, they are not completely “destroyed”. Their values are still in memory, and this space will potentially be used by other functions.

This is why it is important to initialize your variables when you write your code, because otherwise, they will take whatever value there is on the stack at the moment when the program is running.

Let’s consider the following C code (1-main.c):

#include <stdio.h>

void func1(void)
{
     int a;
     int b;
     int c;

     a = 98;
     b = 972;
     c = a + b;
     printf("a = %d, b = %d, c = %d\n", a, b, c);
}

void func2(void)
{
     int a;
     int b;
     int c;

     printf("a = %d, b = %d, c = %d\n", a, b, c);
}

int main(void)
{
    func1();
    func2();
    return (0);
}

As you can see, func2 does not set the values of its local vaiables a, b and c, yet if we compile and run this program it will print…

holberton$ gcc 1-main.c && ./a.out 
a = 98, b = 972, c = 1070
a = 98, b = 972, c = 1070
holberton$ 

… the same variable values of func1! This is because of how the stack works. The two functions declared the same amount of variables, with the same type, in the same order. Their stack frames are exactly the same. When func1 ends, the memory where the values of its local variables reside are not cleared – only rsp is incremented.
As a consequence, when we call func2 its stack frame sits at exactly the same place of the previous func1 stack frame, and the local variables of func2 have the same values of the local variables of func1 when we left func1.

Let’s examine the assembly code to prove it:

holberton$ objdump -d -j .text -M intel
000000000040052d <func1>:
  40052d:       55                      push   rbp
  40052e:       48 89 e5                mov    rbp,rsp
  400531:       48 83 ec 10             sub    rsp,0x10
  400535:       c7 45 f4 62 00 00 00    mov    DWORD PTR [rbp-0xc],0x62
  40053c:       c7 45 f8 cc 03 00 00    mov    DWORD PTR [rbp-0x8],0x3cc
  400543:       8b 45 f8                mov    eax,DWORD PTR [rbp-0x8]
  400546:       8b 55 f4                mov    edx,DWORD PTR [rbp-0xc]
  400549:       01 d0                   add    eax,edx
  40054b:       89 45 fc                mov    DWORD PTR [rbp-0x4],eax
  40054e:       8b 4d fc                mov    ecx,DWORD PTR [rbp-0x4]
  400551:       8b 55 f8                mov    edx,DWORD PTR [rbp-0x8]
  400554:       8b 45 f4                mov    eax,DWORD PTR [rbp-0xc]
  400557:       89 c6                   mov    esi,eax
  400559:       bf 34 06 40 00          mov    edi,0x400634
  40055e:       b8 00 00 00 00          mov    eax,0x0
  400563:       e8 a8 fe ff ff          call   400410 <printf@plt>
  400568:       c9                      leave  
  400569:       c3                      ret    

000000000040056a <func2>:
  40056a:       55                      push   rbp
  40056b:       48 89 e5                mov    rbp,rsp
  40056e:       48 83 ec 10             sub    rsp,0x10
  400572:       8b 4d fc                mov    ecx,DWORD PTR [rbp-0x4]
  400575:       8b 55 f8                mov    edx,DWORD PTR [rbp-0x8]
  400578:       8b 45 f4                mov    eax,DWORD PTR [rbp-0xc]
  40057b:       89 c6                   mov    esi,eax
  40057d:       bf 34 06 40 00          mov    edi,0x400634
  400582:       b8 00 00 00 00          mov    eax,0x0
  400587:       e8 84 fe ff ff          call   400410 <printf@plt>
  40058c:       c9                      leave  
  40058d:       c3                      ret  

000000000040058e <main>:
  40058e:       55                      push   rbp
  40058f:       48 89 e5                mov    rbp,rsp
  400592:       e8 96 ff ff ff          call   40052d <func1>
  400597:       e8 ce ff ff ff          call   40056a <func2>
  40059c:       b8 00 00 00 00          mov    eax,0x0
  4005a1:       5d                      pop    rbp
  4005a2:       c3                      ret    
  4005a3:       66 2e 0f 1f 84 00 00    nop    WORD PTR cs:[rax+rax*1+0x0]
  4005aa:       00 00 00 
  4005ad:       0f 1f 00                nop    DWORD PTR [rax]

As you can see, the way the stack frame is formed is always consistent. In our two functions, the size of the stack frame is the same since the local variables are the same.

push   rbp
mov    rbp,rsp
sub    rsp,0x10

And both functions end with the leave statement.

The variables a, b and c are referenced the same way in the two functions:

  • a lies at memory address rbp - 0xc
  • b lies at memory address rbp - 0x8
  • c lies at memory address rbp - 0x4

Note that the order of those variables on the stack is not the same as the order of those variables in our code. The compiler orders them as it wants, so you should never assume the order of your local variables in the stack.

So, this is the state of the stack and the registers rbp and rsp before we leave func1:

the stack

When we leave the function func1, we hit the instruction leave; as previously explained, this is the state of the stack, rbp and rsp right before returning to the function main:

the stack

So when we enter func2, the local variables are set to whatever sits in memory on the stack, and that is why their values are the same as the local variables of the function func1.

the stack

ret

You might have noticed that all our example functions end with the instruction ret. ret pops the return address from stack and jumps there. When functions are called the program uses the instruction call to push the return address before it jumps to the first instruction of the function called.
This is how the program is able to call a function and then return from said function the calling function to execute its next instruction.

So this means that there are more than just variables on the stack, there are also memory addresses of instructions. Let’s revisit our 1-main.c code.

When the main function calls func1,

  400592:       e8 96 ff ff ff          call   40052d <func1>

it pushes the memory address of the next instruction onto the stack, and then jumps to func1.
As a consequence, before executing any instructions in func1, the top of the stack contains this address, so rsp points to this value.

the stack

After the stack frame of func1 is formed, the stack looks like this:

the stack

Wrapping everything up

Given what we just learned, we can directly use rbp to directly access all our local variables (without using the C variables!), as well as the saved rbp value on the stack and the return address values of our functions.

To do so in C, we can use:

    register long rsp asm ("rsp");
    register long rbp asm ("rbp");

Here is the listing of the program 2-main.c:

#include <stdio.h>

void func1(void)
{
    int a;
    int b;
    int c;
    register long rsp asm ("rsp");
    register long rbp asm ("rbp");

    a = 98;
    b = 972;
    c = a + b;
    printf("a = %d, b = %d, c = %d\n", a, b, c);
    printf("func1, rpb = %lx\n", rbp);
    printf("func1, rsp = %lx\n", rsp);
    printf("func1, a = %d\n", *(int *)(((char *)rbp) - 0xc) );
    printf("func1, b = %d\n", *(int *)(((char *)rbp) - 0x8) );
    printf("func1, c = %d\n", *(int *)(((char *)rbp) - 0x4) );
    printf("func1, previous rbp value = %lx\n", *(unsigned long int *)rbp );
    printf("func1, return address value = %lx\n", *(unsigned long int *)((char *)rbp + 8) );
}

void func2(void)
{
    int a;
    int b;
    int c;
    register long rsp asm ("rsp");
    register long rbp asm ("rbp");

    printf("func2, a = %d, b = %d, c = %d\n", a, b, c);
    printf("func2, rpb = %lx\n", rbp);
    printf("func2, rsp = %lx\n", rsp);
}

int main(void)
{
    register long rsp asm ("rsp");
    register long rbp asm ("rbp");

    printf("main, rpb = %lx\n", rbp);
    printf("main, rsp = %lx\n", rsp);
    func1();
    func2();
    return (0);
}

Getting the values of the variables

the stack

From our previous discoveries, we know that our variables are referenced via rbp – 0xX:

  • a is at rbp - 0xc
  • b is at rbp - 0x8
  • c is at rbp - 0x4

So in order to get the values of those variables, we need to dereference rbp. For the variable a:

  • cast our variable rbp to a char *: (char *)rbp
  • subtract the correct amount of bytes to get the address of where the variable is in memory: (char *)rbp) - 0xc
  • cast it again to a pointer pointing to an int since a is of type int: (int *)(((char *)rbp) - 0xc)
  • and dereference it to get the value sitting at this address: *(int *)(((char *)rbp) - 0xc)

The saved rbp value

the stack

Looking at the above diagram, the current rbp directly points to the saved rbp, so we simply have to cast our variable rbp to a pointer to an unsigned long int and dereference it: *(unsigned long int *)rbp.

The return address value

the stack

The return address value is right before the saved previous rbp on the stack. rbp is 8 bytes long, so we simply need to add 8 to the current value of rbp to get the address where this return value is on the stack. This is how we do it:

  • cast our variable rbp to a char *: (char *)rbp
  • add 8 to this value: ((char *)rbp + 8)
  • cast it to point to an unsigned long int: (unsigned long int *)((char *)rbp + 8)
  • dereference it to get the value at this address: *(unsigned long int *)((char *)rbp + 8)

The output of our program

holberton$ gcc 2-main.c && ./a.out 
main, rpb = 7ffc78e71b70
main, rsp = 7ffc78e71b70
a = 98, b = 972, c = 1070
func1, rpb = 7ffc78e71b60
func1, rsp = 7ffc78e71b50
func1, a = 98
func1, b = 972
func1, c = 1070
func1, previous rbp value = 7ffc78e71b70
func1, return address value = 400697
func2, a = 98, b = 972, c = 1070
func2, rpb = 7ffc78e71b60
func2, rsp = 7ffc78e71b50
holberton$

We can see that:

  • from func1 we can access all our variables correctly via rbp
  • from func1 we can get the rbp of the function main
  • we confirm that func1 and func2 do have the same rbp and rsp values
  • the difference between rsp and rbp is 0x10, as seen in the assembly code (sub rsp,0x10)
  • in the main function, rsp == rbp because there are no local variables

The return address from func1 is 0x400697. Let’s double check this assumption by disassembling the program. If we are correct, this should be the address of the instruction right after the call of func1 in the main function.

holberton$ objdump -d -j .text -M intel | less
0000000000400664 <main>:
  400664:       55                      push   rbp
  400665:       48 89 e5                mov    rbp,rsp
  400668:       48 89 e8                mov    rax,rbp
  40066b:       48 89 c6                mov    rsi,rax
  40066e:       bf 3b 08 40 00          mov    edi,0x40083b
  400673:       b8 00 00 00 00          mov    eax,0x0
  400678:       e8 93 fd ff ff          call   400410 <printf@plt>
  40067d:       48 89 e0                mov    rax,rsp
  400680:       48 89 c6                mov    rsi,rax
  400683:       bf 4c 08 40 00          mov    edi,0x40084c
  400688:       b8 00 00 00 00          mov    eax,0x0
  40068d:       e8 7e fd ff ff          call   400410 <printf@plt>
  400692:       e8 96 fe ff ff          call   40052d <func1>
  400697:       e8 7a ff ff ff          call   400616 <func2>
  40069c:       b8 00 00 00 00          mov    eax,0x0
  4006a1:       5d                      pop    rbp
  4006a2:       c3                      ret    
  4006a3:       66 2e 0f 1f 84 00 00    nop    WORD PTR cs:[rax+rax*1+0x0]
  4006aa:       00 00 00 
  4006ad:       0f 1f 00                nop    DWORD PTR [rax]

And yes! \o/

Hack the stack!

Now that we know where to find the return address on the stack, what if we were to modify this value? Could we alter the flow of a program and make func1 return to somewhere else? Let’s add a new function, called bye to our program (3-main.c):

#include <stdio.h>
#include <stdlib.h>

void bye(void)
{
    printf("[x] I am in the function bye!\n");
    exit(98);
}

void func1(void)
{
    int a;
    int b;
    int c;
    register long rsp asm ("rsp");
    register long rbp asm ("rbp");

    a = 98;
    b = 972;
    c = a + b;
    printf("a = %d, b = %d, c = %d\n", a, b, c);
    printf("func1, rpb = %lx\n", rbp);
    printf("func1, rsp = %lx\n", rsp);
    printf("func1, a = %d\n", *(int *)(((char *)rbp) - 0xc) );
    printf("func1, b = %d\n", *(int *)(((char *)rbp) - 0x8) );
    printf("func1, c = %d\n", *(int *)(((char *)rbp) - 0x4) );
    printf("func1, previous rbp value = %lx\n", *(unsigned long int *)rbp );
    printf("func1, return address value = %lx\n", *(unsigned long int *)((char *)rbp + 8) );
}

void func2(void)
{
    int a;
    int b;
    int c;
    register long rsp asm ("rsp");
    register long rbp asm ("rbp");

    printf("func2, a = %d, b = %d, c = %d\n", a, b, c);
    printf("func2, rpb = %lx\n", rbp);
    printf("func2, rsp = %lx\n", rsp);
}

int main(void)
{
    register long rsp asm ("rsp");
    register long rbp asm ("rbp");

    printf("main, rpb = %lx\n", rbp);
    printf("main, rsp = %lx\n", rsp);
    func1();
    func2();
    return (0);
}

Let’s see at which address the code of this function starts:

holberton$ gcc 3-main.c && objdump -d -j .text -M intel | less
00000000004005bd <bye>:
  4005bd:       55                      push   rbp
  4005be:       48 89 e5                mov    rbp,rsp
  4005c1:       bf d8 07 40 00          mov    edi,0x4007d8
  4005c6:       e8 b5 fe ff ff          call   400480 <puts@plt>
  4005cb:       bf 62 00 00 00          mov    edi,0x62
  4005d0:       e8 eb fe ff ff          call   4004c0 <exit@plt>

Now let’s replace the return address on the stack from the func1 function with the address of the beginning of the function bye, 4005bd (4-main.c):

#include <stdio.h>
#include <stdlib.h>

void bye(void)
{
    printf("[x] I am in the function bye!\n");
    exit(98);
}

void func1(void)
{
    int a;
    int b;
    int c;
    register long rsp asm ("rsp");
    register long rbp asm ("rbp");

    a = 98;
    b = 972;
    c = a + b;
    printf("a = %d, b = %d, c = %d\n", a, b, c);
    printf("func1, rpb = %lx\n", rbp);
    printf("func1, rsp = %lx\n", rsp);
    printf("func1, a = %d\n", *(int *)(((char *)rbp) - 0xc) );
    printf("func1, b = %d\n", *(int *)(((char *)rbp) - 0x8) );
    printf("func1, c = %d\n", *(int *)(((char *)rbp) - 0x4) );
    printf("func1, previous rbp value = %lx\n", *(unsigned long int *)rbp );
    printf("func1, return address value = %lx\n", *(unsigned long int *)((char *)rbp + 8) );
    /* hack the stack! */
    *(unsigned long int *)((char *)rbp + 8) = 0x4005bd;
}

void func2(void)
{
    int a;
    int b;
    int c;
    register long rsp asm ("rsp");
    register long rbp asm ("rbp");

    printf("func2, a = %d, b = %d, c = %d\n", a, b, c);
    printf("func2, rpb = %lx\n", rbp);
    printf("func2, rsp = %lx\n", rsp);
}

int main(void)
{
    register long rsp asm ("rsp");
    register long rbp asm ("rbp");

    printf("main, rpb = %lx\n", rbp);
    printf("main, rsp = %lx\n", rsp);
    func1();
    func2();
    return (0);
}
holberton$ gcc 4-main.c && ./a.out
main, rpb = 7fff62ef1b60
main, rsp = 7fff62ef1b60
a = 98, b = 972, c = 1070
func1, rpb = 7fff62ef1b50
func1, rsp = 7fff62ef1b40
func1, a = 98
func1, b = 972
func1, c = 1070
func1, previous rbp value = 7fff62ef1b60
func1, return address value = 40074d
[x] I am in the function bye!
holberton$ echo $?
98
holberton$ 

We have called the function bye, without calling it! ?

Outro

I hope that you enjoyed this and learned a couple of things about the stack. As usual, this will be continued! Let me know if you have anything you would like me to cover in the next chapter.

Questions? Feedback?

If you have questions or feedback don’t hesitate to ping us on Twitter at @holbertonschool or @julienbarbier42.
Haters, please send your comments to /dev/null.

Happy Hacking!

Thank you for reading!

As always, no one is perfect (except Chuck of course), so don’t hesitate to contribute or send me your comments if you find anything I missed.

Files

This repo contains the source code (X-main.c files) for programs created in this tutorial.

Read more about the virtual memory

Follow @holbertonschool or @julienbarbier42 on Twitter to get the next chapters! This was the fifth chapter in our series on the virtual memory. If you missed the previous ones, here are the links to them:

Many thanks to Naomi for proof-reading! ?

Hack the Virtual Memory: malloc, the heap & the program break

Hack the VM!

This is the fourth chapter in a series around virtual memory. The goal is to learn some CS basics, but in a different and more practical way.

If you missed the previous chapters, you should probably start there:

The heap

In this chapter we will look at the heap and malloc in order to answer some of the questions we ended with at the end of the previous chapter:

  • Why doesn’t our allocated memory start at the very beginning of the heap (0x2050010 vs 02050000)? What are those first 16 bytes used for?
  • Is the heap actually growing upwards?

Prerequisites

In order to fully understand this article, you will need to know:

Environment

All scripts and programs have been tested on the following system:

  • Ubuntu
    • Linux ubuntu 4.4.0-31-generic #50~14.04.1-Ubuntu SMP Wed Jul 13 01:07:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Tools used:

  • gcc
    • gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
  • glibc 2.19 (see version.c if you need to check your glibc version)
  • strace
    • strace — version 4.8

Everything we will write will be true for this system/environment, but may be different on another system

We will also go through the Linux source code. If you are on Ubuntu, you can download the sources of your current kernel by running this command:

apt-get source linux-image-$(uname -r)

malloc

malloc is the common function used to dynamically allocate memory. This memory is allocated on the “heap”.
Note: malloc is not a system call.

From man malloc:

[...] allocate dynamic memory[...]
void *malloc(size_t size);
[...]
The malloc() function allocates size bytes and returns a pointer to the allocated memory.

No malloc, no [heap]

Let’s look at memory regions of a process that does not call malloc (0-main.c).

#include <stdlib.h>
#include <stdio.h>

/**
 * main - do nothing
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    getchar();
    return (EXIT_SUCCESS);
}

julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 0-main.c -o 0
julien@holberton:~/holberton/w/hackthevm3$ ./0 

Quick reminder (1/3): the memory regions of a process are listed in the /proc/[pid]/maps file. As a result, we first need to know the PID of the process. That is done using the ps command; the second column of ps aux output will give us the PID of the process. Please read chapter 0 to learn more.

julien@holberton:/tmp$ ps aux | grep \ \./0$
julien     3638  0.0  0.0   4200   648 pts/9    S+   12:01   0:00 ./0

Quick reminder (2/3): from the above output, we can see that the PID of the process we want to look at is 3638. As a result, the maps file will be found in the directory /proc/3638.

julien@holberton:/tmp$ cd /proc/3638

Quick reminder (3/3): The maps file contains the memory regions of the process. The format of each line in this file is:
address perms offset dev inode pathname

julien@holberton:/proc/3638$ cat maps
00400000-00401000 r-xp 00000000 08:01 174583                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/0
00600000-00601000 r--p 00000000 08:01 174583                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/0
00601000-00602000 rw-p 00001000 08:01 174583                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/0
7f38f87d7000-7f38f8991000 r-xp 00000000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f38f8991000-7f38f8b91000 ---p 001ba000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f38f8b91000-7f38f8b95000 r--p 001ba000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f38f8b95000-7f38f8b97000 rw-p 001be000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f38f8b97000-7f38f8b9c000 rw-p 00000000 00:00 0 
7f38f8b9c000-7f38f8bbf000 r-xp 00000000 08:01 136229                     /lib/x86_64-linux-gnu/ld-2.19.so
7f38f8da3000-7f38f8da6000 rw-p 00000000 00:00 0 
7f38f8dbb000-7f38f8dbe000 rw-p 00000000 00:00 0 
7f38f8dbe000-7f38f8dbf000 r--p 00022000 08:01 136229                     /lib/x86_64-linux-gnu/ld-2.19.so
7f38f8dbf000-7f38f8dc0000 rw-p 00023000 08:01 136229                     /lib/x86_64-linux-gnu/ld-2.19.so
7f38f8dc0000-7f38f8dc1000 rw-p 00000000 00:00 0 
7ffdd85c5000-7ffdd85e6000 rw-p 00000000 00:00 0                          [stack]
7ffdd85f2000-7ffdd85f4000 r--p 00000000 00:00 0                          [vvar]
7ffdd85f4000-7ffdd85f6000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
julien@holberton:/proc/3638$ 

Note: hackthevm3 is a symbolic link to hack_the_virtual_memory/03. The Heap/

-> As we can see from the above maps file, there’s no [heap] region allocated.

malloc(x)

Let’s do the same but with a program that calls malloc (1-main.c):

#include <stdio.h>
#include <stdlib.h>

/**
 * main - 1 call to malloc
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    malloc(1);
    getchar();
    return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 1-main.c -o 1
julien@holberton:~/holberton/w/hackthevm3$ ./1 

julien@holberton:/proc/3638$ ps aux | grep \ \./1$
julien     3718  0.0  0.0   4332   660 pts/9    S+   12:09   0:00 ./1
julien@holberton:/proc/3638$ cd /proc/3718
julien@holberton:/proc/3718$ cat maps 
00400000-00401000 r-xp 00000000 08:01 176964                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/1
00600000-00601000 r--p 00000000 08:01 176964                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/1
00601000-00602000 rw-p 00001000 08:01 176964                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/1
01195000-011b6000 rw-p 00000000 00:00 0                                  [heap]
...
julien@holberton:/proc/3718$ 

-> the [heap] is here.

Let’s check the return value of malloc to make sure the returned address is in the heap region (2-main.c):

#include <stdio.h>
#include <stdlib.h>

/**
 * main - prints the malloc returned address
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    void *p;

    p = malloc(1);
    printf("%p\n", p);
    getchar();
    return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 2-main.c -o 2
julien@holberton:~/holberton/w/hackthevm3$ ./2 
0x24d6010

julien@holberton:/proc/3718$ ps aux | grep \ \./2$
julien     3834  0.0  0.0   4336   676 pts/9    S+   12:48   0:00 ./2
julien@holberton:/proc/3718$ cd /proc/3834
julien@holberton:/proc/3834$ cat maps
00400000-00401000 r-xp 00000000 08:01 176966                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/2
00600000-00601000 r--p 00000000 08:01 176966                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/2
00601000-00602000 rw-p 00001000 08:01 176966                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/2
024d6000-024f7000 rw-p 00000000 00:00 0                                  [heap]
...
julien@holberton:/proc/3834$ 

-> 024d6000 <0x24d6010 < 024f7000

The returned address is inside the heap region. And as we have seen in the previous chapter, the returned address does not start exactly at the beginning of the region; we’ll see why later.

strace, brk and sbrk

malloc is a “regular” function (as opposed to a system call), so it must call some kind of syscall in order to manipulate the heap. Let’s use strace to find out.

strace is a program used to trace system calls and signals. Any program will always use a few syscalls before your main function is executed. In order to know which syscalls are used by malloc, we will add a write syscall before and after the call to malloc(3-main.c).

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

/**
 * main - let's find out which syscall malloc is using
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    void *p;

    write(1, "BEFORE MALLOC\n", 14);
    p = malloc(1);
    write(1, "AFTER MALLOC\n", 13);
    printf("%p\n", p);
    getchar();
    return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 3-main.c -o 3
julien@holberton:~/holberton/w/hackthevm3$ strace ./3 
execve("./3", ["./3"], [/* 61 vars */]) = 0
...
write(1, "BEFORE MALLOC\n", 14BEFORE MALLOC
)         = 14
brk(0)                                  = 0xe70000
brk(0xe91000)                           = 0xe91000
write(1, "AFTER MALLOC\n", 13AFTER MALLOC
)          = 13
...
read(0, 

From the above listing we can focus on this:

brk(0)                                  = 0xe70000
brk(0xe91000)                           = 0xe91000

-> malloc is using the brk system call in order to manipulate the heap. From brk man page (man brk), we can see what this system call is doing:

...
       int brk(void *addr);
       void *sbrk(intptr_t increment);
...
DESCRIPTION
       brk() and sbrk() change the location of the program  break,  which  defines
       the end of the process's data segment (i.e., the program break is the first
       location after the end of the uninitialized data segment).  Increasing  the
       program  break has the effect of allocating memory to the process; decreas‐
       ing the break deallocates memory.

       brk() sets the end of the data segment to the value specified by addr, when
       that  value  is  reasonable,  the system has enough memory, and the process
       does not exceed its maximum data size (see setrlimit(2)).

       sbrk() increments the program's data space  by  increment  bytes.   Calling
       sbrk()  with  an increment of 0 can be used to find the current location of
       the program break.

The program break is the address of the first location beyond the current end of the data region of the program in the virual memory.

program break before the call to malloc / brk

By increasing the value of the program break, via brk or sbrk, the function malloc creates a new space that can then be used by the process to dynamically allocate memory (using malloc).

program break after the malloc / brk call

So the heap is actually an extension of the data segment of the program.

The first call to brk (brk(0)) returns the current address of the program break to malloc. And the second call is the one that actually creates new memory (since 0xe91000 > 0xe70000) by increasing the value of the program break. In the above example, the heap is now starting at 0xe70000 and ends at 0xe91000. Let’s double check with the /proc/[PID]/maps file:

julien@holberton:/proc/3855$ ps aux | grep \ \./3$
julien     4011  0.0  0.0   4748   708 pts/9    S+   13:04   0:00 strace ./3
julien     4014  0.0  0.0   4336   644 pts/9    S+   13:04   0:00 ./3
julien@holberton:/proc/3855$ cd /proc/4014
julien@holberton:/proc/4014$ cat maps 
00400000-00401000 r-xp 00000000 08:01 176967                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/3
00600000-00601000 r--p 00000000 08:01 176967                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/3
00601000-00602000 rw-p 00001000 08:01 176967                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/3
00e70000-00e91000 rw-p 00000000 00:00 0                                  [heap]
...
julien@holberton:/proc/4014$ 

-> 00e70000-00e91000 rw-p 00000000 00:00 0 [heap] matches the pointers returned back to malloc by brk.

That’s great, but wait, why didmalloc increment the heap by 00e9100000e70000 = 0x21000 or 135168 bytes, when we only asked for only 1 byte?

Many mallocs

What will happen if we call malloc several times? (4-main.c)

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

/**
 * main - many calls to malloc
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    void *p;

    write(1, "BEFORE MALLOC #0\n", 17);
    p = malloc(1024);
    write(1, "AFTER MALLOC #0\n", 16);
    printf("%p\n", p);

    write(1, "BEFORE MALLOC #1\n", 17);
    p = malloc(1024);
    write(1, "AFTER MALLOC #1\n", 16);
    printf("%p\n", p);

    write(1, "BEFORE MALLOC #2\n", 17);
    p = malloc(1024);
    write(1, "AFTER MALLOC #2\n", 16);
    printf("%p\n", p);

    write(1, "BEFORE MALLOC #3\n", 17);
    p = malloc(1024);
    write(1, "AFTER MALLOC #3\n", 16);
    printf("%p\n", p);

    getchar();
    return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 4-main.c -o 4
julien@holberton:~/holberton/w/hackthevm3$ strace ./4 
execve("./4", ["./4"], [/* 61 vars */]) = 0
...
write(1, "BEFORE MALLOC #0\n", 17BEFORE MALLOC #0
)      = 17
brk(0)                                  = 0x1314000
brk(0x1335000)                          = 0x1335000
write(1, "AFTER MALLOC #0\n", 16AFTER MALLOC #0
)       = 16
...
write(1, "0x1314010\n", 100x1314010
)             = 10
write(1, "BEFORE MALLOC #1\n", 17BEFORE MALLOC #1
)      = 17
write(1, "AFTER MALLOC #1\n", 16AFTER MALLOC #1
)       = 16
write(1, "0x1314420\n", 100x1314420
)             = 10
write(1, "BEFORE MALLOC #2\n", 17BEFORE MALLOC #2
)      = 17
write(1, "AFTER MALLOC #2\n", 16AFTER MALLOC #2
)       = 16
write(1, "0x1314830\n", 100x1314830
)             = 10
write(1, "BEFORE MALLOC #3\n", 17BEFORE MALLOC #3
)      = 17
write(1, "AFTER MALLOC #3\n", 16AFTER MALLOC #3
)       = 16
write(1, "0x1314c40\n", 100x1314c40
)             = 10
...
read(0, 

-> malloc is NOT calling brk each time we call it.

The first time, malloc creates a new space (the heap) for the program (by increasing the program break location). The following times, malloc uses the same space to give our program “new” chunks of memory. Those “new” chunks of memory are part of the memory previously allocated using brk. This way, malloc doesn’t have to use syscalls (brk) every time we call it, and thus it makes malloc – and our programs using malloc – faster. It also allows malloc and free to optimize the usage of the memory.

Let’s double check that we have only one heap, allocated by the first call to brk:

julien@holberton:/proc/4014$ ps aux | grep \ \./4$
julien     4169  0.0  0.0   4748   688 pts/9    S+   13:33   0:00 strace ./4
julien     4172  0.0  0.0   4336   656 pts/9    S+   13:33   0:00 ./4
julien@holberton:/proc/4014$ cd /proc/4172
julien@holberton:/proc/4172$ cat maps
00400000-00401000 r-xp 00000000 08:01 176973                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/4
00600000-00601000 r--p 00000000 08:01 176973                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/4
00601000-00602000 rw-p 00001000 08:01 176973                             /home/julien/holberton/w/hack_the_virtual_memory/03. The Heap/4
01314000-01335000 rw-p 00000000 00:00 0                                  [heap]
7f4a3f2c4000-7f4a3f47e000 r-xp 00000000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f4a3f47e000-7f4a3f67e000 ---p 001ba000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f4a3f67e000-7f4a3f682000 r--p 001ba000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f4a3f682000-7f4a3f684000 rw-p 001be000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f4a3f684000-7f4a3f689000 rw-p 00000000 00:00 0 
7f4a3f689000-7f4a3f6ac000 r-xp 00000000 08:01 136229                     /lib/x86_64-linux-gnu/ld-2.19.so
7f4a3f890000-7f4a3f893000 rw-p 00000000 00:00 0 
7f4a3f8a7000-7f4a3f8ab000 rw-p 00000000 00:00 0 
7f4a3f8ab000-7f4a3f8ac000 r--p 00022000 08:01 136229                     /lib/x86_64-linux-gnu/ld-2.19.so
7f4a3f8ac000-7f4a3f8ad000 rw-p 00023000 08:01 136229                     /lib/x86_64-linux-gnu/ld-2.19.so
7f4a3f8ad000-7f4a3f8ae000 rw-p 00000000 00:00 0 
7ffd1ba73000-7ffd1ba94000 rw-p 00000000 00:00 0                          [stack]
7ffd1bbed000-7ffd1bbef000 r--p 00000000 00:00 0                          [vvar]
7ffd1bbef000-7ffd1bbf1000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
julien@holberton:/proc/4172$ 

-> We have only one [heap] and the addresses match those returned by sbrk: 0x1314000 & 0x1335000

Naive malloc

Based on the above, and assuming we won’t ever need to free anything, we can now write our own (naive) version of malloc, that would move the program break each time it is called.

#include <stdlib.h>
#include <unistd.h>

/**                                                                                            
 * malloc - naive version of malloc: dynamically allocates memory on the heap using sbrk                         
 * @size: number of bytes to allocate                                                          
 *                                                                                             
 * Return: the memory address newly allocated, or NULL on error                                
 *                                                                                             
 * Note: don't do this at home :)                                                              
 */
void *malloc(size_t size)
{
    void *previous_break;

    previous_break = sbrk(size);
    /* check for error */
    if (previous_break == (void *) -1)
    {
        /* on error malloc returns NULL */
        return (NULL);
    }
    return (previous_break);
}

The 0x10 lost bytes

If we look at the output of the previous program (4-main.c), we can see that the first memory address returned by malloc doesn’t start at the beginning of the heap, but 0x10 bytes after: 0x1314010 vs 0x1314000. Also, when we call malloc(1024) a second time, the address should be 0x1314010 (the returned value of the first call to malloc) + 1024 (or 0x400 in hexadecimal, since the first call to malloc was asking for 1024 bytes) = 0x1318010. But the return value of the second call to malloc is 0x1314420. We have lost 0x10 bytes again! Same goes for the subsequent calls.

Let’s look at what we can find inside those “lost” 0x10-byte memory spaces (5-main.c) and whether the memory loss stays constant:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

/**                                                                                            
 * pmem - print mem                                                                            
 * @p: memory address to start printing from                                                   
 * @bytes: number of bytes to print                                                            
 *                                                                                             
 * Return: nothing                                                                             
 */
void pmem(void *p, unsigned int bytes)
{
    unsigned char *ptr;
    unsigned int i;

    ptr = (unsigned char *)p;
    for (i = 0; i < bytes; i++)
    {
        if (i != 0)
        {
            printf(" ");
        }
        printf("%02x", *(ptr + i));
    }
    printf("\n");
}

/**
 * main - the 0x10 lost bytes
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    void *p;
    int i;

    for (i = 0; i < 10; i++)
    {
        p = malloc(1024 * (i + 1));
        printf("%p\n", p);
        printf("bytes at %p:\n", (void *)((char *)p - 0x10));
        pmem((char *)p - 0x10, 0x10);
    }
    return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 5-main.c -o 5
julien@holberton:~/holberton/w/hackthevm3$ ./5
0x1fa8010
bytes at 0x1fa8000:
00 00 00 00 00 00 00 00 11 04 00 00 00 00 00 00
0x1fa8420
bytes at 0x1fa8410:
00 00 00 00 00 00 00 00 11 08 00 00 00 00 00 00
0x1fa8c30
bytes at 0x1fa8c20:
00 00 00 00 00 00 00 00 11 0c 00 00 00 00 00 00
0x1fa9840
bytes at 0x1fa9830:
00 00 00 00 00 00 00 00 11 10 00 00 00 00 00 00
0x1faa850
bytes at 0x1faa840:
00 00 00 00 00 00 00 00 11 14 00 00 00 00 00 00
0x1fabc60
bytes at 0x1fabc50:
00 00 00 00 00 00 00 00 11 18 00 00 00 00 00 00
0x1fad470
bytes at 0x1fad460:
00 00 00 00 00 00 00 00 11 1c 00 00 00 00 00 00
0x1faf080
bytes at 0x1faf070:
00 00 00 00 00 00 00 00 11 20 00 00 00 00 00 00
0x1fb1090
bytes at 0x1fb1080:
00 00 00 00 00 00 00 00 11 24 00 00 00 00 00 00
0x1fb34a0
bytes at 0x1fb3490:
00 00 00 00 00 00 00 00 11 28 00 00 00 00 00 00
julien@holberton:~/holberton/w/hackthevm3$ 

There is one clear pattern: the size of the malloc’ed memory chunk is always found in the preceding 0x10 bytes. For instance, the first malloc call is malloc’ing 1024 (0x0400) bytes and we can find 11 04 00 00 00 00 00 00 in the preceding 0x10 bytes. Those last bytes represent the number 0x 00 00 00 00 00 00 04 11 = 0x400 (1024) + 0x10 (the block size preceding those 1024 bytes + 1 (we’ll talk about this “+1” later in this chapter). If we look at each 0x10 bytes preceding the addresses returned by malloc, they all contain the size of the chunk of memory asked to malloc + 0x10 + 1.

At this point, given what we said and saw earlier, we can probably guess that those 0x10 bytes are a sort of data structure used by malloc (and free) to deal with the heap. And indeed, even though we don’t understand everything yet, we can already use this data structure to go from one malloc’ed chunk of memory to the other (6-main.c) as long as we have the address of the beginning of the heap (and as long as we have never called free):

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

/**                                                                                            
 * pmem - print mem                                                                            
 * @p: memory address to start printing from                                                   
 * @bytes: number of bytes to print                                                            
 *                                                                                             
 * Return: nothing                                                                             
 */
void pmem(void *p, unsigned int bytes)
{
    unsigned char *ptr;
    unsigned int i;

    ptr = (unsigned char *)p;
    for (i = 0; i < bytes; i++)
    {
        if (i != 0)
        {
            printf(" ");
        }
        printf("%02x", *(ptr + i));
    }
    printf("\n");
}

/**
 * main - using the 0x10 bytes to jump to next malloc'ed chunks
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    void *p;
    int i;
    void *heap_start;
    size_t size_of_the_block;

    heap_start = sbrk(0);
    write(1, "START\n", 6);
    for (i = 0; i < 10; i++)
    {
        p = malloc(1024 * (i + 1)); 
        *((int *)p) = i;
        printf("%p: [%i]\n", p, i);
    }
    p = heap_start;
    for (i = 0; i < 10; i++)
    {
        pmem(p, 0x10);
        size_of_the_block = *((size_t *)((char *)p + 8)) - 1;
        printf("%p: [%i] - size = %lu\n",
              (void *)((char *)p + 0x10),
              *((int *)((char *)p + 0x10)),
              size_of_the_block);
        p = (void *)((char *)p + size_of_the_block);
    }
    write(1, "END\n", 4);
    return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 6-main.c -o 6
julien@holberton:~/holberton/w/hackthevm3$ ./6 
START
0x9e6010: [0]
0x9e6420: [1]
0x9e6c30: [2]
0x9e7840: [3]
0x9e8850: [4]
0x9e9c60: [5]
0x9eb470: [6]
0x9ed080: [7]
0x9ef090: [8]
0x9f14a0: [9]
00 00 00 00 00 00 00 00 11 04 00 00 00 00 00 00
0x9e6010: [0] - size = 1040
00 00 00 00 00 00 00 00 11 08 00 00 00 00 00 00
0x9e6420: [1] - size = 2064
00 00 00 00 00 00 00 00 11 0c 00 00 00 00 00 00
0x9e6c30: [2] - size = 3088
00 00 00 00 00 00 00 00 11 10 00 00 00 00 00 00
0x9e7840: [3] - size = 4112
00 00 00 00 00 00 00 00 11 14 00 00 00 00 00 00
0x9e8850: [4] - size = 5136
00 00 00 00 00 00 00 00 11 18 00 00 00 00 00 00
0x9e9c60: [5] - size = 6160
00 00 00 00 00 00 00 00 11 1c 00 00 00 00 00 00
0x9eb470: [6] - size = 7184
00 00 00 00 00 00 00 00 11 20 00 00 00 00 00 00
0x9ed080: [7] - size = 8208
00 00 00 00 00 00 00 00 11 24 00 00 00 00 00 00
0x9ef090: [8] - size = 9232
00 00 00 00 00 00 00 00 11 28 00 00 00 00 00 00
0x9f14a0: [9] - size = 10256
END
julien@holberton:~/holberton/w/hackthevm3$ 

One of our open questions from the previous chapter is now answered: malloc is using 0x10 additional bytes for each malloc’ed memory block to store the size of the block.

0x10 bytes preceeding malloc

This data will actually be used by free to save it to a list of available blocks for future calls to malloc.

But our study also raises a new question: what are the first 8 bytes of the 16 (0x10 in hexadecimal) bytes used for? It seems to always be zero. Is it just padding?

RTFSC

At this stage, we probably want to check the source code of malloc to confirm what we just found (malloc.c from the glibc).

1055 /*
1056       malloc_chunk details:
1057    
1058        (The following includes lightly edited explanations by Colin Plumb.)
1059    
1060        Chunks of memory are maintained using a `boundary tag' method as
1061        described in e.g., Knuth or Standish.  (See the paper by Paul
1062        Wilson ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps for a
1063        survey of such techniques.)  Sizes of free chunks are stored both
1064        in the front of each chunk and at the end.  This makes
1065        consolidating fragmented chunks into bigger chunks very fast.  The
1066        size fields also hold bits representing whether chunks are free or
1067        in use.
1068    
1069        An allocated chunk looks like this:
1070    
1071    
1072        chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1073                |             Size of previous chunk, if unallocated (P clear)  |
1074                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1075                |             Size of chunk, in bytes                     |A|M|P|
1076          mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1077                |             User data starts here...                          .
1078                .                                                               .
1079                .             (malloc_usable_size() bytes)                      .
1080                .                                                               |
1081    nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1082                |             (size of chunk, but used for application data)    |
1083                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1084                |             Size of next chunk, in bytes                |A|0|1|
1085                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1086    
1087        Where "chunk" is the front of the chunk for the purpose of most of
1088        the malloc code, but "mem" is the pointer that is returned to the
1089        user.  "Nextchunk" is the beginning of the next contiguous chunk.

-> We were correct \o/. Right before the address returned by malloc to the user, we have two variables:

  • Size of previous chunk, if unallocated: we never free’d any chunks so that is why it was always 0
  • Size of chunk, in bytes

Let’s free some chunks to confirm that the first 8 bytes are used the way the source code describes it (7-main.c):

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

/**                                                                                            
 * pmem - print mem                                                                            
 * @p: memory address to start printing from                                                   
 * @bytes: number of bytes to print                                                            
 *                                                                                             
 * Return: nothing                                                                             
 */
void pmem(void *p, unsigned int bytes)
{
    unsigned char *ptr;
    unsigned int i;

    ptr = (unsigned char *)p;
    for (i = 0; i < bytes; i++)
    {
        if (i != 0)
        {
            printf(" ");
        }
        printf("%02x", *(ptr + i));
    }
    printf("\n");
}

/**
 * main - confirm the source code
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    void *p;
    int i;
    size_t size_of_the_chunk;
    size_t size_of_the_previous_chunk;
    void *chunks[10];

    for (i = 0; i < 10; i++)
    {
        p = malloc(1024 * (i + 1));
        chunks[i] = (void *)((char *)p - 0x10);
        printf("%p\n", p);
    }
    free((char *)(chunks[3]) + 0x10);
    free((char *)(chunks[7]) + 0x10);
    for (i = 0; i < 10; i++)
    {
        p = chunks[i];
        printf("chunks[%d]: ", i);
        pmem(p, 0x10);
        size_of_the_chunk = *((size_t *)((char *)p + 8)) - 1;
        size_of_the_previous_chunk = *((size_t *)((char *)p));
        printf("chunks[%d]: %p, size = %li, prev = %li\n",
              i, p, size_of_the_chunk, size_of_the_previous_chunk);
    }
    return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 7-main.c -o 7
julien@holberton:~/holberton/w/hackthevm3$ ./7
0x1536010
0x1536420
0x1536c30
0x1537840
0x1538850
0x1539c60
0x153b470
0x153d080
0x153f090
0x15414a0
chunks[0]: 00 00 00 00 00 00 00 00 11 04 00 00 00 00 00 00
chunks[0]: 0x1536000, size = 1040, prev = 0
chunks[1]: 00 00 00 00 00 00 00 00 11 08 00 00 00 00 00 00
chunks[1]: 0x1536410, size = 2064, prev = 0
chunks[2]: 00 00 00 00 00 00 00 00 11 0c 00 00 00 00 00 00
chunks[2]: 0x1536c20, size = 3088, prev = 0
chunks[3]: 00 00 00 00 00 00 00 00 11 10 00 00 00 00 00 00
chunks[3]: 0x1537830, size = 4112, prev = 0
chunks[4]: 10 10 00 00 00 00 00 00 10 14 00 00 00 00 00 00
chunks[4]: 0x1538840, size = 5135, prev = 4112
chunks[5]: 00 00 00 00 00 00 00 00 11 18 00 00 00 00 00 00
chunks[5]: 0x1539c50, size = 6160, prev = 0
chunks[6]: 00 00 00 00 00 00 00 00 11 1c 00 00 00 00 00 00
chunks[6]: 0x153b460, size = 7184, prev = 0
chunks[7]: 00 00 00 00 00 00 00 00 11 20 00 00 00 00 00 00
chunks[7]: 0x153d070, size = 8208, prev = 0
chunks[8]: 10 20 00 00 00 00 00 00 10 24 00 00 00 00 00 00
chunks[8]: 0x153f080, size = 9231, prev = 8208
chunks[9]: 00 00 00 00 00 00 00 00 11 28 00 00 00 00 00 00
chunks[9]: 0x1541490, size = 10256, prev = 0
julien@holberton:~/holberton/w/hackthevm3$ 

As we can see from the above listing, when the previous chunk has been free’d, the malloc chunk’s first 8 bytes contain the size of the previous unallocated chunk. So the correct representation of a malloc chunk is the following:

malloc chunk

Also, it seems that the first bit of the next 8 bytes (containing the size of the current chunk) serves as a flag to check if the previous chunk is used (1) or not (0). So the correct updated version of our program should be written this way (8-main.c):

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

/**                                                                                            
 * pmem - print mem                                                                            
 * @p: memory address to start printing from                                                   
 * @bytes: number of bytes to print                                                            
 *                                                                                             
 * Return: nothing                                                                             
 */
void pmem(void *p, unsigned int bytes)
{
    unsigned char *ptr;
    unsigned int i;

    ptr = (unsigned char *)p;
    for (i = 0; i < bytes; i++)
    {
        if (i != 0)
        {
            printf(" ");
        }
        printf("%02x", *(ptr + i));
    }
    printf("\n");
}

/**
 * main - updating with correct checks
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    void *p;
    int i;
    size_t size_of_the_chunk;
    size_t size_of_the_previous_chunk;
    void *chunks[10];
    char prev_used;

    for (i = 0; i < 10; i++)
    {
        p = malloc(1024 * (i + 1));
        chunks[i] = (void *)((char *)p - 0x10);
    }
    free((char *)(chunks[3]) + 0x10);
    free((char *)(chunks[7]) + 0x10);
    for (i = 0; i < 10; i++)
    {
        p = chunks[i];
        printf("chunks[%d]: ", i);
        pmem(p, 0x10);
        size_of_the_chunk = *((size_t *)((char *)p + 8));
        prev_used = size_of_the_chunk & 1;
        size_of_the_chunk -= prev_used;
        size_of_the_previous_chunk = *((size_t *)((char *)p));
        printf("chunks[%d]: %p, size = %li, prev (%s) = %li\n",
              i, p, size_of_the_chunk,
              (prev_used? "allocated": "unallocated"), size_of_the_previous_chunk);
    }
    return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 8-main.c -o 8
julien@holberton:~/holberton/w/hackthevm3$ ./8 
chunks[0]: 00 00 00 00 00 00 00 00 11 04 00 00 00 00 00 00
chunks[0]: 0x1031000, size = 1040, prev (allocated) = 0
chunks[1]: 00 00 00 00 00 00 00 00 11 08 00 00 00 00 00 00
chunks[1]: 0x1031410, size = 2064, prev (allocated) = 0
chunks[2]: 00 00 00 00 00 00 00 00 11 0c 00 00 00 00 00 00
chunks[2]: 0x1031c20, size = 3088, prev (allocated) = 0
chunks[3]: 00 00 00 00 00 00 00 00 11 10 00 00 00 00 00 00
chunks[3]: 0x1032830, size = 4112, prev (allocated) = 0
chunks[4]: 10 10 00 00 00 00 00 00 10 14 00 00 00 00 00 00
chunks[4]: 0x1033840, size = 5136, prev (unallocated) = 4112
chunks[5]: 00 00 00 00 00 00 00 00 11 18 00 00 00 00 00 00
chunks[5]: 0x1034c50, size = 6160, prev (allocated) = 0
chunks[6]: 00 00 00 00 00 00 00 00 11 1c 00 00 00 00 00 00
chunks[6]: 0x1036460, size = 7184, prev (allocated) = 0
chunks[7]: 00 00 00 00 00 00 00 00 11 20 00 00 00 00 00 00
chunks[7]: 0x1038070, size = 8208, prev (allocated) = 0
chunks[8]: 10 20 00 00 00 00 00 00 10 24 00 00 00 00 00 00
chunks[8]: 0x103a080, size = 9232, prev (unallocated) = 8208
chunks[9]: 00 00 00 00 00 00 00 00 11 28 00 00 00 00 00 00
chunks[9]: 0x103c490, size = 10256, prev (allocated) = 0
julien@holberton:~/holberton/w/hackthevm3$ 

Is the heap actually growing upwards?

The last question left unanswered is: “Is the heap actually growing upwards?”. From the brk man page, it seems so:

DESCRIPTION
       brk() and sbrk() change the location of the program break, which defines the end  of  the
       process's  data  segment  (i.e., the program break is the first location after the end of
       the uninitialized data segment).  Increasing the program break has the effect of allocat‐
       ing memory to the process; decreasing the break deallocates memory.

Let’s check! (9-main.c)

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

/**
 * main - moving the program break
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    int i;

    write(1, "START\n", 6);
    malloc(1);
    getchar();
    write(1, "LOOP\n", 5);
    for (i = 0; i < 0x25000 / 1024; i++)
    {
        malloc(1024);
    }
    write(1, "END\n", 4);
    getchar();
    return (EXIT_SUCCESS);
}

Now let’s confirm this assumption with strace:

julien@holberton:~/holberton/w/hackthevm3$ strace ./9 
execve("./9", ["./9"], [/* 61 vars */]) = 0
...
write(1, "START\n", 6START
)                  = 6
brk(0)                                  = 0x1fd8000
brk(0x1ff9000)                          = 0x1ff9000
...
write(1, "LOOP\n", 5LOOP
)                   = 5
brk(0x201a000)                          = 0x201a000
write(1, "END\n", 4END
)                    = 4
...
julien@holberton:~/holberton/w/hackthevm3$ 

clearly, malloc made only two calls to brk to increase the allocated space on the heap. And the second call is using a higher memory address argument (0x201a000 > 0x1ff9000). The second syscall was triggered when the space on the heap was too small to host all the malloc calls.

Let’s double check with /proc.

julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 9-main.c -o 9
julien@holberton:~/holberton/w/hackthevm3$ ./9
START

julien@holberton:/proc/7855$ ps aux | grep \ \./9$
julien     7972  0.0  0.0   4332   684 pts/9    S+   19:08   0:00 ./9
julien@holberton:/proc/7855$ cd /proc/7972
julien@holberton:/proc/7972$ cat maps
...
00901000-00922000 rw-p 00000000 00:00 0                                  [heap]
...
julien@holberton:/proc/7972$ 

-> 00901000-00922000 rw-p 00000000 00:00 0 [heap]
Let’s hit Enter and look at the [heap] again:

LOOP
END

julien@holberton:/proc/7972$ cat maps
...
00901000-00943000 rw-p 00000000 00:00 0                                  [heap]
...
julien@holberton:/proc/7972$ 

-> 00901000-00943000 rw-p 00000000 00:00 0 [heap]
The beginning of the heap is still the same, but the size has increased upwards from 00922000 to 00943000.

The Address Space Layout Randomisation (ASLR)

You may have noticed something “strange” in the /proc/pid/maps listing above, that we want to study:

The program break is the address of the first location beyond the current end of the data region – so the address of the first location beyond the executable in the virtual memory. As a consequence, the heap should start right after the end of the executable in memory. As you can see in all above listing, it is NOT the case. The only thing that is true is that the heap is always the next memory region after the executable, which makes sense since the heap is actually part of the data segment of the executable itself. Also, if we look even closer, the memory gap size between the executable and the heap is never the same:

Format of the following lines: [PID of the above maps listings]: address of the beginning of the [heap] – address of the end of the executable = memory gap size

  • [3718]: 01195000 – 00602000 = b93000
  • [3834]: 024d6000 – 00602000 = 1ed4000
  • [4014]: 00e70000 – 00602000 = 86e000
  • [4172]: 01314000 – 00602000 = d12000
  • [7972]: 00901000 – 00602000 = 2ff000

It seems that this gap size is random, and indeed, it is. If we look at the ELF binary loader source code (fs/binfmt_elf.c) we can find this:

        if ((current->flags & PF_RANDOMIZE) && (randomize_va_space > 1)) {
                current->mm->brk = current->mm->start_brk =
                        arch_randomize_brk(current->mm);
#ifdef compat_brk_randomized
                current->brk_randomized = 1;
#endif
        }

where current->mm->brk is the address of the program break. The arch_randomize_brk function can be found in the arch/x86/kernel/process.c file:

unsigned long arch_randomize_brk(struct mm_struct *mm)
{
        unsigned long range_end = mm->brk + 0x02000000;
        return randomize_range(mm->brk, range_end, 0) ? : mm->brk;
}

The randomize_range returns a start address such that:

    [...... <range> .....]
  start                  end

Source code of the randomize_range function (drivers/char/random.c):

/*
 * randomize_range() returns a start address such that
 *
 *    [...... <range> .....]
 *  start                  end
 *
 * a <range> with size "len" starting at the return value is inside in the
 * area defined by [start, end], but is otherwise randomized.
 */
unsigned long
randomize_range(unsigned long start, unsigned long end, unsigned long len)
{
        unsigned long range = end - len - start;

        if (end <= start + len)
                return 0;
        return PAGE_ALIGN(get_random_int() % range + start);
}

As a result, the offset between the data section of the executable and the program break initial position when the process runs can have a size of anywhere between 0 and 0x02000000. This randomization is known as Address Space Layout Randomisation (ASLR). ASLR is a computer security technique involved in preventing exploitation of memory corruption vulnerabilities. In order to prevent an attacker from jumping to, for example, a particular exploited function in memory, ASLR randomly arranges the address space positions of key data areas of a process, including the positions of the heap and the stack.

The updated VM diagram

With all the above in mind, we can now update our VM diagram:

Virtual memory diagram

malloc(0)

Did you ever wonder what was happening when we call malloc with a size of 0? Let’s check! (10-main.c)

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

/**                                                                                            
 * pmem - print mem                                                                            
 * @p: memory address to start printing from                                                   
 * @bytes: number of bytes to print                                                            
 *                                                                                             
 * Return: nothing                                                                             
 */
void pmem(void *p, unsigned int bytes)
{
    unsigned char *ptr;
    unsigned int i;

    ptr = (unsigned char *)p;
    for (i = 0; i < bytes; i++)
    {
        if (i != 0)
        {
            printf(" ");
        }
        printf("%02x", *(ptr + i));
    }
    printf("\n");
}

/**
 * main - moving the program break
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    void *p;
    size_t size_of_the_chunk;
    char prev_used;

    p = malloc(0);
    printf("%p\n", p);
    pmem((char *)p - 0x10, 0x10);
    size_of_the_chunk = *((size_t *)((char *)p - 8));
    prev_used = size_of_the_chunk & 1;
    size_of_the_chunk -= prev_used;
    printf("chunk size = %li bytes\n", size_of_the_chunk);
    return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm3$ gcc -Wall -Wextra -pedantic -Werror 10-main.c -o 10
julien@holberton:~/holberton/w/hackthevm3$ ./10
0xd08010
00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00
chunk size = 32 bytes
julien@holberton:~/holberton/w/hackthevm3$ 

-> malloc(0) is actually using 32 bytes, including the first 0x10 bytes.

Again, note that this will not always be the case. From the man page (man malloc):

NULL may also be returned by a successful call to malloc() with a size of zero

Outro

We have learned a couple of things about malloc and the heap. But there is actually more than brk and sbrk. You can try malloc’ing a big chunk of memory, strace it, and look at /proc to learn more before we cover it in a next chapter ?

Also, studying how free works in coordination with malloc is something we haven’t covered yet. If you want to look at it, you will find part of the answer to why the minimum chunk size is 32 (when we ask malloc for 0 bytes) vs 16 (0x10 in hexadecimal) or 0.

As usual, to be continued! Let me know if you have something you would like me to cover in the next chapter.

Questions? Feedback?

If you have questions or feedback don’t hesitate to ping us on Twitter at @holbertonschool or @julienbarbier42.
Haters, please send your comments to /dev/null.

Happy Hacking!

Thank you for reading!

As always, no-one is perfect (except Chuck of course), so don’t hesitate to contribute or send me your comments if you find anything I missed.

Files

This repo contains the source code (naive_malloc.c, version.c & “X-main.c` files) for programs created in this tutorial.

Read more about the virtual memory

Follow @holbertonschool or @julienbarbier42 on Twitter to get the next chapters!

Many thanks to Tim, Anne and Ian for proof-reading! ?

Hack the Virtual Memory: drawing the VM diagram

Hack The Virtual Memory, chapter 2: Drawing the VM diagram

We previously talked about what you could find in the virtual memory of a process, and where you could find it. Today, we will try to “reconstruct” (part of) the following diagram by making our process print addresses of various elements of the program.

the virtual memory

Prerequisites

In order to fully understand this article, you will need to know:

  • The basics of the C programming language
  • A little bit of assembly (but not required)
  • The very basics of the Linux filesystem and the shell
  • We will also use the /proc/[pid]/maps file (see man proc or read our first article Hack The Virtual Memory, chapter 0: C strings & /proc)

Environment

All scripts and programs have been tested on the following system:

  • Ubuntu
    • Linux ubuntu 4.4.0-31-generic #50~14.04.1-Ubuntu SMP Wed Jul 13 01:07:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
    • Everything we will write will be true for this system, but may be different on another system

Tools used:

  • gcc
    • gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
  • objdump
    • GNU objdump (GNU Binutils for Ubuntu) 2.24
  • udcli
    • udis86 1.7.2
  • bc
    • bc 1.06.95

The stack

The first thing we want to locate in our diagram is the stack. We know that in C, local variables are located on the stack. So if we print the address of a local variable, it should give us an idea on where we would find the stack in the virtual memory. Let’s use this program (main-1.c) to find out:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

/**
 * main - print locations of various elements
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    int a;

    printf("Address of a: %p\n", (void *)&a);
    return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -pedantic -Werror main-0.c -o 0
julien@holberton:~/holberton/w/hackthevm2$ ./0
Address of a: 0x7ffd14b8bd9c
julien@holberton:~/holberton/w/hackthevm2$ 

This will be our first point of reference when we will compare other elements’ addresses.

The heap

The heap is used when you malloc space for your variables. Let’s add a line to use malloc and see where the memory address returned by malloc is located (main-1.c):

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

/**
 * main - print locations of various elements
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    int a;
    void *p;

    printf("Address of a: %p\n", (void *)&a);
    p = malloc(98);
    if (p == NULL)
    {
        fprintf(stderr, "Can't malloc\n");
        return (EXIT_FAILURE);
    }
    printf("Allocated space in the heap: %p\n", p);
    return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -pedantic -Werror main-1.c -o 1
julien@holberton:~/holberton/w/hackthevm2$ ./1 
Address of a: 0x7ffd4204c554
Allocated space in the heap: 0x901010
julien@holberton:~/holberton/w/hackthevm2$ 

It’s now clear that the heap (0x901010) is way below the stack (0x7ffd4204c554). At this point we can already draw this diagram:

heap and stack

The executable

Your program is also in the virtual memory. If we print the address of the main function, we should have an idea of where the program is located compared to the stack and the heap. Let’s see if we find it below the heap as expected (main-2.c):

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

/**
 * main - print locations of various elements
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    int a;
    void *p;

    printf("Address of a: %p\n", (void *)&a);
    p = malloc(98);
    if (p == NULL)
    {
        fprintf(stderr, "Can't malloc\n");
        return (EXIT_FAILURE);
    }
    printf("Allocated space in the heap: %p\n", p);
    printf("Address of function main: %p\n", (void *)main);
    return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -Werror main-2.c -o 2
julien@holberton:~/holberton/w/hackthevm2$ ./2 
Address of a: 0x7ffdced37d74
Allocated space in the heap: 0x2199010
Address of function main: 0x40060d
julien@holberton:~/holberton/w/hackthevm2$ 

It seems that our program (0x40060d) is located below the heap (0x2199010), just as expected.
But let’s make sure that this is the actual code of our program, and not some sort of pointer to another location. Let’s disassemble our program 2 with objdump to look at the “memory address” of the main function:

julien@holberton:~/holberton/w/hackthevm2$ objdump -M intel -j .text -d 2 | grep '<main>:' -A 5
000000000040060d <main>:
  40060d:   55                      push   rbp
  40060e:   48 89 e5                mov    rbp,rsp
  400611:   48 83 ec 10             sub    rsp,0x10
  400615:   48 8d 45 f4             lea    rax,[rbp-0xc]
  400619:   48 89 c6                mov    rsi,rax

000000000040060d <main> -> we find the exact same address (0x40060d). If you still have any doubts, you can print the first bytes located at this address, to make sure they match the output of objdump (main-3.c):

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

/**
 * main - print locations of various elements
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    int a;
    void *p;
    unsigned int i;

    printf("Address of a: %p\n", (void *)&a);
    p = malloc(98);
    if (p == NULL)
    {
        fprintf(stderr, "Can't malloc\n");
        return (EXIT_FAILURE);
    }
    printf("Allocated space in the heap: %p\n", p);
    printf("Address of function main: %p\n", (void *)main);
    printf("First bytes of the main function:\n\t");
    for (i = 0; i < 15; i++)
    {
        printf("%02x ", ((unsigned char *)main)[i]);
    }
    printf("\n");
    return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -Werror main-3.c -o 3
julien@holberton:~/holberton/w/hackthevm2$ objdump -M intel -j .text -d 3 | grep '<main>:' -A 5
000000000040064d <main>:
  40064d:   55                      push   rbp
  40064e:   48 89 e5                mov    rbp,rsp
  400651:   48 83 ec 10             sub    rsp,0x10
  400655:   48 8d 45 f0             lea    rax,[rbp-0x10]
  400659:   48 89 c6                mov    rsi,rax
julien@holberton:~/holberton/w/hackthevm2$ ./3 
Address of a: 0x7ffeff0f13b0
Allocated space in the heap: 0x8b3010
Address of function main: 0x40064d
First bytes of the main function:
    55 48 89 e5 48 83 ec 10 48 8d 45 f0 48 89 c6 
julien@holberton:~/holberton/w/hackthevm2$ echo "55 48 89 e5 48 83 ec 10 48 8d 45 f0 48 89 c6" | udcli -64 -x -o 40064d
000000000040064d 55               push rbp                
000000000040064e 4889e5           mov rbp, rsp            
0000000000400651 4883ec10         sub rsp, 0x10           
0000000000400655 488d45f0         lea rax, [rbp-0x10]     
0000000000400659 4889c6           mov rsi, rax            
julien@holberton:~/holberton/w/hackthevm2$

-> We can see that we print the same address and the same content. We are now triple sure this is our main function.

You can download the Udis86 Disassembler Library here.

Here is the updated diagram, based on what we have learned:

stack, heap and executable

Command line arguments and environment variables

The main function can take arguments:

  • The command line arguments
    • the first argument of the main function (usually named argc or ac) is the number of command line arguments
    • the second argument of the main function (usually named argv or av) is an array of pointers to the arguments (C strings)
  • The environment variables
    • the third argument of the main function (usally named env or envp) is an array of pointers to the environment variables (C strings)

Let’s see where those elements stand in the virtual memory of our process (main-4.c):

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

/**
 * main - print locations of various elements
 *
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 */
int main(int ac, char **av, char **env)
{
        int a;
        void *p;
        int i;

        printf("Address of a: %p\n", (void *)&a);
        p = malloc(98);
        if (p == NULL)
        {
                fprintf(stderr, "Can't malloc\n");
                return (EXIT_FAILURE);
        }
        printf("Allocated space in the heap: %p\n", p);
        printf("Address of function main: %p\n", (void *)main);
        printf("First bytes of the main function:\n\t");
        for (i = 0; i < 15; i++)
        {
                printf("%02x ", ((unsigned char *)main)[i]);
        }
        printf("\n");
        printf("Address of the array of arguments: %p\n", (void *)av);
        printf("Addresses of the arguments:\n\t");
        for (i = 0; i < ac; i++)
        {
                printf("[%s]:%p ", av[i], av[i]);
        }
        printf("\n");
        printf("Address of the array of environment variables: %p\n", (void *)env);
    printf("Address of the first environment variable: %p\n", (void *)(env[0]));
        return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -Werror main-4.c -o 4
julien@holberton:~/holberton/w/hackthevm2$ ./4 Hello Holberton School!
Address of a: 0x7ffe7d6d8da0
Allocated space in the heap: 0xc8c010
Address of function main: 0x40069d
First bytes of the main function:
    55 48 89 e5 48 83 ec 30 89 7d ec 48 89 75 e0 
Address of the array of arguments: 0x7ffe7d6d8e98
Addresses of the arguments:
    [./4]:0x7ffe7d6da373 [Hello]:0x7ffe7d6da377 [Holberton]:0x7ffe7d6da37d [School!]:0x7ffe7d6da387 
Address of the array of environment variables: 0x7ffe7d6d8ec0
Address of the first environment variables:
    [0x7ffe7d6da38f]:"XDG_VTNR=7"
    [0x7ffe7d6da39a]:"XDG_SESSION_ID=c2"
    [0x7ffe7d6da3ac]:"CLUTTER_IM_MODULE=xim"
julien@holberton:~/holberton/w/hackthevm2$ 

These elements are above the stack as expected, but now we know the exact order: stack (0x7ffe7d6d8da0) < argv (0x7ffe7d6d8e98) < env (0x7ffe7d6d8ec0) < arguments (from 0x7ffe7d6da373 to 0x7ffe7d6da387 + 8 (8 = size of the string "school" + 1 for the '\0' char)) < environment variables (starting at 0x7ffe7d6da38f).

Actually, we can also see that all the command line arguments are next to each other in the memory, and also right next to the environment variables.

Are the argv and env arrays next to each other?

The array argv is 5 elements long (there were 4 arguments from the command line + 1 NULL element at the end (argv always ends with NULL to mark the end of the array)). Each element is a pointer to a char and since we are on a 64-bit machine, a pointer is 8 bytes (if you want to make sure, you can use the C operator sizeof() to get the size of a pointer). As a result our argv array is of size 5 * 8 = 40. 40 in base 10 is 0x28 in base 16. If we add this value to the address of the beginning of the array 0x7ffe7d6d8e98, we get… 0x7ffe7d6d8ec0 (The address of the env array)! So the two arrays are next to each other in memory.

Is the first command line argument stored right after the env array?

In order to check this we need to know the size of the env array. We know that it ends with a NULL pointer, so in order to get the number of its elements we simply need to loop through it, checking if the “current” element is NULL. Here’s the updated C code (main-5.c):

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

/**                                                                                                      
 * main - print locations of various elements                                                            
 *                                                                                                       
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS                                      
 */
int main(int ac, char **av, char **env)
{
     int a;
     void *p;
     int i;
     int size;

     printf("Address of a: %p\n", (void *)&a);
     p = malloc(98);
     if (p == NULL)
     {
          fprintf(stderr, "Can't malloc\n");
          return (EXIT_FAILURE);
     }
     printf("Allocated space in the heap: %p\n", p);
     printf("Address of function main: %p\n", (void *)main);
     printf("First bytes of the main function:\n\t");
     for (i = 0; i < 15; i++)
     {
          printf("%02x ", ((unsigned char *)main)[i]);
     }
     printf("\n");
     printf("Address of the array of arguments: %p\n", (void *)av);
     printf("Addresses of the arguments:\n\t");
     for (i = 0; i < ac; i++)
     {
          printf("[%s]:%p ", av[i], av[i]);
     }
     printf("\n");
     printf("Address of the array of environment variables: %p\n", (void *)env);
     printf("Address of the first environment variables:\n");
     for (i = 0; i < 3; i++)
     {
          printf("\t[%p]:\"%s\"\n", env[i], env[i]);
     }
     /* size of the env array */
     i = 0;
     while (env[i] != NULL)
     {
          i++;
     }
     i++; /* the NULL pointer */
     size = i * sizeof(char *);
     printf("Size of the array env: %d elements -> %d bytes (0x%x)\n", i, size, size);
     return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm2$ ./5 Hello Betty Holberton!
Address of a: 0x7ffc77598acc
Allocated space in the heap: 0x2216010
Address of function main: 0x40069d
First bytes of the main function:
    55 48 89 e5 48 83 ec 40 89 7d dc 48 89 75 d0 
Address of the array of arguments: 0x7ffc77598bc8
Addresses of the arguments:
    [./5]:0x7ffc7759a374 [Hello]:0x7ffc7759a378 [Betty]:0x7ffc7759a37e [Holberton!]:0x7ffc7759a384 
Address of the array of environment variables: 0x7ffc77598bf0
Address of the first environment variables:
    [0x7ffc7759a38f]:"XDG_VTNR=7"
    [0x7ffc7759a39a]:"XDG_SESSION_ID=c2"
    [0x7ffc7759a3ac]:"CLUTTER_IM_MODULE=xim"
Size of the array env: 62 elements -> 496 bytes (0x1f0)
julien@holberton:~/holberton/w/hackthevm2$ bc
bc 1.06.95
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'. 
obase=16
ibase=16
1F0+7FFC77598BF0
7FFC77598DE0
quit
julien@holberton:~/holberton/w/hackthevm2$ 

-> 7FFC77598DE0 != (but still <) 0x7ffc7759a374. So the answer is no ?

Wrapping up

Let’s update our diagram with what we learned.

virtual memory with command line arguments and environment variables

Is the stack really growing downwards?

Let’s call a function and figure this out! If this is true, then the variables of the calling function will be higher in memory than those from the called function (main-6.c).

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

/**                                                                                                      
 * f - print locations of various elements                                                               
 *                                                                                                       
 * Returns: nothing                                                                                      
 */
void f(void)
{
     int a;
     int b;
     int c;

     a = 98;
     b = 1024;
     c = a * b;
     printf("[f] a = %d, b = %d, c = a * b = %d\n", a, b, c);
     printf("[f] Adresses of a: %p, b = %p, c = %p\n", (void *)&a, (void *)&b, (void *)&c);
}

/**                                                                                                      
 * main - print locations of various elements                                                            
 *                                                                                                       
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS                                      
 */
int main(int ac, char **av, char **env)
{
     int a;
     void *p;
     int i;
     int size;

     printf("Address of a: %p\n", (void *)&a);
     p = malloc(98);
     if (p == NULL)
     {
          fprintf(stderr, "Can't malloc\n");
          return (EXIT_FAILURE);
     }
     printf("Allocated space in the heap: %p\n", p);
     printf("Address of function main: %p\n", (void *)main);
     printf("First bytes of the main function:\n\t");
     for (i = 0; i < 15; i++)
     {
          printf("%02x ", ((unsigned char *)main)[i]);
     }
     printf("\n");
     printf("Address of the array of arguments: %p\n", (void *)av);
     printf("Addresses of the arguments:\n\t");
     for (i = 0; i < ac; i++)
     {
          printf("[%s]:%p ", av[i], av[i]);
     }
     printf("\n");
     printf("Address of the array of environment variables: %p\n", (void *)env);
     printf("Address of the first environment variables:\n");
     for (i = 0; i < 3; i++)
     {
          printf("\t[%p]:\"%s\"\n", env[i], env[i]);
     }
     /* size of the env array */
     i = 0;
     while (env[i] != NULL)
     {
          i++;
     }
     i++; /* the NULL pointer */
     size = i * sizeof(char *);
     printf("Size of the array env: %d elements -> %d bytes (0x%x)\n", i, size, size);
     f();
     return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -Werror main-6.c -o 6
julien@holberton:~/holberton/w/hackthevm2$ ./6
Address of a: 0x7ffdae53ea4c
Allocated space in the heap: 0xf32010
Address of function main: 0x4006f9
First bytes of the main function:
    55 48 89 e5 48 83 ec 40 89 7d dc 48 89 75 d0 
Address of the array of arguments: 0x7ffdae53eb48
Addresses of the arguments:
    [./6]:0x7ffdae54038b 
Address of the array of environment variables: 0x7ffdae53eb58
Address of the first environment variables:
    [0x7ffdae54038f]:"XDG_VTNR=7"
    [0x7ffdae54039a]:"XDG_SESSION_ID=c2"
    [0x7ffdae5403ac]:"CLUTTER_IM_MODULE=xim"
Size of the array env: 62 elements -> 496 bytes (0x1f0)
[f] a = 98, b = 1024, c = a * b = 100352
[f] Adresses of a: 0x7ffdae53ea04, b = 0x7ffdae53ea08, c = 0x7ffdae53ea0c
julien@holberton:~/holberton/w/hackthevm2$ 

-> True! (address of var a in function f) 0x7ffdae53ea04 < 0x7ffdae53ea4c (address of var a in function main)

We now update our diagram:

stack is growing downwards

/proc

Let’s double check everything we found so far with /proc/[pid]/maps (man proc or refer to the first article in this series to learn about the proc filesystem if you don’t know what it is).

Let’s add a getchar() to our program so that we can look at its “/proc” (main-7.c):

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

/**                                                                                                      
 * f - print locations of various elements                                                               
 *                                                                                                       
 * Returns: nothing                                                                                      
 */
void f(void)
{
     int a;
     int b;
     int c;

     a = 98;
     b = 1024;
     c = a * b;
     printf("[f] a = %d, b = %d, c = a * b = %d\n", a, b, c);
     printf("[f] Adresses of a: %p, b = %p, c = %p\n", (void *)&a, (void *)&b, (void *)&c);
}

/**                                                                                                      
 * main - print locations of various elements                                                            
 *                                                                                                       
 * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS                                      
 */
int main(int ac, char **av, char **env)
{
     int a;
     void *p;
     int i;
     int size;

     printf("Address of a: %p\n", (void *)&a);
     p = malloc(98);
     if (p == NULL)
     {
          fprintf(stderr, "Can't malloc\n");
          return (EXIT_FAILURE);
     }
     printf("Allocated space in the heap: %p\n", p);
     printf("Address of function main: %p\n", (void *)main);
     printf("First bytes of the main function:\n\t");
     for (i = 0; i < 15; i++)
     {
          printf("%02x ", ((unsigned char *)main)[i]);
     }
     printf("\n");
     printf("Address of the array of arguments: %p\n", (void *)av);
     printf("Addresses of the arguments:\n\t");
     for (i = 0; i < ac; i++)
     {
          printf("[%s]:%p ", av[i], av[i]);
     }
     printf("\n");
     printf("Address of the array of environment variables: %p\n", (void *)env);
     printf("Address of the first environment variables:\n");
     for (i = 0; i < 3; i++)
     {
          printf("\t[%p]:\"%s\"\n", env[i], env[i]);
     }
     /* size of the env array */
     i = 0;
     while (env[i] != NULL)
     {
          i++;
     }
     i++; /* the NULL pointer */
     size = i * sizeof(char *);
     printf("Size of the array env: %d elements -> %d bytes (0x%x)\n", i, size, size);
     f();
     getchar();
     return (EXIT_SUCCESS);
}
julien@holberton:~/holberton/w/hackthevm2$ gcc -Wall -Wextra -Werror main-7.c -o 7
julien@holberton:~/holberton/w/hackthevm2$ ./7 Rona is a Legend SRE
Address of a: 0x7fff16c8146c
Allocated space in the heap: 0x2050010
Address of function main: 0x400739
First bytes of the main function:
    55 48 89 e5 48 83 ec 40 89 7d dc 48 89 75 d0 
Address of the array of arguments: 0x7fff16c81568
Addresses of the arguments:
    [./7]:0x7fff16c82376 [Rona]:0x7fff16c8237a [is]:0x7fff16c8237f [a]:0x7fff16c82382 [Legend]:0x7fff16c82384 [SRE]:0x7fff16c8238b 
Address of the array of environment variables: 0x7fff16c815a0
Address of the first environment variables:
    [0x7fff16c8238f]:"XDG_VTNR=7"
    [0x7fff16c8239a]:"XDG_SESSION_ID=c2"
    [0x7fff16c823ac]:"CLUTTER_IM_MODULE=xim"
Size of the array env: 62 elements -> 496 bytes (0x1f0)
[f] a = 98, b = 1024, c = a * b = 100352
[f] Adresses of a: 0x7fff16c81424, b = 0x7fff16c81428, c = 0x7fff16c8142c

julien@holberton:~$ ps aux | grep "./7" | grep -v grep
julien     5788  0.0  0.0   4336   628 pts/8    S+   18:04   0:00 ./7 Rona is a Legend SRE
julien@holberton:~$ cat /proc/5788/maps
00400000-00401000 r-xp 00000000 08:01 171828                             /home/julien/holberton/w/hackthevm2/7
00600000-00601000 r--p 00000000 08:01 171828                             /home/julien/holberton/w/hackthevm2/7
00601000-00602000 rw-p 00001000 08:01 171828                             /home/julien/holberton/w/hackthevm2/7
02050000-02071000 rw-p 00000000 00:00 0                                  [heap]
7f68caa1c000-7f68cabd6000 r-xp 00000000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f68cabd6000-7f68cadd6000 ---p 001ba000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f68cadd6000-7f68cadda000 r--p 001ba000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f68cadda000-7f68caddc000 rw-p 001be000 08:01 136253                     /lib/x86_64-linux-gnu/libc-2.19.so
7f68caddc000-7f68cade1000 rw-p 00000000 00:00 0 
7f68cade1000-7f68cae04000 r-xp 00000000 08:01 136229                     /lib/x86_64-linux-gnu/ld-2.19.so
7f68cafe8000-7f68cafeb000 rw-p 00000000 00:00 0 
7f68cafff000-7f68cb003000 rw-p 00000000 00:00 0 
7f68cb003000-7f68cb004000 r--p 00022000 08:01 136229                     /lib/x86_64-linux-gnu/ld-2.19.so
7f68cb004000-7f68cb005000 rw-p 00023000 08:01 136229                     /lib/x86_64-linux-gnu/ld-2.19.so
7f68cb005000-7f68cb006000 rw-p 00000000 00:00 0 
7fff16c62000-7fff16c83000 rw-p 00000000 00:00 0                          [stack]
7fff16d07000-7fff16d09000 r--p 00000000 00:00 0                          [vvar]
7fff16d09000-7fff16d0b000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
julien@holberton:~$ 

Let’s check a few things:

  • The stack starts at 7fff16c62000 and ends at 7fff16c83000. Our variables are all inside this region (0x7fff16c8146c, 0x7fff16c81424, 0x7fff16c81428, 0x7fff16c8142c)
  • The heap starts at 02050000 and ends at 02071000. Our allocated memory is in there (0x2050010)
  • Our code (the main function) is located at address 0x400739, so in the following region:
    00400000-00401000 r-xp 00000000 08:01 171828 /home/julien/holberton/w/hackthevm2/7
    It comes from the file /home/julien/holberton/w/hackthevm2/7 (our executable) and this region has execution permissions, which also makes sense.
  • The arguments and environment variables (from 0x7fff16c81568 to 0x7fff16c8238f + 0x1f0) are located in the region starting at 7fff16c62000 and ending at 7fff16c83000… the stack! ? So they are IN the stack, not outside the stack.

This also brings up more questions:

  • Why is our executable “divided” into three different regions with different permissions? What is inside these two regions?
    • 00600000-00601000 r--p 00000000 08:01 171828 /home/julien/holberton/w/hackthevm2/7
    • 00601000-00602000 rw-p 00001000 08:01 171828 /home/julien/holberton/w/hackthevm2/7
  • What are all those other regions?
  • Why our allocated memory does not start at the very beginning of the heap (0x2050010 vs 02050000)? What are those first 16 bytes used for?

There is also another fact that we haven’t checked: Is the heap actually growing upwards?

We’ll find out another day! But before we end this chapter, let’s update our diagram with everything we’ve learned:

the virtual memory

Outro

We have learned a ton of things by simply printing informations from our executables! But we still have open questions that we will explore in a future chapter to complete our diagram of the virtual memory. In the meantime, you should try to find out yourself.

Questions? Feedback?

If you have questions or feedback don’t hesitate to ping us on Twitter at @holbertonschool or @julienbarbier42.
Haters, please send your comments to /dev/null.

Happy Hacking!

Thank you for reading!

As always, no-one is perfect (except Chuck of course), so don’t hesitate to contribute or send me your comments if you find anything I missed.

Files

This repo contains the source code (main-X.c files) for programs created in this tutorial.

Read more about the virtual memory

Follow @holbertonschool or @julienbarbier42 on Twitter to get the next chapters! This was the third chapter in our series on the virtual memory.

Many thanks to Tim and SantaCruzDad for English proof-reading! ?