Recently I enrolled in Data Science Specialization at Coursera.
The first class is The Data Scientist’s Toolbox which requires you to submit the course project. This assignment is designed to make sure that you have done the basic software setup that will get you through the rest of the Data Science Specialization. This course project consists of four parts.
One of the parts requires you to:
- Create a text file called HelloWorld.md
- Add the line “## This is a markdown file” (without the quotation marks) to the document
- Push the document to the datasciencecoursera repo you created on Github
- Submit the link to the HelloWorld.md file on your Github repo.
I have never used Git for versioning as a developer with over 7 years of experience. I have accessed Github only to download Zend Framework 2 and the Skeleton Application that can be downloaded via zipped file.
So with no experience I had to figure out how to do all the requirements the proper way.
NOTE: In this article I assume that you have installed Git and have created an account on GitHub. Also I assume that you have created a repository on GitHub. These things are pretty straightforward to do and that’s why I won’t cover it.
First thing to do is either to install Git or check whether you have one. As I have already installed Xcode 6 on my Mac it is clear that I have Git, but to make sure lets open up the Terminal.
NOTE: MacOS has Terminal application pre-installed and you can find it in the Applications -> Utilities folder. You can also quickly access it using Spotlight. The terminal has a variety of uses, but for the purposes of this tutorial we’ll be using a syntax/command set called Bash. Terminal is already configured to use this syntax.
You have to type “git –version” in the Terminal window. It will either list Git version that is currently installed or if you don’t have one the output will be “-bash: git: command not found”.
In case you don’t have Git installed go to git-scm.com download it and install.
To find out what is the path of Git installation type “which git” and in my case it’s “/usr/bin/git” directory.
The first thing you should do when you install Git is to set your user name and e-mail address. This is important because every Git commit uses this information, and it’s immutably baked into the commits you start creating:
git config –global user.name “John Doe”
git config –global user.email firstname.lastname@example.org
In order to use Git and later proceed with GitHub you have to identify a folder as a Git repository. You can do that by issuing a “git init” command within a folder.
Let’s create a folder which you can place inside the Documents folder and name it “DataScience” (You can place the folder anywhere you like, and you can call it anything you want).
In your Terminal application type:
When the folder is created you can initiate it as a Git repository. In terminal issue: git init
NOTE: In case if you haven’t initiated your working directory as a Git repository, whenever you try to issue any Git command you will be given an error: fatal: Not a git repository (or any of the parent directories): .git
In order to proceed we have to pull existing repository from GitHub to which we are going to push the HelloWorld.md file that is required by the course project.
Let’s set up connection with GitHub. There are two ways to set up connection, it’s via HTTPS or SSH. We will cover HTTPS connection.
Navigate to your GitHub repository and find text field that contains HTTPS link and copy it. In Terminal type:
git remote add <alias> <yourRepoLink>
alias stands for the connection name that you will use later when addressing your reposiotry (you can give it whatever name you like) and yourRepoLink is the link that you copied few seconds ago.
In my case it looks like this: git remote add datascience https://github.com/mrgott/datasciencecoursera.git
Now you can type “git remote” and in the list you can see that one remote connection named “datascience” has been added. So later, whenever you need to connect to your repository you will directly use this alias.
Now it’s time to pull existing repository from GitHub to the directory you created.
“git pull datascience master” – this command tells git that you want to pull from a remote repository datascience which you added earlier and you want to download the branch called master.
In terminal you might encounter the following:
You’re in the text editor, vim! It’s a modal text editor, so you need to:
- Press i to enter insert mode.
- Now you can type your message, as if you were in a normal (non-modal) text editor.
- Press esc to go back to command mode.
- Then type :w followed by enter to save.
- Finally :q followed by enter to quit.
This solution has been taken from Stackoverflow – Github locks up mac terminal when using pull command.
If you navigate to Documents/DataScience directory you will see that the contents of your remote repository have been download to your local directory. It means that you are having a copy of your remote repository locally.
Now it’s time to create a markdown file HelloWorld.md and write contents “## This is a markdown file” to it. You can create it with your favorite text editor or create it with Terminal. Let’s do it using command line.
To create a file you type “touch HelloWorld.md” and write contents to it “echo “## This is a markdown file” >> HelloWorld.md”.
When the file is ready you have to let Git know about it to track it via adding the file to staging environment by “git add HelloWorld.md” command. If you type “git status” you will see that file HelloWorld.md has been added and is ready to be committed.
As you work through a project you can keep on adding files to the staging environment until you are ready to add them to a permanent record. The “git status” command will always show you what is going on with Git at any particular moment.
It shows you what’s in the staging queue or any files that are untracked by Git at the moment. As for now you are ready to make a record of your environment at this time. The commit command makes a record of the state of the folder. Whatever was in the staging environment will be locked as record in the commit log. So let’s commit the file we added to staging environment:
git commit –m “Enter the message of your commit”
After you run the commit command you can run “git status” command to check the status of your project and you’ll get message:
On branch master
nothing to commit, working directory clean
Right now you are ready to push your project to GitHub (your remote repository) and share your files with the world.
To push your files to GitHub you need to issue command: git push –u <repositoryAliasName> <yourBranch> where repositoryAliasName is a remote that we created earlier and yourBranch is the name of your local branch you want to push to GitHub. By default it will be master. In my case it looks like this:
git push –u datascience master
After you process this command you’ll be prompted to enter your GitHub username and password. Enter it and you’ll be presented with the upload status.
This is a really simple intro into Git and GitHub world. The purpose of this article was to share the steps how to create file and push it to GitHub in order to accomplish The Data Scientist’s Toolbox project on Coursera.com.
If you want to learn more about Git and GitHub you can watch video tutorials by Lynda.com: Git Essential Training with Kevin Skoglund and Up and Running with Git and GitHub with Ray Villalobos