Newer
Older
# Laborpraktikum Data Analysis WiSe 2023/2024
The purpose of this course is to provide an hands-on introduction on foundations of probability as well as an insight into the field of Data Analysis with large sets of experimental data. The students will learn how to use and understand basic tools and methods which are used by researchers involved in searches for astrophysical signals in both the electromagnetic and gravitational spectrum.
**Join on [STUD IP](https://studip.uni-hannover.de/dispatch.php/course/overview?cid=78c7b37a5454b800d82e105c1ac41e47)**
# Table of Contents
[Resources](resources.md)
1. [Schedule](#schedule)
1. [Setting up](#lab-1)
1. [Prerequisites](#prerequisites)
2. [Generate SSH key](#generate-a-ssh-key)
3. [Clone repository](#get-the-code)
4. [First commit](#your-first-commit)
5. [Rebase from upstream](#rebase-from-upstream)
2. [Local enviroment](#environment)
3. [Update commit](#updating-your-code)
4. [Permission error](#permission-error)
## Introductory information
### Objectives
* Gain familiarity with git, python, jupiter notebooks and tools related to Data analysis
* Simulate simple probabilistic experiments and problems. Compare experiment outcomes (empirical results) with probability estimations (theoretical results)
* Learn how to document your findings and write a report
* Cope with problems (e.g. software issues, code bugs etc.) and learn how to handle them **independently**
### Rules (VERY IMPORTANT)
* There will be 8 (or maybe 9) sheets: the main concepts and the general structure will be illustrated and amply discussed in class. The remaining work will have to be done at home.
1. **CAVEAT**: Every report must contain explanatory text that neatly explains both theory and code.
- It is strongly suggested not to miss any class. No more than one class is supposed to be missed.
- No class participation and extra good sheet **is not** a good sign
- Python only
- For **any** problem: just contact me -> jasper.martins@aei.mpg.de
- Deadlines: TBA
- Final report: put all sheets together. Can’t miss any sheet.
## Schedule
**Tuesdays 14:00 to 18:00 Room 103 or 106, Callinstr 38 (AEI main entrance)**
This course awards a total of 4 ECTS credits. To each ECTS credit corresponds an average or 30 hours of workload. According to this, and based on our planned weekly schedule, for each hour spent in class, you are (on average) supposed to study 1.5 hours at home.
More infos at: [LEISTUNGSPUNKTE](https://www.uni-hannover.de/de/studium/vor-dem-studium/orientierung-studienentscheidung/studienaufbau#c89762)
* **17<sup>th</sup> October 2023** Introduction/First steps
* **24<sup>th</sup> October - 7<sup>th</sup> November 2023** Monte Carlo Methods
* **14<sup>th</sup> November 2023** Basic principles of Bayesian inference
* **21<sup>st</sup> - 28<sup>th</sup> November 2023** Combinatorics
* ...
**First chance: TBA / Second chance: TBA - All solutions and report due**
Submit your report online by pushing your code to the repository within the dates above mentioned, 23:59 CEST.
Include also a **.pdf** version of your report.
If pushing to your repository does not work for whatever reason, please print out your report and put it in an envelope, write on the envolpe my name (Jasper Martins) and "Laborpraktikum Data Analysis 2024" and place it in the pidgeonbox marked "P" at the AEI institute first floor, opposite room 128 by the dates above mentioned. Your name and matriculation number as well as your Studies Topic (E.g: MSc Physics) should be on the first page of the report.
## First Steps
### Prerequisites
#### Get git and using GitLab
* Install [git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git).
* [Login to the University GitLab](https://gitlab.uni-hannover.de) (login with WebSSO)
#### Generate a ssh key
> To make your life easier when you upload the solutions to the next exercises you should now generate on your machine a `ssh` key that will allow you to do operations on your repository without being asked for a username and password each time.
* Full guide at [this link](https://docs.gitlab.com/ee/user/ssh.html#rsa-ssh-keys)
#### Get the code
* Once you are logged in with your account, fork this repository by pressing the fork button on the upper right corner of this repository's page.

> Now you should have __your own repository__ in your namespace called datalab `<your_username>/dalab_2024`.
* You should also have a ssh key added to your account to continue - if not use the _'HTTPS'_ link for the repository - you will be prompted for a username and password everytime. Copy the git url of this repository by going to your github page, the repository and clicking on _Code_>_SSH_>_copy_:

* Open a command line/terminal and clone your repository. The command should look something like:
```
git clone git@gitlab.uni-hannover.de/<your_username>/data_analysis_lab_2024.git
This will automatically create a new folder called `data_analysis_lab_2024` inside the folder where you ran the command and will give you an error if such a folder exists. If you want the folder to have another name run `git clone git@gitlab.uni-hannover.de:<your_username>/data_analysis_lab_2024 <new_folder_name>`, e.g: `git clone git@gitlab.uni-hannover.de:name.lastname/data_analysis_lab_2024`. If you want to move the entire folder after you have cloned it, everything will work fine as the git references are kept in hidden files inside the folder.
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
#### Your first commit
* Create a new file in `dalab_2024/solutions/exercise_1.py` and push your changes to your repository.
<details>
<summary>Solution here</summary>
Go to your `dalab_2024` folder. Make a new folder called `solutions`:
```
$ mkdir solutions
```
Create a new file called `exercise_1.py` with any method.
```
$ touch solutions/exercise_1.py
```
Check the changes to your repository
```
$ git status
```
Commit the changes and then push them:
```
$ git add .
$ git commit -m "Saving my changes."
$ git log
$ git push origin main
```
</details>
#### Rebase from upstream
To get new changes that are pushed to __this__ main repository the simplest way is to add an upstream and rebase your code. Before you rebase you should commit all your local changes that you want to keep. Try it yourself using this [link](https://medium.com/@topspinj/how-to-git-rebase-into-a-forked-repo-c9f05e821c8a)
<details>
<summary>Solution here</summary>
To see what repositorities you are tracking run `git remote -v` - The output will probably look like this
```
$ git remote -v
origin git@gitlab.uni-hannover.de:<your_username>/data_analysis_lab_2024.git (fetch)
origin git@gitlab.uni-hannover.de:<your_username>/data_analysis_lab_2024.git (push)
```
Because you did the fork from the interface you can also get the new changes from the interface. But the better way to it is to add a _'remote'_ pointing to the fork (Add a keyname for the main repository). The textbook name for a repo you forked from is __upstream__.
Add a remote named _upstream_ pointing to *this* repo using: `git remote add upstream git@gitlab.uni-hannover.de:m.jasper.martins/data_analysis_lab_2024.git`. Now when you run `git remote -v`you should see something *like* this:
origin git@gitlab.uni-hannover.de:<your_username>/data_analysis_lab_2024.git (fetch)
origin git@gitlab.uni-hannover.de:<your_username>/data_analysis_lab_2024.git (push)
upstream git@gitlab.uni-hannover.de:m.jasper.martins/data_analysis_lab_2024.git (fetch)
upstream git@gitlab.uni-hannover.de:m.jasper.martins/data_analysis_lab_2024.git (push)
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
```
The best way to pull the new changes is using the `rebase` command. This means that any commits you have made will be _'rebased'_ onto the new changes in the repository you have forked. **IMPORTANT**: Make sure you have commited all your changes before proceeding.
```
$ git status
$ git add .
$ git commit -m "Saving my changes."
$ git log
$ git fetch upstream
$ git rebase upstream/main
$ git log
```
</details>
### Environment
* [Google Colaboratory](https://colab.research.google.com/)
### Updating your code
Make commits and push your changes often. You don't need to rebase from the upstream each time. You can download the sheets through the web interface from the main repository.
The best practice is to rebase before pushing. Git will give you a set of instructions, make sure you read them.
Check the changes to your repository
```
$ git status
```
Commit the changes that you want:
```
$ git add .
$ git commit -m "Saving my changes."
```
Rebase your code (just in case )
```
$ git pull --rebase
```
Hopefully there are no conflicts but if there are try to fix them, sometimes just a `git rebase --continue` suffices. After conflict resolution a summary will appear. You just need to check it and close the file with `:q`
The last step is to push your changes
```
$ git push origin main
```
### Permission error
Sometimes you may see an error like this:
```
git@gitlab.uni-hannover.de: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
```
This means that most probably your ssh agent is either not running or has forgotten your key.
* From a terminal run: `eval "$(ssh-agent -s)"` to make sure the ssh-agent is running
* Add the key: `ssh-add ~/.ssh/id_rsa_uni_gitlab` - check your own key name!