2020-04-08

Python / Pytorch

3 minutes read (About 439 words)

Pytorch_Logistic_regression

Pytorch Logistic regression(Binary Classifier만들기)

모두를 위한 딥러닝 - 파이토치 강의 참고

0과 1 두 가지를 분류하기 위한 binary classifier를 만들어 보았습니다.
binary 분류 문제를 해결하기 위해서는 선형 회구와 같은 실수값이 아닌 확률값을 예측해야 합니다.
이를 위해 선형 함수와 sigmoid 함수를 통과하는 BinaryClassifier를 다음과 같이 만듭니다.

# 모델 정의
class BinaryClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(8, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        return self.sigmoid(self.linear(x))
		
model = BinaryClassifier()

학습에 사용할 모델을 만들었으니 사용할 데이터를 불러오고 optimizer를 정의합니다.

xy = np.loadtxt('data/data-03-diabetes.csv', delimiter=',', dtype=np.float32)
x_data = xy[:, 0:-1]
y_data = xy[:, [-1]]
x_train = torch.FloatTensor(x_data)
y_train = torch.FloatTensor(y_data)

optimizer = optim.SGD(model.parameters(), lr=1)

마지막으로 epoch수 만큼 반복해 학습합니다.

epochs = 100

for epoch in range(epochs + 1):

    hypothesis = model(x_train)

    loss = F.binary_cross_entropy(hypothesis, y_train)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if epoch % 10 == 0:
        prediction = (hypothesis >= torch.FloatTensor([0.5])).float()
        correct_prediction = (prediction == y_train).float()
        accuracy = correct_prediction.sum().item() / len(correct_prediction)

        print('Epoch : {:4d}/{} loss : {:6f} Accuracy : {:2.2f}'.format(
            epoch, epochs, loss.item(), accuracy*100
        ))

Accuracy를 나타내줄때 올바른 예측은 hypothesis(predict) 값이 0.5보다 큰값을 기준으로 사용했습니다.
binary 문제를 해결하기 위해서는 sigmoid함수를 사용했습니다. 다음 강의때 다시한번 적겠지만 세개 이상의 class를 가지는 분류문제를 해결하기 위해서는 softmax함수를 사용해야 한다는 차이점을 기억해야겠습니다.

Full Code

2020-04-07

Python / Pytorch

4 minutes read (About 638 words)

Pytorch_data_loading_with_DataLoader

Pytorch DataLoader를 통한 data loading

모두를 위한 딥러닝 - 파이토치 강의 참고

DataLoader를 통해 데이터를 batch_size 만큼 나누어 읽어오기 위해서는 torch.utils.data 의 Dataset을 상속받는 클래스를 정의해야 합니다.
자기만의 Dataset을 만든 뒤, __len__ 과 __getitem__ 메서드를 overriding해서 사용해야 합니다.

def __len__(self):
	...

def __getitem__(self, idx):
	...

Dataset의 소스는 이곳에서 확인할 수 있으며, DataLoader에 대한 한글 설명은 이곳을 참고할 수 있습니다.
간단한 Linear regression 모델과 데이터를 만들어서 실습해 보았습니다.
먼저 Dataset을 상속받는 저만의 Dataset과 __len__ 과 __getitem__ 메서드를 만들어 보겠습니다.

class MyDataset(Dataset):

    def __init__(self):
        self.x_train = [[73, 80, 75],
                         [93, 88, 83],
                         [89, 91, 90],
                         [96,98, 100],
                         [73, 66, 70]]
        self.y_train = [[152], [185], [180], [196], [142]]

    def __len__(self):
        return len(self.x_train)

    def __getitem__(self, idx):
        x = torch.FloatTensor(self.x_train[idx])
        y = torch.FloatTensor(self.y_train[idx])
        return x, y

이렇게 만들어진 MyDataset은 torch.utils.data의 DataLoader를 통해 batch_size를 조절할 수 있습니다.

1 2	dataset = MyDataset() dataloader = DataLoader(dataset, batch_size=2, shuffle=True) # shuffle 은 데이터를 섞어서 각각의 배치를 만들어준다는 뜻이다.

DataLoader를 통해 만들어진 객체는 iterable한 객체이기 때문에 다음과 같이 출력해서 확인해 볼 수도 있습니다.

a = iter(dataloader)
print(next(a))
print(next(a))
print(next(a))
#>>>
[tensor([[ 73.,  80.,  75.],
        [ 96.,  98., 100.]]), tensor([[152.],
        [196.]])]
[tensor([[73., 66., 70.],
        [93., 88., 83.]]), tensor([[142.],
        [185.]])]
[tensor([[89., 91., 90.]]), tensor([[180.]])]

저만의 MyDataset에 입력한 데이터가 DataLoader를 통해 지정한 배치로 나눠져서 정상적으로 출력되는것을 확인할 수 있습니다.
이제 Linear regression 모델, optimizer를 다음과 같이 만든 뒤,

class MultivariateLinearRegressionModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(3, 1)

    def forward(self, x):
        return self.linear(x)


model = MultivariateLinearRegressionModel()
optimizer = optim.SGD(model.parameters(), lr=1e-5)

DataLoader를 통해 데이터를 불러와 학습해 보겠습니다.

epochs = 20

for epoch in range(epochs + 1):
    for batch_idx, train in enumerate(dataloader):
        xt, yt = train

        prediction = model(xt)

        loss = F.mse_loss(prediction, yt)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        print('Epoch {:4d}/{} Batch : {}/{} Cost : {:.6f}'.format(
            epoch, epochs, batch_idx+1, len(dataloader), loss.item()
        ))

epochs 수만큼 학습을 하는 for문 안에 또 하나의 for문이 들어가 있는것을 확인할 수 있습니다. 안쪽 for문에서 batch_size만큼 나눠진 train 데이터를 model을 통해 학습하는 과정을 구현해 볼 수 있었습니다.

Full Code

2020-04-06

Python / Pytorch

2 minutes read (About 334 words)

Pytorch_Multivariable_Linear_regression

Pytorch Multivariable Linear regression 기본 정리

모두를 위한 딥러닝 - 파이토치 강의 참고

두 개 이상의 x값으로부터 y값을 예측하는 간단한 모델을 만들어보자.
간단한 예제를 위해 (5,3) 의 train_x data를, (5,1) 의 train_y 데이터를 만든다.

x_train = torch.FloatTensor([[73, 80, 75],
                             [93, 88, 83],
                             [89, 91, 90],
                             [96,98, 100],
                             [73, 66, 70]])
y_train = torch.FloatTensor([[152], [185], [180], [196], [142]])

선형 회귀 모델을 만들것이므로 우리가 학습해야하는 변수는 w 와 b 두 가지이다. 이를 torch.zeros 를 이용해 만들고 requires_grad=True 로 설정해 학습할 데이터로 설정하자.

1 2	w = torch.zeros((3,1), requires_grad=True) b = torch.zeros(1, requires_grad=True)

모델을 통해 구한 hypothesis와 실제값의 차이로부터 loss를 구하기 위해 MSE를 사용하고, SGD optimizer를 통해 w 와 b를 개선한다.

1	optimizer = torch.optim.SGD([w,b], lr=1e-5)

epochs 수만큼 모델을 학습시킨다.

for epoch in range(epochs+1):

    hypothesis = x_train.matmul(w) + b # 모델을 통해 구해지는 predict값

    loss = torch.mean((hypothesis - y_train) ** 2) # predict와 train 데이터로부터 구하는 loss

    optimizer.zero_grad # gradient 초기화
    loss.backward()		# gradient 계산
    optimizer.step()	# w 와 b 개선

    print('Epoch {:4d}/{} hypothesis : {} Cost : {:.6f}'. format(
        epoch, epochs, hypothesis.squeeze().detach(), loss.item()
    ))

Full Code

2020-04-02

Python / Pytorch

4 minutes read (About 534 words)

Pytorch_Linear_regression_기본정리

Pytorch Linear regression 기본 정리

모두를 위한 딥러닝 - 파이토치 강의 참고

하나의 x값을 통해 하나의 y를 예측하는 간단한 모델을 만들어보자.
다음과 같이 세개의 데이터를 만들고 이를 통해 간단한 선형 예측 모델을 학습시켜본다.

1 2	x_train = torch.FloatTensor([[1], [2], [3]]) y_train = torch.FloatTensor([[2], [4], [6]])

선형 모델에서 x값이 주어졌을 때 y값(hypothesis-H(x))을 계산하기 위해서는 다음과 같은 식을 따르게 된다.

1	H(x) = W * x + b

가지고 있는 데이터를 위 식을 통해 학습시키고자 한다면 우리가 알아야 하는 값 즉, 학습해야 하는 값은 W 와 b이다.
“requires_grad=True” 를 설정하여 다음과 같이 0으로 초기화된 W 와 b 를 만들 수 있다.

1 2	w = torch.zeros(1, requires_grad=True) b = torch.zeros(1, requires_grad=True)

우리가 만든 모델을 평가하기 위해서는 Mean Squared Error(MSE) 를 사용할 것입니다.
MSE는 예측값과 실제값의 차이를 제곱한 후 평균을 구한 값으로 다음과 같이 나타낼 수 있습니다.

1	loss = torch.mean((hypothesis - y_train) ** 2)

loss 를 구했다면 이를 통해 학습해야 하는 W 와 b 값을 개선해야 합니다.
이를 위해 Stochastic gradient descent(SGD)를 사용할 것입니다.
optim 라이브러리의 SGD에 학습할 tensor를 리스트 형태로 넣어주고 learing rate를 지정해 다음과 같이 나타낼 수 있습니다.

1	optimizer = torch.optim.SGD([w, b], lr=0.01)

optimizer를 정의했다면 아래와 같은 순서를 통해 W와 b를 개선해준다.

1
2
3

optimizer.zero_grad()  # gradient 초기화
loss.backward()		   # gradient 계산
optimizer.step()	   # W 와 b 개선

이러한 학습 방법을 따라 모델을 100회 한 후 x가 4일때의 y값을 살펴보면 8에 근접하는 값이 나오는 것을 확인할 수 있다.

1
2
3

x_test = torch.FloatTensor([[4]])
print(x_test * w + b)
#>>> tensor([[7.5598]], grad_fn=<AddBackward0>)

Full Code

2020-03-29

Python / About Python

4 minutes read (About 566 words)

파이썬_dataframe_loc_iloc_인덱싱

파이썬 dataframe 인덱싱하기

dataframe을 인덱싱할때 loc과 iloc을 사용할 수 있다.
다음과 같은 데이터 프레임을 예시로 두 가지를 살펴보자.

import pandas as pd

df = pd.DataFrame({
    'age' :     [13,17,19,21,23],
    'class' : ['math','science','english','math','science'],
    'city' :    ['A','B','A','C','B'],
    'gender' :  [ 'M', 'F', 'F', 'M', 'M'],
})

print(df)
>>>
   age    class city gender
0   13     math    A      M
1   17  science    B      F
2   19  english    A      F
3   21     math    C      M
4   23  science    B      M

loc을 사용하여 인덱싱하기
- loc 인덱싱은 두 가지 사용법이 있다.
  1
  2
  3
  4
  5
  df.loc[행 인덱스값]
  
  or
  
  df.loc[행 인덱스값, 열 인덱스값]
- 이를 이용해 둘째, 셋째 행만을 가져오면 다음과 같다.
  1
  2
  3
  4
  5
  6
  7
  ex1 = print(df.loc[1:2])
  
  print(ex1)
  >>>
  age class city gender
  1 17 science B F
  2 19 english A F
- boolean 시리즈를 행을 선택하는 인덱스값으로 사용할 수 있다.
- 이를 이용하여 A 도시에 사는 행을 선택하면 다음과 같다.
  1
  2
  3
  4
  5
  6
  7
  ex2 = df.loc[df.city == 'A']
  
  print(ex2)
  >>>
  age class city gender
  0 13 math A M
  2 19 english A F
- 한 걸음 더 나가서 A 도시에 사는 사람들의 성별을 나타내는 시리즈를 알고 싶다.
- 이럴때는 열 인덱스값을 추가해서 다음과 같이 얻어낼 수 있다.
  1
  2
  3
  4
  5
  6
  ex3 = df.loc[df.city == 'A', 'gender']
  
  print(ex3)
  >>>
  0 M
  2 F
- 이렇게 열 인덱스값을 추가하여 다음과 같이 나이와 성별만을 갖는 두개의 행으로 datframe을 슬라이싱 할 수도 있다.
  1
  2
  3
  4
  5
  6
  7
  ex4 = df.loc[1:2,['age','gender']]
  
  print(ex4)
  >>>
  age gender
  1 17 F
  2 19 F
iloc을 사용하여 인덱싱하기
- iloc은 integer-location based indexing 로서 정수(integer)를 인덱스값으로 받는다는 점이 loc과의 차이점이다.
- loc의 마지막 예제를 iloc을 통해 나타내면 다음과 같다.
  1
  2
  3
  4
  5
  6
  7
  ex5 = df.iloc[1:3,[0,3]]
  
  print(ex5)
  >>>
  age gender
  1 17 F
  2 19 F
- ex4와 ex5의 코드를 살펴보면 단순히 정수값을 받는다는 점 이외에도 차이점이 있다.
- loc 은 행 인덱스값을 1:2 까지로 나타내면 마지막 행까지 모두 포함한 결과를 내놓지만, iloc은 행 인덱스값의 마지막 행을 포함하지 않기 때문에 1:3 으로 나타내줬다.

2020-03-24

Python / About Python

3 minutes read (About 510 words)

파이썬_dataframe_function_apply_방법_without_iterrrows

파이썬 dataframe 각 row에 function을 apply하는 방법에 대하여.

dataframe을 다루다보면 각 행(row)에 대해 특정 값에 대한 결과를 얻거나 과정을 수행하고 싶을 때가 있다.
이럴때 가장먼저 떠오르는것은 pd.iterrows() 였다.
하지만 천만건이 넘어가는 데이터를 다루려 하다보니 iterrows는 너무나 느리다는 것을 알게 되었다.
이를 해결하기 위해 찾아낸 방법이 수행할 작업을 function으로 만들고 각 행에 apply하는 것이다.
간단한 dataframe 을 만들어서 살펴보자.

import pandas as pd

rectangles = [
    { 'Name': "A", 'Age': 17 },
    { 'Name': "B", 'Age': 20 },
    { 'Name': "C", 'Age': 27 }
]

rectangles_df = pd.DataFrame(rectangles)
print(rectangles_df)

Name	Age
A	17
B	20
C	27

이름과 성별이 주어진 dataframe이 있을 때 각 행을 살펴보며 어떤 사람이 투표권을 행사할 수 있을지 판단해보려고 한다.
각 행을 입력받아 Age 가 18세 이상이라면 True를 그렇지 않다면 False를 반환하는 함수를 만든뒤 datafrmae에 apply한다.

def vote(row):
    if row['Age'] >= 18:
        return True
    else: return False

rectangles_df["vote"] = rectangles_df.apply(vote, axis=1)

print(rectangles_df)

Name	Age	vote
A	17	False
B	20	True
C	27	True

위와 같이 각 행의 Age 를 판단해 vote라는 새로운 칼럼을 만들어낸 것을 볼 수 있다.
이와같이 함수를 dataframe에 apply할 때 각 행(row)에 대하여 적용하고 싶다면 df.apply(func, axis=1) 로 axis=1 을 잊지말아야 한다.
위의 예제는 행이 3개라 iterrows로 실습해도 차이를 느낄순 없을것이다. 하지만 데이터 크기가 커질수록 iterrows를 사용하는것은 자제하고 apply를 적용하는 방법을 알아야한다.
apply 외에도 몇가지 다른 방법을 알고 싶다면 참고해보자.

2020-03-17

Python / About Python

3 minutes read (About 459 words)

파이썬_딕셔너리_missing_메서드에_관하여.

파이썬 딕셔너리 missing 메서드에 관하여

파이썬 딕셔너리를 사용할 때 찾는 key값이 없는 경우 다음과 같은 에러가 발생한다.

mine2 = dict()
mine2["dog"] = ["Na", "Mg"]
print(mine2["cat"])

### print >> KeyError: 'cat'

이러한 KeyError 를 딕셔너리를 상속받는 자기만의 딕셔너리 클래스를 만들어서 해결할 수 있는 방법이 있다.
- missing 메서드는 딕셔너리의 key 값을 찾고싶다는 입력을 보냈을 때 키 값이 존재하지 않을 경우 어떤 값을 return 할까를 작성할 수 있다. (정확히는 getitem 에서 key가 없을 경우 정의된 missing 을 호출한다.)
- 다음과 같은 나만의 딕셔너리를 만들어보자.
  1
  2
  3
  class mydict(dict):
  def __missing__(self, key):
  return []
- 딕셔너리를 상속받은 나의 mydict에 다음과 같은 데이터를 넣어보자.
  1
  2
  3
  4
  5
  6
  7
  8
  9
  mine = mydict()
  
  mine["dog"] = ["H"]
  mine["tiger"] = ["He, Li"]
  mine["wolf"] = ["Be", "B", "C"]
  
  print(mine)
  
  ### print >> {'dog': ['H'], 'tiger': ['He, Li'], 'wolf': ['Be', 'B', 'C']}
- 위와 같은 mine 딕셔너리에서 처음과 같이 “cat” 을 찾으려고 한다면 어떻게 될까?
  1
  2
  3
  print(mine["cat"])
  
  ### print >> []
- KeyError 가 발생하지 않았다. 왜냐하면 나만의 딕셔너리에 정의한 missing 메서드를 통해 key값이 없다면 빈 리스트를 return하게 만들었기 때문이다!
- 이를 이용하면 다음과 같이 데이터를 다룰 수도 있다.
  1
  2
  3
  4
  5
  mine["cat"] += ["N", "O", "F", "Ne"]
  
  print(mine)
  
  ### print >> {'dog': ['H'], 'tiger': ['He, Li'], 'wolf': ['Be', 'B', 'C'], 'cat': ['N', 'O', 'F', 'Ne']}
- 그냥 처음과 같이 데이터를 입력하면 되는것이 아닌가? 라고 생각할 수 있지만 큐나 스텍을 이용해 순차적으로 모든 데이터를 순회하는 경우에는 이와같은 방법이 유용하게 쓰일 수 있다.

Pytorch_Logistic_regression

Pytorch Logistic regression(Binary Classifier만들기)

모두를 위한 딥러닝 - 파이토치 강의 참고

Full Code

Pytorch_data_loading_with_DataLoader

Pytorch DataLoader를 통한 data loading

모두를 위한 딥러닝 - 파이토치 강의 참고

Full Code

Pytorch_Multivariable_Linear_regression

Pytorch Multivariable Linear regression 기본 정리

모두를 위한 딥러닝 - 파이토치 강의 참고

Full Code

Pytorch_Linear_regression_기본정리

Pytorch Linear regression 기본 정리

모두를 위한 딥러닝 - 파이토치 강의 참고

Full Code

파이썬_dataframe_loc_iloc_인덱싱

파이썬 dataframe 인덱싱하기

파이썬_dataframe_function_apply_방법_without_iterrrows

파이썬 dataframe 각 row에 function을 apply하는 방법에 대하여.

파이썬_딕셔너리_missing_메서드에_관하여.

파이썬 딕셔너리 missing 메서드에 관하여

Links

Categories

Tag Cloud

Recent

Archives

Tags

Recent

Archives

Tags

Your browser is out-of-date!

Pytorch_Logistic_regression

Pytorch Logistic regression(Binary Classifier만들기)

모두를 위한 딥러닝 - 파이토치 강의 참고

Full Code

Pytorch_data_loading_with_DataLoader

Pytorch DataLoader를 통한 data loading

모두를 위한 딥러닝 - 파이토치 강의 참고

Full Code

Pytorch_Multivariable_Linear_regression

Pytorch Multivariable Linear regression 기본 정리

모두를 위한 딥러닝 - 파이토치 강의 참고

Full Code

Pytorch_Linear_regression_기본정리

Pytorch Linear regression 기본 정리

모두를 위한 딥러닝 - 파이토치 강의 참고

Full Code

파이썬_dataframe_loc_iloc_인덱싱

파이썬 dataframe 인덱싱하기

파이썬_dataframe_function_apply_방법_without_iterrrows

파이썬 dataframe 각 row에 function을 apply하는 방법에 대하여.

파이썬_딕셔너리___missing___메서드에_관하여.

파이썬 딕셔너리 missing 메서드에 관하여

Links

Categories

Tag Cloud

Recent

Archives

Tags

Recent

Archives

Tags

Your browser is out-of-date!

파이썬_딕셔너리_missing_메서드에_관하여.