Python을 사용하여 텍스트 데이터를 차원 벡터에 어떻게 포함할 수 있습니까?

<시간/>

Tensorflow는 Google에서 제공하는 기계 학습 프레임워크입니다. 알고리즘, 딥 러닝 애플리케이션 등을 구현하기 위해 Python과 함께 사용되는 오픈 소스 프레임워크입니다. 연구 및 생산 목적으로 사용됩니다.

Keras는 ONEIROS(개방형 신경 전자 지능형 로봇 운영 체제) 프로젝트 연구의 일부로 개발되었습니다. Keras는 Python으로 작성된 딥 러닝 API입니다. 기계 학습 문제를 해결하는 데 도움이 되는 생산적인 인터페이스가 있는 고급 API입니다. Tensorflow 프레임워크 위에서 실행됩니다. 빠르게 실험할 수 있도록 제작되었습니다. 머신 러닝 솔루션을 개발하고 캡슐화하는 데 필수적인 필수 추상화 및 빌딩 블록을 제공합니다.

Keras는 이미 Tensorflow 패키지 내에 있습니다. 아래 코드 줄을 사용하여 액세스할 수 있습니다.

import tensorflow
from tensorflow import keras

Keras 기능 API는 순차 API를 사용하여 생성된 모델에 비해 더 유연한 모델을 생성하는 데 도움이 됩니다. 기능적 API는 비선형 토폴로지가 있는 모델과 함께 작동할 수 있고 레이어를 공유하고 여러 입력 및 출력과 함께 작동할 수 있습니다. 딥 러닝 모델은 일반적으로 여러 계층을 포함하는 방향성 순환 그래프(DAG)입니다. 기능적 API는 레이어 그래프를 작성하는 데 도움이 됩니다.

Google Colaboratory를 사용하여 아래 코드를 실행하고 있습니다. Google Colab 또는 Colaboratory는 브라우저를 통해 Python 코드를 실행하는 데 도움이 되며 구성이 필요 없고 GPU(그래픽 처리 장치)에 대한 무료 액세스가 필요합니다. Colaboratory는 Jupyter Notebook 위에 구축되었습니다. 다음은 64차원 벡터에 제목의 모든 단어를 포함하는 코드 스니펫입니다.

예

print("Number of unique issue tags")
num_tags = 12
print("Size of vocabulary while preprocessing text data")
num_words = 10000
print("Number of classes for predictions")
num_classes = 4
title_input = keras.Input(
   shape=(None,), name="title"
)
print("Variable length int sequence")
body_input = keras.Input(shape=(None,), name="body")
tags_input = keras.Input(
   shape=(num_tags,), name="tags"
)
print("Embed every word in the title to a 64-dimensional vector")
title_features = layers.Embedding(num_words, 64)(title_input)
print("Embed every word into a 64-dimensional vector")
body_features = layers.Embedding(num_words, 64)(body_input)
print("Reduce sequence of embedded words into single 128-dimensional vector")
title_features = layers.LSTM(128)(title_features)
print("Reduce sequence of embedded words into single 132-dimensional vector")
body_features = layers.LSTM(32)(body_features)
print("Merge available features into a single vector by concatenating it")
x = layers.concatenate([title_features, body_features, tags_input])
print("Use logistic regression to predict the features")
priority_pred = layers.Dense(1, name="priority")(x)
department_pred = layers.Dense(num_classes, name="class")(x)
print("Instantiate a model that predicts priority and class")
model = keras.Model(
   inputs=[title_input, body_input, tags_input],
   outputs=[priority_pred, department_pred],
)

코드 크레딧 - https://www.tensorflow.org/guide/keras/functional

출력

Number of unique issue tags
Size of vocabulary while preprocessing text data
Number of classes for predictions
Variable length int sequence
Embed every word in the title to a 64-dimensional vector
Embed every word into a 64-dimensional vector
Reduce sequence of embedded words into single 128-dimensional vector
Reduce sequence of embedded words into single 132-dimensional vector
Merge available features into a single vector by concatenating it
Use logistic regression to predict the features
Instantiate a model that predicts priority and class

설명

기능적 API는 여러 입력 및 출력 작업에 사용할 수 있습니다.
순차 API로는 불가능합니다.