Red neuronal recurrente

Red Neuronal Recurrente (RNN): La estructura de una red neuronal artificial es relativamente simple y se refiere principalmente a la multiplicación de matrices. Durante el primer paso, las entradas se multiplican por pesos inicialmente aleatorios, y sesgo, transformados con una función de activación y los valores de salida se utilizan para hacer una predicción. Este paso da una idea de lo lejos que está la red de la realidad.

Red neuronal clásica	Red neuronal recurrente
Los vectores de entrada producen vectores de salida	Tratan datos secuenciales de forma eficiente
No mezclan información entre ejecuciones	Recuerdan las salidas anteriores como entrada
Tratan una secuencia de datos de una sola vez	Pueden tratar secuencias muy largas, elemento a elemento

El entrenamiento de una red neuronal recurrente debe prolongarse para cada paso temporal, lo que es muy costoso en tiempo de proceso y memoria RAM. Esto se simplifica “desenrollando” la red en tantas capas como pasos temporales o de datos se dispone en la secuencia temporal de entrenamiento, como si fuese una red no recurrente (feed-forward). Cada capa desenrollada tiene los mismos pesos para acelerar el proceso.

Como cuanto más larga sea la secuencia temporal a analizar, mayor será el número de capas debe desenrollar, puede aparecer el problema de desvanecimiento de gradiente (vanishing gradient). Esto se soluciona incorporando capas de tipo LSTM o GRU que permiten el backpropagation through time conectando eventos que aparezcan muy alejados en los datos de entrada, sin que su peso se diluye entre las capas.

Historia[editar]

Las redes neuronales recurrentes fueron basadas en el trabajo de David Rumelhart en 1986.^[1] Las redes de Hopfield, un tipo especial de red recurrente, fueron descubiertas por John Hopfield en 1982. En 1993, un sistema compresor de historia neural resolvió una tarea "Very Deep Learning" que requirió desplegar más de 1000 capas de una red recurrente en un tiempo dado.^[2]

LSTM[editar]

Redes neuronales Long short-term memory (LSTM) fueron inventadas por Hochreiter ySchmidhuber en 1997 y establecieron récords de eficiencia en distintos ámbitos de aplicación.^[3]

Alrededor de 2007, las LSTM empezaron a revolucionar el reconocimiento del habla, superando ciertos modelos tradicionales en el campo.^[4] En 2009, una red LSTM entrenada con Connectionist Temporal Classification (CTC) fue la primera RNN en ganar una competición de reconocimiento de patrones, ganando distintas competiciones en reconocimiento de lenguaje escrito^[5]^[6] En 2014, la compañía china Baidu usó RNNs entrenadas con CTC para romper el dataset de reconocimiento del habla 2S09 Switchboard Hub5'00^[7] benchmark without using any traditional speech processing methods.^[8]

Las LSTM también han mejorado el reconocimiento del habla con vocabulario extenso^[9]^[10] y síntesis de text-to-speech^[11] y fueron utilizadas en Google Android.^[5]^[12] En 2015, el reconocimiento de voz de Google experimentó una mejora en su rendimiento del 49%, de acuerdo con sus fuentes^{[cita requerida]} gracias a una red LSTM CTC.^[13]

Las LSTM rompieron records en traducctión automática,^[14] modelado de lenguaje,^[15] y procesamiento de lenguaje multilingüe.^[16] Una combinación de LSTM con redes neuronales convolucionales (CNNs) mejoró el subtitulado automático de imágenes.^[17]

Referencias[editar]

↑ Williams, Ronald J.; Hinton, Geoffrey E.; Rumelhart, David E. (October 1986). «Learning representations by back-propagating errors». Nature 323 (6088): 533-536. Bibcode:1986Natur.323..533R. ISSN 1476-4687. S2CID 205001834. doi:10.1038/323533a0.
↑ Schmidhuber, Jürgen (1993). Habilitation thesis: System modeling and optimization. Página 150.
↑ Hochreiter, Sepp; Schmidhuber, Jürgen (1 de noviembre de 1997). «Long Short-Term Memory». Neural Computation 9 (8): 1735-1780. PMID 9377276. S2CID 1915014. doi:10.1162/neco.1997.9.8.1735.
↑ Fernández, Santiago; Graves, Alex; Schmidhuber, Jürgen (2007). «An Application of Recurrent Neural Networks to Discriminative Keyword Spotting». Proceedings of the 17th International Conference on Artificial Neural Networks. ICANN'07 (Berlin, Heidelberg: Springer-Verlag). pp. 220-229. ISBN 978-3-540-74693-5.
↑ ^a ^b Error en la cita: Etiqueta <ref> no válida; no se ha definido el contenido de las referencias llamadas schmidhuber2015
↑ Graves, Alex; Schmidhuber, Jürgen (2009). Bengio, Yoshua; Schuurmans, Dale; Lafferty, John; Williams, Chris editor-K. I.; Culotta, Aron, eds. Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks. Neural Information Processing Systems (NIPS) Foundation. pp. 545-552.
↑ Switchboard Hub5'00 speech recognition dataset
↑ Hannun, Awni; Case, Carl; Casper, Jared; Catanzaro, Bryan; Diamos, Greg; Elsen, Erich; Prenger, Ryan; Satheesh, Sanjeev et ál. (2014-12-17). «Deep Speech: Scaling up end-to-end speech recognition». arXiv:1412.5567 [cs.CL].
↑ Error en la cita: Etiqueta <ref> no válida; no se ha definido el contenido de las referencias llamadas sak2014
↑ Error en la cita: Etiqueta <ref> no válida; no se ha definido el contenido de las referencias llamadas liwu2015
↑ Fan, Bo; Wang, Lijuan; Soong, Frank K.; Xie, Lei (2015) "Photo-Real Talking Head with Deep Bidirectional LSTM", in Proceedings of ICASSP 2015
↑ Zen, Heiga; Sak, Haşim (2015). «Unidirectional Long Short-Term Memory Recurrent Neural Network with Recurrent Output Layer for Low-Latency Speech Synthesis». Google.com. ICASSP. pp. 4470-4474.
↑ Sak, Haşim; Senior, Andrew; Rao, Kanishka; Beaufays, Françoise; Schalkwyk, Johan (September 2015). «Google voice search: faster and more accurate».
↑ Sutskever, Ilya; Vinyals, Oriol; Le, Quoc V. (2014). «Sequence to Sequence Learning with Neural Networks». Electronic Proceedings of the Neural Information Processing Systems Conference 27: 5346. Bibcode:2014arXiv1409.3215S. arXiv:1409.3215.
↑ Jozefowicz, Rafal; Vinyals, Oriol; Schuster, Mike; Shazeer, Noam; Wu, Yonghui (2016-02-07). «Exploring the Limits of Language Modeling». arXiv:1602.02410 [cs.CL].
↑ Gillick, Dan; Brunk, Cliff; Vinyals, Oriol; Subramanya, Amarnag (2015-11-30). «Multilingual Language Processing From Bytes». arXiv:1512.00103 [cs.CL].
↑ Vinyals, Oriol; Toshev, Alexander; Bengio, Samy; Erhan, Dumitru (2014-11-17). «Show and Tell: A Neural Image Caption Generator». arXiv:1411.4555 [cs.CV].

Datos: Q1457734

[1] Williams, Ronald J.; Hinton, Geoffrey E.; Rumelhart, David E. (October 1986). «Learning representations by back-propagating errors». Nature 323 (6088): 533-536. Bibcode:1986Natur.323..533R. ISSN 1476-4687. S2CID 205001834. doi:10.1038/323533a0.

[schmidhuber1993-2] Schmidhuber, Jürgen (1993). Habilitation thesis: System modeling and optimization. Página 150.

[lstm-3] Hochreiter, Sepp; Schmidhuber, Jürgen (1 de noviembre de 1997). «Long Short-Term Memory». Neural Computation 9 (8): 1735-1780. PMID 9377276. S2CID 1915014. doi:10.1162/neco.1997.9.8.1735.

[fernandez2007keyword-4] Fernández, Santiago; Graves, Alex; Schmidhuber, Jürgen (2007). «An Application of Recurrent Neural Networks to Discriminative Keyword Spotting». Proceedings of the 17th International Conference on Artificial Neural Networks. ICANN'07 (Berlin, Heidelberg: Springer-Verlag). pp. 220-229. ISBN 978-3-540-74693-5.

[schmidhuber2015-5] Error en la cita: Etiqueta <ref> no válida; no se ha definido el contenido de las referencias llamadas schmidhuber2015

[graves20093-6] Graves, Alex; Schmidhuber, Jürgen (2009). Bengio, Yoshua; Schuurmans, Dale; Lafferty, John; Williams, Chris editor-K. I.; Culotta, Aron, eds. Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks. Neural Information Processing Systems (NIPS) Foundation. pp. 545-552.

[7] Switchboard Hub5'00 speech recognition dataset

[hannun2014-8] Hannun, Awni; Case, Carl; Casper, Jared; Catanzaro, Bryan; Diamos, Greg; Elsen, Erich; Prenger, Ryan; Satheesh, Sanjeev et ál. (2014-12-17). «Deep Speech: Scaling up end-to-end speech recognition». arXiv:1412.5567 [cs.CL].

[sak2014-9] Error en la cita: Etiqueta <ref> no válida; no se ha definido el contenido de las referencias llamadas sak2014

[liwu2015-10] Error en la cita: Etiqueta <ref> no válida; no se ha definido el contenido de las referencias llamadas liwu2015

[fan2015-11] Fan, Bo; Wang, Lijuan; Soong, Frank K.; Xie, Lei (2015) "Photo-Real Talking Head with Deep Bidirectional LSTM", in Proceedings of ICASSP 2015

[zen2015-12] Zen, Heiga; Sak, Haşim (2015). «Unidirectional Long Short-Term Memory Recurrent Neural Network with Recurrent Output Layer for Low-Latency Speech Synthesis». Google.com. ICASSP. pp. 4470-4474.

[sak2015-13] Sak, Haşim; Senior, Andrew; Rao, Kanishka; Beaufays, Françoise; Schalkwyk, Johan (September 2015). «Google voice search: faster and more accurate».

[sutskever2014-14] Sutskever, Ilya; Vinyals, Oriol; Le, Quoc V. (2014). «Sequence to Sequence Learning with Neural Networks». Electronic Proceedings of the Neural Information Processing Systems Conference 27: 5346. Bibcode:2014arXiv1409.3215S. arXiv:1409.3215.

[vinyals2016-15] Jozefowicz, Rafal; Vinyals, Oriol; Schuster, Mike; Shazeer, Noam; Wu, Yonghui (2016-02-07). «Exploring the Limits of Language Modeling». arXiv:1602.02410 [cs.CL].

[gillick2015-16] Gillick, Dan; Brunk, Cliff; Vinyals, Oriol; Subramanya, Amarnag (2015-11-30). «Multilingual Language Processing From Bytes». arXiv:1512.00103 [cs.CL].

[vinyals2015-17] Vinyals, Oriol; Toshev, Alexander; Bengio, Samy; Erhan, Dumitru (2014-11-17). «Show and Tell: A Neural Image Caption Generator». arXiv:1411.4555 [cs.CV].

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]