From post at volker-wysk.de Sun Dec 2 22:46:33 2018 From: post at volker-wysk.de (Volker Wysk) Date: Mon, 03 Dec 2018 04:46:33 +0100 Subject: [moin-user] Can't index PDF attachments (bug?) Message-ID: <2877096.kHBejDaPxo@desktop> Hi I'm trying to reindex my data with Xapian, like this: export PYTHONPATH=/usr/local/lib/python2.7/dist-packages:/usr/local/share/ moin/config /usr/local/share/moin/server/moin --wiki-url=http://localhost/wiki/ index build --mode=rebuild This reports an error for all the PDF attachments. Like this: 2018-12-03 04:23:16,639 ERROR MoinMoin.search.builtin:261 Filter application_pdf threw error '[Errno 1] Operation not permitted' for file /usr/ local/share/moin/data/pages/ Handb(c3bc)cher(2f)Smartphone(20)Sony(20)Ericsson(20)Xperia(20)mini/ attachments/Xperia mini_ Ausf?hrliche Bedienungsanleitung.pdf Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/MoinMoin/search/builtin.py", line 257, in contentfilter data = execute(self, filename) File "/usr/local/lib/python2.7/dist-packages/MoinMoin/filter/ application_pdf.py", line 20, in execute return execfilter("pdftotext -q -enc UTF-8 %s -", filename) File "/usr/local/lib/python2.7/dist-packages/MoinMoin/filter/__init__.py", line 44, in execfilter data, errors, rc = exec_cmd(filter_cmd, timeout=300) File "/usr/local/lib/python2.7/dist-packages/MoinMoin/util/SubProcess.py", line 31, in exec_cmd preexec_fn=None if subprocess.mswindows else os.setsid(), I CAN access the attachment in question from the wiki. I have the poppler-utils installed: desktop ~ % ll /usr/bin/pdftotext -rwxr-xr-x 1 root root 35024 Aug 28 15:49 /usr/bin/pdftotext* So I'm wondering why this doesn't work. This might be a bug. Cheers Volker From tw at waldmann-edv.de Mon Dec 3 07:59:00 2018 From: tw at waldmann-edv.de (Thomas Waldmann) Date: Mon, 3 Dec 2018 13:59:00 +0100 Subject: [moin-user] Can't index PDF attachments (bug?) In-Reply-To: <2877096.kHBejDaPxo@desktop> References: <2877096.kHBejDaPxo@desktop> Message-ID: > [Errno 1] Operation not permitted' for file ... That is an OS level error. Either there is a permissions problem on that file or it is locked by another process (happens frequently on windows, usually not on linux) or ... Does this also happen if the file has no blanks in the filename? > I CAN access the attachment in question from the wiki. Then I guess it is not a "normal" permissions issue. > I have the poppler-utils installed: > > desktop ~ % ll /usr/bin/pdftotext > -rwxr-xr-x 1 root root 35024 Aug 28 15:49 /usr/bin/pdftotext* You could try running pdftotext filename.pdf manually and see what happens. -- GPG ID: 9F88FB52FAF7B393 GPG FP: 6D5B EF9A DD20 7580 5747 B70F 9F88 FB52 FAF7 B393 From post at volker-wysk.de Mon Dec 3 08:22:12 2018 From: post at volker-wysk.de (Volker Wysk) Date: Mon, 03 Dec 2018 14:22:12 +0100 Subject: [moin-user] Can't index PDF attachments (bug?) In-Reply-To: References: <2877096.kHBejDaPxo@desktop> Message-ID: <3548922.2YeuFtMX1Y@desktop> Am Montag, 3. Dezember 2018, 13:59:00 CET schrieb Thomas Waldmann: > > [Errno 1] Operation not permitted' for file ... > > That is an OS level error. > > Either there is a permissions problem on that file or it is locked by > another process (happens frequently on windows, usually not on linux) or ... Does that mean, that you should stop the Apache server for the time the indexing takes place? I've tried this, but it's the same. > Does this also happen if the file has no blanks in the filename? Yes, it does. > > I CAN access the attachment in question from the wiki. > > Then I guess it is not a "normal" permissions issue. Yes, it occured to me too, that it looks like a file permission error. I've double-checked the permissions, but they seem okay. > > I have the poppler-utils installed: > > > > desktop ~ % ll /usr/bin/pdftotext > > -rwxr-xr-x 1 root root 35024 Aug 28 15:49 /usr/bin/pdftotext* > > You could try running pdftotext filename.pdf manually and see what happens. That works. Bye Volker From tw at waldmann-edv.de Mon Dec 3 08:31:25 2018 From: tw at waldmann-edv.de (Thomas Waldmann) Date: Mon, 3 Dec 2018 14:31:25 +0100 Subject: [moin-user] Can't index PDF attachments (bug?) In-Reply-To: <3548922.2YeuFtMX1Y@desktop> References: <2877096.kHBejDaPxo@desktop> <3548922.2YeuFtMX1Y@desktop> Message-ID: <7214c6bb-367c-aabb-f690-869db1caf9ec@waldmann-edv.de> > Does that mean, that you should stop the Apache server for the time the > indexing takes place? No. > Yes, it occured to me too, that it looks like a file permission error. I've > double-checked the permissions, but they seem okay. Is there some additional security framework active on that machine, like apparmor or selinux or posix ACLs or ...? If so, guess you need to check these also. -- GPG ID: 9F88FB52FAF7B393 GPG FP: 6D5B EF9A DD20 7580 5747 B70F 9F88 FB52 FAF7 B393 From post at volker-wysk.de Mon Dec 3 08:41:58 2018 From: post at volker-wysk.de (Volker Wysk) Date: Mon, 03 Dec 2018 14:41:58 +0100 Subject: [moin-user] Can't index PDF attachments (bug?) In-Reply-To: <7214c6bb-367c-aabb-f690-869db1caf9ec@waldmann-edv.de> References: <2877096.kHBejDaPxo@desktop> <3548922.2YeuFtMX1Y@desktop> <7214c6bb-367c-aabb-f690-869db1caf9ec@waldmann-edv.de> Message-ID: <16989490.PrP3Cfr9sk@desktop> Am Montag, 3. Dezember 2018, 14:31:25 CET schrieb Thomas Waldmann: > > Yes, it occured to me too, that it looks like a file permission error. > > I've > > double-checked the permissions, but they seem okay. > > Is there some additional security framework active on that machine, like > apparmor or selinux or posix ACLs or ...? I'm not sure what you mean by "additional security framework". I've got running fail2ban, which uses kernel iptables to block brute force attacks from the outside... I haven't installed any security framework, as far as I know. So, the answer is probably "no". Bye V.W. From tw at waldmann-edv.de Mon Dec 3 09:12:51 2018 From: tw at waldmann-edv.de (Thomas Waldmann) Date: Mon, 3 Dec 2018 15:12:51 +0100 Subject: [moin-user] Can't index PDF attachments (bug?) In-Reply-To: <16989490.PrP3Cfr9sk@desktop> References: <2877096.kHBejDaPxo@desktop> <3548922.2YeuFtMX1Y@desktop> <7214c6bb-367c-aabb-f690-869db1caf9ec@waldmann-edv.de> <16989490.PrP3Cfr9sk@desktop> Message-ID: <2da60c0c-f21f-9144-e042-0863c892b49c@waldmann-edv.de> >> Is there some additional security framework active on that machine, like >> apparmor or selinux or posix ACLs or ...? > > I'm not sure what you mean by "additional security framework". Well, I gave some examples. Basically anything that may intercept/forbid file access. Ubuntu comes with apparmor, while some redhat (fedora?) dists come with selinux. ACLs can be checked with the respective commands, see man acl. From post at volker-wysk.de Mon Dec 3 09:34:04 2018 From: post at volker-wysk.de (Volker Wysk) Date: Mon, 03 Dec 2018 15:34:04 +0100 Subject: [moin-user] Can't index PDF attachments (bug?) In-Reply-To: <2da60c0c-f21f-9144-e042-0863c892b49c@waldmann-edv.de> References: <2877096.kHBejDaPxo@desktop> <16989490.PrP3Cfr9sk@desktop> <2da60c0c-f21f-9144-e042-0863c892b49c@waldmann-edv.de> Message-ID: <2584444.t8clBnkR17@desktop> Am Montag, 3. Dezember 2018, 15:12:51 CET schrieb Thomas Waldmann: > >> Is there some additional security framework active on that machine, like > >> apparmor or selinux or posix ACLs or ...? > > > > I'm not sure what you mean by "additional security framework". > > Well, I gave some examples. Basically anything that may intercept/forbid > file access. And this thing should block access to the PDF attachments in the Moinmoin database. Doesn't sound very plausible. Unless if Moinmoin would use something like this, and it's a bug... > Ubuntu comes with apparmor, while some redhat (fedora?) dists come with > selinux. > > ACLs can be checked with the respective commands, see man acl. Okay, thanks for the info. But I can't remember having done anything like that. It's completely new to me. The missing ability to search in attached PDF files isn't a show stopper for me. I think I'll just leave it this way. Thanks for your time. :-) Bye V.W. From paul at boddie.org.uk Mon Dec 3 10:24:24 2018 From: paul at boddie.org.uk (Paul Boddie) Date: Mon, 03 Dec 2018 16:24:24 +0100 Subject: [moin-user] Can't index PDF attachments (bug?) In-Reply-To: <2584444.t8clBnkR17@desktop> References: <2877096.kHBejDaPxo@desktop> <2da60c0c-f21f-9144-e042-0863c892b49c@waldmann-edv.de> <2584444.t8clBnkR17@desktop> Message-ID: <2035776.TDmvqtRfec@jeremy> On Monday 3. December 2018 15.34.04 Volker Wysk wrote: > Am Montag, 3. Dezember 2018, 15:12:51 CET schrieb Thomas Waldmann: > > >> Is there some additional security framework active on that machine, > > >> like > > >> apparmor or selinux or posix ACLs or ...? > > > > > > I'm not sure what you mean by "additional security framework". > > > > Well, I gave some examples. Basically anything that may intercept/forbid > > file access. > > And this thing should block access to the PDF attachments in the Moinmoin > database. Doesn't sound very plausible. Unless if Moinmoin would use > something like this, and it's a bug... I can imagine SELinux causing a problem in a situation like this because it seems like Moin is expected to be able to run another program to perform the indexing (from my brief perusal of this thread). Anyone who has had to configure Web applications for SELinux will confirm that it is easy to miss some kind of permission or other that might be needed, either related to executing other programs or accessing network or file resources. > > Ubuntu comes with apparmor, while some redhat (fedora?) dists come with > > selinux. > > > > ACLs can be checked with the respective commands, see man acl. > > Okay, thanks for the info. But I can't remember having done anything like > that. It's completely new to me. > > The missing ability to search in attached PDF files isn't a show stopper for > me. I think I'll just leave it this way. > > Thanks for your time. :-) Which distribution are you using? If it is one of the Red Hat family (Fedora, CentOS, RHEL), it is entirely possible that SELinux is switched on by default and that you won't have made any decision about it. So it is worth checking in case you have other problems in future. I cannot comment about AppArmor, but that is also worth investigating for certain distros (Ubuntu, maybe SuSE). I can't give quick answers about this because I run Debian and don't have these things enabled, but I hope you will be able to investigate further. Paul From post at volker-wysk.de Tue Dec 4 05:57:25 2018 From: post at volker-wysk.de (Volker Wysk) Date: Tue, 04 Dec 2018 11:57:25 +0100 Subject: [moin-user] Can't index PDF attachments (bug?) In-Reply-To: <2035776.TDmvqtRfec@jeremy> References: <2877096.kHBejDaPxo@desktop> <2584444.t8clBnkR17@desktop> <2035776.TDmvqtRfec@jeremy> Message-ID: <1562349.f9IzhuHOf5@desktop> Am Montag, 3. Dezember 2018, 16:24:24 CET schrieb Paul Boddie: > On Monday 3. December 2018 15.34.04 Volker Wysk wrote: > > Am Montag, 3. Dezember 2018, 15:12:51 CET schrieb Thomas Waldmann: > > > >> Is there some additional security framework active on that machine, > > > >> like > > > >> apparmor or selinux or posix ACLs or ...? > > > > > > > > I'm not sure what you mean by "additional security framework". > > > > > > Well, I gave some examples. Basically anything that may intercept/forbid > > > file access. > > > > And this thing should block access to the PDF attachments in the Moinmoin > > database. Doesn't sound very plausible. Unless if Moinmoin would use > > something like this, and it's a bug... > > I can imagine SELinux causing a problem in a situation like this because it > seems like Moin is expected to be able to run another program to perform the > indexing (from my brief perusal of this thread). Anyone who has had to > configure Web applications for SELinux will confirm that it is easy to miss > some kind of permission or other that might be needed, either related to > executing other programs or accessing network or file resources. Moinmoin isn't part of my Linux distribution. I've installed it from a tarball. I doubt that it has been secured by SELinux or AppArmor. > > > Ubuntu comes with apparmor, while some redhat (fedora?) dists come with > > > selinux. > > > > > > ACLs can be checked with the respective commands, see man acl. > > > > Okay, thanks for the info. But I can't remember having done anything like > > that. It's completely new to me. > > > > The missing ability to search in attached PDF files isn't a show stopper > > for me. I think I'll just leave it this way. > > > > Thanks for your time. :-) > > Which distribution are you using? Kubuntu 18.04.1 LTS > If it is one of the Red Hat family > (Fedora, CentOS, RHEL), it is entirely possible that SELinux is switched on > by default and that you won't have made any decision about it. So it is > worth checking in case you have other problems in future. I cannot comment > about AppArmor, but that is also worth investigating for certain distros > (Ubuntu, maybe SuSE). > > I can't give quick answers about this because I run Debian and don't have > these things enabled, but I hope you will be able to investigate further. Hmmm... AppArmor *is* installed on my system, and it has active profiles. It must be the default for Kubuntu. I've shortly tried to see what is protected by it, but I couldn't find out how. Still, I don't think that it is able to apply to manually installed software. Bye Volker From post at volker-wysk.de Tue Dec 4 06:05:11 2018 From: post at volker-wysk.de (Volker Wysk) Date: Tue, 04 Dec 2018 12:05:11 +0100 Subject: [moin-user] Can't index PDF attachments (bug?) In-Reply-To: <1562349.f9IzhuHOf5@desktop> References: <2877096.kHBejDaPxo@desktop> <2035776.TDmvqtRfec@jeremy> <1562349.f9IzhuHOf5@desktop> Message-ID: <9477888.1A3i2YEpYT@desktop> Am Dienstag, 4. Dezember 2018, 11:57:25 CET schrieb Volker Wysk: > Still, I don't think that it is able to apply to manually installed > software. Because you'd need to manually configure AppArmor/SELinux for it. Volker From paul at boddie.org.uk Tue Dec 4 12:00:06 2018 From: paul at boddie.org.uk (Paul Boddie) Date: Tue, 04 Dec 2018 18:00:06 +0100 Subject: [moin-user] Can't index PDF attachments (bug?) In-Reply-To: <9477888.1A3i2YEpYT@desktop> References: <2877096.kHBejDaPxo@desktop> <1562349.f9IzhuHOf5@desktop> <9477888.1A3i2YEpYT@desktop> Message-ID: <1828842.Evud1SF5A6@jeremy> On Tuesday 4. December 2018 12.05.11 Volker Wysk wrote: > Am Dienstag, 4. Dezember 2018, 11:57:25 CET schrieb Volker Wysk: > > Still, I don't think that it is able to apply to manually installed > > software. > > Because you'd need to manually configure AppArmor/SELinux for it. Again, I can't comment on AppArmor, but with SELinux it is enough to be using the default Apache package, for example, to experience problems with manually installed Web applications that need Apache, precisely because those applications run within the Apache context. So, things that want to connect to databases (not Moin, of course) or things that want to write to other parts of the filesystem (like Moin) will definitely cause policy errors. If you're running Moin standalone then Web server policies won't apply, but then there may be other policies about binding to network ports. So it isn't always easy to start out with a completely unconfined environment. Paul From post at volker-wysk.de Tue Dec 4 14:15:11 2018 From: post at volker-wysk.de (Volker Wysk) Date: Tue, 04 Dec 2018 20:15:11 +0100 Subject: [moin-user] Can't index PDF attachments (bug?) In-Reply-To: <1828842.Evud1SF5A6@jeremy> References: <2877096.kHBejDaPxo@desktop> <9477888.1A3i2YEpYT@desktop> <1828842.Evud1SF5A6@jeremy> Message-ID: <6100038.B2Vf1vGBOu@desktop> Am Dienstag, 4. Dezember 2018, 18:00:06 CET schrieb Paul Boddie: > On Tuesday 4. December 2018 12.05.11 Volker Wysk wrote: > > Am Dienstag, 4. Dezember 2018, 11:57:25 CET schrieb Volker Wysk: > > > Still, I don't think that it is able to apply to manually installed > > > software. > > > > Because you'd need to manually configure AppArmor/SELinux for it. > > Again, I can't comment on AppArmor, but with SELinux it is enough to be > using the default Apache package, for example, to experience problems with > manually installed Web applications that need Apache, precisely because > those applications run within the Apache context. So, things that want to > connect to databases (not Moin, of course) or things that want to write to > other parts of the filesystem (like Moin) will definitely cause policy > errors. I see. I've tried to stop apparmor, and do an indexing run without it. It doesn't help, the "Operation not permitted" errors still occur. > If you're running Moin standalone then Web server policies won't apply, but > then there may be other policies about binding to network ports. So it isn't > always easy to start out with a completely unconfined environment. Thanks! Volker